Search | arXiv e-print repository

Thermodynamic & pattern-matching efficiency in heterogeneous networks

Authors: Davide Cipollini, Lambert Schomaker

Abstract: Why complex structures emerge in a real world environment is a fascinating question which has not found any fully satisfactory answer at the point of today. At the border between statistical physics, machine learning and network theory, we provide some computational empirical evidence indicating that the emergence of complex structures in the outer world is the macroscopic thermodynamic result of… ▽ More Why complex structures emerge in a real world environment is a fascinating question which has not found any fully satisfactory answer at the point of today. At the border between statistical physics, machine learning and network theory, we provide some computational empirical evidence indicating that the emergence of complex structures in the outer world is the macroscopic thermodynamic result of the optimal trade-off between the ability to encode information about the environment in the very features of the network and the ability to transmit information across it. While the former is quantified by the von Neumann network entropy and the latter by the Helmholtz free energy, the two are combined into a single quantity tantamount to the thermodynamic efficiency. We take as a case study an algorithm originally proposed for creating and searching trees of patterns under the assumption that, akin to living system, the algorithm seeks the best internal representation of the outer world, namely a real world dataset of patterns. Remarkably, the optimal thermodynamic efficiency overlaps with the optimal pattern-matching efficiency, defined in terms of accuracy and computational cost, whether the former is measured at the resolution scale where the network transits from exhibiting a specific heat with a single peak to a scale-invariant region. Moreover the critical hypothesis is investigated and empirical evidence suggests that the system benefits from the vicinity to the critical point whether it is faced with higher-complexity environments, while it does not when the environment is simpler. △ Less

Submitted 10 June, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

arXiv:2402.04581 [pdf, other]

Boosting Reinforcement Learning Algorithms in Continuous Robotic Reaching Tasks using Adaptive Potential Functions

Authors: Yifei Chen, Lambert Schomaker, Francisco Cruz

Abstract: In reinforcement learning, reward sha** is an efficient way to guide the learning process of an agent, as the reward can indicate the optimal policy of the task. The potential-based reward sha** framework was proposed to guarantee policy invariance after reward sha**, where a potential function is used to calculate the sha** reward. In former work, we proposed a novel adaptive potential fu… ▽ More In reinforcement learning, reward sha** is an efficient way to guide the learning process of an agent, as the reward can indicate the optimal policy of the task. The potential-based reward sha** framework was proposed to guarantee policy invariance after reward sha**, where a potential function is used to calculate the sha** reward. In former work, we proposed a novel adaptive potential function (APF) method to learn the potential function concurrently with training the agent based on information collected by the agent during the training process, and examined the APF method in discrete action space scenarios. This paper investigates the feasibility of using APF in solving continuous-reaching tasks in a real-world robotic scenario with continuous action space. We combine the Deep Deterministic Policy Gradient (DDPG) algorithm and our proposed method to form a new algorithm called APF-DDPG. To compare APF-DDPG with DDPG, we designed a task where the agent learns to control Baxter's right arm to reach a goal position. The experimental results show that the APF-DDPG algorithm outperforms the DDPG algorithm on both learning speed and robustness. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 7 pages, 6 figures

arXiv:2311.11198 [pdf, other]

Self-Supervised Versus Supervised Training for Segmentation of Organoid Images

Authors: Asmaa Haja, Eric Brouwer, Lambert Schomaker

Abstract: The process of annotating relevant data in the field of digital microscopy can be both time-consuming and especially expensive due to the required technical skills and human-expert knowledge. Consequently, large amounts of microscopic image data sets remain unlabeled, preventing their effective exploitation using deep-learning algorithms. In recent years it has been shown that a lot of relevant in… ▽ More The process of annotating relevant data in the field of digital microscopy can be both time-consuming and especially expensive due to the required technical skills and human-expert knowledge. Consequently, large amounts of microscopic image data sets remain unlabeled, preventing their effective exploitation using deep-learning algorithms. In recent years it has been shown that a lot of relevant information can be drawn from unlabeled data. Self-supervised learning (SSL) is a promising solution based on learning intrinsic features under a pretext task that is similar to the main task without requiring labels. The trained result is transferred to the main task - image segmentation in our case. A ResNet50 U-Net was first trained to restore images of liver progenitor organoids from augmented images using the Structural Similarity Index Metric (SSIM), alone, and using SSIM combined with L1 loss. Both the encoder and decoder were trained in tandem. The weights were transferred to another U-Net model designed for segmentation with frozen encoder weights, using Binary Cross Entropy, Dice, and Intersection over Union (IoU) losses. For comparison, we used the same U-Net architecture to train two supervised models, one utilizing the ResNet50 encoder as well as a simple CNN. Results showed that self-supervised learning models using a 25\% pixel drop or image blurring augmentation performed better than the other augmentation techniques using the IoU loss. When trained on only 114 images for the main task, the self-supervised learning approach outperforms the supervised method achieving an F1-score of 0.85, with higher stability, in contrast to an F1=0.78 scored by the supervised method. Furthermore, when trained with larger data sets (1,000 images), self-supervised learning is still able to perform better, achieving an F1-score of 0.92, contrasting to a score of 0.85 for the supervised method. △ Less

Submitted 18 November, 2023; originally announced November 2023.

arXiv:2308.07971 [pdf, other]

MultiSChuBERT: Effective Multimodal Fusion for Scholarly Document Quality Prediction

Authors: Gideon Maillette de Buy Wenniger, Thomas van Dongen, Lambert Schomaker

Abstract: Automatic assessment of the quality of scholarly documents is a difficult task with high potential impact. Multimodality, in particular the addition of visual information next to text, has been shown to improve the performance on scholarly document quality prediction (SDQP) tasks. We propose the multimodal predictive model MultiSChuBERT. It combines a textual model based on chunking full paper tex… ▽ More Automatic assessment of the quality of scholarly documents is a difficult task with high potential impact. Multimodality, in particular the addition of visual information next to text, has been shown to improve the performance on scholarly document quality prediction (SDQP) tasks. We propose the multimodal predictive model MultiSChuBERT. It combines a textual model based on chunking full paper text and aggregating computed BERT chunk-encodings (SChuBERT), with a visual model based on Inception V3.Our work contributes to the current state-of-the-art in SDQP in three ways. First, we show that the method of combining visual and textual embeddings can substantially influence the results. Second, we demonstrate that gradual-unfreezing of the weights of the visual sub-model, reduces its tendency to ovefit the data, improving results. Third, we show the retained benefit of multimodality when replacing standard BERT$_{\textrm{BASE}}$ embeddings with more recent state-of-the-art text embedding models. Using BERT$_{\textrm{BASE}}$ embeddings, on the (log) number of citations prediction task with the ACL-BiblioMetry dataset, our MultiSChuBERT (text+visual) model obtains an $R^{2}$ score of 0.454 compared to 0.432 for the SChuBERT (text only) model. Similar improvements are obtained on the PeerRead accept/reject prediction task. In our experiments using SciBERT, scincl, SPECTER and SPECTER2.0 embeddings, we show that each of these tailored embeddings adds further improvements over the standard BERT$_{\textrm{BASE}}$ embeddings, with the SPECTER2.0 embeddings performing best. △ Less

Submitted 15 August, 2023; originally announced August 2023.

ACM Class: I.2.7

arXiv:2307.15071 [pdf, other]

Writer adaptation for offline text recognition: An exploration of neural network-based methods

Authors: Tobias van der Werff, Maruf A. Dhali, Lambert Schomaker

Abstract: Handwriting recognition has seen significant success with the use of deep learning. However, a persistent shortcoming of neural networks is that they are not well-equipped to deal with shifting data distributions. In the field of handwritten text recognition (HTR), this shows itself in poor recognition accuracy for writers that are not similar to those seen during training. An ideal HTR model shou… ▽ More Handwriting recognition has seen significant success with the use of deep learning. However, a persistent shortcoming of neural networks is that they are not well-equipped to deal with shifting data distributions. In the field of handwritten text recognition (HTR), this shows itself in poor recognition accuracy for writers that are not similar to those seen during training. An ideal HTR model should be adaptive to new writing styles in order to handle the vast amount of possible writing styles. In this paper, we explore how HTR models can be made writer adaptive by using only a handful of examples from a new writer (e.g., 16 examples) for adaptation. Two HTR architectures are used as base models, using a ResNet backbone along with either an LSTM or Transformer sequence decoder. Using these base models, two methods are considered to make them writer adaptive: 1) model-agnostic meta-learning (MAML), an algorithm commonly used for tasks such as few-shot classification, and 2) writer codes, an idea originating from automatic speech recognition. Results show that an HTR-specific version of MAML known as MetaHTR improves performance compared to the baseline with a 1.4 to 2.0 improvement in word error rate (WER). The improvement due to writer adaptation is between 0.2 and 0.7 WER, where a deeper model seems to lend itself better to adaptation using MetaHTR than a shallower model. However, applying MetaHTR to larger HTR models or sentence-level HTR may become prohibitive due to its high computational and memory requirements. Lastly, writer codes based on learned features or Hinge statistical features did not lead to improved recognition performance. △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: 21 pages including appendices, 6 figures, 10 tables

arXiv:2306.06754 [pdf, other]

Reinforcement Learning in Robotic Motion Planning by Combined Experience-based Planning and Self-Imitation Learning

Authors: Sha Luo, Lambert Schomaker

Abstract: High-quality and representative data is essential for both Imitation Learning (IL)- and Reinforcement Learning (RL)-based motion planning tasks. For real robots, it is challenging to collect enough qualified data either as demonstrations for IL or experiences for RL due to safety considerations in environments with obstacles. We target this challenge by proposing the self-imitation learning by pla… ▽ More High-quality and representative data is essential for both Imitation Learning (IL)- and Reinforcement Learning (RL)-based motion planning tasks. For real robots, it is challenging to collect enough qualified data either as demonstrations for IL or experiences for RL due to safety considerations in environments with obstacles. We target this challenge by proposing the self-imitation learning by planning plus (SILP+) algorithm, which efficiently embeds experience-based planning into the learning architecture to mitigate the data-collection problem. The planner generates demonstrations based on successfully visited states from the current RL policy, and the policy improves by learning from these demonstrations. In this way, we relieve the demand for human expert operators to collect demonstrations required by IL and improve the RL performance as well. Various experimental results show that SILP+ achieves better training efficiency higher and more stable success rate in complex motion planning tasks compared to several other methods. Extensive tests on physical robots illustrate the effectiveness of SILP+ in a physical setting. △ Less

Submitted 11 June, 2023; originally announced June 2023.

arXiv:2305.10126 [pdf, other]

Fusion-S2iGan: An Efficient and Effective Single-Stage Framework for Speech-to-Image Generation

Authors: Zhenxing Zhang, Lambert Schomaker

Abstract: The goal of a speech-to-image transform is to produce a photo-realistic picture directly from a speech signal. Recently, various studies have focused on this task and have achieved promising performance. However, current speech-to-image approaches are based on a stacked modular framework that suffers from three vital issues: 1) Training separate networks is time-consuming as well as inefficient an… ▽ More The goal of a speech-to-image transform is to produce a photo-realistic picture directly from a speech signal. Recently, various studies have focused on this task and have achieved promising performance. However, current speech-to-image approaches are based on a stacked modular framework that suffers from three vital issues: 1) Training separate networks is time-consuming as well as inefficient and the convergence of the final generative model strongly depends on the previous generators; 2) The quality of precursor images is ignored by this architecture; 3) Multiple discriminator networks are required to be trained. To this end, we propose an efficient and effective single-stage framework called Fusion-S2iGan to yield perceptually plausible and semantically consistent image samples on the basis of given spoken descriptions. Fusion-S2iGan introduces a visual+speech fusion module (VSFM), constructed with a pixel-attention module (PAM), a speech-modulation module (SMM) and a weighted-fusion module (WFM), to inject the speech embedding from a speech encoder into the generator while improving the quality of synthesized pictures. Fusion-S2iGan spreads the bimodal information over all layers of the generator network to reinforce the visual feature maps at various hierarchical levels in the architecture. We conduct a series of experiments on four benchmark data sets, i.e., CUB birds, Oxford-102, Flickr8k and Places-subset. The experimental results demonstrate the superiority of the presented Fusion-S2iGan compared to the state-of-the-art models with a multi-stage architecture and a performance level that is close to traditional text-to-image approaches. △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2212.07923 [pdf, other]

doi 10.5220/0011699500003411

The Effects of Character-Level Data Augmentation on Style-Based Dating of Historical Manuscripts

Authors: Lisa Koopmans, Maruf A. Dhali, Lambert Schomaker

Abstract: Identifying the production dates of historical manuscripts is one of the main goals for paleographers when studying ancient documents. Automatized methods can provide paleographers with objective tools to estimate dates more accurately. Previously, statistical features have been used to date digitized historical manuscripts based on the hypothesis that handwriting styles change over periods. Howev… ▽ More Identifying the production dates of historical manuscripts is one of the main goals for paleographers when studying ancient documents. Automatized methods can provide paleographers with objective tools to estimate dates more accurately. Previously, statistical features have been used to date digitized historical manuscripts based on the hypothesis that handwriting styles change over periods. However, the sparse availability of such documents poses a challenge in obtaining robust systems. Hence, the research of this article explores the influence of data augmentation on the dating of historical manuscripts. Linear Support Vector Machines were trained with k-fold cross-validation on textural and grapheme-based features extracted from historical manuscripts of different collections, including the Medieval Paleographical Scale, early Aramaic manuscripts, and the Dead Sea Scrolls. Results show that training models with augmented data improve the performance of historical manuscripts dating by 1% - 3% in cumulative scores. Additionally, this indicates further enhancement possibilities by considering models specific to the features and the documents' scripts. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Comments: Accepted after the peer-review process for ICPRAM 2023; scheduled to be presented on 22 February 2023

Report number: ISBN 978-989-758-626-2

Journal ref: Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods - ICPRAM, 124-135, 2023

arXiv:2207.02688 [pdf, other]

Ferroelastic domain walls in BiFeO$_3$ as memristive networks

Authors: Jan Rieck, Davide Cipollini, Mart Salverda, Cynthia P. Quinteros, Lambert R. B. Schomaker, Beatriz Noheda

Abstract: Electronic conduction along individual domain walls (DWs) has been reported in BiFeO$_3$ (BFO) and other nominally insulating ferroelectrics. DWs in these materials separate regions of differently oriented electrical polarization (domains) and are just a few atoms wide, providing self-assembled nanometric conduction paths. In this work, it is shown that electronic transport is possible also from w… ▽ More Electronic conduction along individual domain walls (DWs) has been reported in BiFeO$_3$ (BFO) and other nominally insulating ferroelectrics. DWs in these materials separate regions of differently oriented electrical polarization (domains) and are just a few atoms wide, providing self-assembled nanometric conduction paths. In this work, it is shown that electronic transport is possible also from wall to wall through the dense network of as-grown DWs in BFO thin films. Electric field cycling at different points of the network, performed locally by conducting atomic force microscope (cAFM), induces resistive switching selectively at the DWs, both for vertical (single wall) and lateral (wall-to-wall) conduction. These findings are the first step towards investigating DWs as memristive networks for information processing and in-materio computing. △ Less

Submitted 6 July, 2022; originally announced July 2022.

arXiv:2204.12678 [pdf, other]

Optimized latent-code selection for explainable conditional text-to-image GANs

Authors: Zhenxing Zhang, Lambert Schomaker

Abstract: The task of text-to-image generation has achieved remarkable progress due to the advances in the conditional generative adversarial networks (GANs). However, existing conditional text-to-image GANs approaches mostly concentrate on improving both image quality and semantic relevance but ignore the explainability of the model which plays a vital role in real-world applications. In this paper, we pre… ▽ More The task of text-to-image generation has achieved remarkable progress due to the advances in the conditional generative adversarial networks (GANs). However, existing conditional text-to-image GANs approaches mostly concentrate on improving both image quality and semantic relevance but ignore the explainability of the model which plays a vital role in real-world applications. In this paper, we present a variety of techniques to take a deep look into the latent space and semantic space of the conditional text-to-image GANs model. We introduce pairwise linear interpolation of latent codes and `linguistic' linear interpolation to study what the model has learned within the latent space and `linguistic' embeddings. Subsequently, we extend linear interpolation to triangular interpolation conditioned on three corners to further analyze the model. After that, we build a Good/Bad data set containing unsuccessfully and successfully synthetic samples and corresponding latent codes for the image-quality research. Based on this data set, we propose a framework for finding good latent codes by utilizing a linear SVM. Experimental results on the recent DiverGAN generator trained on two benchmark data sets qualitatively prove the effectiveness of our presented techniques, with a better than 94\% accuracy in predicting ${Good}$/${Bad}$ classes for latent vectors. The Good/Bad data set is publicly available at https://zenodo.org/record/5850224#.YeGMwP7MKUk. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2202.12929

arXiv:2203.01042 [pdf, other]

doi 10.5220/0011743700003411

Image-based material analysis of ancient historical documents

Authors: Thomas Reynolds, Maruf A. Dhali, Lambert Schomaker

Abstract: Researchers continually perform corroborative tests to classify ancient historical documents based on the physical materials of their writing surfaces. However, these tests, often performed on-site, requires actual access to the manuscript objects. The procedures involve a considerable amount of time and cost, and can damage the manuscripts. Develo** a technique to classify such documents using… ▽ More Researchers continually perform corroborative tests to classify ancient historical documents based on the physical materials of their writing surfaces. However, these tests, often performed on-site, requires actual access to the manuscript objects. The procedures involve a considerable amount of time and cost, and can damage the manuscripts. Develo** a technique to classify such documents using only digital images can be very useful and efficient. In order to tackle this problem, this study uses images of a famous historical collection, the Dead Sea Scrolls, to propose a novel method to classify the materials of the manuscripts. The proposed classifier uses the two-dimensional Fourier Transform to identify patterns within the manuscript surfaces. Combining a binary classification system employing the transform with a majority voting process is shown to be effective for this classification task. This pilot study shows a successful classification percentage of up to 97% for a confined amount of manuscripts produced from either parchment or papyrus material. Feature vectors based on Fourier-space grid representation outperformed a concentric Fourier-space format. △ Less

Submitted 2 March, 2022; originally announced March 2022.

Comments: 8 pages, 11 figures including supplementary documents; Submitted to ICPR 2022

Report number: ISBN 978-989-758-626-2

Journal ref: Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods - ICPRAM, 697-706, 2023

arXiv:2202.12929 [pdf, other]

OptGAN: Optimizing and Interpreting the Latent Space of the Conditional Text-to-Image GANs

Authors: Zhenxing Zhang, Lambert Schomaker

Abstract: Text-to-image generation intends to automatically produce a photo-realistic image, conditioned on a textual description. It can be potentially employed in the field of art creation, data augmentation, photo-editing, etc. Although many efforts have been dedicated to this task, it remains particularly challenging to generate believable, natural scenes. To facilitate the real-world applications of te… ▽ More Text-to-image generation intends to automatically produce a photo-realistic image, conditioned on a textual description. It can be potentially employed in the field of art creation, data augmentation, photo-editing, etc. Although many efforts have been dedicated to this task, it remains particularly challenging to generate believable, natural scenes. To facilitate the real-world applications of text-to-image synthesis, we focus on studying the following three issues: 1) How to ensure that generated samples are believable, realistic or natural? 2) How to exploit the latent space of the generator to edit a synthesized image? 3) How to improve the explainability of a text-to-image generation framework? In this work, we constructed two novel data sets (i.e., the Good & Bad bird and face data sets) consisting of successful as well as unsuccessful generated samples, according to strict criteria. To effectively and efficiently acquire high-quality images by increasing the probability of generating Good latent codes, we use a dedicated Good/Bad classifier for generated images. It is based on a pre-trained front end and fine-tuned on the basis of the proposed Good & Bad data set. After that, we present a novel algorithm which identifies semantically-understandable directions in the latent space of a conditional text-to-image GAN architecture by performing independent component analysis on the pre-trained weight values of the generator. Furthermore, we develop a background-flattening loss (BFL), to improve the background appearance in the edited image. Subsequently, we introduce linear interpolation analysis between pairs of keywords. This is extended into a similar triangular `linguistic' interpolation in order to take a deep look into what a text-to-image synthesis model has learned within the linguistic embeddings. Our data set is available at https://zenodo.org/record/6283798#.YhkN_ujMI2w. △ Less

Submitted 25 February, 2022; originally announced February 2022.

Comments: 18 pages

arXiv:2111.09267 [pdf, other]

doi 10.1016/j.neucom.2021.12.005

DiverGAN: An Efficient and Effective Single-Stage Framework for Diverse Text-to-Image Generation

Authors: Zhenxing Zhang, Lambert Schomaker

Abstract: In this paper, we present an efficient and effective single-stage framework (DiverGAN) to generate diverse, plausible and semantically consistent images according to a natural-language description. DiverGAN adopts two novel word-level attention modules, i.e., a channel-attention module (CAM) and a pixel-attention module (PAM), which model the importance of each word in the given sentence while all… ▽ More In this paper, we present an efficient and effective single-stage framework (DiverGAN) to generate diverse, plausible and semantically consistent images according to a natural-language description. DiverGAN adopts two novel word-level attention modules, i.e., a channel-attention module (CAM) and a pixel-attention module (PAM), which model the importance of each word in the given sentence while allowing the network to assign larger weights to the significant channels and pixels semantically aligning with the salient words. After that, Conditional Adaptive Instance-Layer Normalization (CAdaILN) is introduced to enable the linguistic cues from the sentence embedding to flexibly manipulate the amount of change in shape and texture, further improving visual-semantic representation and hel** stabilize the training. Also, a dual-residual structure is developed to preserve more original visual features while allowing for deeper networks, resulting in faster convergence speed and more vivid details. Furthermore, we propose to plug a fully-connected layer into the pipeline to address the lack-of-diversity problem, since we observe that a dense layer will remarkably enhance the generative capability of the network, balancing the trade-off between a low-dimensional random latent code contributing to variants and modulation modules that use high-dimensional and textual contexts to strength feature maps. Inserting a linear layer after the second residual block achieves the best variety and quality. Both qualitative and quantitative results on benchmark data sets demonstrate the superiority of our DiverGAN for realizing diversity, without harming quality and semantic consistency. △ Less

Submitted 17 November, 2021; originally announced November 2021.

Journal ref: Neurocomputing 2022

arXiv:2109.04847 [pdf, other]

Active learning for reducing labeling effort in text classification tasks

Authors: Pieter Floris Jacobs, Gideon Maillette de Buy Wenniger, Marco Wiering, Lambert Schomaker

Abstract: Labeling data can be an expensive task as it is usually performed manually by domain experts. This is cumbersome for deep learning, as it is dependent on large labeled datasets. Active learning (AL) is a paradigm that aims to reduce labeling effort by only using the data which the used model deems most informative. Little research has been done on AL in a text classification setting and next to no… ▽ More Labeling data can be an expensive task as it is usually performed manually by domain experts. This is cumbersome for deep learning, as it is dependent on large labeled datasets. Active learning (AL) is a paradigm that aims to reduce labeling effort by only using the data which the used model deems most informative. Little research has been done on AL in a text classification setting and next to none has involved the more recent, state-of-the-art Natural Language Processing (NLP) models. Here, we present an empirical study that compares different uncertainty-based algorithms with BERT$_{base}$ as the used classifier. We evaluate the algorithms on two NLP classification datasets: Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore heuristics that aim to solve presupposed problems of uncertainty-based AL; namely, that it is unscalable and that it is prone to selecting outliers. Furthermore, we explore the influence of the query-pool size on the performance of AL. Whereas it was found that the proposed heuristics for AL did not improve performance of AL; our results show that using uncertainty-based AL with BERT$_{base}$ outperforms random sampling of data. This difference in performance can decrease as the query-pool size gets larger. △ Less

Submitted 3 November, 2021; v1 submitted 10 September, 2021; originally announced September 2021.

Comments: Accepted as a conference paper at the joint 33rd Benelux Conference on Artificial Intelligence and the 30th Belgian Dutch Conference on Machine Learning (BNAIC/BENELEARN 2021). This camera-ready version submitted to BNAIC/BENELEARN, adds several improvements including a more thorough discussion of related work plus an extended discussion section. 28 pages including references and appendices

ACM Class: I.2.7

arXiv:2104.05036 [pdf, other]

GR-RNN: Global-Context Residual Recurrent Neural Networks for Writer Identification

Authors: Sheng He, Lambert Schomaker

Abstract: This paper presents an end-to-end neural network system to identify writers through handwritten word images, which jointly integrates global-context information and a sequence of local fragment-based features. The global-context information is extracted from the tail of the neural network by a global average pooling step. The sequence of local and fragment-based features is extracted from a low-le… ▽ More This paper presents an end-to-end neural network system to identify writers through handwritten word images, which jointly integrates global-context information and a sequence of local fragment-based features. The global-context information is extracted from the tail of the neural network by a global average pooling step. The sequence of local and fragment-based features is extracted from a low-level deep feature map which contains subtle information about the handwriting style. The spatial relationship between the sequence of fragments is modeled by the recurrent neural network (RNN) to strengthen the discriminative ability of the local fragment features. We leverage the complementary information between the global-context and local fragments, resulting in the proposed global-context residual recurrent neural network (GR-RNN) method. The proposed method is evaluated on four public data sets and experimental results demonstrate that it can provide state-of-the-art performance. In addition, the neural networks trained on gray-scale images provide better results than neural networks trained on binarized and contour images, indicating that texture information plays an important role for writer identification. The source code will be available: \url{https://github.com/shengfly/writer-identification}. △ Less

Submitted 11 April, 2021; originally announced April 2021.

Comments: To appear: Pattern Recognition

arXiv:2104.02793 [pdf, other]

A fully automated end-to-end process for fluorescence microscopy images of yeast cells: From segmentation to detection and classification

Authors: Asmaa Haja, Lambert R. B. Schomaker

Abstract: In recent years, an enormous amount of fluorescence microscopy images were collected in high-throughput lab settings. Analyzing and extracting relevant information from all images in a short time is almost impossible. Detecting tiny individual cell compartments is one of many challenges faced by biologists. This paper aims at solving this problem by building an end-to-end process that employs meth… ▽ More In recent years, an enormous amount of fluorescence microscopy images were collected in high-throughput lab settings. Analyzing and extracting relevant information from all images in a short time is almost impossible. Detecting tiny individual cell compartments is one of many challenges faced by biologists. This paper aims at solving this problem by building an end-to-end process that employs methods from the deep learning field to automatically segment, detect and classify cell compartments of fluorescence microscopy images of yeast cells. With this intention we used Mask R-CNN to automatically segment and label a large amount of yeast cell data, and YOLOv4 to automatically detect and classify individual yeast cell compartments from these images. This fully automated end-to-end process is intended to be integrated into an interactive e-Science server in the PerICo1 project, which can be used by biologists with minimized human effort in training and operation to complete their various classification tasks. In addition, we evaluated the detection and classification performance of state-of-the-art YOLOv4 on data from the NOP1pr-GFP-SWAT yeast-cell data library. Experimental results show that by dividing original images into 4 quadrants YOLOv4 outputs good detection and classification results with an F1-score of 98% in terms of accuracy and speed, which is optimally suited for the native resolution of the microscope and current GPU memory sizes. Although the application domain is optical microscopy in yeast cells, the method is also applicable to multiple-cell images in medical applications △ Less

Submitted 6 April, 2021; originally announced April 2021.

arXiv:2103.13834 [pdf, other]

Self-Imitation Learning by Planning

Authors: Sha Luo, Hamidreza Kasaei, Lambert Schomaker

Abstract: Imitation learning (IL) enables robots to acquire skills quickly by transferring expert knowledge, which is widely adopted in reinforcement learning (RL) to initialize exploration. However, in long-horizon motion planning tasks, a challenging problem in deploying IL and RL methods is how to generate and collect massive, broadly distributed data such that these methods can generalize effectively. I… ▽ More Imitation learning (IL) enables robots to acquire skills quickly by transferring expert knowledge, which is widely adopted in reinforcement learning (RL) to initialize exploration. However, in long-horizon motion planning tasks, a challenging problem in deploying IL and RL methods is how to generate and collect massive, broadly distributed data such that these methods can generalize effectively. In this work, we solve this problem using our proposed approach called {self-imitation learning by planning (SILP)}, where demonstration data are collected automatically by planning on the visited states from the current policy. SILP is inspired by the observation that successfully visited states in the early reinforcement learning stage are collision-free nodes in the graph-search based motion planner, so we can plan and relabel robot's own trials as demonstrations for policy learning. Due to these self-generated demonstrations, we relieve the human operator from the laborious data preparation process required by IL and RL methods in solving complex motion planning tasks. The evaluation results show that our SILP method achieves higher success rates and enhances sample efficiency compared to selected baselines, and the policy learned in simulation performs well in a real-world placement task with changing goals and obstacles. △ Less

Submitted 26 March, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

arXiv:2012.11740 [pdf, other]

doi 10.18653/v1/2020.sdp-1.17

SChuBERT: Scholarly Document Chunks with BERT-encoding boost Citation Count Prediction

Authors: Thomas van Dongen, Gideon Maillette de Buy Wenniger, Lambert Schomaker

Abstract: Predicting the number of citations of scholarly documents is an upcoming task in scholarly document processing. Besides the intrinsic merit of this information, it also has a wider use as an imperfect proxy for quality which has the advantage of being cheaply available for large volumes of scholarly documents. Previous work has dealt with number of citations prediction with relatively small traini… ▽ More Predicting the number of citations of scholarly documents is an upcoming task in scholarly document processing. Besides the intrinsic merit of this information, it also has a wider use as an imperfect proxy for quality which has the advantage of being cheaply available for large volumes of scholarly documents. Previous work has dealt with number of citations prediction with relatively small training data sets, or larger datasets but with short, incomplete input text. In this work we leverage the open access ACL Anthology collection in combination with the Semantic Scholar bibliometric database to create a large corpus of scholarly documents with associated citation information and we propose a new citation prediction model called SChuBERT. In our experiments we compare SChuBERT with several state-of-the-art citation prediction models and show that it outperforms previous methods by a large margin. We also show the merit of using more training data and longer input for number of citations prediction. △ Less

Submitted 21 December, 2020; originally announced December 2020.

Comments: Published at the First Workshop on Scholarly Document Processing, at EMNLP 2020. Minor corrections were made to the workshop version, including addition of color to Figures 1,2

ACM Class: I.2.7

Journal ref: Proceedings of the First Workshop on Scholarly Document Processing. Association for Computational Linguistics. (2020) 158-167. EMNLP|SDP 2020 https://www.aclweb.org/anthology/2020.sdp-1.17/

arXiv:2011.02709 [pdf, other]

doi 10.1109/IJCNN52387.2021.9533527

DTGAN: Dual Attention Generative Adversarial Networks for Text-to-Image Generation

Authors: Zhenxing Zhang, Lambert Schomaker

Abstract: Most existing text-to-image generation methods adopt a multi-stage modular architecture which has three significant problems: 1) Training multiple networks increases the run time and affects the convergence and stability of the generative model; 2) These approaches ignore the quality of early-stage generator images; 3) Many discriminators need to be trained. To this end, we propose the Dual Attent… ▽ More Most existing text-to-image generation methods adopt a multi-stage modular architecture which has three significant problems: 1) Training multiple networks increases the run time and affects the convergence and stability of the generative model; 2) These approaches ignore the quality of early-stage generator images; 3) Many discriminators need to be trained. To this end, we propose the Dual Attention Generative Adversarial Network (DTGAN) which can synthesize high-quality and semantically consistent images only employing a single generator/discriminator pair. The proposed model introduces channel-aware and pixel-aware attention modules that can guide the generator to focus on text-relevant channels and pixels based on the global sentence vector and to fine-tune original feature maps using attention weights. Also, Conditional Adaptive Instance-Layer Normalization (CAdaILN) is presented to help our attention modules flexibly control the amount of change in shape and texture by the input natural-language description. Furthermore, a new type of visual loss is utilized to enhance the image resolution by ensuring vivid shape and perceptually uniform color distributions of generated images. Experimental results on benchmark datasets demonstrate the superiority of our proposed method compared to the state-of-the-art models with a multi-stage framework. Visualization of the attention maps shows that the channel-aware attention module is able to localize the discriminative regions, while the pixel-aware attention module has the ability to capture the globally visual contents for the generation of an image. △ Less

Submitted 21 November, 2020; v1 submitted 5 November, 2020; originally announced November 2020.

Journal ref: 2021 International Joint Conference on Neural Networks (IJCNN) Proceedings

arXiv:2010.14476 [pdf, other]

doi 10.1371/journal.pone.0249769

Artificial intelligence based writer identification generates new evidence for the unknown scribes of the Dead Sea Scrolls exemplified by the Great Isaiah Scroll (1QIsaa)

Authors: Mladen Popović, Maruf A. Dhali, Lambert Schomaker

Abstract: The Dead Sea Scrolls are tangible evidence of the Bible's ancient scribal culture. Palaeography - the study of ancient handwriting - can provide access to this scribal culture. However, one of the problems of traditional palaeography is to determine writer identity when the writing style is near uniform. This is exemplified by the Great Isaiah Scroll (1QIsaa). To this end, we used pattern recognit… ▽ More The Dead Sea Scrolls are tangible evidence of the Bible's ancient scribal culture. Palaeography - the study of ancient handwriting - can provide access to this scribal culture. However, one of the problems of traditional palaeography is to determine writer identity when the writing style is near uniform. This is exemplified by the Great Isaiah Scroll (1QIsaa). To this end, we used pattern recognition and artificial intelligence techniques to innovate the palaeography of the scrolls regarding writer identification and to pioneer the microlevel of individual scribes to open access to the Bible's ancient scribal culture. Although many scholars believe that 1QIsaa was written by one scribe, we report new evidence for a breaking point in the series of columns in this scroll. Without prior assumption of writer identity, based on point clouds of the reduced-dimensionality feature-space, we found that columns from the first and second halves of the manuscript ended up in two distinct zones of such scatter plots, notably for a range of digital palaeography tools, each addressing very different featural aspects of the script samples. In a secondary, independent, analysis, now assuming writer difference and using yet another independent feature method and several different types of statistical testing, a switching point was found in the column series. A clear phase transition is apparent around column 27. Given the statistically significant differences between the two halves, a tertiary, post-hoc analysis was performed. Demonstrating that two main scribes were responsible for the Great Isaiah Scroll, this study sheds new light on the Bible's ancient scribal culture by providing new, tangible evidence that ancient biblical texts were not copied by a single scribe only but that multiple scribes could closely collaborate on one particular manuscript. △ Less

Submitted 27 October, 2020; originally announced October 2020.

Comments: 23 pages, 15 pages of supplementary materials, submitted to PLOS ONE on 19 October 2019

Report number: PLoS ONE 16(4): e0249769

Journal ref: PLoS ONE 2021

arXiv:2005.00129 [pdf, other]

doi 10.18653/v1/2020.sdp-1.18

Structure-Tags Improve Text Classification for Scholarly Document Quality Prediction

Authors: Gideon Maillette de Buy Wenniger, Thomas van Dongen, Eleri Aedmaa, Herbert Teun Kruitbosch, Edwin A. Valentijn, Lambert Schomaker

Abstract: Training recurrent neural networks on long texts, in particular scholarly documents, causes problems for learning. While hierarchical attention networks (HANs) are effective in solving these problems, they still lose important information about the structure of the text. To tackle these problems, we propose the use of HANs combined with structure-tags which mark the role of sentences in the docume… ▽ More Training recurrent neural networks on long texts, in particular scholarly documents, causes problems for learning. While hierarchical attention networks (HANs) are effective in solving these problems, they still lose important information about the structure of the text. To tackle these problems, we propose the use of HANs combined with structure-tags which mark the role of sentences in the document. Adding tags to sentences, marking them as corresponding to title, abstract or main body text, yields improvements over the state-of-the-art for scholarly document quality prediction. The proposed system is applied to the task of accept/reject prediction on the PeerRead dataset and compared against a recent BiLSTM-based model and joint textual+visual model as well as against plain HANs. Compared to plain HANs, accuracy increases on all three domains. On the computation and language domain our new model works best overall, and increases accuracy 4.7% over the best literature result. We also obtain improvements when introducing the tags for prediction of the number of citations for 88k scientific publications that we compiled from the Allen AI S2ORC dataset. For our HAN-system with structure-tags we reach 28.5% explained variance, an improvement of 1.8% over our reimplementation of the BiLSTM-based model as well as 1.0% improvement over plain HANs. △ Less

Submitted 17 December, 2020; v1 submitted 30 April, 2020; originally announced May 2020.

Comments: This new version of the paper brings the paper up-to-date with the improved paper, published at the First Workshop on Scholarly Document Processing, at EMNLP 2020. .Additionally, minor corrections were made including addition of color to Figures 1,2. The changes in comparison to the first arXiv version are substantial, including various additional results, and substantial improvements to the text

ACM Class: I.2.7

Journal ref: Proceedings of the First Workshop on Scholarly Document Processing. Association for Computational Linguistics. (2020) 158-167. EMNLP|SDP 2020 https://www.aclweb.org/anthology/2020.sdp-1.18

arXiv:2003.08771 [pdf, other]

"Who is Driving around Me?" Unique Vehicle Instance Classification using Deep Neural Features

Authors: Tim Oosterhuis, Lambert Schomaker

Abstract: Being aware of other traffic is a prerequisite for self-driving cars to operate in the real world. In this paper, we show how the intrinsic feature maps of an object detection CNN can be used to uniquely identify vehicles from a dash-cam feed. Feature maps of a pretrained `YOLO' network are used to create 700 deep integrated feature signatures (DIFS) from 20 different images of 35 vehicles from a… ▽ More Being aware of other traffic is a prerequisite for self-driving cars to operate in the real world. In this paper, we show how the intrinsic feature maps of an object detection CNN can be used to uniquely identify vehicles from a dash-cam feed. Feature maps of a pretrained `YOLO' network are used to create 700 deep integrated feature signatures (DIFS) from 20 different images of 35 vehicles from a high resolution dataset and 340 signatures from 20 different images of 17 vehicles of a lower resolution tracking benchmark dataset. The YOLO network was trained to classify general object categories, e.g. classify a detected object as a `car' or `truck'. 5-Fold nearest neighbor (1NN) classification was used on DIFS created from feature maps in the middle layers of the network to correctly identify unique vehicles at a rate of 96.7\% for the high resolution data and with a rate of 86.8\% for the lower resolution data. We conclude that a deep neural detection network trained to distinguish between different classes can be successfully used to identify different instances belonging to the same class, through the creation of deep integrated feature signatures (DIFS). △ Less

Submitted 29 February, 2020; originally announced March 2020.

Comments: 6 pages, 3 figures

ACM Class: I.2.6

arXiv:2003.07212 [pdf, other]

doi 10.1109/TIFS.2020.2981236

FragNet: Writer Identification using Deep Fragment Networks

Authors: Sheng He, Lambert Schomaker

Abstract: Writer identification based on a small amount of text is a challenging problem. In this paper, we propose a new benchmark study for writer identification based on word or text block images which approximately contain one word. In order to extract powerful features on these word images, a deep neural network, named FragNet, is proposed. The FragNet has two pathways: feature pyramid which is used to… ▽ More Writer identification based on a small amount of text is a challenging problem. In this paper, we propose a new benchmark study for writer identification based on word or text block images which approximately contain one word. In order to extract powerful features on these word images, a deep neural network, named FragNet, is proposed. The FragNet has two pathways: feature pyramid which is used to extract feature maps and fragment pathway which is trained to predict the writer identity based on fragments extracted from the input image and the feature maps on the feature pyramid. We conduct experiments on four benchmark datasets, which show that our proposed method can generate efficient and robust deep representations for writer identification based on both word and page images. △ Less

Submitted 24 March, 2020; v1 submitted 16 March, 2020; originally announced March 2020.

Journal ref: IEEE Trans. on Information Forensic and Security, 2020

arXiv:2002.03892 [pdf, other]

Learning to Grasp 3D Objects using Deep Residual U-Nets

Authors: Yikun Li, Lambert Schomaker, S. Hamidreza Kasaei

Abstract: Grasp synthesis is one of the challenging tasks for any robot object manipulation task. In this paper, we present a new deep learning-based grasp synthesis approach for 3D objects. In particular, we propose an end-to-end 3D Convolutional Neural Network to predict the objects' graspable areas. We named our approach Res-U-Net since the architecture of the network is designed based on U-Net structure… ▽ More Grasp synthesis is one of the challenging tasks for any robot object manipulation task. In this paper, we present a new deep learning-based grasp synthesis approach for 3D objects. In particular, we propose an end-to-end 3D Convolutional Neural Network to predict the objects' graspable areas. We named our approach Res-U-Net since the architecture of the network is designed based on U-Net structure and residual network-styled blocks. It devised to plan 6-DOF grasps for any desired object, be efficient to compute and use, and be robust against varying point cloud density and Gaussian noise. We have performed extensive experiments to assess the performance of the proposed approach concerning graspable part detection, grasp success rate, and robustness to varying point cloud density and Gaussian noise. Experiments validate the promising performance of the proposed architecture in all aspects. A video showing the performance of our approach in the simulation environment can be found at: http://youtu.be/5_yAJCc8owo △ Less

Submitted 12 September, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

arXiv:2002.02697 [pdf, other]

doi 10.1109/IJCNN48605.2020.9207427

Accelerating Reinforcement Learning for Reaching using Continuous Curriculum Learning

Authors: Sha Luo, Hamidreza Kasaei, Lambert Schomaker

Abstract: Reinforcement learning has shown great promise in the training of robot behavior due to the sequential decision making characteristics. However, the required enormous amount of interactive and informative training data provides the major stumbling block for progress. In this study, we focus on accelerating reinforcement learning (RL) training and improving the performance of multi-goal reaching ta… ▽ More Reinforcement learning has shown great promise in the training of robot behavior due to the sequential decision making characteristics. However, the required enormous amount of interactive and informative training data provides the major stumbling block for progress. In this study, we focus on accelerating reinforcement learning (RL) training and improving the performance of multi-goal reaching tasks. Specifically, we propose a precision-based continuous curriculum learning (PCCL) method in which the requirements are gradually adjusted during the training process, instead of fixing the parameter in a static schedule. To this end, we explore various continuous curriculum strategies for controlling a training process. This approach is tested using a Universal Robot 5e in both simulation and real-world multi-goal reach experiments. Experimental results support the hypothesis that a static training schedule is suboptimal, and using an appropriate decay function for curriculum learning provides superior results in a faster way. △ Less

Submitted 21 December, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

arXiv:1912.05156 [pdf, other]

Lifelong learning for text retrieval and recognition in historical handwritten document collections

Authors: Lambert Schomaker

Abstract: This chapter provides an overview of the problems that need to be dealt with when constructing a lifelong-learning retrieval, recognition and indexing engine for large historical document collections in multiple scripts and languages, the Monk system. This application is highly variable over time, since the continuous labeling by end users changes the concept of what a 'ground truth' constitutes.… ▽ More This chapter provides an overview of the problems that need to be dealt with when constructing a lifelong-learning retrieval, recognition and indexing engine for large historical document collections in multiple scripts and languages, the Monk system. This application is highly variable over time, since the continuous labeling by end users changes the concept of what a 'ground truth' constitutes. Although current advances in deep learning provide a huge potential in this application domain, the scale of the problem, i.e., more than 520 hugely diverse books, documents and manuscripts precludes the current meticulous and painstaking human effort which is required in designing and develo** successful deep-learning systems. The ball-park principle is introduced, which describes the evolution from the sparsely-labeled stage that can only be addressed by traditional methods or nearest-neighbor methods on embedded vectors of pre-trained neural networks, up to the other end of the spectrum where massive labeling allows reliable training of deep-learning methods. Contents: Introduction, Expectation management, Deep learning, The ball-park principle, Technical realization, Work flow, Quality and quantity of material, Industrialization and scalability, Human effort, Algorithms, Object of recognition, Processing pipeline, Performance,Compositionality, Conclusion. △ Less

Submitted 11 December, 2019; originally announced December 2019.

Comments: To appear as chapter in book: Handwritten Historical Document Analysis, Recognition, and Retrieval -- State of the Art and Future Trends, in the book series: Series in Machine Perception and Artificial Intelligence World Scientific, ISSN (print): 1793-0839 Original version deposited at Zenodo: https://zenodo.org/record/2346885#.XfCfsq5ytpg on December 17, 2018

arXiv:1912.03223 [pdf, other]

doi 10.1007/s00521-020-05612-0

A limited-size ensemble of homogeneous CNN/LSTMs for high-performance word classification

Authors: Mahya Ameryan, Lambert Schomaker

Abstract: In recent years, long short-term memory neural networks (LSTMs) have been applied quite successfully to problems in handwritten text recognition. However, their strength is more located in handling sequences of variable length than in handling geometric variability of the image patterns. Furthermore, the best results for LSTMs are often based on large-scale training of an ensemble of network insta… ▽ More In recent years, long short-term memory neural networks (LSTMs) have been applied quite successfully to problems in handwritten text recognition. However, their strength is more located in handling sequences of variable length than in handling geometric variability of the image patterns. Furthermore, the best results for LSTMs are often based on large-scale training of an ensemble of network instances. In this paper, an end-to-end convolutional LSTM Neural Network is used to handle both geometric variation and sequence variability. We show that high performances can be reached on a common benchmark set by using proper data augmentation for just five such networks using a proper coding scheme and a proper voting scheme. The networks have similar architectures (Convolutional Neural Network (CNN): five layers, bidirectional LSTM (BiLSTM): three layers followed by a connectionist temporal classification (CTC) processing step). The approach assumes differently-scaled input images and different feature map sizes. Two datasets are used for evaluation of the performance of our algorithm: A standard benchmark RIMES dataset (French), and a historical handwritten dataset KdK (Dutch). Final performance obtained for the word-recognition test of RIMES was 96.6%, a clear improvement over other state-of-the-art approaches. On the KdK dataset, our approach also shows good results. The proposed approach is deployed in the Monk search engine for historical-handwriting collections. △ Less

Submitted 6 December, 2019; originally announced December 2019.

Journal ref: Neural Computing and Applications(2021)

arXiv:1911.07930 [pdf, other]

BiNet: Degraded-Manuscript Binarization in Diverse Document Textures and Layouts using Deep Encoder-Decoder Networks

Authors: Maruf A. Dhali, Jan Willem de Wit, Lambert Schomaker

Abstract: Handwritten document-image binarization is a semantic segmentation process to differentiate ink pixels from background pixels. It is one of the essential steps towards character recognition, writer identification, and script-style evolution analysis. The binarization task itself is challenging due to the vast diversity of writing styles, inks, and paper materials. It is even more difficult for his… ▽ More Handwritten document-image binarization is a semantic segmentation process to differentiate ink pixels from background pixels. It is one of the essential steps towards character recognition, writer identification, and script-style evolution analysis. The binarization task itself is challenging due to the vast diversity of writing styles, inks, and paper materials. It is even more difficult for historical manuscripts due to the aging and degradation of the documents over time. One of such manuscripts is the Dead Sea Scrolls (DSS) image collection, which poses extreme challenges for the existing binarization techniques. This article proposes a new binarization technique for the DSS images using the deep encoder-decoder networks. Although the artificial neural network proposed here is primarily designed to binarize the DSS images, it can be trained on different manuscript collections as well. Additionally, the use of transfer learning makes the network already utilizable for a wide range of handwritten documents, making it a unique multi-purpose tool for binarization. Qualitative results and several quantitative comparisons using both historical manuscripts and datasets from handwritten document image binarization competition (H-DIBCO and DIBCO) exhibit the robustness and the effectiveness of the system. The best performing network architecture proposed here is a variant of the U-Net encoder-decoders. △ Less

Submitted 13 November, 2019; originally announced November 2019.

Comments: 26 pages, 15 figures, 11 tables

arXiv:1904.08421 [pdf, ps, other]

A large-scale field test on word-image classification in large historical document collections using a traditional and two deep-learning methods

Authors: Lambert Schomaker

Abstract: This technical report describes a practical field test on word-image classification in a very large collection of more than 300 diverse handwritten historical manuscripts, with 1.6 million unique labeled images and more than 11 million images used in testing. Results indicate that several deep-learning tests completely failed (mean accuracy 83%). In the tests with more than 1000 output units (lexi… ▽ More This technical report describes a practical field test on word-image classification in a very large collection of more than 300 diverse handwritten historical manuscripts, with 1.6 million unique labeled images and more than 11 million images used in testing. Results indicate that several deep-learning tests completely failed (mean accuracy 83%). In the tests with more than 1000 output units (lexical words) in one-hot encoding for classification, performance steeply drops to almost zero percent accuracy, even with a modest size of the pre-final (i.e., penultimate) layer (150 units). A traditional feature method (BOVW) displays a consistent performance over numbers of classes and numbers of training examples (mean accuracy 87%). Additional tests using nearest mean on the output of the pre-final layer of an Inception V3 network, for each book, only yielded mediocre results (mean accuracy 49\%), but was not sensitive to high numbers of classes. Notably, this experiment was only possible on the basis of labels that were harvested on the basis of a traditional method which already works starting from a single labeled image per class. It is expected that the performance of the failed deep learning tests can be repaired, but only on the basis of human handcrafting (sic) of network architecture and hyperparameters. When the failed problematic books are not considered, end-to-end CNN training yields about 95% accuracy. This average is dominated by a large subset of Chinese characters, performances for other script styles being lower. △ Less

Submitted 17 April, 2019; originally announced April 2019.

Comments: Field test of a large operational image search system, comparing BOVW and end-to-end CNN

arXiv:1902.11208 [pdf, other]

doi 10.1109/ICDAR.2019.00064

No Padding Please: Efficient Neural Handwriting Recognition

Authors: Gideon Maillette de Buy Wenniger, Lambert Schomaker, Andy Way

Abstract: Neural handwriting recognition (NHR) is the recognition of handwritten text with deep learning models, such as multi-dimensional long short-term memory (MDLSTM) recurrent neural networks. Models with MDLSTM layers have achieved state-of-the art results on handwritten text recognition tasks. While multi-directional MDLSTM-layers have an unbeaten ability to capture the complete context in all direct… ▽ More Neural handwriting recognition (NHR) is the recognition of handwritten text with deep learning models, such as multi-dimensional long short-term memory (MDLSTM) recurrent neural networks. Models with MDLSTM layers have achieved state-of-the art results on handwritten text recognition tasks. While multi-directional MDLSTM-layers have an unbeaten ability to capture the complete context in all directions, this strength limits the possibilities for parallelization, and therefore comes at a high computational cost. In this work we develop methods to create efficient MDLSTM-based models for NHR, particularly a method aimed at eliminating computation waste that results from padding. This proposed method, called example-packing, replaces wasteful stacking of padded examples with efficient tiling in a 2-dimensional grid. For word-based NHR this yields a speed improvement of factor 6.6 over an already efficient baseline of minimal padding for each batch separately. For line-based NHR the savings are more modest, but still significant. In addition to example-packing, we propose: 1) a technique to optimize parallelization for dynamic graph definition frameworks including PyTorch, using convolutions with grou**, 2) a method for parallelization across GPUs for variable-length example batches. All our techniques are thoroughly tested on our own PyTorch re-implementation of MDLSTM-based NHR models. A thorough evaluation on the IAM dataset shows that our models are performing similar to earlier implementations of state-of-the-art models. Our efficient NHR model and some of the reusable techniques discussed with it offer ways to realize relatively efficient models for the omnipresent scenario of variable-length inputs in deep learning. △ Less

Submitted 28 February, 2019; originally announced February 2019.

arXiv:1901.06081 [pdf, other]

doi 10.1016/j.patcog.2019.01.025

DeepOtsu: Document Enhancement and Binarization using Iterative Deep Learning

Authors: Sheng He, Lambert Schomaker

Abstract: This paper presents a novel iterative deep learning framework and apply it for document enhancement and binarization. Unlike the traditional methods which predict the binary label of each pixel on the input image, we train the neural network to learn the degradations in document images and produce the uniform images of the degraded input images, which allows the network to refine the output iterat… ▽ More This paper presents a novel iterative deep learning framework and apply it for document enhancement and binarization. Unlike the traditional methods which predict the binary label of each pixel on the input image, we train the neural network to learn the degradations in document images and produce the uniform images of the degraded input images, which allows the network to refine the output iteratively. Two different iterative methods have been studied in this paper: recurrent refinement (RR) which uses the same trained neural network in each iteration for document enhancement and stacked refinement (SR) which uses a stack of different neural networks for iterative output refinement. Given the learned uniform and enhanced image, the binarization map can be easy to obtain by a global or local threshold. The experimental results on several public benchmark data sets show that our proposed methods provide a new clean version of the degraded image which is suitable for visualization and promising results of binarization using the global Otsu's threshold based on the enhanced images learned iteratively by the neural network. △ Less

Submitted 17 January, 2019; originally announced January 2019.

Comments: Accepted by Pattern Recognition

arXiv:1809.10954 [pdf, other]

doi 10.1016/j.patcog.2018.11.003

Deep Adaptive Learning for Writer Identification based on Single Handwritten Word Images

Authors: Sheng He, Lambert Schomaker

Abstract: There are two types of information in each handwritten word image: explicit information which can be easily read or derived directly, such as lexical content or word length, and implicit attributes such as the author's identity. Whether features learned by a neural network for one task can be used for another task remains an open question. In this paper, we present a deep adaptive learning method… ▽ More There are two types of information in each handwritten word image: explicit information which can be easily read or derived directly, such as lexical content or word length, and implicit attributes such as the author's identity. Whether features learned by a neural network for one task can be used for another task remains an open question. In this paper, we present a deep adaptive learning method for writer identification based on single-word images using multi-task learning. An auxiliary task is added to the training process to enforce the emergence of reusable features. Our proposed method transfers the benefits of the learned features of a convolutional neural network from an auxiliary task such as explicit content recognition to the main task of writer identification in a single procedure. Specifically, we propose a new adaptive convolutional layer to exploit the learned deep features. A multi-task neural network with one or several adaptive convolutional layers is trained end-to-end, to exploit robust generic features for a specific main task, i.e., writer identification. Three auxiliary tasks, corresponding to three explicit attributes of handwritten word images (lexical content, word length and character attributes), are evaluated. Experimental results on two benchmark datasets show that the proposed deep adaptive learning method can improve the performance of writer identification based on single-word images, compared to non-adaptive and simple linear-adaptive approaches. △ Less

Submitted 28 September, 2018; originally announced September 2018.

Comments: Under view of Pattern Recognition

arXiv:1808.08993 [pdf, other]

Open Set Chinese Character Recognition using Multi-typed Attributes

Authors: Sheng He, Lambert Schomaker

Abstract: Recognition of Off-line Chinese characters is still a challenging problem, especially in historical documents, not only in the number of classes extremely large in comparison to contemporary image retrieval methods, but also new unseen classes can be expected under open learning conditions (even for CNN). Chinese character recognition with zero or a few training samples is a difficult problem and… ▽ More Recognition of Off-line Chinese characters is still a challenging problem, especially in historical documents, not only in the number of classes extremely large in comparison to contemporary image retrieval methods, but also new unseen classes can be expected under open learning conditions (even for CNN). Chinese character recognition with zero or a few training samples is a difficult problem and has not been studied yet. In this paper, we propose a new Chinese character recognition method by multi-type attributes, which are based on pronunciation, structure and radicals of Chinese characters, applied to character recognition in historical books. This intermediate attribute code has a strong advantage over the common `one-hot' class representation because it allows for understanding complex and unseen patterns symbolically using attributes. First, each character is represented by four groups of attribute types to cover a wide range of character possibilities: Pinyin label, layout structure, number of strokes, three different input methods such as Cangjie, Zhengma and Wubi, as well as a four-corner encoding method. A convolutional neural network (CNN) is trained to learn these attributes. Subsequently, characters can be easily recognized by these attributes using a distance metric and a complete lexicon that is encoded in attribute space. We evaluate the proposed method on two open data sets: printed Chinese character recognition for zero-shot learning, historical characters for few-shot learning and a closed set: handwritten Chinese characters. Experimental results show a good general classification of seen classes but also a very promising generalization ability to unseen characters. △ Less

Submitted 27 August, 2018; originally announced August 2018.

Comments: 29 pages, submitted to Pattern Recognition

arXiv:1612.05996 [pdf, other]

doi 10.1017/S1743921317000254

Target and (Astro-)WISE technologies - Data federations and its applications

Authors: E. A. Valentijn, K. Begeman, A. Belikov, D. R. Boxhoorn, J. Brinchmann, J. McFarland, H. Holties, K. H. Kuijken, G. Verdoes Kleijn, W-J. Vriend, O. R. Williams, J. B. T. M. Roerdink, L. R. B. Schomaker, M. A. Swertz, A. Tsyganov, G. J. W. van Dijk

Abstract: After its first implementation in 2003 the Astro-WISE technology has been rolled out in several European countries and is used for the production of the KiDS survey data. In the multi-disciplinary Target initiative this technology, nicknamed WISE technology, has been further applied to a large number of projects. Here, we highlight the data handling of other astronomical applications, such as VLT-… ▽ More After its first implementation in 2003 the Astro-WISE technology has been rolled out in several European countries and is used for the production of the KiDS survey data. In the multi-disciplinary Target initiative this technology, nicknamed WISE technology, has been further applied to a large number of projects. Here, we highlight the data handling of other astronomical applications, such as VLT-MUSE and LOFAR, together with some non-astronomical applications such as the medical projects Lifelines and GLIMPS, the MONK handwritten text recognition system, and business applications, by amongst others, the Target Holding. We describe some of the most important lessons learned and describe the application of the data-centric WISE type of approach to the Science Ground Segment of the Euclid satellite. △ Less

Submitted 18 December, 2016; originally announced December 2016.

Comments: 9 pages, 5 figures, Proceedngs IAU Symposium No 325 Astroinformatics 2017

arXiv:1608.05277 [pdf, other]

Caveats on Bayesian and hidden-Markov models (v2.8)

Authors: Lambert Schomaker

Abstract: This paper describes a number of fundamental and practical problems in the application of hidden-Markov models and Bayes when applied to cursive-script recognition. Several problems, however, will have an effect in other application areas. The most fundamental problem is the propagation of error in the product of probabilities. This is a common and pervasive problem which deserves more attention.… ▽ More This paper describes a number of fundamental and practical problems in the application of hidden-Markov models and Bayes when applied to cursive-script recognition. Several problems, however, will have an effect in other application areas. The most fundamental problem is the propagation of error in the product of probabilities. This is a common and pervasive problem which deserves more attention. On the basis of Monte Carlo modeling, tables for the expected relative error are given. It seems that it is distributed according to a continuous Poisson distribution over log probabilities. A second essential problem is related to the appropriateness of the Markov assumption. Basic tests will reveal whether a problem requires modeling of the stochastics of seriality, at all. Examples are given of lexical encodings which cover 95-99% classification accuracy of a lexicon, with removed sequence information, for several European languages. Finally, a summary of results on a non- Bayes, non-Markov method in handwriting recognition are presented, with very acceptable results and minimal modeling or training requirements using nearest-mean classification. △ Less

Submitted 3 January, 2017; v1 submitted 18 August, 2016; originally announced August 2016.

Comments: Difference of v2.8 with v2.7: a) Final empirical (simulation) table for word-trie experiment with epsilon-in-the-probabilities; b) Some small text changes; c) used ispell

ACM Class: I.2.6

Showing 1–35 of 35 results for author: Schomaker, L