-
Best Practices for a Handwritten Text Recognition System
Authors:
George Retsinas,
Giorgos Sfikas,
Basilis Gatos,
Christophoros Nikou
Abstract:
Handwritten text recognition has been developed rapidly in the recent years, following the rise of deep learning and its applications. Though deep learning methods provide notable boost in performance concerning text recognition, non-trivial deviation in performance can be detected even when small pre-processing or architectural/optimization elements are changed. This work follows a ``best practic…
▽ More
Handwritten text recognition has been developed rapidly in the recent years, following the rise of deep learning and its applications. Though deep learning methods provide notable boost in performance concerning text recognition, non-trivial deviation in performance can be detected even when small pre-processing or architectural/optimization elements are changed. This work follows a ``best practice'' rationale; highlight simple yet effective empirical practices that can further help training and provide well-performing handwritten text recognition systems. Specifically, we considered three basic aspects of a deep HTR system and we proposed simple yet effective solutions: 1) retain the aspect ratio of the images in the preprocessing step, 2) use max-pooling for converting the 3D feature map of CNN output into a sequence of features and 3) assist the training procedure via an additional CTC loss which acts as a shortcut on the max-pooled sequential features. Using these proposed simple modifications, one can attain close to state-of-the-art results, while considering a basic convolutional-recurrent (CNN+LSTM) architecture, for both IAM and RIMES datasets. Code is available at https://github.com/georgeretsi/HTR-best-practices/.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Keyword Spotting Simplified: A Segmentation-Free Approach using Character Counting and CTC re-scoring
Authors:
George Retsinas,
Giorgos Sfikas,
Christophoros Nikou
Abstract:
Recent advances in segmentation-free keyword spotting treat this problem w.r.t. an object detection paradigm and borrow from state-of-the-art detection systems to simultaneously propose a word bounding box proposal mechanism and compute a corresponding representation. Contrary to the norm of such methods that rely on complex and large DNN models, we propose a novel segmentation-free system that ef…
▽ More
Recent advances in segmentation-free keyword spotting treat this problem w.r.t. an object detection paradigm and borrow from state-of-the-art detection systems to simultaneously propose a word bounding box proposal mechanism and compute a corresponding representation. Contrary to the norm of such methods that rely on complex and large DNN models, we propose a novel segmentation-free system that efficiently scans a document image to find rectangular areas that include the query information. The underlying model is simple and compact, predicting character occurrences over rectangular areas through an implicitly learned scale map, trained on word-level annotated images. The proposed document scanning is then performed using this character counting in a cost-effective manner via integral images and binary search. Finally, the retrieval similarity by character counting is refined by a pyramidal representation and a CTC-based re-scoring algorithm, fully utilizing the trained CNN model. Experimental validation on two widely-used datasets shows that our method achieves state-of-the-art results outperforming the more complex alternatives, despite the simplicity of the underlying model.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Automatic Video Colorization using 3D Conditional Generative Adversarial Networks
Authors:
Panagiotis Kouzouglidis,
Giorgos Sfikas,
Christophoros Nikou
Abstract:
In this work, we present a method for automatic colorization of grayscale videos. The core of the method is a Generative Adversarial Network that is trained and tested on sequences of frames in a sliding window manner. Network convolutional and deconvolutional layers are three-dimensional, with frame height, width and time as the dimensions taken into account. Multiple chrominance estimates per fr…
▽ More
In this work, we present a method for automatic colorization of grayscale videos. The core of the method is a Generative Adversarial Network that is trained and tested on sequences of frames in a sliding window manner. Network convolutional and deconvolutional layers are three-dimensional, with frame height, width and time as the dimensions taken into account. Multiple chrominance estimates per frame are aggregated and combined with available luminance information to recreate a colored sequence. Colorization trials are run succesfully on a dataset of old black-and-white films. The usefulness of our method is also validated with numerical results, computed with a newly proposed metric that measures colorization consistency over a frame sequence.
△ Less
Submitted 8 May, 2019;
originally announced May 2019.
-
Curriculum Learning of Visual Attribute Clusters for Multi-Task Classification
Authors:
Nikolaos Sarafianos,
Theodore Giannakopoulos,
Christophoros Nikou,
Ioannis A. Kakadiaris
Abstract:
Visual attributes, from simple objects (e.g., backpacks, hats) to soft-biometrics (e.g., gender, height, clothing) have proven to be a powerful representational approach for many applications such as image description and human identification. In this paper, we introduce a novel method to combine the advantages of both multi-task and curriculum learning in a visual attribute classification framewo…
▽ More
Visual attributes, from simple objects (e.g., backpacks, hats) to soft-biometrics (e.g., gender, height, clothing) have proven to be a powerful representational approach for many applications such as image description and human identification. In this paper, we introduce a novel method to combine the advantages of both multi-task and curriculum learning in a visual attribute classification framework. Individual tasks are grouped after performing hierarchical clustering based on their correlation. The clusters of tasks are learned in a curriculum learning setup by transferring knowledge between clusters. The learning process within each cluster is performed in a multi-task classification setup. By leveraging the acquired knowledge, we speed-up the process and improve performance. We demonstrate the effectiveness of our method via ablation studies and a detailed analysis of the covariates, on a variety of publicly available datasets of humans standing with their full-body visible. Extensive experimentation has proven that the proposed approach boosts the performance by 4% to 10%.
△ Less
Submitted 9 July, 2018; v1 submitted 19 September, 2017;
originally announced September 2017.
-
Human Activity Recognition Using Robust Adaptive Privileged Probabilistic Learning
Authors:
Michalis Vrigkas,
Evangelos Kazakos,
Christophoros Nikou,
Ioannis A. Kakadiaris
Abstract:
In this work, a novel method based on the learning using privileged information (LUPI) paradigm for recognizing complex human activities is proposed that handles missing information during testing. We present a supervised probabilistic approach that integrates LUPI into a hidden conditional random field (HCRF) model. The proposed model is called HCRF+ and may be trained using both maximum likeliho…
▽ More
In this work, a novel method based on the learning using privileged information (LUPI) paradigm for recognizing complex human activities is proposed that handles missing information during testing. We present a supervised probabilistic approach that integrates LUPI into a hidden conditional random field (HCRF) model. The proposed model is called HCRF+ and may be trained using both maximum likelihood and maximum margin approaches. It employs a self-training technique for automatic estimation of the regularization parameters of the objective functions. Moreover, the method provides robustness to outliers (such as noise or missing data) by modeling the conditional distribution of the privileged information by a Student's \textit{t}-density function, which is naturally integrated into the HCRF+ framework. Different forms of privileged information were investigated. The proposed method was evaluated using four challenging publicly available datasets and the experimental results demonstrate its effectiveness with respect to the-state-of-the-art in the LUPI framework using both hand-crafted features and features extracted from a convolutional neural network.
△ Less
Submitted 19 September, 2017;
originally announced September 2017.
-
Inferring Human Activities Using Robust Privileged Probabilistic Learning
Authors:
Michalis Vrigkas,
Evangelos Kazakos,
Christophoros Nikou,
Ioannis A. Kakadiaris
Abstract:
Classification models may often suffer from "structure imbalance" between training and testing data that may occur due to the deficient data collection process. This imbalance can be represented by the learning using privileged information (LUPI) paradigm. In this paper, we present a supervised probabilistic classification approach that integrates LUPI into a hidden conditional random field (HCRF)…
▽ More
Classification models may often suffer from "structure imbalance" between training and testing data that may occur due to the deficient data collection process. This imbalance can be represented by the learning using privileged information (LUPI) paradigm. In this paper, we present a supervised probabilistic classification approach that integrates LUPI into a hidden conditional random field (HCRF) model. The proposed model is called LUPI-HCRF and is able to cope with additional information that is only available during training. Moreover, the proposed method employes Student's t-distribution to provide robustness to outliers by modeling the conditional distribution of the privileged information. Experimental results in three publicly available datasets demonstrate the effectiveness of the proposed approach and improve the state-of-the-art in the LUPI framework for recognizing human activities.
△ Less
Submitted 31 August, 2017;
originally announced August 2017.
-
Curriculum Learning for Multi-Task Classification of Visual Attributes
Authors:
Nikolaos Sarafianos,
Theodore Giannakopoulos,
Christophoros Nikou,
Ioannis A. Kakadiaris
Abstract:
Visual attributes, from simple objects (e.g., backpacks, hats) to soft-biometrics (e.g., gender, height, clothing) have proven to be a powerful representational approach for many applications such as image description and human identification. In this paper, we introduce a novel method to combine the advantages of both multi-task and curriculum learning in a visual attribute classification framewo…
▽ More
Visual attributes, from simple objects (e.g., backpacks, hats) to soft-biometrics (e.g., gender, height, clothing) have proven to be a powerful representational approach for many applications such as image description and human identification. In this paper, we introduce a novel method to combine the advantages of both multi-task and curriculum learning in a visual attribute classification framework. Individual tasks are grouped based on their correlation so that two groups of strongly and weakly correlated tasks are formed. The two groups of tasks are learned in a curriculum learning setup by transferring the acquired knowledge from the strongly to the weakly correlated. The learning process within each group though, is performed in a multi-task classification setup. The proposed method learns better and converges faster than learning all the tasks in a typical multi-task learning paradigm. We demonstrate the effectiveness of our approach on the publicly available, SoBiR, VIPeR and PETA datasets and report state-of-the-art results across the board.
△ Less
Submitted 29 August, 2017;
originally announced August 2017.
-
Predicting Privileged Information for Height Estimation
Authors:
Nikolaos Sarafianos,
Christophoros Nikou,
Ioannis A. Kakadiaris
Abstract:
In this paper, we propose a novel regression-based method for employing privileged information to estimate the height using human metrology. The actual values of the anthropometric measurements are difficult to estimate accurately using state-of-the-art computer vision algorithms. Hence, we use ratios of anthropometric measurements as features. Since many anthropometric measurements are not availa…
▽ More
In this paper, we propose a novel regression-based method for employing privileged information to estimate the height using human metrology. The actual values of the anthropometric measurements are difficult to estimate accurately using state-of-the-art computer vision algorithms. Hence, we use ratios of anthropometric measurements as features. Since many anthropometric measurements are not available at test time in real-life scenarios, we employ a learning using privileged information (LUPI) framework in a regression setup. Instead of using the LUPI paradigm for regression in its original form (i.e., ε-SVR+), we train regression models that predict the privileged information at test time. The predictions are then used, along with observable features, to perform height estimation. Once the height is estimated, a map** to classes is performed. We demonstrate that the proposed approach can estimate the height better and faster than the ε-SVR+ algorithm and report results for different genders and quartiles of humans.
△ Less
Submitted 9 February, 2017;
originally announced February 2017.