-
Unfolder: Fast localization and image rectification of a document with a crease from folding in half
Authors:
A. M. Ershov,
D. V. Tropin,
E. E. Limonova,
D. P. Nikolaev,
V. V. Arlazarov
Abstract:
Presentation of folded documents is not an uncommon case in modern society. Digitizing such documents by capturing them with a smartphone camera can be tricky since a crease can divide the document contents into separate planes. To unfold the document, one could hold the edges potentially obscuring it in a captured image. While there are many geometrical rectification methods, they were usually de…
▽ More
Presentation of folded documents is not an uncommon case in modern society. Digitizing such documents by capturing them with a smartphone camera can be tricky since a crease can divide the document contents into separate planes. To unfold the document, one could hold the edges potentially obscuring it in a captured image. While there are many geometrical rectification methods, they were usually developed for arbitrary bends and folds. We consider such algorithms and propose a novel approach Unfolder developed specifically for images of documents with a crease from folding in half. Unfolder is robust to projective distortions of the document image and does not fragment the image in the vicinity of a crease after rectification. A new Folded Document Images dataset was created to investigate the rectification accuracy of folded (2, 3, 4, and 8 folds) documents. The dataset includes 1600 images captured when document placed on a table and when held in hand. The Unfolder algorithm allowed for a recognition error rate of 0.33, which is better than the advanced neural network methods DocTr (0.44) and DewarpNet (0.57). The average runtime for Unfolder was only 0.25 s/image on an iPhone XR.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Advanced Hough-based method for on-device document localization
Authors:
D. V. Tropin,
A. M. Ershov,
D. P. Nikolaev,
V. V. Arlazarov
Abstract:
The demand for on-device document recognition systems increases in conjunction with the emergence of more strict privacy and security requirements. In such systems, there is no data transfer from the end device to a third-party information processing servers. The response time is vital to the user experience of on-device document recognition. Combined with the unavailability of discrete GPUs, powe…
▽ More
The demand for on-device document recognition systems increases in conjunction with the emergence of more strict privacy and security requirements. In such systems, there is no data transfer from the end device to a third-party information processing servers. The response time is vital to the user experience of on-device document recognition. Combined with the unavailability of discrete GPUs, powerful CPUs, or a large RAM capacity on consumer-grade end devices such as smartphones, the time limitations put significant constraints on the computational complexity of the applied algorithms for on-device execution.
In this work, we consider document location in an image without prior knowledge of the document content or its internal structure. In accordance with the published works, at least 5 systems offer solutions for on-device document location. All these systems use a location method which can be considered Hough-based. The precision of such systems seems to be lower than that of the state-of-the-art solutions which were not designed to account for the limited computational resources.
We propose an advanced Hough-based method. In contrast with other approaches, it accounts for the geometric invariants of the central projection model and combines both edge and color features for document boundary detection. The proposed method allowed for the second best result for SmartDoc dataset in terms of precision, surpassed by U-net like neural network. When evaluated on a more challenging MIDV-500 dataset, the proposed algorithm guaranteed the best precision compared to published methods. Our method retained the applicability to on-device computations.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Approach for Document Detection by Contours and Contrasts
Authors:
Daniil V. Tropin,
Sergey A. Ilyuhin,
Dmitry P. Nikolaev,
Vladimir V. Arlazarov
Abstract:
This paper considers arbitrary document detection performed on a mobile device. The classical contour-based approach often fails in cases featuring occlusion, complex background, or blur. The region-based approach, which relies on the contrast between object and background, does not have application limitations, however, its known implementations are highly resource-consuming. We propose a modific…
▽ More
This paper considers arbitrary document detection performed on a mobile device. The classical contour-based approach often fails in cases featuring occlusion, complex background, or blur. The region-based approach, which relies on the contrast between object and background, does not have application limitations, however, its known implementations are highly resource-consuming. We propose a modification of the contour-based method, in which the competing contour location hypotheses are ranked according to the contrast between the areas inside and outside the border. In the experiments, such modification allows for the decrease of alternatives ordering errors by 40% and the decrease of the overall detection errors by 10%. The proposed method provides unmatched state-of-the-art performance on the open MIDV-500 dataset, and it demonstrates results comparable with state-of-the-art performance on the SmartDoc dataset.
△ Less
Submitted 19 October, 2020; v1 submitted 6 August, 2020;
originally announced August 2020.
-
Dynamic Programming Approach to Template-based OCR
Authors:
M. A. Povolotskiy,
D. V. Tropin
Abstract:
In this paper we propose a dynamic programming solution to the template-based recognition task in OCR case. We formulate a problem of optimal position search for complex objects consisting of parts forming a sequence. We limit the distance between every two adjacent elements with predefined upper and lower thresholds. We choose the sum of penalties for each part in given position as a function to…
▽ More
In this paper we propose a dynamic programming solution to the template-based recognition task in OCR case. We formulate a problem of optimal position search for complex objects consisting of parts forming a sequence. We limit the distance between every two adjacent elements with predefined upper and lower thresholds. We choose the sum of penalties for each part in given position as a function to be minimized. We show that such a choice of restrictions allows a faster algorithm to be used than the one for the general form of deformation penalties. We named this algorithm Dynamic Squeezeboxes Packing (DSP) and applied it to solve the two OCR problems: text fields extraction from an image of document Visual Inspection Zone (VIZ) and license plate segmentation. The quality and the performance of resulting solutions were experimentally proved to meet the requirements of the state-of-the-art industrial recognition systems.
△ Less
Submitted 19 December, 2018;
originally announced December 2018.