Search | arXiv e-print repository

Unfolder: Fast localization and image rectification of a document with a crease from folding in half

Authors: A. M. Ershov, D. V. Tropin, E. E. Limonova, D. P. Nikolaev, V. V. Arlazarov

Abstract: Presentation of folded documents is not an uncommon case in modern society. Digitizing such documents by capturing them with a smartphone camera can be tricky since a crease can divide the document contents into separate planes. To unfold the document, one could hold the edges potentially obscuring it in a captured image. While there are many geometrical rectification methods, they were usually de… ▽ More Presentation of folded documents is not an uncommon case in modern society. Digitizing such documents by capturing them with a smartphone camera can be tricky since a crease can divide the document contents into separate planes. To unfold the document, one could hold the edges potentially obscuring it in a captured image. While there are many geometrical rectification methods, they were usually developed for arbitrary bends and folds. We consider such algorithms and propose a novel approach Unfolder developed specifically for images of documents with a crease from folding in half. Unfolder is robust to projective distortions of the document image and does not fragment the image in the vicinity of a crease after rectification. A new Folded Document Images dataset was created to investigate the rectification accuracy of folded (2, 3, 4, and 8 folds) documents. The dataset includes 1600 images captured when document placed on a table and when held in hand. The Unfolder algorithm allowed for a recognition error rate of 0.33, which is better than the advanced neural network methods DocTr (0.44) and DewarpNet (0.57). The average runtime for Unfolder was only 0.25 s/image on an iPhone XR. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: This is a preprint of the article accepted for publication in the journal "Computer Optics"

arXiv:2106.09987 [pdf]

doi 10.18287/2412-6179-CO-895

Advanced Hough-based method for on-device document localization

Authors: D. V. Tropin, A. M. Ershov, D. P. Nikolaev, V. V. Arlazarov

Abstract: The demand for on-device document recognition systems increases in conjunction with the emergence of more strict privacy and security requirements. In such systems, there is no data transfer from the end device to a third-party information processing servers. The response time is vital to the user experience of on-device document recognition. Combined with the unavailability of discrete GPUs, powe… ▽ More The demand for on-device document recognition systems increases in conjunction with the emergence of more strict privacy and security requirements. In such systems, there is no data transfer from the end device to a third-party information processing servers. The response time is vital to the user experience of on-device document recognition. Combined with the unavailability of discrete GPUs, powerful CPUs, or a large RAM capacity on consumer-grade end devices such as smartphones, the time limitations put significant constraints on the computational complexity of the applied algorithms for on-device execution. In this work, we consider document location in an image without prior knowledge of the document content or its internal structure. In accordance with the published works, at least 5 systems offer solutions for on-device document location. All these systems use a location method which can be considered Hough-based. The precision of such systems seems to be lower than that of the state-of-the-art solutions which were not designed to account for the limited computational resources. We propose an advanced Hough-based method. In contrast with other approaches, it accounts for the geometric invariants of the central projection model and combines both edge and color features for document boundary detection. The proposed method allowed for the second best result for SmartDoc dataset in terms of precision, surpassed by U-net like neural network. When evaluated on a more challenging MIDV-500 dataset, the proposed algorithm guaranteed the best precision compared to published methods. Our method retained the applicability to on-device computations. △ Less

Submitted 18 June, 2021; originally announced June 2021.

Comments: This is a preprint of the article submitted for publication in the journal "Computer Optics"

arXiv:2008.02615 [pdf, ps, other]

doi 10.1109/ICPR48806.2021.9413271

Approach for Document Detection by Contours and Contrasts

Authors: Daniil V. Tropin, Sergey A. Ilyuhin, Dmitry P. Nikolaev, Vladimir V. Arlazarov

Abstract: This paper considers arbitrary document detection performed on a mobile device. The classical contour-based approach often fails in cases featuring occlusion, complex background, or blur. The region-based approach, which relies on the contrast between object and background, does not have application limitations, however, its known implementations are highly resource-consuming. We propose a modific… ▽ More This paper considers arbitrary document detection performed on a mobile device. The classical contour-based approach often fails in cases featuring occlusion, complex background, or blur. The region-based approach, which relies on the contrast between object and background, does not have application limitations, however, its known implementations are highly resource-consuming. We propose a modification of the contour-based method, in which the competing contour location hypotheses are ranked according to the contrast between the areas inside and outside the border. In the experiments, such modification allows for the decrease of alternatives ordering errors by 40% and the decrease of the overall detection errors by 10%. The proposed method provides unmatched state-of-the-art performance on the open MIDV-500 dataset, and it demonstrates results comparable with state-of-the-art performance on the SmartDoc dataset. △ Less

Submitted 19 October, 2020; v1 submitted 6 August, 2020; originally announced August 2020.

Comments: This paper has been accepted to the ICPR 2020 conference in Milan which will be held on the 10-15 January 2021. Therefore this work has not yet been presented

Journal ref: 2020 25th International Conference on Pattern Recognition (ICPR), (2021) 9689-9695

arXiv:1812.07933 [pdf, other]

doi 10.1117/12.2522974

Dynamic Programming Approach to Template-based OCR

Authors: M. A. Povolotskiy, D. V. Tropin

Abstract: In this paper we propose a dynamic programming solution to the template-based recognition task in OCR case. We formulate a problem of optimal position search for complex objects consisting of parts forming a sequence. We limit the distance between every two adjacent elements with predefined upper and lower thresholds. We choose the sum of penalties for each part in given position as a function to… ▽ More In this paper we propose a dynamic programming solution to the template-based recognition task in OCR case. We formulate a problem of optimal position search for complex objects consisting of parts forming a sequence. We limit the distance between every two adjacent elements with predefined upper and lower thresholds. We choose the sum of penalties for each part in given position as a function to be minimized. We show that such a choice of restrictions allows a faster algorithm to be used than the one for the general form of deformation penalties. We named this algorithm Dynamic Squeezeboxes Packing (DSP) and applied it to solve the two OCR problems: text fields extraction from an image of document Visual Inspection Zone (VIZ) and license plate segmentation. The quality and the performance of resulting solutions were experimentally proved to meet the requirements of the state-of-the-art industrial recognition systems. △ Less

Submitted 19 December, 2018; originally announced December 2018.

Comments: 8 pages, 5 figures, 1 table

Showing 1–4 of 4 results for author: Tropin, D V