Skip to main content

Showing 1–15 of 15 results for author: Hradiš, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.00420  [pdf, other

    cs.CV cs.AI cs.LG

    Self-supervised Pre-training of Text Recognizers

    Authors: Martin Kišš, Michal Hradiš

    Abstract: In this paper, we investigate self-supervised pre-training methods for document text recognition. Nowadays, large unlabeled datasets can be collected for many research tasks, including text recognition, but it is costly to annotate them. Therefore, methods utilizing unlabeled data are researched. We study self-supervised pre-training methods based on masked label prediction using three different a… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 18 pages, 6 figures, 4 tables, accepted to ICDAR24

  2. arXiv:2302.06318  [pdf, other

    cs.CV

    Towards Writing Style Adaptation in Handwriting Recognition

    Authors: Jan Kohút, Michal Hradiš, Martin Kišš

    Abstract: One of the challenges of handwriting recognition is to transcribe a large number of vastly different writing styles. State-of-the-art approaches do not explicitly use information about the writer's style, which may be limiting overall accuracy due to various ambiguities. We explore models with writer-dependent parameters which take the writer's identity as an additional input. The proposed models… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Comments: Submitted to ICDAR2023 conference

  3. arXiv:2302.06308  [pdf, other

    cs.CV

    Finetuning Is a Surprisingly Effective Domain Adaptation Baseline in Handwriting Recognition

    Authors: Jan Kohút, Michal Hradiš

    Abstract: In many machine learning tasks, a large general dataset and a small specialized dataset are available. In such situations, various domain adaptation methods can be used to adapt a general model to the target dataset. We show that in the case of neural networks trained for handwriting recognition using CTC, simple finetuning with data augmentation works surprisingly well in such scenarios and that… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Comments: Submitted to ICDAR2023 conference

  4. arXiv:2212.02135  [pdf, other

    cs.LG cs.CV

    SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels

    Authors: Martin Kišš, Michal Hradiš, Karel Beneš, Petr Buchal, Michal Kula

    Abstract: This paper explores semi-supervised training for sequence tasks, such as Optical Character Recognition or Automatic Speech Recognition. We propose a novel loss function $\unicode{x2013}$ SoftCTC $\unicode{x2013}$ which is an extension of CTC allowing to consider multiple transcription variants at the same time. This allows to omit the confidence based filtering step which is otherwise a crucial co… ▽ More

    Submitted 19 September, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

    Comments: 21 pages, 8 figures, 6 tables, accepted to International Journal on Document Analysis and Recognition (IJDAR)

    MSC Class: 68T07; 68T10

  5. arXiv:2201.09575  [pdf, other

    cs.CV

    Importance of Textlines in Historical Document Classification

    Authors: Martin Kišš, Jan Kohút, Karel Beneš, Michal Hradiš

    Abstract: This paper describes a system prepared at Brno University of Technology for ICDAR 2021 Competition on Historical Document Classification, experiments leading to its design, and the main findings. The solved tasks include script and font classification, document origin localization, and dating. We combined patch-level and line-level approaches, where the line-level system utilizes an existing, publ… ▽ More

    Submitted 30 March, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

    Comments: 13 pages, 7 figures, 5 tables

    MSC Class: 68T07; 68T10

  6. AT-ST: Self-Training Adaptation Strategy for OCR in Domains with Limited Transcriptions

    Authors: Martin Kišš, Karel Beneš, Michal Hradiš

    Abstract: This paper addresses text recognition for domains with limited manual annotations by a simple self-training strategy. Our approach should reduce human annotation effort when target domain data is plentiful, such as when transcribing a collection of single person's correspondence or a large manuscript. We propose to train a seed system on large scale data from related domains mixed with available a… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

    Comments: 15 pages, 6 figures, 5 tables

  7. TS-Net: OCR Trained to Switch Between Text Transcription Styles

    Authors: Jan Kohút, Michal Hradiš

    Abstract: Users of OCR systems, from different institutions and scientific disciplines, prefer and produce different transcription styles. This presents a problem for training of consistent text recognition neural networks on real-world data. We propose to extend existing text recognition networks with a Transcription Style Block (TSB) which can learn from data to switch between multiple transcription style… ▽ More

    Submitted 13 February, 2023; v1 submitted 9 March, 2021; originally announced March 2021.

    Journal ref: ICDAR 2021: Proceedings, Part IV 16 (pp. 478-493)

  8. arXiv:2102.11838  [pdf, other

    cs.CV

    Page Layout Analysis System for Unconstrained Historic Documents

    Authors: Oldřich Kodym, Michal Hradiš

    Abstract: Extraction of text regions and individual text lines from historic documents is necessary for automatic transcription. We propose extending a CNN-based text baseline detection system by adding line height and text block boundary predictions to the model output, allowing the system to extract more comprehensive layout information. We also show that pixel-wise text orientation prediction can be used… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: Submitted to ICDAR2021 conference

  9. arXiv:1907.01307  [pdf, other

    cs.CV

    Brno Mobile OCR Dataset

    Authors: Martin Kišš, Michal Hradiš, Oldřich Kodym

    Abstract: We introduce the Brno Mobile OCR Dataset (B-MOD) for document Optical Character Recognition from low-quality images captured by handheld mobile devices. While OCR of high-quality scanned documents is a mature field where many commercial tools are available, and large datasets of text in the wild exist, no existing datasets can be used to develop and test document OCR methods robust to non-uniform… ▽ More

    Submitted 2 July, 2019; originally announced July 2019.

  10. arXiv:1712.06352  [pdf, other

    cs.RO

    CNN for IMU Assisted Odometry Estimation using Velodyne LiDAR

    Authors: Martin Velas, Michal Spanel, Michal Hradis, Adam Herout

    Abstract: We introduce a novel method for odometry estimation using convolutional neural networks from 3D LiDAR scans. The original sparse data are encoded into 2D matrices for the training of proposed networks and for the prediction. Our networks show significantly better precision in the estimation of translational motion parameters comparing with state of the art method LOAM, while achieving real-time pe… ▽ More

    Submitted 18 December, 2017; originally announced December 2017.

  11. arXiv:1709.02128  [pdf, other

    cs.RO

    CNN for Very Fast Ground Segmentation in Velodyne LiDAR Data

    Authors: Martin Velas, Michal Spanel, Michal Hradis, Adam Herout

    Abstract: This paper presents a novel method for ground segmentation in Velodyne point clouds. We propose an encoding of sparse 3D data from the Velodyne sensor suitable for training a convolutional neural network (CNN). This general purpose approach is used for segmentation of the sparse point cloud into ground and non-ground points. The LiDAR data are represented as a multi-channel 2D signal where the hor… ▽ More

    Submitted 7 September, 2017; originally announced September 2017.

    Comments: ICRA 2018 submission

  12. Camera Elevation Estimation from a Single Mountain Landscape Photograph

    Authors: Martin Cadik, Jan Vasicek, Michal Hradis, Filip Radenovic, Ondrej Chum

    Abstract: This work addresses the problem of camera elevation estimation from a single photograph in an outdoor environment. We introduce a new benchmark dataset of one-hundred thousand images with annotated camera elevation called Alps100K. We propose and experimentally evaluate two automatic data-driven approaches to camera elevation estimation: one based on convolutional neural networks, the other on loc… ▽ More

    Submitted 12 July, 2016; originally announced July 2016.

    Journal ref: In Xianghua Xie, Mark W. Jones, and Gary K. L. Tam, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 30.1-30.12. BMVA Press, September 2015

  13. arXiv:1605.00366  [pdf, other

    cs.CV

    Compression Artifacts Removal Using Convolutional Neural Networks

    Authors: Pavel Svoboda, Michal Hradis, David Barina, Pavel Zemcik

    Abstract: This paper shows that it is possible to train large and deep convolutional neural networks (CNN) for JPEG compression artifacts reduction, and that such networks can provide significantly better reconstruction quality compared to previously used smaller networks as well as to any other state-of-the-art methods. We were able to train networks with 8 layers in a single step and in relatively short t… ▽ More

    Submitted 2 May, 2016; originally announced May 2016.

    Comments: To be published in WSCG 2016

  14. arXiv:1602.07873  [pdf, other

    cs.CV

    CNN for License Plate Motion Deblurring

    Authors: Pavel Svoboda, Michal Hradis, Lukas Marsik, Pavel Zemcik

    Abstract: In this work we explore the previously proposed approach of direct blind deconvolution and denoising with convolutional neural networks in a situation where the blur kernels are partially constrained. We focus on blurred images from a real-life traffic surveillance system, on which we, for the first time, demonstrate that neural networks trained on artificial data provide superior reconstruction q… ▽ More

    Submitted 25 February, 2016; originally announced February 2016.

  15. arXiv:1506.03995  [pdf, other

    cs.CV

    Technical Report: Image Captioning with Semantically Similar Images

    Authors: Martin Kolář, Michal Hradiš, Pavel Zemčík

    Abstract: This report presents our submission to the MS COCO Captioning Challenge 2015. The method uses Convolutional Neural Network activations as an embedding to find semantically similar images. From these images, the most typical caption is selected based on unigram frequencies. Although the method received low scores with automated evaluation metrics and in human assessed average correctness, it is com… ▽ More

    Submitted 12 June, 2015; originally announced June 2015.

    Comments: 3 pages