Search | arXiv e-print repository

On-device Real-time Hand Gesture Recognition

Authors: George Sung, Kanstantsin Sokal, Esha Uboweja, Valentin Bazarevsky, Jonathan Baccash, Eduard Gabriel Bazavan, Chuo-Ling Chang, Matthias Grundmann

Abstract: We present an on-device real-time hand gesture recognition (HGR) system, which detects a set of predefined static gestures from a single RGB camera. The system consists of two parts: a hand skeleton tracker and a gesture classifier. We use MediaPipe Hands as the basis of the hand skeleton tracker, improve the keypoint accuracy, and add the estimation of 3D keypoints in a world metric space. We cre… ▽ More We present an on-device real-time hand gesture recognition (HGR) system, which detects a set of predefined static gestures from a single RGB camera. The system consists of two parts: a hand skeleton tracker and a gesture classifier. We use MediaPipe Hands as the basis of the hand skeleton tracker, improve the keypoint accuracy, and add the estimation of 3D keypoints in a world metric space. We create two different gesture classifiers, one based on heuristics and the other using neural networks (NN). △ Less

Submitted 29 October, 2021; originally announced November 2021.

Comments: 5 pages, 6 figures; ICCV Workshop on Computer Vision for Augmented and Virtual Reality, Montreal, Canada, 2021

arXiv:1909.03205 [pdf, other]

Non-discriminative data or weak model? On the relative importance of data and model resolution

Authors: Mark Sandler, Jonathan Baccash, Andrey Zhmoginov, Andrew Howard

Abstract: We explore the question of how the resolution of the input image ("input resolution") affects the performance of a neural network when compared to the resolution of the hidden layers ("internal resolution"). Adjusting these characteristics is frequently used as a hyperparameter providing a trade-off between model performance and accuracy. An intuitive interpretation is that the reduced information… ▽ More We explore the question of how the resolution of the input image ("input resolution") affects the performance of a neural network when compared to the resolution of the hidden layers ("internal resolution"). Adjusting these characteristics is frequently used as a hyperparameter providing a trade-off between model performance and accuracy. An intuitive interpretation is that the reduced information content in the low-resolution input causes decay in the accuracy. In this paper, we show that up to a point, the input resolution alone plays little role in the network performance, and it is the internal resolution that is the critical driver of model quality. We then build on these insights to develop novel neural network architectures that we call \emph{Isometric Neural Networks}. These models maintain a fixed internal resolution throughout their entire depth. We demonstrate that they lead to high accuracy models with low activation footprint and parameter count. △ Less

Submitted 17 October, 2019; v1 submitted 7 September, 2019; originally announced September 2019.

Comments: ICCV 2019 Workshop on Real-World Recognition from Low-Quality Images and Videos

arXiv:1904.09150 [pdf, other]

A Scalable Handwritten Text Recognition System

Authors: R. Reeve Ingle, Yasuhisa Fujii, Thomas Deselaers, Jonathan Baccash, Ashok C. Popat

Abstract: Many studies on (Offline) Handwritten Text Recognition (HTR) systems have focused on building state-of-the-art models for line recognition on small corpora. However, adding HTR capability to a large scale multilingual OCR system poses new challenges. This paper addresses three problems in building such systems: data, efficiency, and integration. Firstly, one of the biggest challenges is obtaining… ▽ More Many studies on (Offline) Handwritten Text Recognition (HTR) systems have focused on building state-of-the-art models for line recognition on small corpora. However, adding HTR capability to a large scale multilingual OCR system poses new challenges. This paper addresses three problems in building such systems: data, efficiency, and integration. Firstly, one of the biggest challenges is obtaining sufficient amounts of high quality training data. We address the problem by using online handwriting data collected for a large scale production online handwriting recognition system. We describe our image data generation pipeline and study how online data can be used to build HTR models. We show that the data improve the models significantly under the condition where only a small number of real images is available, which is usually the case for HTR models. It enables us to support a new script at substantially lower cost. Secondly, we propose a line recognition model based on neural networks without recurrent connections. The model achieves a comparable accuracy with LSTM-based models while allowing for better parallelism in training and inference. Finally, we present a simple way to integrate HTR models into an OCR system. These constitute a solution to bring HTR capability into a large scale OCR system. △ Less

Submitted 14 June, 2019; v1 submitted 19 April, 2019; originally announced April 2019.

Comments: ICDAR 2019

arXiv:1708.04671 [pdf, other]

Sequence-to-Label Script Identification for Multilingual OCR

Authors: Yasuhisa Fujii, Karel Driesen, Jonathan Baccash, Ash Hurst, Ashok C. Popat

Abstract: We describe a novel line-level script identification method. Previous work repurposed an OCR model generating per-character script codes, counted to obtain line-level script identification. This has two shortcomings. First, as a sequence-to-sequence model it is more complex than necessary for the sequence-to-label problem of line script identification. This makes it harder to train and inefficient… ▽ More We describe a novel line-level script identification method. Previous work repurposed an OCR model generating per-character script codes, counted to obtain line-level script identification. This has two shortcomings. First, as a sequence-to-sequence model it is more complex than necessary for the sequence-to-label problem of line script identification. This makes it harder to train and inefficient to run. Second, the counting heuristic may be suboptimal compared to a learned model. Therefore we reframe line script identification as a sequence-to-label problem and solve it using two components, trained end-toend: Encoder and Summarizer. The encoder converts a line image into a feature sequence. The summarizer aggregates the sequence to classify the line. We test various summarizers with identical inception-style convolutional networks as encoders. Experiments on scanned books and photos containing 232 languages in 30 scripts show 16% reduction of script identification error rate compared to the baseline. This improved script identification reduces the character error rate attributable to script misidentification by 33%. △ Less

Submitted 17 August, 2017; v1 submitted 15 August, 2017; originally announced August 2017.

Comments: ICDAR2017, The 14th IAPR International Conference on Document Analysis and Recognition, Kyoto, Japan

MSC Class: 68T45 ACM Class: I.7.5

Showing 1–4 of 4 results for author: Baccash, J