-
On-device Real-time Custom Hand Gesture Recognition
Authors:
Esha Uboweja,
David Tian,
Qifei Wang,
Yi-Chun Kuo,
Joe Zou,
Lu Wang,
George Sung,
Matthias Grundmann
Abstract:
Most existing hand gesture recognition (HGR) systems are limited to a predefined set of gestures. However, users and developers often want to recognize new, unseen gestures. This is challenging due to the vast diversity of all plausible hand shapes, e.g. it is impossible for developers to include all hand gestures in a predefined list. In this paper, we present a user-friendly framework that lets…
▽ More
Most existing hand gesture recognition (HGR) systems are limited to a predefined set of gestures. However, users and developers often want to recognize new, unseen gestures. This is challenging due to the vast diversity of all plausible hand shapes, e.g. it is impossible for developers to include all hand gestures in a predefined list. In this paper, we present a user-friendly framework that lets users easily customize and deploy their own gesture recognition pipeline. Our framework provides a pre-trained single-hand embedding model that can be fine-tuned for custom gesture recognition. Users can perform gestures in front of a webcam to collect a small amount of images per gesture. We also offer a low-code solution to train and deploy the custom gesture recognition model. This makes it easy for users with limited ML expertise to use our framework. We further provide a no-code web front-end for users without any ML expertise. This makes it even easier to build and test the end-to-end pipeline. The resulting custom HGR is then ready to be run on-device for real-time scenarios. This can be done by calling a simple function in our open-sourced model inference API, MediaPipe Tasks. This entire process only takes a few minutes.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
On-device Real-time Hand Gesture Recognition
Authors:
George Sung,
Kanstantsin Sokal,
Esha Uboweja,
Valentin Bazarevsky,
Jonathan Baccash,
Eduard Gabriel Bazavan,
Chuo-Ling Chang,
Matthias Grundmann
Abstract:
We present an on-device real-time hand gesture recognition (HGR) system, which detects a set of predefined static gestures from a single RGB camera. The system consists of two parts: a hand skeleton tracker and a gesture classifier. We use MediaPipe Hands as the basis of the hand skeleton tracker, improve the keypoint accuracy, and add the estimation of 3D keypoints in a world metric space. We cre…
▽ More
We present an on-device real-time hand gesture recognition (HGR) system, which detects a set of predefined static gestures from a single RGB camera. The system consists of two parts: a hand skeleton tracker and a gesture classifier. We use MediaPipe Hands as the basis of the hand skeleton tracker, improve the keypoint accuracy, and add the estimation of 3D keypoints in a world metric space. We create two different gesture classifiers, one based on heuristics and the other using neural networks (NN).
△ Less
Submitted 29 October, 2021;
originally announced November 2021.
-
MediaPipe: A Framework for Building Perception Pipelines
Authors:
Camillo Lugaresi,
Jiuqiang Tang,
Hadon Nash,
Chris McClanahan,
Esha Uboweja,
Michael Hays,
Fan Zhang,
Chuo-Ling Chang,
Ming Guang Yong,
Juhyun Lee,
Wan-Teh Chang,
Wei Hua,
Manfred Georg,
Matthias Grundmann
Abstract:
Building applications that perceive the world around them is challenging. A developer needs to (a) select and develop corresponding machine learning algorithms and models, (b) build a series of prototypes and demos, (c) balance resource consumption against the quality of the solutions, and finally (d) identify and mitigate problematic cases. The MediaPipe framework addresses all of these challenge…
▽ More
Building applications that perceive the world around them is challenging. A developer needs to (a) select and develop corresponding machine learning algorithms and models, (b) build a series of prototypes and demos, (c) balance resource consumption against the quality of the solutions, and finally (d) identify and mitigate problematic cases. The MediaPipe framework addresses all of these challenges. A developer can use MediaPipe to build prototypes by combining existing perception components, to advance them to polished cross-platform applications and measure system performance and resource consumption on target platforms. We show that these features enable a developer to focus on the algorithm or model development and use MediaPipe as an environment for iteratively improving their application with results reproducible across different devices and platforms. MediaPipe will be open-sourced at https://github.com/google/mediapipe.
△ Less
Submitted 14 June, 2019;
originally announced June 2019.