-
BlazePose GHUM Holistic: Real-time 3D Human Landmarks and Pose Estimation
Abstract: We present BlazePose GHUM Holistic, a lightweight neural network pipeline for 3D human body landmarks and pose estimation, specifically tailored to real-time on-device inference. BlazePose GHUM Holistic enables motion capture from a single RGB image including avatar control, fitness tracking and AR/VR effects. Our main contributions include i) a novel method for 3D ground truth data acquisition, i… ▽ More
Submitted 23 June, 2022; originally announced June 2022.
Comments: 4 pages, 4 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, New Orleans, LA, 2022
-
On-device Real-time Hand Gesture Recognition
Abstract: We present an on-device real-time hand gesture recognition (HGR) system, which detects a set of predefined static gestures from a single RGB camera. The system consists of two parts: a hand skeleton tracker and a gesture classifier. We use MediaPipe Hands as the basis of the hand skeleton tracker, improve the keypoint accuracy, and add the estimation of 3D keypoints in a world metric space. We cre… ▽ More
Submitted 29 October, 2021; originally announced November 2021.
Comments: 5 pages, 6 figures; ICCV Workshop on Computer Vision for Augmented and Virtual Reality, Montreal, Canada, 2021
-
MediaPipe Hands: On-device Real-time Hand Tracking
Abstract: We present a real-time on-device hand tracking pipeline that predicts hand skeleton from single RGB camera for AR/VR applications. The pipeline consists of two models: 1) a palm detector, 2) a hand landmark model. It's implemented via MediaPipe, a framework for building cross-platform ML solutions. The proposed model and pipeline architecture demonstrates real-time inference speed on mobile GPUs a… ▽ More
Submitted 17 June, 2020; originally announced June 2020.
Comments: 5 pages, 7 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Seattle, WA, USA, 2020
-
BlazePose: On-device Real-time Body Pose tracking
Abstract: We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is tailored for real-time inference on mobile devices. During inference, the network produces 33 body keypoints for a single person and runs at over 30 frames per second on a Pixel 2 phone. This makes it particularly suited to real-time use cases like fitness tracking and sign language reco… ▽ More
Submitted 17 June, 2020; originally announced June 2020.
Comments: 4 pages, 6 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Seattle, WA, USA, 2020
-
Real-time Hair Segmentation and Recoloring on Mobile GPUs
Abstract: We present a novel approach for neural network-based hair segmentation from a single camera input specifically designed for real-time, mobile application. Our relatively small neural network produces a high-quality hair segmentation mask that is well suited for AR effects, e.g. virtual hair recoloring. The proposed model achieves real-time inference speed on mobile GPUs (30-100+ FPS, depending on… ▽ More
Submitted 15 July, 2019; originally announced July 2019.
Comments: 4 pages, 5 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Long Beach, CA, USA, 2019
-
BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs
Abstract: We present BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices. This super-realtime performance enables it to be applied to any augmented reality pipeline that requires an accurate facial region of interest as an input for task-specific models, such as 2D/3D facial keypoint or geometry estimation, fac… ▽ More
Submitted 14 July, 2019; v1 submitted 11 July, 2019; originally announced July 2019.
Comments: 4 pages, 3 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Long Beach, CA, USA, 2019