Skip to main content

Showing 1–8 of 8 results for author: Raveendran, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.05782  [pdf, other

    cs.CV

    Blendshapes GHUM: Real-time Monocular Facial Blendshape Prediction

    Authors: Ivan Grishchenko, Geng Yan, Eduard Gabriel Bazavan, Andrei Zanfir, Nikolai Chinaev, Karthik Raveendran, Matthias Grundmann, Cristian Sminchisescu

    Abstract: We present Blendshapes GHUM, an on-device ML pipeline that predicts 52 facial blendshape coefficients at 30+ FPS on modern mobile phones, from a single monocular RGB image and enables facial motion capture applications like virtual avatars. Our main contributions are: i) an annotation-free offline method for obtaining blendshape coefficients from real-world human scans, ii) a lightweight real-time… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: 4 pages, 3 figures

  2. arXiv:2208.11666  [pdf, other

    cs.CV cs.LG

    Efficient Heterogeneous Video Segmentation at the Edge

    Authors: Jamie Menjay Lin, Siargey Pisarchyk, Juhyun Lee, David Tian, Tingbo Hou, Karthik Raveendran, Raman Sarokin, George Sung, Trent Tolley, Matthias Grundmann

    Abstract: We introduce an efficient video segmentation system for resource-limited edge devices leveraging heterogeneous compute. Specifically, we design network models by searching across multiple dimensions of specifications for the neural architectures and operations on top of already light-weight backbones, targeting commercially available edge inference engines. We further analyze and optimize the hete… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

    Comments: Published as a workshop paper at CVPRW CV4ARVR 2022

  3. arXiv:2206.11678  [pdf, other

    cs.CV

    BlazePose GHUM Holistic: Real-time 3D Human Landmarks and Pose Estimation

    Authors: Ivan Grishchenko, Valentin Bazarevsky, Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Zanfir, Richard Yee, Karthik Raveendran, Matsvei Zhdanovich, Matthias Grundmann, Cristian Sminchisescu

    Abstract: We present BlazePose GHUM Holistic, a lightweight neural network pipeline for 3D human body landmarks and pose estimation, specifically tailored to real-time on-device inference. BlazePose GHUM Holistic enables motion capture from a single RGB image including avatar control, fitness tracking and AR/VR effects. Our main contributions include i) a novel method for 3D ground truth data acquisition, i… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: 4 pages, 4 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, New Orleans, LA, 2022

  4. arXiv:2006.11341  [pdf, other

    cs.CV

    Real-time Pupil Tracking from Monocular Video for Digital Puppetry

    Authors: Artsiom Ablavatski, Andrey Vakunov, Ivan Grishchenko, Karthik Raveendran, Matsvei Zhdanovich

    Abstract: We present a simple, real-time approach for pupil tracking from live video on mobile devices. Our method extends a state-of-the-art face mesh detector with two new components: a tiny neural network that predicts positions of the pupils in 2D, and a displacement-based estimation of the pupil blend shape coefficients. Our technique can be used to accurately control the pupil movements of a virtual p… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

  5. arXiv:2006.10962  [pdf, other

    cs.CV

    Attention Mesh: High-fidelity Face Mesh Prediction in Real-time

    Authors: Ivan Grishchenko, Artsiom Ablavatski, Yury Kartynnik, Karthik Raveendran, Matthias Grundmann

    Abstract: We present Attention Mesh, a lightweight architecture for 3D face mesh prediction that uses attention to semantically meaningful regions. Our neural network is designed for real-time on-device inference and runs at over 50 FPS on a Pixel 2 phone. Our solution enables applications like AR makeup, eye tracking and AR puppeteering that rely on highly accurate landmarks for eye and lips regions. Our m… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

    Comments: 4 pages, 5 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Seattle, WA, USA, 2020

  6. arXiv:2006.10204  [pdf, other

    cs.CV

    BlazePose: On-device Real-time Body Pose tracking

    Authors: Valentin Bazarevsky, Ivan Grishchenko, Karthik Raveendran, Tyler Zhu, Fan Zhang, Matthias Grundmann

    Abstract: We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is tailored for real-time inference on mobile devices. During inference, the network produces 33 body keypoints for a single person and runs at over 30 frames per second on a Pixel 2 phone. This makes it particularly suited to real-time use cases like fitness tracking and sign language reco… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

    Comments: 4 pages, 6 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Seattle, WA, USA, 2020

  7. arXiv:1907.05047  [pdf, other

    cs.CV

    BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs

    Authors: Valentin Bazarevsky, Yury Kartynnik, Andrey Vakunov, Karthik Raveendran, Matthias Grundmann

    Abstract: We present BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices. This super-realtime performance enables it to be applied to any augmented reality pipeline that requires an accurate facial region of interest as an input for task-specific models, such as 2D/3D facial keypoint or geometry estimation, fac… ▽ More

    Submitted 14 July, 2019; v1 submitted 11 July, 2019; originally announced July 2019.

    Comments: 4 pages, 3 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Long Beach, CA, USA, 2019

  8. arXiv:1906.06792  [pdf, other

    cs.CV cs.LG

    Floors are Flat: Leveraging Semantics for Real-Time Surface Normal Prediction

    Authors: Steven Hickson, Karthik Raveendran, Alireza Fathi, Kevin Murphy, Irfan Essa

    Abstract: We propose 4 insights that help to significantly improve the performance of deep learning models that predict surface normals and semantic labels from a single RGB image. These insights are: (1) denoise the "ground truth" surface normals in the training set to ensure consistency with the semantic labels; (2) concurrently train on a mix of real and synthetic data, instead of pretraining on syntheti… ▽ More

    Submitted 16 June, 2019; originally announced June 2019.