Search | arXiv e-print repository

HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior

Authors: David Svitov, Pietro Morerio, Lourdes Agapito, Alessio Del Bue

Abstract: We present HAHA - a novel approach for animatable human avatar generation from monocular input videos. The proposed method relies on learning the trade-off between the use of Gaussian splatting and a textured mesh for efficient and high fidelity rendering. We demonstrate its efficiency to animate and render full-body human avatars controlled via the SMPL-X parametric model. Our model learns to app… ▽ More We present HAHA - a novel approach for animatable human avatar generation from monocular input videos. The proposed method relies on learning the trade-off between the use of Gaussian splatting and a textured mesh for efficient and high fidelity rendering. We demonstrate its efficiency to animate and render full-body human avatars controlled via the SMPL-X parametric model. Our model learns to apply Gaussian splatting only in areas of the SMPL-X mesh where it is necessary, like hair and out-of-mesh clothing. This results in a minimal number of Gaussians being used to represent the full avatar, and reduced rendering artifacts. This allows us to handle the animation of small body parts such as fingers that are traditionally disregarded. We demonstrate the effectiveness of our approach on two open datasets: SnapshotPeople and X-Humans. Our method demonstrates on par reconstruction quality to the state-of-the-art on SnapshotPeople, while using less than a third of Gaussians. HAHA outperforms previous state-of-the-art on novel poses from X-Humans both quantitatively and qualitatively. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2303.10275 [pdf, other]

MoRF: Mobile Realistic Fullbody Avatars from a Monocular Video

Authors: Renat Bashirov, Alexey Larionov, Evgeniya Ustinova, Mikhail Sidorenko, David Svitov, Ilya Zakharkin, Victor Lempitsky

Abstract: We present a system to create Mobile Realistic Fullbody (MoRF) avatars. MoRF avatars are rendered in real-time on mobile devices, learned from monocular videos, and have high realism. We use SMPL-X as a proxy geometry and render it with DNR (neural texture and image-2-image network). We improve on prior work, by overfitting per-frame war** fields in the neural texture space, allowing to better a… ▽ More We present a system to create Mobile Realistic Fullbody (MoRF) avatars. MoRF avatars are rendered in real-time on mobile devices, learned from monocular videos, and have high realism. We use SMPL-X as a proxy geometry and render it with DNR (neural texture and image-2-image network). We improve on prior work, by overfitting per-frame war** fields in the neural texture space, allowing to better align the training signal between different frames. We also refine SMPL-X mesh fitting procedure to improve the overall avatar quality. In the comparisons to other monocular video-based avatar systems, MoRF avatars achieve higher image sharpness and temporal consistency. Participants of our user study also preferred avatars generated by MoRF. △ Less

Submitted 11 December, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

arXiv:2303.09375 [pdf, other]

DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars

Authors: David Svitov, Dmitrii Gudkov, Renat Bashirov, Victor Lempitsky

Abstract: We present DINAR, an approach for creating realistic rigged fullbody avatars from single RGB images. Similarly to previous works, our method uses neural textures combined with the SMPL-X body model to achieve photo-realistic quality of avatars while kee** them easy to animate and fast to infer. To restore the texture, we use a latent diffusion model and show how such model can be trained in the… ▽ More We present DINAR, an approach for creating realistic rigged fullbody avatars from single RGB images. Similarly to previous works, our method uses neural textures combined with the SMPL-X body model to achieve photo-realistic quality of avatars while kee** them easy to animate and fast to infer. To restore the texture, we use a latent diffusion model and show how such model can be trained in the neural texture space. The use of the diffusion model allows us to realistically reconstruct large unseen regions such as the back of a person given the frontal view. The models in our pipeline are trained using 2D images and videos only. In the experiments, our approach achieves state-of-the-art rendering quality and good generalization to new poses and viewpoints. In particular, the approach improves state-of-the-art on the SnapshotPeople public benchmark. △ Less

Submitted 10 December, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

arXiv:2011.07513 [pdf, other]

AmphibianDetector: adaptive computation for moving objects detection

Authors: David Svitov, Sergey Alyamkin

Abstract: Convolutional neural networks (CNN) allow achieving the highest accuracy for the task of object detection in images. Major challenges in further development of object detectors are false-positive detections and high demand of processing power. In this paper, we propose an approach to object detection which makes it possible to reduce the number of false-positive detections by processing only movin… ▽ More Convolutional neural networks (CNN) allow achieving the highest accuracy for the task of object detection in images. Major challenges in further development of object detectors are false-positive detections and high demand of processing power. In this paper, we propose an approach to object detection which makes it possible to reduce the number of false-positive detections by processing only moving objects and reduce the required processing power for algorithm inference. The proposed approach is a modification of CNN already trained for object detection task. This method can be used to improve the accuracy of an existing system by applying minor changes to the algorithm. The efficiency of the proposed approach was demonstrated on the open dataset "CDNet2014 pedestrian". The implementation of the method proposed in the article is available on the GitHub: https://github.com/david-svitov/AmphibianDetector △ Less

Submitted 25 December, 2020; v1 submitted 15 November, 2020; originally announced November 2020.

Comments: 12 pages, 6 figures, 1 table

arXiv:2003.02586 [pdf, other]

MarginDistillation: distillation for margin-based softmax

Authors: David Svitov, Sergey Alyamkin

Abstract: The usage of convolutional neural networks (CNNs) in conjunction with a margin-based softmax approach demonstrates a state-of-the-art performance for the face recognition problem. Recently, lightweight neural network models trained with the margin-based softmax have been introduced for the face identification task for edge devices. In this paper, we propose a novel distillation method for lightwei… ▽ More The usage of convolutional neural networks (CNNs) in conjunction with a margin-based softmax approach demonstrates a state-of-the-art performance for the face recognition problem. Recently, lightweight neural network models trained with the margin-based softmax have been introduced for the face identification task for edge devices. In this paper, we propose a novel distillation method for lightweight neural network architectures that outperforms other known methods for the face recognition task on LFW, AgeDB-30 and Megaface datasets. The idea of the proposed method is to use class centers from the teacher network for the student network. Then the student network is trained to get the same angles between the class centers and the face embeddings, predicted by the teacher network. △ Less

Submitted 5 March, 2020; originally announced March 2020.

arXiv:1904.07714 [pdf, other]

Low-Power Computer Vision: Status, Challenges, Opportunities

Authors: Sergei Alyamkin, Matthew Ardi, Alexander C. Berg, Achille Brighton, Bo Chen, Yiran Chen, Hsin-Pai Cheng, Zichen Fan, Chen Feng, Bo Fu, Kent Gauen, Abhinav Goel, Alexander Goncharenko, Xuyang Guo, Soonhoi Ha, Andrew Howard, Xiao Hu, Yuanjun Huang, Donghyun Kang, Jaeyoun Kim, Jong Gook Ko, Alexander Kondratyev, Junhyeok Lee, Seungjae Lee, Suwoong Lee , et al. (19 additional authors not shown)

Abstract: Computer vision has achieved impressive progress in recent years. Meanwhile, mobile phones have become the primary computing platforms for millions of people. In addition to mobile phones, many autonomous systems rely on visual data for making decisions and some of these systems have limited energy (such as unmanned aerial vehicles also called drones and mobile robots). These systems rely on batte… ▽ More Computer vision has achieved impressive progress in recent years. Meanwhile, mobile phones have become the primary computing platforms for millions of people. In addition to mobile phones, many autonomous systems rely on visual data for making decisions and some of these systems have limited energy (such as unmanned aerial vehicles also called drones and mobile robots). These systems rely on batteries and energy efficiency is critical. This article serves two main purposes: (1) Examine the state-of-the-art for low-power solutions to detect objects in images. Since 2015, the IEEE Annual International Low-Power Image Recognition Challenge (LPIRC) has been held to identify the most energy-efficient computer vision solutions. This article summarizes 2018 winners' solutions. (2) Suggest directions for research as well as opportunities for low-power computer vision. △ Less

Submitted 15 April, 2019; originally announced April 2019.

Comments: Preprint, Accepted by IEEE Journal on Emerging and Selected Topics in Circuits and Systems. arXiv admin note: substantial text overlap with arXiv:1810.01732

arXiv:1810.01732 [pdf]

2018 Low-Power Image Recognition Challenge

Authors: Sergei Alyamkin, Matthew Ardi, Achille Brighton, Alexander C. Berg, Yiran Chen, Hsin-Pai Cheng, Bo Chen, Zichen Fan, Chen Feng, Bo Fu, Kent Gauen, Jongkook Go, Alexander Goncharenko, Xuyang Guo, Hong Hanh Nguyen, Andrew Howard, Yuanjun Huang, Donghyun Kang, Jaeyoun Kim, Alexander Kondratyev, Seungjae Lee, Suwoong Lee, Junhyeok Lee, Zhiyu Liang, Xin Liu , et al. (16 additional authors not shown)

Abstract: The Low-Power Image Recognition Challenge (LPIRC, https://rebootingcomputing.ieee.org/lpirc) is an annual competition started in 2015. The competition identifies the best technologies that can classify and detect objects in images efficiently (short execution time and low energy consumption) and accurately (high precision). Over the four years, the winners' scores have improved more than 24 times.… ▽ More The Low-Power Image Recognition Challenge (LPIRC, https://rebootingcomputing.ieee.org/lpirc) is an annual competition started in 2015. The competition identifies the best technologies that can classify and detect objects in images efficiently (short execution time and low energy consumption) and accurately (high precision). Over the four years, the winners' scores have improved more than 24 times. As computer vision is widely used in many battery-powered systems (such as drones and mobile phones), the need for low-power computer vision will become increasingly important. This paper summarizes LPIRC 2018 by describing the three different tracks and the winners' solutions. △ Less

Submitted 3 October, 2018; originally announced October 2018.

Comments: 13 pages, workshop in 2018 CVPR, competition, low-power, image recognition

Showing 1–7 of 7 results for author: Svitov, D