-
DragPoser: Motion Reconstruction from Variable Sparse Tracking Signals via Latent Space Optimization
Authors:
Jose Luis Ponton,
Eduard Pujol,
Andreas Aristidou,
Carlos Andujar,
Nuria Pelechano
Abstract:
High-quality motion reconstruction that follows the user's movements can be achieved by high-end mocap systems with many sensors. However, obtaining such animation quality with fewer input devices is gaining popularity as it brings mocap closer to the general public. The main challenges include the loss of end-effector accuracy in learning-based approaches, or the lack of naturalness and smoothnes…
▽ More
High-quality motion reconstruction that follows the user's movements can be achieved by high-end mocap systems with many sensors. However, obtaining such animation quality with fewer input devices is gaining popularity as it brings mocap closer to the general public. The main challenges include the loss of end-effector accuracy in learning-based approaches, or the lack of naturalness and smoothness in IK-based solutions. In addition, such systems are often finely tuned to a specific number of trackers and are highly sensitive to missing data e.g., in scenarios where a sensor is occluded or malfunctions. In response to these challenges, we introduce DragPoser, a novel deep-learning-based motion reconstruction system that accurately represents hard and dynamic on-the-fly constraints, attaining real-time high end-effectors position accuracy. This is achieved through a pose optimization process within a structured latent space. Our system requires only one-time training on a large human motion dataset, and then constraints can be dynamically defined as losses, while the pose is iteratively refined by computing the gradients of these losses within the latent space. To further enhance our approach, we incorporate a Temporal Predictor network, which employs a Transformer architecture to directly encode temporality within the latent space. This network ensures the pose optimization is confined to the manifold of valid poses and also leverages past pose data to predict temporally coherent poses. Results demonstrate that DragPoser surpasses both IK-based and the latest data-driven methods in achieving precise end-effector positioning, while it produces natural poses and temporally coherent motion. In addition, our system showcases robustness against on-the-fly constraint modifications, and exhibits exceptional adaptability to various input configurations and changes.
△ Less
Submitted 29 April, 2024;
originally announced June 2024.
-
Revisiting Poisson-disk Subsampling for Massive Point Cloud Decimation
Authors:
Marc Comino-Trinidad,
Antonio Chica,
Carlos andújar
Abstract:
Scanning devices often produce point clouds exhibiting highly uneven distributions of point samples across the surfaces being captured. Different point cloud subsampling techniques have been proposed to generate more evenly distributed samples. Poisson-disk sampling approaches assign each sample a cost value so that subsampling reduces to sorting the samples by cost and then removing the desired r…
▽ More
Scanning devices often produce point clouds exhibiting highly uneven distributions of point samples across the surfaces being captured. Different point cloud subsampling techniques have been proposed to generate more evenly distributed samples. Poisson-disk sampling approaches assign each sample a cost value so that subsampling reduces to sorting the samples by cost and then removing the desired ratio of samples with the highest cost. Unfortunately, these approaches compute the sample cost using pairwise distances of the points within a constant search radius, which is very costly for massive point clouds with uneven densities. In this paper, we revisit Poisson-disk sampling for point clouds. Instead of optimizing for equal densities, we propose to maximize the distance to the closest point, which is equivalent to estimating the local point density as a value inversely proportional to this distance. This algorithm can be efficiently implemented using k nearest-neighbors searches. Besides a kd-tree, our algorithm also uses a voxelization to speed up the searches required to compute per-sample costs. We propose a new strategy to minimize cost updates that is amenable for out-of-core operation. We demonstrate the benefits of our approach in terms of performance, scalability, and output quality. We also discuss extensions based on adding orientation-based and color-based terms to the cost function.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data
Authors:
Jose Luis Ponton,
Haoran Yun,
Andreas Aristidou,
Carlos Andujar,
Nuria Pelechano
Abstract:
Accurate and reliable human motion reconstruction is crucial for creating natural interactions of full-body avatars in Virtual Reality (VR) and entertainment applications. As the Metaverse and social applications gain popularity, users are seeking cost-effective solutions to create full-body animations that are comparable in quality to those produced by commercial motion capture systems. In order…
▽ More
Accurate and reliable human motion reconstruction is crucial for creating natural interactions of full-body avatars in Virtual Reality (VR) and entertainment applications. As the Metaverse and social applications gain popularity, users are seeking cost-effective solutions to create full-body animations that are comparable in quality to those produced by commercial motion capture systems. In order to provide affordable solutions, though, it is important to minimize the number of sensors attached to the subject's body. Unfortunately, reconstructing the full-body pose from sparse data is a heavily under-determined problem. Some studies that use IMU sensors face challenges in reconstructing the pose due to positional drift and ambiguity of the poses. In recent years, some mainstream VR systems have released 6-degree-of-freedom (6-DoF) tracking devices providing positional and rotational information. Nevertheless, most solutions for reconstructing full-body poses rely on traditional inverse kinematics (IK) solutions, which often produce non-continuous and unnatural poses. In this article, we introduce SparsePoser, a novel deep learning-based solution for reconstructing a full-body pose from a reduced set of six tracking devices. Our system incorporates a convolutional-based autoencoder that synthesizes high-quality continuous human poses by learning the human motion manifold from motion capture data. Then, we employ a learned IK component, made of multiple lightweight feed-forward neural networks, to adjust the hands and feet toward the corresponding trackers. We extensively evaluate our method on publicly available motion capture datasets and with real-time live demos. We show that our method outperforms state-of-the-art techniques using IMU sensors or 6-DoF tracking devices, and can be used for users with different body dimensions and proportions.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Animation Fidelity in Self-Avatars: Impact on User Performance and Sense of Agency
Authors:
Haoran Yun,
Jose Luis Ponton,
Carlos Andujar,
Nuria Pelechano
Abstract:
The use of self-avatars is gaining popularity thanks to affordable VR headsets. Unfortunately, mainstream VR devices often use a small number of trackers and provide low-accuracy animations. Previous studies have shown that the Sense of Embodiment, and in particular the Sense of Agency, depends on the extent to which the avatar's movements mimic the user's movements. However, few works study such…
▽ More
The use of self-avatars is gaining popularity thanks to affordable VR headsets. Unfortunately, mainstream VR devices often use a small number of trackers and provide low-accuracy animations. Previous studies have shown that the Sense of Embodiment, and in particular the Sense of Agency, depends on the extent to which the avatar's movements mimic the user's movements. However, few works study such effect for tasks requiring a precise interaction with the environment, i.e., tasks that require accurate manipulation, precise foot step**, or correct body poses. In these cases, users are likely to notice inconsistencies between their self-avatars and their actual pose. In this paper, we study the impact of the animation fidelity of the user avatar on a variety of tasks that focus on arm movement, leg movement and body posture. We compare three different animation techniques: two of them using Inverse Kinematics to reconstruct the pose from sparse input (6 trackers), and a third one using a professional motion capture system with 17 inertial sensors. We evaluate these animation techniques both quantitatively (completion time, unintentional collisions, pose accuracy) and qualitatively (Sense of Embodiment). Our results show that the animation quality affects the Sense of Embodiment. Inertial-based MoCap performs significantly better in mimicking body poses. Surprisingly, IK-based solutions using fewer sensors outperformed MoCap in tasks requiring accurate positioning, which we attribute to the higher latency and the positional drift that causes errors at the end-effectors, which are more noticeable in contact areas such as the feet.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Combining Motion Matching and Orientation Prediction to Animate Avatars for Consumer-Grade VR Devices
Authors:
Jose Luis Ponton,
Haoran Yun,
Carlos Andujar,
Nuria Pelechano
Abstract:
The animation of user avatars plays a crucial role in conveying their pose, gestures, and relative distances to virtual objects or other users. Self-avatar animation in immersive VR helps improve the user experience and provides a Sense of Embodiment. However, consumer-grade VR devices typically include at most three trackers, one at the Head Mounted Display (HMD), and two at the handheld VR contr…
▽ More
The animation of user avatars plays a crucial role in conveying their pose, gestures, and relative distances to virtual objects or other users. Self-avatar animation in immersive VR helps improve the user experience and provides a Sense of Embodiment. However, consumer-grade VR devices typically include at most three trackers, one at the Head Mounted Display (HMD), and two at the handheld VR controllers. Since the problem of reconstruction the user pose from such sparse data is ill-defined, especially for the lower body, the approach adopted by most VR games consists of assuming the body orientation matches that of the HMD, and applying animation blending and time-war** from a reduced set of animations. Unfortunately, this approach produces noticeable mismatches between user and avatar movements. In this work we present a new approach to animate user avatars that is suitable for current mainstream VR devices. First, we use a neural network to estimate the user's body orientation based on the tracking information from the HMD and the hand controllers. Then we use this orientation together with the velocity and rotation of the HMD to build a feature vector that feeds a Motion Matching algorithm. We built a MoCap database with animations of VR users wearing a HMD and used it to test our approach on both self-avatars and other users' avatars. Our results show that our system can provide a large variety of lower body animations while correctly matching the user orientation, which in turn allows us to represent not only forward movements but also step** in any direction.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
Procedural Generation of STEM Quizzes
Authors:
Carlos Andujar
Abstract:
Electronic quizzes are used extensively for summative and formative assessment. Current Learning Management Systems (LMS) allow instructors to create quizzes through a Graphical User Interface. Despite having a smooth learning curve, question generation/editing process with such interfaces is often slow and the creation of question variants is mostly limited to random parameters. In this paper we…
▽ More
Electronic quizzes are used extensively for summative and formative assessment. Current Learning Management Systems (LMS) allow instructors to create quizzes through a Graphical User Interface. Despite having a smooth learning curve, question generation/editing process with such interfaces is often slow and the creation of question variants is mostly limited to random parameters. In this paper we argue that procedural question generation greatly facilitates the task of creating varied, formative, up-to-date, adaptive question banks for STEM quizzes. We present and evaluate a proof-of-concept Python API for script-based question generation, and propose different question design patterns that greatly facilitate question authoring. The API supports questions including mathematical formulas, dynamically generated images and videos, as well as interactive content such as 3D model viewers. Output questions can be imported in major LMS. For basic usage, the required programming skills are minimal. More advanced uses do require some programming knowledge, but at a level that is common in STEM instructors. A side advantage of our system is that the question bank is actually embedded in Python code, making collaboration, version control, and maintenance tasks very easy. We demonstrate the benefits of script-based generation over traditional GUI-based approaches, in terms of question richness, authoring speed and content re-usability.
△ Less
Submitted 8 September, 2020;
originally announced September 2020.
-
First Impressions: A Survey on Vision-Based Apparent Personality Trait Analysis
Authors:
Julio C. S. Jacques Junior,
Yağmur Güçlütürk,
Marc Pérez,
Umut Güçlü,
Carlos Andujar,
Xavier Baró,
Hugo Jair Escalante,
Isabelle Guyon,
Marcel A. J. van Gerven,
Rob van Lier,
Sergio Escalera
Abstract:
Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing inter…
▽ More
Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.
△ Less
Submitted 17 July, 2019; v1 submitted 21 April, 2018;
originally announced April 2018.