Search | arXiv e-print repository

Geometric Generative Models based on Morphological Equivariant PDEs and GANs

Authors: El Hadji S. Diop, Thierno Fall, Alioune Mbengue, Mohamed Daoudi

Abstract: Content and image generation consist in creating or generating data from noisy information by extracting specific features such as texture, edges, and other thin image structures. We are interested here in generative models, and two main problems are addressed. Firstly, the improvements of specific feature extraction while accounting at multiscale levels intrinsic geometric features; and secondly,… ▽ More Content and image generation consist in creating or generating data from noisy information by extracting specific features such as texture, edges, and other thin image structures. We are interested here in generative models, and two main problems are addressed. Firstly, the improvements of specific feature extraction while accounting at multiscale levels intrinsic geometric features; and secondly, the equivariance of the network to reduce its complexity and provide a geometric interpretability. To proceed, we propose a geometric generative model based on an equivariant partial differential equation (PDE) for group convolution neural networks (G-CNNs), so called PDE-G-CNNs, built on morphology operators and generative adversarial networks (GANs). Equivariant morphological PDE layers are composed of multiscale dilations and erosions formulated in Riemannian manifolds, while group symmetries are defined on a Lie group. We take advantage of the Lie group structure to properly integrate the equivariance in layers, and are able to use the Riemannian metric to solve the multiscale morphological operations. Each point of the Lie group is associated with a unique point in the manifold, which helps us derive a metric on the Riemannian manifold from a tensor field invariant under the Lie group so that the induced metric has the same symmetries. The proposed geometric morphological GAN (GM-GAN) is obtained by using the proposed morphological equivariant convolutions in PDE-G-CNNs to bring nonlinearity in classical CNNs. GM-GAN is evaluated on MNIST data and compared with GANs. Preliminary results show that GM-GAN model outperforms classical GAN. △ Less

Submitted 25 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.10942 [pdf, other]

ScanTalk: 3D Talking Heads from Unregistered Scans

Authors: Federico Nocentini, Thomas Besnier, Claudio Ferrari, Sylvain Arguillere, Stefano Berretti, Mohamed Daoudi

Abstract: Speech-driven 3D talking heads generation has emerged as a significant area of interest among researchers, presenting numerous challenges. Existing methods are constrained by animating faces with fixed topologies, wherein point-wise correspondence is established, and the number and order of points remains consistent across all identities the model can animate. In this work, we present ScanTalk, a… ▽ More Speech-driven 3D talking heads generation has emerged as a significant area of interest among researchers, presenting numerous challenges. Existing methods are constrained by animating faces with fixed topologies, wherein point-wise correspondence is established, and the number and order of points remains consistent across all identities the model can animate. In this work, we present ScanTalk, a novel framework capable of animating 3D faces in arbitrary topologies including scanned data. Our approach relies on the DiffusionNet architecture to overcome the fixed topology constraint, offering promising avenues for more flexible and realistic 3D animations. By leveraging the power of DiffusionNet, ScanTalk not only adapts to diverse facial structures but also maintains fidelity when dealing with scanned data, thereby enhancing the authenticity and versatility of generated 3D talking heads. Through comprehensive comparisons with state-of-the-art methods, we validate the efficacy of our approach, demonstrating its capacity to generate realistic talking heads comparable to existing techniques. While our primary objective is to develop a generic method free from topological constraints, all state-of-the-art methodologies are bound by such limitations. Code for reproducing our results, and the pre-trained model will be made available. △ Less

Submitted 19 March, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

arXiv:2311.04382 [pdf, other]

Basis restricted elastic shape analysis on the space of unregistered surfaces

Authors: Emmanuel Hartman, Emery Pierson, Martin Bauer, Mohamed Daoudi, Nicolas Charon

Abstract: This paper introduces a new mathematical and numerical framework for surface analysis derived from the general setting of elastic Riemannian metrics on shape spaces. Traditionally, those metrics are defined over the infinite dimensional manifold of immersed surfaces and satisfy specific invariance properties enabling the comparison of surfaces modulo shape preserving transformations such as repara… ▽ More This paper introduces a new mathematical and numerical framework for surface analysis derived from the general setting of elastic Riemannian metrics on shape spaces. Traditionally, those metrics are defined over the infinite dimensional manifold of immersed surfaces and satisfy specific invariance properties enabling the comparison of surfaces modulo shape preserving transformations such as reparametrizations. The specificity of the approach we develop is to restrict the space of allowable transformations to predefined finite dimensional bases of deformation fields. These are estimated in a data-driven way so as to emulate specific types of surface transformations observed in a training set. The use of such bases allows to simplify the representation of the corresponding shape space to a finite dimensional latent space. However, in sharp contrast with methods involving e.g. mesh autoencoders, the latent space is here equipped with a non-Euclidean Riemannian metric precisely inherited from the family of aforementioned elastic metrics. We demonstrate how this basis restricted model can be then effectively implemented to perform a variety of tasks on surface meshes which, importantly, does not assume these to be pre-registered (i.e. with given point correspondences) or to even have a consistent mesh structure. We specifically validate our approach on human body shape and pose data as well as human face scans, and show how it generally outperforms state-of-the-art methods on problems such as shape registration, interpolation, motion transfer or random pose generation. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 18 pages, 10 figures, 8 tables

MSC Class: I.4.0; I.5.1; I.4.9

arXiv:2306.15762 [pdf, other]

Toward Mesh-Invariant 3D Generative Deep Learning with Geometric Measures

Authors: Thomas Besnier, Sylvain Arguillère, Emery Pierson, Mohamed Daoudi

Abstract: 3D generative modeling is accelerating as the technology allowing the capture of geometric data is develo**. However, the acquired data is often inconsistent, resulting in unregistered meshes or point clouds. Many generative learning algorithms require correspondence between each point when comparing the predicted shape and the target shape. We propose an architecture able to cope with different… ▽ More 3D generative modeling is accelerating as the technology allowing the capture of geometric data is develo**. However, the acquired data is often inconsistent, resulting in unregistered meshes or point clouds. Many generative learning algorithms require correspondence between each point when comparing the predicted shape and the target shape. We propose an architecture able to cope with different parameterizations, even during the training phase. In particular, our loss function is built upon a kernel-based metric over a representation of meshes using geometric measures such as currents and varifolds. The latter allows to implement an efficient dissimilarity measure with many desirable properties such as robustness to resampling of the mesh or point cloud. We demonstrate the efficiency and resilience of our model with a generative learning task of human faces. △ Less

Submitted 27 June, 2023; originally announced June 2023.

arXiv:2303.17611 [pdf, other]

Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion Recognition

Authors: Yu** Wu, Mohamed Daoudi, Ali Amad

Abstract: Recently, wearable emotion recognition based on peripheral physiological signals has drawn massive attention due to its less invasive nature and its applicability in real-life scenarios. However, how to effectively fuse multimodal data remains a challenging problem. Moreover, traditional fully-supervised based approaches suffer from overfitting given limited labeled data. To address the above issu… ▽ More Recently, wearable emotion recognition based on peripheral physiological signals has drawn massive attention due to its less invasive nature and its applicability in real-life scenarios. However, how to effectively fuse multimodal data remains a challenging problem. Moreover, traditional fully-supervised based approaches suffer from overfitting given limited labeled data. To address the above issues, we propose a novel self-supervised learning (SSL) framework for wearable emotion recognition, where efficient multimodal fusion is realized with temporal convolution-based modality-specific encoders and a transformer-based shared encoder, capturing both intra-modal and inter-modal correlations. Extensive unlabeled data is automatically assigned labels by five signal transforms, and the proposed SSL model is pre-trained with signal transformation recognition as a pretext task, allowing the extraction of generalized multimodal representations for emotion-related downstream tasks. For evaluation, the proposed SSL model was first pre-trained on a large-scale self-collected physiological dataset and the resulting encoder was subsequently frozen or fine-tuned on three public supervised emotion recognition datasets. Ultimately, our SSL-based method achieved state-of-the-art results in various emotion classification tasks. Meanwhile, the proposed model proved to be more accurate and robust compared to fully-supervised methods on low data regimes. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: Accepted IEEE Transactions On Affective Computing

arXiv:2301.10134 [pdf, other]

Bipartite Graph Diffusion Model for Human Interaction Generation

Authors: Baptiste Chopin, Hao Tang, Mohamed Daoudi

Abstract: The generation of natural human motion interactions is a hot topic in computer vision and computer animation. It is a challenging task due to the diversity of possible human motion interactions. Diffusion models, which have already shown remarkable generative capabilities in other domains, are a good candidate for this task. In this paper, we introduce a novel bipartite graph diffusion method (BiG… ▽ More The generation of natural human motion interactions is a hot topic in computer vision and computer animation. It is a challenging task due to the diversity of possible human motion interactions. Diffusion models, which have already shown remarkable generative capabilities in other domains, are a good candidate for this task. In this paper, we introduce a novel bipartite graph diffusion method (BiGraphDiff) to generate human motion interactions between two persons. Specifically, bipartite node sets are constructed to model the inherent geometric constraints between skeleton nodes during interactions. The interaction graph diffusion model is transformer-based, combining some state-of-the-art motion methods. We show that the proposed achieves new state-of-the-art results on leading benchmarks for the human interaction generation task. △ Less

Submitted 3 November, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

Comments: accepted in WACV 2024

arXiv:2211.13185 [pdf, other]

BaRe-ESA: A Riemannian Framework for Unregistered Human Body Shapes

Authors: Emmanuel Hartman, Emery Pierson, Martin Bauer, Nicolas Charon, Mohamed Daoudi

Abstract: We present Basis Restricted Elastic Shape Analysis (BaRe-ESA), a novel Riemannian framework for human body scan representation, interpolation and extrapolation. BaRe-ESA operates directly on unregistered meshes, i.e., without the need to establish prior point to point correspondences or to assume a consistent mesh structure. Our method relies on a latent space representation, which is equipped wit… ▽ More We present Basis Restricted Elastic Shape Analysis (BaRe-ESA), a novel Riemannian framework for human body scan representation, interpolation and extrapolation. BaRe-ESA operates directly on unregistered meshes, i.e., without the need to establish prior point to point correspondences or to assume a consistent mesh structure. Our method relies on a latent space representation, which is equipped with a Riemannian (non-Euclidean) metric associated to an invariant higher-order metric on the space of surfaces. Experimental results on the FAUST and DFAUST datasets show that BaRe-ESA brings significant improvements with respect to previous solutions in terms of shape registration, interpolation and extrapolation. The efficiency and strength of our model is further demonstrated in applications such as motion transfer and random generation of body shape and pose. △ Less

Submitted 21 August, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

Comments: 13 pages, 7 figures, 3 tables

MSC Class: I.4.0; I.5.1; I.4.9

arXiv:2210.16807 [pdf, other]

The Florence 4D Facial Expression Dataset

Authors: F. Principi, S. Berretti, C. Ferrari, N. Otberdout, M. Daoudi, A. Del Bimbo

Abstract: Human facial expressions change dynamically, so their recognition / analysis should be conducted by accounting for the temporal evolution of face deformations either in 2D or 3D. While abundant 2D video data do exist, this is not the case in 3D, where few 3D dynamic (4D) datasets were released for public use. The negative consequence of this scarcity of data is amplified by current deep learning b… ▽ More Human facial expressions change dynamically, so their recognition / analysis should be conducted by accounting for the temporal evolution of face deformations either in 2D or 3D. While abundant 2D video data do exist, this is not the case in 3D, where few 3D dynamic (4D) datasets were released for public use. The negative consequence of this scarcity of data is amplified by current deep learning based-methods for facial expression analysis that require large quantities of variegate samples to be effectively trained. With the aim of smoothing such limitations, in this paper we propose a large dataset, named Florence 4D, composed of dynamic sequences of 3D face models, where a combination of synthetic and real identities exhibit an unprecedented variety of 4D facial expressions, with variations that include the classical neutral-apex transition, but generalize to expression-to-expression. All these characteristics are not exposed by any of the existing 4D datasets and they cannot even be obtained by combining more than one dataset. We strongly believe that making such a data corpora publicly available to the community will allow designing and experimenting new applications that were not possible to investigate till now. To show at some extent the difficulty of our data in terms of different identities and varying expressions, we also report a baseline experimentation on the proposed dataset that can be used as baseline. △ Less

Submitted 30 October, 2022; originally announced October 2022.

arXiv:2209.01813 [pdf, other]

Automatic Estimation of Self-Reported Pain by Trajectory Analysis in the Manifold of Fixed Rank Positive Semi-Definite Matrices

Authors: Benjamin Szczapa, Mohamed Daoudi, Stefano Berretti, Pietro Pala, Alberto Del Bimbo, Zakia Hammal

Abstract: We propose an automatic method to estimate self-reported pain based on facial landmarks extracted from videos. For each video sequence, we decompose the face into four different regions and the pain intensity is measured by modeling the dynamics of facial movement using the landmarks of these regions. A formulation based on Gram matrices is used for representing the trajectory of landmarks on the… ▽ More We propose an automatic method to estimate self-reported pain based on facial landmarks extracted from videos. For each video sequence, we decompose the face into four different regions and the pain intensity is measured by modeling the dynamics of facial movement using the landmarks of these regions. A formulation based on Gram matrices is used for representing the trajectory of landmarks on the Riemannian manifold of symmetric positive semi-definite matrices of fixed rank. A curve fitting algorithm is used to smooth the trajectories and temporal alignment is performed to compute the similarity between the trajectories on the manifold. A Support Vector Regression classifier is then trained to encode extracted trajectories into pain intensity levels consistent with self-reported pain intensity measurement. Finally, a late fusion of the estimation for each region is performed to obtain the final predicted pain level. The proposed approach is evaluated on two publicly available datasets, the UNBCMcMaster Shoulder Pain Archive and the Biovid Heat Pain dataset. We compared our method to the state-of-the-art on both datasets using different testing protocols, showing the competitiveness of the proposed approach. △ Less

Submitted 17 September, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

Comments: To appear in IEEE Transactions On Affective Computing, it is an extension of our paper arXiv:2006.13882

arXiv:2208.00050 [pdf, other]

Generating Multiple 4D Expression Transitions by Learning Face Landmark Trajectories

Authors: Naima Otberdout, Claudio Ferrari, Mohamed Daoudi, Stefano Berretti, Alberto Del Bimbo

Abstract: In this work, we address the problem of 4D facial expressions generation. This is usually addressed by animating a neutral 3D face to reach an expression peak, and then get back to the neutral state. In the real world though, people show more complex expressions, and switch from one expression to another. We thus propose a new model that generates transitions between different expressions, and syn… ▽ More In this work, we address the problem of 4D facial expressions generation. This is usually addressed by animating a neutral 3D face to reach an expression peak, and then get back to the neutral state. In the real world though, people show more complex expressions, and switch from one expression to another. We thus propose a new model that generates transitions between different expressions, and synthesizes long and composed 4D expressions. This involves three sub-problems: (i) modeling the temporal dynamics of expressions, (ii) learning transitions between them, and (iii) deforming a generic mesh. We propose to encode the temporal evolution of expressions using the motion of a set of 3D landmarks, that we learn to generate by training a manifold-valued GAN (Motion3DGAN). To allow the generation of composed expressions, this model accepts two labels encoding the starting and the ending expressions. The final sequence of meshes is generated by a Sparse2Dense mesh Decoder (S2D-Dec) that maps the landmark displacements to a dense, per-vertex displacement of a known mesh topology. By explicitly working with motion trajectories, the model is totally independent from the identity. Extensive experiments on five public datasets show that our proposed approach brings significant improvements with respect to previous solutions, while retaining good generalization to unseen data. △ Less

Submitted 18 May, 2023; v1 submitted 29 July, 2022; originally announced August 2022.

Comments: This preprint is an extension of CVPR 2022 paper arXiv:2105.07463

arXiv:2207.12485 [pdf, other]

3D Shape Sequence of Human Comparison and Classification using Current and Varifolds

Authors: Emery Pierson, Mohamed Daoudi, Sylvain Arguillere

Abstract: In this paper we address the task of the comparison and the classification of 3D shape sequences of human. The non-linear dynamics of the human motion and the changing of the surface parametrization over the time make this task very challenging. To tackle this issue, we propose to embed the 3D shape sequences in an infinite dimensional space, the space of varifolds, endowed with an inner product t… ▽ More In this paper we address the task of the comparison and the classification of 3D shape sequences of human. The non-linear dynamics of the human motion and the changing of the surface parametrization over the time make this task very challenging. To tackle this issue, we propose to embed the 3D shape sequences in an infinite dimensional space, the space of varifolds, endowed with an inner product that comes from a given positive definite kernel. More specifically, our approach involves two steps: 1) the surfaces are represented as varifolds, this representation induces metrics equivariant to rigid motions and invariant to parametrization; 2) the sequences of 3D shapes are represented by Gram matrices derived from their infinite dimensional Hankel matrices. The problem of comparison of two 3D sequences of human is formulated as a comparison of two Gram-Hankel matrices. Extensive experiments on CVSSP3D and Dyna datasets show that our method is competitive with state-of-the-art in 3D human sequence motion retrieval. Code for the experiments is available at https://github.com/CRISTAL-3DSAM/HumanComparisonVarifolds. △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: European Conference on Computer Vision (ECCV), Tel Aviv October 23-27 2022

arXiv:2207.08811 [pdf, other]

Fusion of Physiological and Behavioural Signals on SPD Manifolds with Application to Stress and Pain Detection

Authors: Yu** WU, Mohamed Daoudi, Ali Amad, Laurent Sparrow, Fabien D'Hondt

Abstract: Existing multimodal stress/pain recognition approaches generally extract features from different modalities independently and thus ignore cross-modality correlations. This paper proposes a novel geometric framework for multimodal stress/pain detection utilizing Symmetric Positive Definite (SPD) matrices as a representation that incorporates the correlation relationship of physiological and behavio… ▽ More Existing multimodal stress/pain recognition approaches generally extract features from different modalities independently and thus ignore cross-modality correlations. This paper proposes a novel geometric framework for multimodal stress/pain detection utilizing Symmetric Positive Definite (SPD) matrices as a representation that incorporates the correlation relationship of physiological and behavioural signals from covariance and cross-covariance. Considering the non-linearity of the Riemannian manifold of SPD matrices, well-known machine learning techniques are not suited to classify these matrices. Therefore, a tangent space map** method is adopted to map the derived SPD matrix sequences to the vector sequences in the tangent space where the LSTM-based network can be applied for classification. The proposed framework has been evaluated on two public multimodal datasets, achieving both the state-of-the-art results for stress and pain detection tasks. △ Less

Submitted 17 July, 2022; originally announced July 2022.

Comments: International Conference on Systems, Man, and Cybernetics, IEEE SMC 2022, October 9-12, 2022

arXiv:2207.01685 [pdf, other]

Interaction Transformer for Human Reaction Generation

Authors: Baptiste Chopin, Hao Tang, Naima Otberdout, Mohamed Daoudi, Nicu Sebe

Abstract: We address the challenging task of human reaction generation, which aims to generate a corresponding reaction based on an input action. Most of the existing works do not focus on generating and predicting the reaction and cannot generate the motion when only the action is given as input. To address this limitation, we propose a novel interaction Transformer (InterFormer) consisting of a Transforme… ▽ More We address the challenging task of human reaction generation, which aims to generate a corresponding reaction based on an input action. Most of the existing works do not focus on generating and predicting the reaction and cannot generate the motion when only the action is given as input. To address this limitation, we propose a novel interaction Transformer (InterFormer) consisting of a Transformer network with both temporal and spatial attention. Specifically, temporal attention captures the temporal dependencies of the motion of both characters and of their interaction, while spatial attention learns the dependencies between the different body parts of each character and those which are part of the interaction. Moreover, we propose using graphs to increase the performance of spatial attention via an interaction distance module that helps focus on nearby joints from both characters. Extensive experiments on the SBU interaction, K3HI, and DuetDance datasets demonstrate the effectiveness of InterFormer. Our method is general and can be used to generate more complex and long-term interactions. We also provide videos of generated reactions and the code with pre-trained models at https://github.com/CRISTAL-3DSAM/InterFormer △ Less

Submitted 1 February, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

Journal ref: IEEE Transactions On Multimedia 2023

arXiv:2203.00736 [pdf, other]

3D Skeleton-based Human Motion Prediction with Manifold-Aware GAN

Authors: Baptiste Chopin, Naima Otberdout, Mohamed Daoudi, Angela Bartolo

Abstract: In this work we propose a novel solution for 3D skeleton-based human motion prediction. The objective of this task consists in forecasting future human poses based on a prior skeleton pose sequence. This involves solving two main challenges still present in recent literature; (1) discontinuity of the predicted motion which results in unrealistic motions and (2) performance deterioration in long-te… ▽ More In this work we propose a novel solution for 3D skeleton-based human motion prediction. The objective of this task consists in forecasting future human poses based on a prior skeleton pose sequence. This involves solving two main challenges still present in recent literature; (1) discontinuity of the predicted motion which results in unrealistic motions and (2) performance deterioration in long-term horizons resulting from error accumulation across time. We tackle these issues by using a compact manifold-valued representation of 3D human skeleton motion. Specifically, we model the temporal evolution of the 3D poses as trajectory, what allows us to map human motions to single points on a sphere manifold. Using such a compact representation avoids error accumulation and provides robust representation for long-term prediction while ensuring the smoothness and the coherence of the whole motion. To learn these non-Euclidean representations, we build a manifold-aware Wasserstein generative adversarial model that captures the temporal and spatial dependencies of human motion through different losses. Experiments have been conducted on CMU MoCap and Human 3.6M datasets and demonstrate the superiority of our approach over the state-of-the-art both in short and long term horizons. The smoothness of the generated motion is highlighted in the qualitative results. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Comments: paper submitted to IEEE transactions on biometrics, behavior, and identity science, an extension of IEEE FG paper arXiv:2105.08715

arXiv:2111.13985 [pdf, other]

doi 10.1016/j.cag.2021.10.012

Projection-based Classification of Surfaces for 3D Human Mesh Sequence Retrieval

Authors: Emery Pierson, Juan-Carlos Alvarez Paiva, Mohamed Daoudi

Abstract: We analyze human poses and motion by introducing three sequences of easily calculated surface descriptors that are invariant under reparametrizations and Euclidean transformations. These descriptors are obtained by associating to each finitely-triangulated surface two functions on the unit sphere: for each unit vector u we compute the weighted area of the projection of the surface onto the plane o… ▽ More We analyze human poses and motion by introducing three sequences of easily calculated surface descriptors that are invariant under reparametrizations and Euclidean transformations. These descriptors are obtained by associating to each finitely-triangulated surface two functions on the unit sphere: for each unit vector u we compute the weighted area of the projection of the surface onto the plane orthogonal to u and the length of its projection onto the line spanned by u. The L2 norms and inner products of the projections of these functions onto the space of spherical harmonics of order k provide us with three sequences of Euclidean and reparametrization invariants of the surface. The use of these invariants reduces the comparison of 3D+time surface representations to the comparison of polygonal curves in R^n. The experimental results on the FAUST and CVSSP3D artificial datasets are promising. Moreover, a slight modification of our method yields good results on the noisy CVSSP3D real dataset. △ Less

Submitted 27 November, 2021; originally announced November 2021.

Journal ref: Computers & Graphics, 2 November 2021

arXiv:2108.11449 [pdf, other]

A Riemannian Framework for Analysis of Human Body Surface

Authors: Emery Pierson, Mohamed Daoudi, Alice-Barbara Tumpach

Abstract: We propose a novel framework for comparing 3D human shapes under the change of shape and pose. This problem is challenging since 3D human shapes vary significantly across subjects and body postures. We solve this problem by using a Riemannian approach. Our core contribution is the map** of the human body surface to the space of metrics and normals. We equip this space with a family of Riemannian… ▽ More We propose a novel framework for comparing 3D human shapes under the change of shape and pose. This problem is challenging since 3D human shapes vary significantly across subjects and body postures. We solve this problem by using a Riemannian approach. Our core contribution is the map** of the human body surface to the space of metrics and normals. We equip this space with a family of Riemannian metrics, called Ebin (or DeWitt) metrics. We treat a human body surface as a point in a "shape space" equipped with a family of Riemannian metrics. The family of metrics is invariant under rigid motions and reparametrizations; hence it induces a metric on the "shape space" of surfaces. Using the alignment of human bodies with a given template, we show that this family of metrics allows us to distinguish the changes in shape and pose. The proposed framework has several advantages. First, we define a family of metrics with desired invariance properties for the comparison of human shape. Second, we present an efficient framework to compute geodesic paths between human shape given the chosen metric. Third, this framework provides some basic tools for statistical shape analysis of human body surfaces. Finally, we demonstrate the utility of the proposed framework in pose and shape retrieval of human body. △ Less

Submitted 23 October, 2021; v1 submitted 25 August, 2021; originally announced August 2021.

Comments: IEEE Workshop on Applications of Computer Vision (WACV) 2022, accepted

arXiv:2105.08715 [pdf, other]

Human Motion Prediction Using Manifold-Aware Wasserstein GAN

Authors: Baptiste Chopin, Naima Otberdout, Mohamed Daoudi, Angela Bartolo

Abstract: Human motion prediction aims to forecast future human poses given a prior pose sequence. The discontinuity of the predicted motion and the performance deterioration in long-term horizons are still the main challenges encountered in current literature. In this work, we tackle these issues by using a compact manifold-valued representation of human motion. Specifically, we model the temporal evolutio… ▽ More Human motion prediction aims to forecast future human poses given a prior pose sequence. The discontinuity of the predicted motion and the performance deterioration in long-term horizons are still the main challenges encountered in current literature. In this work, we tackle these issues by using a compact manifold-valued representation of human motion. Specifically, we model the temporal evolution of the 3D human poses as trajectory, what allows us to map human motions to single points on a sphere manifold. To learn these non-Euclidean representations, we build a manifold-aware Wasserstein generative adversarial model that captures the temporal and spatial dependencies of human motion through different losses. Extensive experiments show that our approach outperforms the state-of-the-art on CMU MoCap and Human 3.6M datasets. Our qualitative results show the smoothness of the predicted motions. △ Less

Submitted 16 July, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

Comments: IEEE International Conference on Automatic Face and Gesture Recognition 2021 Jodhpur, India December 15 - 18, 2021

arXiv:2105.07463 [pdf, other]

Sparse to Dense Dynamic 3D Facial Expression Generation

Authors: Naima Otberdout, Claudio Ferrari, Mohamed Daoudi, Stefano Berretti, Alberto Del Bimbo

Abstract: In this paper, we propose a solution to the task of generating dynamic 3D facial expressions from a neutral 3D face and an expression label. This involves solving two sub-problems: (i)modeling the temporal dynamics of expressions, and (ii) deforming the neutral mesh to obtain the expressive counterpart. We represent the temporal evolution of expressions using the motion of a sparse set of 3D landm… ▽ More In this paper, we propose a solution to the task of generating dynamic 3D facial expressions from a neutral 3D face and an expression label. This involves solving two sub-problems: (i)modeling the temporal dynamics of expressions, and (ii) deforming the neutral mesh to obtain the expressive counterpart. We represent the temporal evolution of expressions using the motion of a sparse set of 3D landmarks that we learn to generate by training a manifold-valued GAN (Motion3DGAN). To better encode the expression-induced deformation and disentangle it from the identity information, the generated motion is represented as per-frame displacement from a neutral configuration. To generate the expressive meshes, we train a Sparse2Dense mesh Decoder (S2D-Dec) that maps the landmark displacements to a dense, per-vertex displacement. This allows us to learn how the motion of a sparse set of landmarks influences the deformation of the overall face surface, independently from the identity. Experimental results on the CoMA and D3DFACS datasets show that our solution brings significant improvements with respect to previous solutions in terms of both dynamic expression generation and mesh reconstruction, while retaining good generalization to unseen data. The code and the pretrained model will be made publicly available. △ Less

Submitted 3 March, 2022; v1 submitted 16 May, 2021; originally announced May 2021.

Comments: paper accepted at CVPR 2022

arXiv:2105.02319 [pdf, other]

Magnifying Subtle Facial Motions for Effective 4D Expression Recognition

Authors: Qingkai Zhen, Di Huang, Yunhong Wang, Hassen Drira, Boulbaba Ben Amor, Mohamed Daoudi

Abstract: In this paper, an effective pipeline to automatic 4D Facial Expression Recognition (4D FER) is proposed. It combines two growing but disparate ideas in Computer Vision -- computing the spatial facial deformations using tools from Riemannian geometry and magnifying them using temporal filtering. The flow of 3D faces is first analyzed to capture the spatial deformations based on the recently-develop… ▽ More In this paper, an effective pipeline to automatic 4D Facial Expression Recognition (4D FER) is proposed. It combines two growing but disparate ideas in Computer Vision -- computing the spatial facial deformations using tools from Riemannian geometry and magnifying them using temporal filtering. The flow of 3D faces is first analyzed to capture the spatial deformations based on the recently-developed Riemannian approach, where registration and comparison of neighboring 3D faces are led jointly. Then, the obtained temporal evolution of these deformations are fed into a magnification method in order to amplify the facial activities over the time. The latter, main contribution of this paper, allows revealing subtle (hidden) deformations which enhance the emotion classification performance. We evaluated our approach on BU-4DFE dataset, the state-of-art 94.18% average performance and an improvement that exceeds 10% in classification accuracy, after magnifying extracted geometric features (deformations), are achieved. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: International Conference On Pattern Recognition 2016

arXiv:2104.13710 [pdf, other]

Hybrid Approach for 3D Head Reconstruction: Using Neural Networks and Visual Geometry

Authors: Oussema Bouafif, Bogdan Khomutenko, Mohamed Daoudi

Abstract: Recovering the 3D geometric structure of a face from a single input image is a challenging active research area in computer vision. In this paper, we present a novel method for reconstructing 3D heads from a single or multiple image(s) using a hybrid approach based on deep learning and geometric techniques. We propose an encoder-decoder network based on the U-net architecture and trained on synthe… ▽ More Recovering the 3D geometric structure of a face from a single input image is a challenging active research area in computer vision. In this paper, we present a novel method for reconstructing 3D heads from a single or multiple image(s) using a hybrid approach based on deep learning and geometric techniques. We propose an encoder-decoder network based on the U-net architecture and trained on synthetic data only. It predicts both pixel-wise normal vectors and landmarks maps from a single input photo. Landmarks are used for the pose computation and the initialization of the optimization problem, which, in turn, reconstructs the 3D head geometry by using a parametric morphable model and normal vector fields. State-of-the-art results are achieved through qualitative and quantitative evaluation tests on both single and multi-view settings. Despite the fact that the model was trained only on synthetic data, it successfully recovers 3D geometry and precise poses for real-world images. △ Less

Submitted 28 April, 2021; originally announced April 2021.

Comments: 25th International Conference On Pattern Recognition 2020

arXiv:2006.13895 [pdf, other]

Modelling the Statistics of Cyclic Activities by Trajectory Analysis on the Manifold of Positive-Semi-Definite Matrices

Authors: Ettore Maria Celozzi, Luca Ciabini, Luca Cultrera, Pietro Pala, Stefano Berretti, Mohamed Daoudi, Alberto Del Bimbo

Abstract: In this paper, a model is presented to extract statistical summaries to characterize the repetition of a cyclic body action, for instance a gym exercise, for the purpose of checking the compliance of the observed action to a template one and highlighting the parts of the action that are not correctly executed (if any). The proposed system relies on a Riemannian metric to compute the distance betwe… ▽ More In this paper, a model is presented to extract statistical summaries to characterize the repetition of a cyclic body action, for instance a gym exercise, for the purpose of checking the compliance of the observed action to a template one and highlighting the parts of the action that are not correctly executed (if any). The proposed system relies on a Riemannian metric to compute the distance between two poses in such a way that the geometry of the manifold where the pose descriptors lie is preserved; a model to detect the begin and end of each cycle; a model to temporally align the poses of different cycles so as to accurately estimate the \emph{cross-sectional} mean and variance of poses across different cycles. The proposed model is demonstrated using gym videos taken from the Internet. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: accepted at 15th IEEE International Conference on Automatic Face and Gesture Recognition 2020

arXiv:2006.13882 [pdf, other]

Automatic Estimation of Self-Reported Pain by Interpretable Representations of Motion Dynamics

Authors: Benjamin Szczapa, Mohamed Daoudi, Stefano Berretti, Pietro Pala, Alberto Del Bimbo, Zakia Hammal

Abstract: We propose an automatic method for pain intensity measurement from video. For each video, pain intensity was measured using the dynamics of facial movement using 66 facial points. Gram matrices formulation was used for facial points trajectory representations on the Riemannian manifold of symmetric positive semi-definite matrices of fixed rank. Curve fitting and temporal alignment were then used t… ▽ More We propose an automatic method for pain intensity measurement from video. For each video, pain intensity was measured using the dynamics of facial movement using 66 facial points. Gram matrices formulation was used for facial points trajectory representations on the Riemannian manifold of symmetric positive semi-definite matrices of fixed rank. Curve fitting and temporal alignment were then used to smooth the extracted trajectories. A Support Vector Regression model was then trained to encode the extracted trajectories into ten pain intensity levels consistent with the Visual Analogue Scale for pain intensity measurement. The proposed approach was evaluated using the UNBC McMaster Shoulder Pain Archive and was compared to the state-of-the-art on the same data. Using both 5-fold cross-validation and leave-one-subject-out cross-validation, our results are competitive with respect to state-of-the-art methods. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: accepted at ICPR 2020 Conference

arXiv:1908.00646 [pdf, other]

Fitting, Comparison, and Alignment of Trajectories on Positive Semi-Definite Matrices with Application to Action Recognition

Authors: Benjamin Szczapa, Mohamed Daoudi, Stefano Berretti, Alberto Del Bimbo, Pietro Pala, Estelle Massart

Abstract: In this paper, we tackle the problem of action recognition using body skeletons extracted from video sequences. Our approach lies in the continuity of recent works representing video frames by Gramian matrices that describe a trajectory on the Riemannian manifold of positive-semidefinite matrices of fixed rank. In comparison with previous works, the manifold of fixed-rank positive-semidefinite mat… ▽ More In this paper, we tackle the problem of action recognition using body skeletons extracted from video sequences. Our approach lies in the continuity of recent works representing video frames by Gramian matrices that describe a trajectory on the Riemannian manifold of positive-semidefinite matrices of fixed rank. In comparison with previous works, the manifold of fixed-rank positive-semidefinite matrices is here endowed with a different metric, and we resort to different algorithms for the curve fitting and temporal alignment steps. We evaluated our approach on three publicly available datasets (UTKinect-Action3D, KTH-Action and UAV-Gesture). The results of the proposed approach are competitive with respect to state-of-the-art methods, while only involving body skeletons. △ Less

Submitted 9 September, 2019; v1 submitted 1 August, 2019; originally announced August 2019.

Comments: Updated version of the paper published in the workshop HBU2019. The differences with the published version are a few small corrections, mainly misleading notations for the distance function on p. 4, and missing square root in the expression for "d", in the Thm. on p. 4. Noticeable changes w. r. t. v1 and v2 on arxiv, please use this version instead

arXiv:1907.10087 [pdf, other]

doi 10.1109/TPAMI.2020.3002500

Dynamic Facial Expression Generation on Hilbert Hypersphere with Conditional Wasserstein Generative Adversarial Nets

Authors: Naima Otberdout, Mohamed Daoudi, Anis Kacem, Lahoucine Ballihi, Stefano Berretti

Abstract: In this work, we propose a novel approach for generating videos of the six basic facial expressions given a neutral face image. We propose to exploit the face geometry by modeling the facial landmarks motion as curves encoded as points on a hypersphere. By proposing a conditional version of manifold-valued Wasserstein generative adversarial network (GAN) for motion generation on the hypersphere, w… ▽ More In this work, we propose a novel approach for generating videos of the six basic facial expressions given a neutral face image. We propose to exploit the face geometry by modeling the facial landmarks motion as curves encoded as points on a hypersphere. By proposing a conditional version of manifold-valued Wasserstein generative adversarial network (GAN) for motion generation on the hypersphere, we learn the distribution of facial expression dynamics of different classes, from which we synthesize new facial expression motions. The resulting motions can be transformed to sequences of landmarks and then to images sequences by editing the texture information using another conditional Generative Adversarial Network. To the best of our knowledge, this is the first work that explores manifold-valued representations with GAN to address the problem of dynamic facial expression generation. We evaluate our proposed approach both quantitatively and qualitatively on two public datasets; Oulu-CASIA and MUG Facial Expression. Our experimental results demonstrate the effectiveness of our approach in generating realistic videos with continuous motion, realistic appearance and identity preservation. We also show the efficiency of our framework for dynamic facial expressions generation, dynamic facial expression transfer and data augmentation for training improved emotion recognition models. △ Less

Submitted 28 May, 2020; v1 submitted 23 July, 2019; originally announced July 2019.

arXiv:1810.11392 [pdf, other]

Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories

Authors: Naima Otberdout, Anis Kacem, Mohamed Daoudi, Lahoucine Ballihi, Stefano Berretti

Abstract: In this paper, we propose a new approach for facial expression recognition using deep covariance descriptors. The solution is based on the idea of encoding local and global Deep Convolutional Neural Network (DCNN) features extracted from still images, in compact local and global covariance descriptors. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matri… ▽ More In this paper, we propose a new approach for facial expression recognition using deep covariance descriptors. The solution is based on the idea of encoding local and global Deep Convolutional Neural Network (DCNN) features extracted from still images, in compact local and global covariance descriptors. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matrices. By conducting the classification of static facial expressions using Support Vector Machine (SVM) with a valid Gaussian kernel on the SPD manifold, we show that deep covariance descriptors are more effective than the standard classification with fully connected layers and softmax. Besides, we propose a completely new and original solution to model the temporal dynamic of facial expressions as deep trajectories on the SPD manifold. As an extension of the classification pipeline of covariance descriptors, we apply SVM with valid positive definite kernels derived from global alignment for deep covariance trajectories classification. By performing extensive experiments on the Oulu-CASIA, CK+, and SFEW datasets, we show that both the proposed static and dynamic approaches achieve state-of-the-art performance for facial expression recognition outperforming many recent approaches. △ Less

Submitted 4 December, 2019; v1 submitted 25 October, 2018; originally announced October 2018.

Comments: A preliminary version of this work appeared in "Otberdout N, Kacem A, Daoudi M, Ballihi L, Berretti S. Deep Covariance Descriptors for Facial Expression Recognition, in British Machine Vision Conference 2018, BMVC 2018, Northumbria University, Newcastle, UK, September 3-6, 2018. ; 2018 :159." arXiv admin note: substantial text overlap with arXiv:1805.03869

arXiv:1810.09008 [pdf]

3D shape retrieval basing on representatives of classes

Authors: M. Benjelloun, E. W. Dadi, E. M. Daoudi

Abstract: In this paper, we present an improvement of our proposed technique for 3D shape retrieval in classified databases [2] which is based on representatives of classes. Instead of systematically matching the object-query with all 3D models of the database, our idea presented in [2] consist, for a classified database, to represent each class by one representative that is used to orient the retrieval pro… ▽ More In this paper, we present an improvement of our proposed technique for 3D shape retrieval in classified databases [2] which is based on representatives of classes. Instead of systematically matching the object-query with all 3D models of the database, our idea presented in [2] consist, for a classified database, to represent each class by one representative that is used to orient the retrieval process to the right class (the class excepted to contain 3D models similar to the query). In order to increase the chance to fall in the right class, our idea in this work is to represent each class by more than one representative. In this case, instead of using only one representative to decide which is the right class we use a set of representatives this will contribute certainly to improving the relevance of retrieval results. The obtained experimental results show that the relevance is significantly improved. △ Less

Submitted 27 December, 2018; v1 submitted 21 October, 2018; originally announced October 2018.

arXiv:1807.00676 [pdf, other]

A Novel Geometric Framework on Gram Matrix Trajectories for Human Behavior Understanding

Authors: Anis Kacem, Mohamed Daoudi, Boulbaba Ben Amor, Stefano Berretti, Juan Carlos Alvarez-Paiva

Abstract: In this paper, we propose a novel space-time geometric representation of human landmark configurations and derive tools for comparison and classification. We model the temporal evolution of landmarks as parametrized trajectories on the Riemannian manifold of positive semidefinite matrices of fixed-rank. Our representation has the benefit to bring naturally a second desirable quantity when comparin… ▽ More In this paper, we propose a novel space-time geometric representation of human landmark configurations and derive tools for comparison and classification. We model the temporal evolution of landmarks as parametrized trajectories on the Riemannian manifold of positive semidefinite matrices of fixed-rank. Our representation has the benefit to bring naturally a second desirable quantity when comparing shapes, the spatial covariance, in addition to the conventional affine-shape representation. We derived then geometric and computational tools for rate-invariant analysis and adaptive re-sampling of trajectories, grounding on the Riemannian geometry of the underlying manifold. Specifically, our approach involves three steps: (1) landmarks are first mapped into the Riemannian manifold of positive semidefinite matrices of fixed-rank to build time-parameterized trajectories; (2) a temporal war** is performed on the trajectories, providing a geometry-aware (dis-)similarity measure between them; (3) finally, a pairwise proximity function SVM is used to classify them, incorporating the (dis-)similarity measure into the kernel function. We show that such representation and metric achieve competitive results in applications as action recognition and emotion recognition from 3D skeletal data, and facial expression recognition from videos. Experiments have been conducted on several publicly available up-to-date benchmarks. △ Less

Submitted 29 June, 2018; originally announced July 2018.

Comments: Under minor revisions in IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI). A preliminary version of this work appeared in ICCV 17 (A Kacem, M Daoudi, BB Amor, JC Alvarez-Paiva, A Novel Space-Time Representation on the Positive Semidefinite Cone for Facial Expression Recognition, ICCV 17). arXiv admin note: substantial text overlap with arXiv:1707.06440

arXiv:1805.03869 [pdf, other]

Deep Covariance Descriptors for Facial Expression Recognition

Authors: Naima Otberdout, Anis Kacem, Mohamed Daoudi, Lahoucine Ballihi, Stefano Berretti

Abstract: In this paper, covariance matrices are exploited to encode the deep convolutional neural networks (DCNN) features for facial expression recognition. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matrices. By performing the classification of the facial expressions using Gaussian kernel on SPD manifold, we show that the covariance descriptors computed on… ▽ More In this paper, covariance matrices are exploited to encode the deep convolutional neural networks (DCNN) features for facial expression recognition. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matrices. By performing the classification of the facial expressions using Gaussian kernel on SPD manifold, we show that the covariance descriptors computed on DCNN features are more efficient than the standard classification with fully connected layers and softmax. By implementing our approach using the VGG-face and ExpNet architectures with extensive experiments on the Oulu-CASIA and SFEW datasets, we show that the proposed approach achieves performance at the state of the art for facial expression recognition. △ Less

Submitted 10 May, 2018; originally announced May 2018.

arXiv:1707.07180 [pdf, other]

Emotion Recognition by Body Movement Representation on the Manifold of Symmetric Positive Definite Matrices

Authors: Mohamed Daoudi, Stefano Berretti, Pietro Pala, Yvonne Delevoye, Alberto Del Bimbo

Abstract: Emotion recognition is attracting great interest for its potential application in a multitude of real-life situations. Much of the Computer Vision research in this field has focused on relating emotions to facial expressions, with investigations rarely including more than upper body. In this work, we propose a new scenario, for which emotional states are related to 3D dynamics of the whole body mo… ▽ More Emotion recognition is attracting great interest for its potential application in a multitude of real-life situations. Much of the Computer Vision research in this field has focused on relating emotions to facial expressions, with investigations rarely including more than upper body. In this work, we propose a new scenario, for which emotional states are related to 3D dynamics of the whole body motion. To address the complexity of human body movement, we used covariance descriptors of the sequence of the 3D skeleton joints, and represented them in the non-linear Riemannian manifold of Symmetric Positive Definite matrices. In doing so, we exploited geodesic distances and geometric means on the manifold to perform emotion classification. Using sequences of spontaneous walking under the five primary emotional states, we report a method that succeeded in classifying the different emotions, with comparable performance to those observed in a human-based force-choice classification task. △ Less

Submitted 22 July, 2017; originally announced July 2017.

Comments: accepted in I19th International Conference on Image Analysis and processing (ICIAP), 11-15 september Catania, Italy, 2017

arXiv:1707.06440 [pdf, other]

A Novel Space-Time Representation on the Positive Semidefinite Con for Facial Expression Recognition

Authors: Anis Kacem, Mohamed Daoudi, Boulbaba Ben Amor, Juan Carlos Alvarez-Paiva

Abstract: In this paper, we study the problem of facial expression recognition using a novel space-time geometric representation. We describe the temporal evolution of facial landmarks as parametrized trajectories on the Riemannian manifold of positive semidefinite matrices of fixed-rank. Our representation has the advantage to bring naturally a second desirable quantity when comparing shapes -- the spatial… ▽ More In this paper, we study the problem of facial expression recognition using a novel space-time geometric representation. We describe the temporal evolution of facial landmarks as parametrized trajectories on the Riemannian manifold of positive semidefinite matrices of fixed-rank. Our representation has the advantage to bring naturally a second desirable quantity when comparing shapes -- the spatial covariance -- in addition to the conventional affine-shape representation. We derive then geometric and computational tools for rate-invariant analysis and adaptive re-sampling of trajectories, grounding on the Riemannian geometry of the manifold. Specifically, our approach involves three steps: 1) facial landmarks are first mapped into the Riemannian manifold of positive semidefinite matrices of rank 2, to build time-parameterized trajectories; 2) a temporal alignment is performed on the trajectories, providing a geometry-aware (dis-)similarity measure between them; 3) finally, pairwise proximity function SVM (ppfSVM) is used to classify them, incorporating the latter (dis-)similarity measure into the kernel function. We show the effectiveness of the proposed approach on four publicly available benchmarks (CK+, MMI, Oulu-CASIA, and AFEW). The results of the proposed approach are comparable to or better than the state-of-the-art methods when involving only facial landmarks. △ Less

Submitted 20 July, 2017; originally announced July 2017.

Comments: To be appeared at ICCV 2017

Showing 1–30 of 30 results for author: Daoudi, M