-
Multi-Objective Hardware Aware Neural Architecture Search using Hardware Cost Diversity
Authors:
Nilotpal Sinha,
Peyman Rostami,
Abd El Rahman Shabayek,
Anis Kacem,
Djamila Aouada
Abstract:
Hardware-aware Neural Architecture Search approaches (HW-NAS) automate the design of deep learning architectures, tailored specifically to a given target hardware platform. Yet, these techniques demand substantial computational resources, primarily due to the expensive process of assessing the performance of identified architectures. To alleviate this problem, a recent direction in the literature…
▽ More
Hardware-aware Neural Architecture Search approaches (HW-NAS) automate the design of deep learning architectures, tailored specifically to a given target hardware platform. Yet, these techniques demand substantial computational resources, primarily due to the expensive process of assessing the performance of identified architectures. To alleviate this problem, a recent direction in the literature has employed representation similarity metric for efficiently evaluating architecture performance. Nonetheless, since it is inherently a single objective method, it requires multiple runs to identify the optimal architecture set satisfying the diverse hardware cost constraints, thereby increasing the search cost. Furthermore, simply converting the single objective into a multi-objective approach results in an under-explored architectural search space. In this study, we propose a Multi-Objective method to address the HW-NAS problem, called MO-HDNAS, to identify the trade-off set of architectures in a single run with low computational cost. This is achieved by optimizing three objectives: maximizing the representation similarity metric, minimizing hardware cost, and maximizing the hardware cost diversity. The third objective, i.e. hardware cost diversity, is used to facilitate a better exploration of the architecture search space. Experimental results demonstrate the effectiveness of our proposed method in efficiently addressing the HW-NAS problem across six edge devices for the image classification task.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention
Authors:
Mohammad Sadil Khan,
Elona Dupont,
Sk Aziz Ali,
Kseniya Cherenkova,
Anis Kacem,
Djamila Aouada
Abstract:
Reverse engineering in the realm of Computer-Aided Design (CAD) has been a longstanding aspiration, though not yet entirely realized. Its primary aim is to uncover the CAD process behind a physical object given its 3D scan. We propose CAD-SIGNet, an end-to-end trainable and auto-regressive architecture to recover the design history of a CAD model represented as a sequence of sketch-and-extrusion f…
▽ More
Reverse engineering in the realm of Computer-Aided Design (CAD) has been a longstanding aspiration, though not yet entirely realized. Its primary aim is to uncover the CAD process behind a physical object given its 3D scan. We propose CAD-SIGNet, an end-to-end trainable and auto-regressive architecture to recover the design history of a CAD model represented as a sequence of sketch-and-extrusion from an input point cloud. Our model learns visual-language representations by layer-wise cross-attention between point cloud and CAD language embedding. In particular, a new Sketch instance Guided Attention (SGA) module is proposed in order to reconstruct the fine-grained details of the sketches. Thanks to its auto-regressive nature, CAD-SIGNet not only reconstructs a unique full design history of the corresponding CAD model given an input point cloud but also provides multiple plausible design choices. This allows for an interactive reverse engineering scenario by providing designers with multiple next-step choices along with the design process. Extensive experiments on publicly available CAD datasets showcase the effectiveness of our approach against existing baseline models in two settings, namely, full design history recovery and conditional auto-completion from point clouds.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection
Authors:
Dat Nguyen,
Nesryne Mejri,
Inder Pal Singh,
Polina Kuleshova,
Marcella Astrid,
Anis Kacem,
Enjie Ghorbel,
Djamila Aouada
Abstract:
This paper introduces a novel approach for high-quality deepfake detection called Localized Artifact Attention Network (LAA-Net). Existing methods for high-quality deepfake detection are mainly based on a supervised binary classifier coupled with an implicit attention mechanism. As a result, they do not generalize well to unseen manipulations. To handle this issue, two main contributions are made.…
▽ More
This paper introduces a novel approach for high-quality deepfake detection called Localized Artifact Attention Network (LAA-Net). Existing methods for high-quality deepfake detection are mainly based on a supervised binary classifier coupled with an implicit attention mechanism. As a result, they do not generalize well to unseen manipulations. To handle this issue, two main contributions are made. First, an explicit attention mechanism within a multi-task learning framework is proposed. By combining heatmap-based and self-consistency attention strategies, LAA-Net is forced to focus on a few small artifact-prone vulnerable regions. Second, an Enhanced Feature Pyramid Network (E-FPN) is proposed as a simple and effective mechanism for spreading discriminative low-level features into the final feature output, with the advantage of limiting redundancy. Experiments performed on several benchmarks show the superiority of our approach in terms of Area Under the Curve (AUC) and Average Precision (AP). The code is available at https://github.com/10Ring/LAA-Net.
△ Less
Submitted 24 May, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
Hardware Aware Evolutionary Neural Architecture Search using Representation Similarity Metric
Authors:
Nilotpal Sinha,
Abd El Rahman Shabayek,
Anis Kacem,
Peyman Rostami,
Carl Shneider,
Djamila Aouada
Abstract:
Hardware-aware Neural Architecture Search (HW-NAS) is a technique used to automatically design the architecture of a neural network for a specific task and target hardware. However, evaluating the performance of candidate architectures is a key challenge in HW-NAS, as it requires significant computational resources. To address this challenge, we propose an efficient hardware-aware evolution-based…
▽ More
Hardware-aware Neural Architecture Search (HW-NAS) is a technique used to automatically design the architecture of a neural network for a specific task and target hardware. However, evaluating the performance of candidate architectures is a key challenge in HW-NAS, as it requires significant computational resources. To address this challenge, we propose an efficient hardware-aware evolution-based NAS approach called HW-EvRSNAS. Our approach re-frames the neural architecture search problem as finding an architecture with performance similar to that of a reference model for a target hardware, while adhering to a cost constraint for that hardware. This is achieved through a representation similarity metric known as Representation Mutual Information (RMI) employed as a proxy performance evaluator. It measures the mutual information between the hidden layer representations of a reference model and those of sampled architectures using a single training batch. We also use a penalty term that penalizes the search process in proportion to how far an architecture's hardware cost is from the desired hardware cost threshold. This resulted in a significantly reduced search time compared to the literature that reached up to 8000x speedups resulting in lower CO2 emissions. The proposed approach is evaluated on two different search spaces while using lower computational resources. Furthermore, our approach is thoroughly examined on six different edge devices under various hardware cost constraints.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
SHARP Challenge 2023: Solving CAD History and pArameters Recovery from Point clouds and 3D scans. Overview, Datasets, Metrics, and Baselines
Authors:
Dimitrios Mallis,
Sk Aziz Ali,
Elona Dupont,
Kseniya Cherenkova,
Ahmet Serdar Karadeniz,
Mohammad Sadil Khan,
Anis Kacem,
Gleb Gusev,
Djamila Aouada
Abstract:
Recent breakthroughs in geometric Deep Learning (DL) and the availability of large Computer-Aided Design (CAD) datasets have advanced the research on learning CAD modeling processes and relating them to real objects. In this context, 3D reverse engineering of CAD models from 3D scans is considered to be one of the most sought-after goals for the CAD industry. However, recent efforts assume multipl…
▽ More
Recent breakthroughs in geometric Deep Learning (DL) and the availability of large Computer-Aided Design (CAD) datasets have advanced the research on learning CAD modeling processes and relating them to real objects. In this context, 3D reverse engineering of CAD models from 3D scans is considered to be one of the most sought-after goals for the CAD industry. However, recent efforts assume multiple simplifications limiting the applications in real-world settings. The SHARP Challenge 2023 aims at pushing the research a step closer to the real-world scenario of CAD reverse engineering through dedicated datasets and tracks. In this paper, we define the proposed SHARP 2023 tracks, describe the provided datasets, and propose a set of baseline methods along with suitable evaluation metrics to assess the performance of the track solutions. All proposed datasets along with useful routines and the evaluation metrics are publicly available.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Impact of Disentanglement on Pruning Neural Networks
Authors:
Carl Shneider,
Peyman Rostami,
Anis Kacem,
Nilotpal Sinha,
Abd El Rahman Shabayek,
Djamila Aouada
Abstract:
Deploying deep learning neural networks on edge devices, to accomplish task specific objectives in the real-world, requires a reduction in their memory footprint, power consumption, and latency. This can be realized via efficient model compression. Disentangled latent representations produced by variational autoencoder (VAE) networks are a promising approach for achieving model compression because…
▽ More
Deploying deep learning neural networks on edge devices, to accomplish task specific objectives in the real-world, requires a reduction in their memory footprint, power consumption, and latency. This can be realized via efficient model compression. Disentangled latent representations produced by variational autoencoder (VAE) networks are a promising approach for achieving model compression because they mainly retain task-specific information, discarding useless information for the task at hand. We make use of the Beta-VAE framework combined with a standard criterion for pruning to investigate the impact of forcing the network to learn disentangled representations on the pruning process for the task of classification. In particular, we perform experiments on MNIST and CIFAR10 datasets, examine disentanglement challenges, and propose a path forward for future works.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
DermSynth3D: Synthesis of in-the-wild Annotated Dermatology Images
Authors:
Ashish Sinha,
Jeremy Kawahara,
Arezou Pakzad,
Kumar Abhishek,
Matthieu Ruthven,
Enjie Ghorbel,
Anis Kacem,
Djamila Aouada,
Ghassan Hamarneh
Abstract:
In recent years, deep learning (DL) has shown great potential in the field of dermatological image analysis. However, existing datasets in this domain have significant limitations, including a small number of image samples, limited disease conditions, insufficient annotations, and non-standardized image acquisitions. To address these shortcomings, we propose a novel framework called DermSynth3D. D…
▽ More
In recent years, deep learning (DL) has shown great potential in the field of dermatological image analysis. However, existing datasets in this domain have significant limitations, including a small number of image samples, limited disease conditions, insufficient annotations, and non-standardized image acquisitions. To address these shortcomings, we propose a novel framework called DermSynth3D. DermSynth3D blends skin disease patterns onto 3D textured meshes of human subjects using a differentiable renderer and generates 2D images from various camera viewpoints under chosen lighting conditions in diverse background scenes. Our method adheres to top-down rules that constrain the blending and rendering process to create 2D images with skin conditions that mimic in-the-wild acquisitions, ensuring more meaningful results. The framework generates photo-realistic 2D dermoscopy images and the corresponding dense annotations for semantic segmentation of the skin, skin conditions, body parts, bounding boxes around lesions, depth maps, and other 3D scene parameters, such as camera position and lighting conditions. DermSynth3D allows for the creation of custom datasets for various dermatology tasks. We demonstrate the effectiveness of data generated using DermSynth3D by training DL models on synthetic data and evaluating them on various dermatology tasks using real 2D dermatological images. We make our code publicly available at https://github.com/sfu-mial/DermSynth3D.
△ Less
Submitted 21 April, 2024; v1 submitted 21 May, 2023;
originally announced May 2023.
-
SepicNet: Sharp Edges Recovery by Parametric Inference of Curves in 3D Shapes
Authors:
Kseniya Cherenkova,
Elona Dupont,
Anis Kacem,
Ilya Arzhannikov,
Gleb Gusev,
Djamila Aouada
Abstract:
3D scanning as a technique to digitize objects in reality and create their 3D models, is used in many fields and areas. Though the quality of 3D scans depends on the technical characteristics of the 3D scanner, the common drawback is the smoothing of fine details, or the edges of an object. We introduce SepicNet, a novel deep network for the detection and parametrization of sharp edges in 3D shape…
▽ More
3D scanning as a technique to digitize objects in reality and create their 3D models, is used in many fields and areas. Though the quality of 3D scans depends on the technical characteristics of the 3D scanner, the common drawback is the smoothing of fine details, or the edges of an object. We introduce SepicNet, a novel deep network for the detection and parametrization of sharp edges in 3D shapes as primitive curves. To make the network end-to-end trainable, we formulate the curve fitting in a differentiable manner. We develop an adaptive point cloud sampling technique that captures the sharp features better than uniform sampling. The experiments were conducted on a newly introduced large-scale dataset of 50k 3D scans, where the sharp edge annotations were extracted from their parametric CAD models, and demonstrate significant improvement over state-of-the-art methods.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Discriminator-free Unsupervised Domain Adaptation for Multi-label Image Classification
Authors:
Indel Pal Singh,
Enjie Ghorbel,
Anis Kacem,
Arunkumar Rathinam,
Djamila Aouada
Abstract:
In this paper, a discriminator-free adversarial-based Unsupervised Domain Adaptation (UDA) for Multi-Label Image Classification (MLIC) referred to as DDA-MLIC is proposed. Recently, some attempts have been made for introducing adversarial-based UDA methods in the context of MLIC. However, these methods which rely on an additional discriminator subnet present one major shortcoming. The learning of…
▽ More
In this paper, a discriminator-free adversarial-based Unsupervised Domain Adaptation (UDA) for Multi-Label Image Classification (MLIC) referred to as DDA-MLIC is proposed. Recently, some attempts have been made for introducing adversarial-based UDA methods in the context of MLIC. However, these methods which rely on an additional discriminator subnet present one major shortcoming. The learning of domain-invariant features may harm their task-specific discriminative power, since the classification and discrimination tasks are decoupled. Herein, we propose to overcome this issue by introducing a novel adversarial critic that is directly deduced from the task-specific classifier. Specifically, a two-component Gaussian Mixture Model (GMM) is fitted on the source and target predictions in order to distinguish between two clusters. This allows extracting a Gaussian distribution for each component. The resulting Gaussian distributions are then used for formulating an adversarial loss based on a Frechet distance. The proposed method is evaluated on several multi-label image datasets covering three different types of domain shift. The obtained results demonstrate that DDA-MLIC outperforms existing state-of-the-art methods in terms of precision while requiring a lower number of parameters. The code is publicly available at github.com/cvi2snt/DDA-MLIC.
△ Less
Submitted 8 November, 2023; v1 submitted 25 January, 2023;
originally announced January 2023.
-
CADOps-Net: Jointly Learning CAD Operation Types and Steps from Boundary-Representations
Authors:
Elona Dupont,
Kseniya Cherenkova,
Anis Kacem,
Sk Aziz Ali,
Ilya Arzhannikov,
Gleb Gusev,
Djamila Aouada
Abstract:
3D reverse engineering is a long sought-after, yet not completely achieved goal in the Computer-Aided Design (CAD) industry. The objective is to recover the construction history of a CAD model. Starting from a Boundary Representation (B-Rep) of a CAD model, this paper proposes a new deep neural network, CADOps-Net, that jointly learns the CAD operation types and the decomposition into different CA…
▽ More
3D reverse engineering is a long sought-after, yet not completely achieved goal in the Computer-Aided Design (CAD) industry. The objective is to recover the construction history of a CAD model. Starting from a Boundary Representation (B-Rep) of a CAD model, this paper proposes a new deep neural network, CADOps-Net, that jointly learns the CAD operation types and the decomposition into different CAD operation steps. This joint learning allows to divide a B-Rep into parts that were created by various types of CAD operations at the same construction step; therefore providing relevant information for further recovery of the design history. Furthermore, we propose the novel CC3D-Ops dataset that includes over $37k$ CAD models annotated with CAD operation type labels and step labels. Compared to existing datasets, the complexity and variety of CC3D-Ops models are closer to those used for industrial purposes. Our experiments, conducted on the proposed CC3D-Ops and the publicly available Fusion360 datasets, demonstrate the competitive performance of CADOps-Net with respect to state-of-the-art, and confirm the importance of the joint learning of CAD operation types and steps.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
TSCom-Net: Coarse-to-Fine 3D Textured Shape Completion Network
Authors:
Ahmet Serdar Karadeniz,
Sk Aziz Ali,
Anis Kacem,
Elona Dupont,
Djamila Aouada
Abstract:
Reconstructing 3D human body shapes from 3D partial textured scans remains a fundamental task for many computer vision and graphics applications -- e.g., body animation, and virtual dressing. We propose a new neural network architecture for 3D body shape and high-resolution texture completion -- BCom-Net -- that can reconstruct the full geometry from mid-level to high-level partial input scans. We…
▽ More
Reconstructing 3D human body shapes from 3D partial textured scans remains a fundamental task for many computer vision and graphics applications -- e.g., body animation, and virtual dressing. We propose a new neural network architecture for 3D body shape and high-resolution texture completion -- BCom-Net -- that can reconstruct the full geometry from mid-level to high-level partial input scans. We decompose the overall reconstruction task into two stages - first, a joint implicit learning network (SCom-Net and TCom-Net) that takes a voxelized scan and its occupancy grid as input to reconstruct the full body shape and predict vertex textures. Second, a high-resolution texture completion network, that utilizes the predicted coarse vertex textures to inpaint the missing parts of the partial 'texture atlas'. A thorough experimental evaluation on 3DBodyTex.V2 dataset shows that our method achieves competitive results with respect to the state-of-the-art while generalizing to different types and levels of partial shapes. The proposed method has also ranked second in the track1 of SHApe Recovery from Partial textured 3D scans (SHARP [38,1]) 2022 challenge1.
△ Less
Submitted 22 August, 2022; v1 submitted 18 August, 2022;
originally announced August 2022.
-
Disentangled Face Identity Representations for joint 3D Face Recognition and Expression Neutralisation
Authors:
Anis Kacem,
Kseniya Cherenkova,
Djamila Aouada
Abstract:
In this paper, we propose a new deep learning-based approach for disentangling face identity representations from expressive 3D faces. Given a 3D face, our approach not only extracts a disentangled identity representation but also generates a realistic 3D face with a neutral expression while predicting its identity. The proposed network consists of three components; (1) a Graph Convolutional Autoe…
▽ More
In this paper, we propose a new deep learning-based approach for disentangling face identity representations from expressive 3D faces. Given a 3D face, our approach not only extracts a disentangled identity representation but also generates a realistic 3D face with a neutral expression while predicting its identity. The proposed network consists of three components; (1) a Graph Convolutional Autoencoder (GCA) to encode the 3D faces into latent representations, (2) a Generative Adversarial Network (GAN) that translates the latent representations of expressive faces into those of neutral faces, (3) and an identity recognition sub-network taking advantage of the neutralized latent representations for 3D face recognition. The whole network is trained in an end-to-end manner. Experiments are conducted on three publicly available datasets showing the effectiveness of the proposed approach.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
Face-GCN: A Graph Convolutional Network for 3D Dynamic Face Identification/Recognition
Authors:
Konstantinos Papadopoulos,
Anis Kacem,
Abdelrahman Shabayek,
Djamila Aouada
Abstract:
Face identification/recognition has significantly advanced over the past years. However, most of the proposed approaches rely on static RGB frames and on neutral facial expressions. This has two disadvantages. First, important facial shape cues are ignored. Second, facial deformations due to expressions can have an impact on the performance of such a method. In this paper, we propose a novel frame…
▽ More
Face identification/recognition has significantly advanced over the past years. However, most of the proposed approaches rely on static RGB frames and on neutral facial expressions. This has two disadvantages. First, important facial shape cues are ignored. Second, facial deformations due to expressions can have an impact on the performance of such a method. In this paper, we propose a novel framework for dynamic 3D face identification/recognition based on facial keypoints. Each dynamic sequence of facial expressions is represented as a spatio-temporal graph, which is constructed using 3D facial landmarks. Each graph node contains local shape and texture features that are extracted from its neighborhood. For the classification/identification of faces, a Spatio-temporal Graph Convolutional Network (ST-GCN) is used. Finally, we evaluate our approach on a challenging dynamic 3D facial expression dataset.
△ Less
Submitted 20 April, 2021; v1 submitted 19 April, 2021;
originally announced April 2021.
-
SHARP 2020: The 1st Shape Recovery from Partial Textured 3D Scans Challenge Results
Authors:
Alexandre Saint,
Anis Kacem,
Kseniya Cherenkova,
Konstantinos Papadopoulos,
Julian Chibane,
Gerard Pons-Moll,
Gleb Gusev,
David Fofi,
Djamila Aouada,
Bjorn Ottersten
Abstract:
The SHApe Recovery from Partial textured 3D scans challenge, SHARP 2020, is the first edition of a challenge fostering and benchmarking methods for recovering complete textured 3D scans from raw incomplete data. SHARP 2020 is organised as a workshop in conjunction with ECCV 2020. There are two complementary challenges, the first one on 3D human scans, and the second one on generic objects. Challen…
▽ More
The SHApe Recovery from Partial textured 3D scans challenge, SHARP 2020, is the first edition of a challenge fostering and benchmarking methods for recovering complete textured 3D scans from raw incomplete data. SHARP 2020 is organised as a workshop in conjunction with ECCV 2020. There are two complementary challenges, the first one on 3D human scans, and the second one on generic objects. Challenge 1 is further split into two tracks, focusing, first, on large body and clothing regions, and, second, on fine body details. A novel evaluation metric is proposed to quantify jointly the shape reconstruction, the texture reconstruction and the amount of completed data. Additionally, two unique datasets of 3D scans are proposed, to provide raw ground-truth data for the benchmarks. The datasets are released to the scientific community. Moreover, an accompanying custom library of software routines is also released to the scientific community. It allows for processing 3D scans, generating partial data and performing the evaluation. Results of the competition, analysed in comparison to baselines, show the validity of the proposed evaluation metrics, and highlight the challenging aspects of the task and of the datasets. Details on the SHARP 2020 challenge can be found at https://cvi2.uni.lu/sharp2020/.
△ Less
Submitted 26 October, 2020;
originally announced October 2020.
-
3DBooSTeR: 3D Body Shape and Texture Recovery
Authors:
Alexandre Saint,
Anis Kacem,
Kseniya Cherenkova,
Djamila Aouada
Abstract:
We propose 3DBooSTeR, a novel method to recover a textured 3D body mesh from a textured partial 3D scan. With the advent of virtual and augmented reality, there is a demand for creating realistic and high-fidelity digital 3D human representations. However, 3D scanning systems can only capture the 3D human body shape up to some level of defects due to its complexity, including occlusion between bod…
▽ More
We propose 3DBooSTeR, a novel method to recover a textured 3D body mesh from a textured partial 3D scan. With the advent of virtual and augmented reality, there is a demand for creating realistic and high-fidelity digital 3D human representations. However, 3D scanning systems can only capture the 3D human body shape up to some level of defects due to its complexity, including occlusion between body parts, varying levels of details, shape deformations and the articulated skeleton. Textured 3D mesh completion is thus important to enhance 3D acquisitions. The proposed approach decouples the shape and texture completion into two sequential tasks. The shape is recovered by an encoder-decoder network deforming a template body mesh. The texture is subsequently obtained by projecting the partial texture onto the template mesh before inpainting the corresponding texture map with a novel approach. The approach is validated on the 3DBodyTex.v2 dataset.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
Multi-objective scheduling on two dedicated processors
Authors:
Adel Kacem,
Abdelaziz Dammak
Abstract:
We study a multi-objective scheduling problem on two dedicated processors. The aim is to minimize simultaneously the makespan, the total tardiness and the total completion time. This NP-hard problem requires the use of well-adapted methods. For this, we adapted genetic algorithms to multi-objective case. Three methods are presented to solve this problem. The first is aggregative, the second is Par…
▽ More
We study a multi-objective scheduling problem on two dedicated processors. The aim is to minimize simultaneously the makespan, the total tardiness and the total completion time. This NP-hard problem requires the use of well-adapted methods. For this, we adapted genetic algorithms to multi-objective case. Three methods are presented to solve this problem. The first is aggregative, the second is Pareto and the third is non-dominated sorting genetic algorithm II (NSGA-II). We proposed some adapted lower bounds for each criterion to evaluate the quality of the found results on a large set of instances. Indeed, these bounds also make it possible to determine the dominance of one algorithm over another based on the different results found by each of them. We used two metrics to measure the quality of the Pareto front: the hypervolume indicator (HV) and the number of solutions in the optimal front (ND). The obtained results show the effectiveness of the proposed algorithms.
△ Less
Submitted 15 April, 2020; v1 submitted 12 August, 2019;
originally announced August 2019.
-
Dynamic Facial Expression Generation on Hilbert Hypersphere with Conditional Wasserstein Generative Adversarial Nets
Authors:
Naima Otberdout,
Mohamed Daoudi,
Anis Kacem,
Lahoucine Ballihi,
Stefano Berretti
Abstract:
In this work, we propose a novel approach for generating videos of the six basic facial expressions given a neutral face image. We propose to exploit the face geometry by modeling the facial landmarks motion as curves encoded as points on a hypersphere. By proposing a conditional version of manifold-valued Wasserstein generative adversarial network (GAN) for motion generation on the hypersphere, w…
▽ More
In this work, we propose a novel approach for generating videos of the six basic facial expressions given a neutral face image. We propose to exploit the face geometry by modeling the facial landmarks motion as curves encoded as points on a hypersphere. By proposing a conditional version of manifold-valued Wasserstein generative adversarial network (GAN) for motion generation on the hypersphere, we learn the distribution of facial expression dynamics of different classes, from which we synthesize new facial expression motions. The resulting motions can be transformed to sequences of landmarks and then to images sequences by editing the texture information using another conditional Generative Adversarial Network. To the best of our knowledge, this is the first work that explores manifold-valued representations with GAN to address the problem of dynamic facial expression generation. We evaluate our proposed approach both quantitatively and qualitatively on two public datasets; Oulu-CASIA and MUG Facial Expression. Our experimental results demonstrate the effectiveness of our approach in generating realistic videos with continuous motion, realistic appearance and identity preservation. We also show the efficiency of our framework for dynamic facial expressions generation, dynamic facial expression transfer and data augmentation for training improved emotion recognition models.
△ Less
Submitted 28 May, 2020; v1 submitted 23 July, 2019;
originally announced July 2019.
-
Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories
Authors:
Naima Otberdout,
Anis Kacem,
Mohamed Daoudi,
Lahoucine Ballihi,
Stefano Berretti
Abstract:
In this paper, we propose a new approach for facial expression recognition using deep covariance descriptors. The solution is based on the idea of encoding local and global Deep Convolutional Neural Network (DCNN) features extracted from still images, in compact local and global covariance descriptors. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matri…
▽ More
In this paper, we propose a new approach for facial expression recognition using deep covariance descriptors. The solution is based on the idea of encoding local and global Deep Convolutional Neural Network (DCNN) features extracted from still images, in compact local and global covariance descriptors. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matrices. By conducting the classification of static facial expressions using Support Vector Machine (SVM) with a valid Gaussian kernel on the SPD manifold, we show that deep covariance descriptors are more effective than the standard classification with fully connected layers and softmax. Besides, we propose a completely new and original solution to model the temporal dynamic of facial expressions as deep trajectories on the SPD manifold. As an extension of the classification pipeline of covariance descriptors, we apply SVM with valid positive definite kernels derived from global alignment for deep covariance trajectories classification. By performing extensive experiments on the Oulu-CASIA, CK+, and SFEW datasets, we show that both the proposed static and dynamic approaches achieve state-of-the-art performance for facial expression recognition outperforming many recent approaches.
△ Less
Submitted 4 December, 2019; v1 submitted 25 October, 2018;
originally announced October 2018.
-
A Novel Geometric Framework on Gram Matrix Trajectories for Human Behavior Understanding
Authors:
Anis Kacem,
Mohamed Daoudi,
Boulbaba Ben Amor,
Stefano Berretti,
Juan Carlos Alvarez-Paiva
Abstract:
In this paper, we propose a novel space-time geometric representation of human landmark configurations and derive tools for comparison and classification. We model the temporal evolution of landmarks as parametrized trajectories on the Riemannian manifold of positive semidefinite matrices of fixed-rank. Our representation has the benefit to bring naturally a second desirable quantity when comparin…
▽ More
In this paper, we propose a novel space-time geometric representation of human landmark configurations and derive tools for comparison and classification. We model the temporal evolution of landmarks as parametrized trajectories on the Riemannian manifold of positive semidefinite matrices of fixed-rank. Our representation has the benefit to bring naturally a second desirable quantity when comparing shapes, the spatial covariance, in addition to the conventional affine-shape representation. We derived then geometric and computational tools for rate-invariant analysis and adaptive re-sampling of trajectories, grounding on the Riemannian geometry of the underlying manifold. Specifically, our approach involves three steps: (1) landmarks are first mapped into the Riemannian manifold of positive semidefinite matrices of fixed-rank to build time-parameterized trajectories; (2) a temporal war** is performed on the trajectories, providing a geometry-aware (dis-)similarity measure between them; (3) finally, a pairwise proximity function SVM is used to classify them, incorporating the (dis-)similarity measure into the kernel function. We show that such representation and metric achieve competitive results in applications as action recognition and emotion recognition from 3D skeletal data, and facial expression recognition from videos. Experiments have been conducted on several publicly available up-to-date benchmarks.
△ Less
Submitted 29 June, 2018;
originally announced July 2018.
-
Analysis of Search Stratagem Utilisation
Authors:
Ameni Kacem,
Philipp Mayr
Abstract:
In Interactive IR, researchers consider the user behaviour towards systems and search tasks in order to adapt search results and to improve the search experience of users. Analysing the users' past interactions with the system is one typical approach. In this paper, we analyse the user behaviour in retrieval sessions towards Marcia Bates' search stratagems such as Footnote Chasing, Citation Search…
▽ More
In Interactive IR, researchers consider the user behaviour towards systems and search tasks in order to adapt search results and to improve the search experience of users. Analysing the users' past interactions with the system is one typical approach. In this paper, we analyse the user behaviour in retrieval sessions towards Marcia Bates' search stratagems such as Footnote Chasing, Citation Searching, Keyword Searching, Author Searching and Journal Run in a real-life academic search engine. In fact, search stratagems represent high-level search behaviour as the users go beyond simple execution of queries and investigate more of the system functionalities. We performed analyses of these five search stratagems using two datasets extracted from the social sciences search engine sowiport. A specific focus was the detection of the search phase and frequency of the usage of these stratagems. In addition, we explored the impact of these stratagems on the whole search process performance. We addressed mainly the usage patterns' observation of the stratagems, their impact on the conduct of retrieval sessions and explore whether they are used similarly in both datasets. From the observation and metrics proposed, we can conclude that the utilisation of search stratagems in real retrieval sessions leads to an improvement of the precision in terms of positive interactions. However, the difference is that Footnote Chasing, Citation Searching and Journal Run appear mostly at the end of a session while Keyword and Author Searching appear typically at the beginning. Thus, we can conclude from the log analysis that the improvement of search functionalities including personalisation and/or recommendation could be achieved by considering references, citations, and journals in the ranking process.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
Deep Covariance Descriptors for Facial Expression Recognition
Authors:
Naima Otberdout,
Anis Kacem,
Mohamed Daoudi,
Lahoucine Ballihi,
Stefano Berretti
Abstract:
In this paper, covariance matrices are exploited to encode the deep convolutional neural networks (DCNN) features for facial expression recognition. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matrices. By performing the classification of the facial expressions using Gaussian kernel on SPD manifold, we show that the covariance descriptors computed on…
▽ More
In this paper, covariance matrices are exploited to encode the deep convolutional neural networks (DCNN) features for facial expression recognition. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matrices. By performing the classification of the facial expressions using Gaussian kernel on SPD manifold, we show that the covariance descriptors computed on DCNN features are more efficient than the standard classification with fully connected layers and softmax. By implementing our approach using the VGG-face and ExpNet architectures with extensive experiments on the Oulu-CASIA and SFEW datasets, we show that the proposed approach achieves performance at the state of the art for facial expression recognition.
△ Less
Submitted 10 May, 2018;
originally announced May 2018.
-
A Novel Space-Time Representation on the Positive Semidefinite Con for Facial Expression Recognition
Authors:
Anis Kacem,
Mohamed Daoudi,
Boulbaba Ben Amor,
Juan Carlos Alvarez-Paiva
Abstract:
In this paper, we study the problem of facial expression recognition using a novel space-time geometric representation. We describe the temporal evolution of facial landmarks as parametrized trajectories on the Riemannian manifold of positive semidefinite matrices of fixed-rank. Our representation has the advantage to bring naturally a second desirable quantity when comparing shapes -- the spatial…
▽ More
In this paper, we study the problem of facial expression recognition using a novel space-time geometric representation. We describe the temporal evolution of facial landmarks as parametrized trajectories on the Riemannian manifold of positive semidefinite matrices of fixed-rank. Our representation has the advantage to bring naturally a second desirable quantity when comparing shapes -- the spatial covariance -- in addition to the conventional affine-shape representation. We derive then geometric and computational tools for rate-invariant analysis and adaptive re-sampling of trajectories, grounding on the Riemannian geometry of the manifold. Specifically, our approach involves three steps: 1) facial landmarks are first mapped into the Riemannian manifold of positive semidefinite matrices of rank 2, to build time-parameterized trajectories; 2) a temporal alignment is performed on the trajectories, providing a geometry-aware (dis-)similarity measure between them; 3) finally, pairwise proximity function SVM (ppfSVM) is used to classify them, incorporating the latter (dis-)similarity measure into the kernel function. We show the effectiveness of the proposed approach on four publicly available benchmarks (CK+, MMI, Oulu-CASIA, and AFEW). The results of the proposed approach are comparable to or better than the state-of-the-art methods when involving only facial landmarks.
△ Less
Submitted 20 July, 2017;
originally announced July 2017.
-
Analysis of Footnote Chasing and Citation Searching in an Academic Search Engine
Authors:
Ameni Kacem,
Philipp Mayr
Abstract:
In interactive information retrieval, researchers consider the user behavior towards systems and search tasks in order to adapt search results by analyzing their past interactions. In this paper, we analyze the user behavior towards Marcia Bates' search stratagems such as 'footnote chasing' and 'citation search' in an academic search engine. We performed a preliminary analysis of their frequency a…
▽ More
In interactive information retrieval, researchers consider the user behavior towards systems and search tasks in order to adapt search results by analyzing their past interactions. In this paper, we analyze the user behavior towards Marcia Bates' search stratagems such as 'footnote chasing' and 'citation search' in an academic search engine. We performed a preliminary analysis of their frequency and stage of use in the social sciences search engine sowiport. In addition, we explored the impact of these stratagems on the whole search process performance. We can conclude that the appearance of these two search features in real retrieval sessions lead to an improvement of the precision in terms of positive interactions with 16% when using footnote chasing and 17% for the citation search stratagem.
△ Less
Submitted 23 September, 2017; v1 submitted 8 July, 2017;
originally announced July 2017.
-
A Complete Year of User Retrieval Sessions in a Social Sciences Academic Search Engine
Authors:
Philipp Mayr,
Ameni Kacem
Abstract:
In this paper, we present an open data set extracted from the transaction log of the social sciences academic search engine sowiport. The data set includes a filtered set of 484,449 retrieval sessions which have been carried out by sowiport users in the period from April 2014 to April 2015. We propose a description of interactions performed by the academic search engine users that can be used in d…
▽ More
In this paper, we present an open data set extracted from the transaction log of the social sciences academic search engine sowiport. The data set includes a filtered set of 484,449 retrieval sessions which have been carried out by sowiport users in the period from April 2014 to April 2015. We propose a description of interactions performed by the academic search engine users that can be used in different applications such as result ranking improvement, user modeling, query reformulation analysis, search pattern recognition.
△ Less
Submitted 23 September, 2017; v1 submitted 2 June, 2017;
originally announced June 2017.