-
Model for Peanuts: Hijacking ML Models without Training Access is Possible
Authors:
Mahmoud Ghorbel,
Halima Bouzidi,
Ioan Marius Bilasco,
Ihsen Alouani
Abstract:
The massive deployment of Machine Learning (ML) models has been accompanied by the emergence of several attacks that threaten their trustworthiness and raise ethical and societal concerns such as invasion of privacy, discrimination risks, and lack of accountability. Model hijacking is one of these attacks, where the adversary aims to hijack a victim model to execute a different task than its origi…
▽ More
The massive deployment of Machine Learning (ML) models has been accompanied by the emergence of several attacks that threaten their trustworthiness and raise ethical and societal concerns such as invasion of privacy, discrimination risks, and lack of accountability. Model hijacking is one of these attacks, where the adversary aims to hijack a victim model to execute a different task than its original one. Model hijacking can cause accountability and security risks since a hijacked model owner can be framed for having their model offering illegal or unethical services. Prior state-of-the-art works consider model hijacking as a training time attack, whereby an adversary requires access to the ML model training to execute their attack. In this paper, we consider a stronger threat model where the attacker has no access to the training phase of the victim model. Our intuition is that ML models, typically over-parameterized, might (unintentionally) learn more than the intended task for they are trained. We propose a simple approach for model hijacking at inference time named SnatchML to classify unknown input samples using distance measures in the latent space of the victim model to previously known samples associated with the hijacking task classes. SnatchML empirically shows that benign pre-trained models can execute tasks that are semantically related to the initial task. Surprisingly, this can be true even for hijacking tasks unrelated to the original task. We also explore different methods to mitigate this risk. We first propose a novel approach we call meta-unlearning, designed to help the model unlearn a potentially malicious task while training on the original task dataset. We also provide insights on over-parameterization as one possible inherent factor that makes model hijacking easier, and we accordingly propose a compression-based countermeasure against this attack.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
eMotion-GAN: A Motion-based GAN for Photorealistic and Facial Expression Preserving Frontal View Synthesis
Authors:
Omar Ikne,
Benjamin Allaert,
Ioan Marius Bilasco,
Hazem Wannous
Abstract:
Many existing facial expression recognition (FER) systems encounter substantial performance degradation when faced with variations in head pose. Numerous frontalization methods have been proposed to enhance these systems' performance under such conditions. However, they often introduce undesirable deformations, rendering them less suitable for precise facial expression analysis. In this paper, we…
▽ More
Many existing facial expression recognition (FER) systems encounter substantial performance degradation when faced with variations in head pose. Numerous frontalization methods have been proposed to enhance these systems' performance under such conditions. However, they often introduce undesirable deformations, rendering them less suitable for precise facial expression analysis. In this paper, we present eMotion-GAN, a novel deep learning approach designed for frontal view synthesis while preserving facial expressions within the motion domain. Considering the motion induced by head variation as noise and the motion induced by facial expression as the relevant information, our model is trained to filter out the noisy motion in order to retain only the motion related to facial expression. The filtered motion is then mapped onto a neutral frontal face to generate the corresponding expressive frontal face. We conducted extensive evaluations using several widely recognized dynamic FER datasets, which encompass sequences exhibiting various degrees of head pose variations in both intensity and orientation. Our results demonstrate the effectiveness of our approach in significantly reducing the FER performance gap between frontal and non-frontal faces. Specifically, we achieved a FER improvement of up to +5\% for small pose variations and up to +20\% improvement for larger pose variations. Code available at \url{https://github.com/o-ikne/eMotion-GAN.git}.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
S3TC: Spiking Separated Spatial and Temporal Convolutions with Unsupervised STDP-based Learning for Action Recognition
Authors:
Mireille El-Assal,
Pierre Tirilly,
Ioan Marius Bilasco
Abstract:
Video analysis is a major computer vision task that has received a lot of attention in recent years. The current state-of-the-art performance for video analysis is achieved with Deep Neural Networks (DNNs) that have high computational costs and need large amounts of labeled data for training. Spiking Neural Networks (SNNs) have significantly lower computational costs (thousands of times) than regu…
▽ More
Video analysis is a major computer vision task that has received a lot of attention in recent years. The current state-of-the-art performance for video analysis is achieved with Deep Neural Networks (DNNs) that have high computational costs and need large amounts of labeled data for training. Spiking Neural Networks (SNNs) have significantly lower computational costs (thousands of times) than regular non-spiking networks when implemented on neuromorphic hardware. They have been used for video analysis with methods like 3D Convolutional Spiking Neural Networks (3D CSNNs). However, these networks have a significantly larger number of parameters compared with spiking 2D CSNN. This, not only increases the computational costs, but also makes these networks more difficult to implement with neuromorphic hardware. In this work, we use CSNNs trained in an unsupervised manner with the Spike Timing-Dependent Plasticity (STDP) rule, and we introduce, for the first time, Spiking Separated Spatial and Temporal Convolutions (S3TCs) for the sake of reducing the number of parameters required for video analysis. This unsupervised learning has the advantage of not needing large amounts of labeled data for training. Factorizing a single spatio-temporal spiking convolution into a spatial and a temporal spiking convolution decreases the number of parameters of the network. We test our network with the KTH, Weizmann, and IXMAS datasets, and we show that S3TCs successfully extract spatio-temporal information from videos, while increasing the output spiking activity, and outperforming spiking 3D convolutions.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Paired Competing Neurons Improving STDP Supervised Local Learning In Spiking Neural Networks
Authors:
Gaspard Goupy,
Pierre Tirilly,
Ioan Marius Bilasco
Abstract:
Direct training of Spiking Neural Networks (SNNs) on neuromorphic hardware has the potential to significantly reduce the energy consumption of artificial neural network training. SNNs trained with Spike Timing-Dependent Plasticity (STDP) benefit from gradient-free and unsupervised local learning, which can be easily implemented on ultra-low-power neuromorphic hardware. However, classification task…
▽ More
Direct training of Spiking Neural Networks (SNNs) on neuromorphic hardware has the potential to significantly reduce the energy consumption of artificial neural network training. SNNs trained with Spike Timing-Dependent Plasticity (STDP) benefit from gradient-free and unsupervised local learning, which can be easily implemented on ultra-low-power neuromorphic hardware. However, classification tasks cannot be performed solely with unsupervised STDP. In this paper, we propose Stabilized Supervised STDP (S2-STDP), a supervised STDP learning rule to train the classification layer of an SNN equipped with unsupervised STDP for feature extraction. S2-STDP integrates error-modulated weight updates that align neuron spikes with desired timestamps derived from the average firing time within the layer. Then, we introduce a training architecture called Paired Competing Neurons (PCN) to further enhance the learning capabilities of our classification layer trained with S2-STDP. PCN associates each class with paired neurons and encourages neuron specialization toward target or non-target samples through intra-class competition. We evaluate our methods on image recognition datasets, including MNIST, Fashion-MNIST, and CIFAR-10. Results show that our methods outperform state-of-the-art supervised STDP learning rules, for comparable architectures and numbers of neurons. Further analysis demonstrates that the use of PCN enhances the performance of S2-STDP, regardless of the hyperparameter set and without introducing any additional hyperparameters.
△ Less
Submitted 27 April, 2024; v1 submitted 4 August, 2023;
originally announced August 2023.
-
Spiking Two-Stream Methods with Unsupervised STDP-based Learning for Action Recognition
Authors:
Mireille El-Assal,
Pierre Tirilly,
Ioan Marius Bilasco
Abstract:
Video analysis is a computer vision task that is useful for many applications like surveillance, human-machine interaction, and autonomous vehicles. Deep Convolutional Neural Networks (CNNs) are currently the state-of-the-art methods for video analysis. However they have high computational costs, and need a large amount of labeled data for training. In this paper, we use Convolutional Spiking Neur…
▽ More
Video analysis is a computer vision task that is useful for many applications like surveillance, human-machine interaction, and autonomous vehicles. Deep Convolutional Neural Networks (CNNs) are currently the state-of-the-art methods for video analysis. However they have high computational costs, and need a large amount of labeled data for training. In this paper, we use Convolutional Spiking Neural Networks (CSNNs) trained with the unsupervised Spike Timing-Dependent Plasticity (STDP) learning rule for action classification. These networks represent the information using asynchronous low-energy spikes. This allows the network to be more energy efficient and neuromorphic hardware-friendly. However, the behaviour of CSNNs is not studied enough with spatio-temporal computer vision models. Therefore, we explore transposing two-stream neural networks into the spiking domain. Implementing this model with unsupervised STDP-based CSNNs allows us to further study the performance of these networks with video analysis. In this work, we show that two-stream CSNNs can successfully extract spatio-temporal information from videos despite using limited training data, and that the spiking spatial and temporal streams are complementary. We also show that using a spatio-temporal stream within a spiking STDP-based two-stream architecture leads to information redundancy and does not improve the performance.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
AdvART: Adversarial Art for Camouflaged Object Detection Attacks
Authors:
Amira Guesmi,
Ioan Marius Bilasco,
Muhammad Shafique,
Ihsen Alouani
Abstract:
Physical adversarial attacks pose a significant practical threat as it deceives deep learning systems operating in the real world by producing prominent and maliciously designed physical perturbations. Emphasizing the evaluation of naturalness is crucial in such attacks, as humans can readily detect and eliminate unnatural manipulations. To overcome this limitation, recent work has proposed levera…
▽ More
Physical adversarial attacks pose a significant practical threat as it deceives deep learning systems operating in the real world by producing prominent and maliciously designed physical perturbations. Emphasizing the evaluation of naturalness is crucial in such attacks, as humans can readily detect and eliminate unnatural manipulations. To overcome this limitation, recent work has proposed leveraging generative adversarial networks (GANs) to generate naturalistic patches, which may not catch human's attention. However, these approaches suffer from a limited latent space which leads to an inevitable trade-off between naturalness and attack efficiency. In this paper, we propose a novel approach to generate naturalistic and inconspicuous adversarial patches. Specifically, we redefine the optimization problem by introducing an additional loss term to the cost function. This term works as a semantic constraint to ensure that the generated camouflage pattern holds semantic meaning rather than arbitrary patterns. The additional term leverages similarity metrics to construct a similarity loss that we optimize within the global objective function. Our technique is based on directly manipulating the pixel values in the patch, which gives higher flexibility and larger space compared to the GAN-based techniques that are based on indirectly optimizing the patch by modifying the latent vector. Our attack achieves superior success rate of up to 91.19\% and 72\%, respectively, in the digital world and when deployed in smart cameras at the edge compared to the GAN-based technique.
△ Less
Submitted 9 February, 2024; v1 submitted 3 March, 2023;
originally announced March 2023.
-
2D versus 3D Convolutional Spiking Neural Networks Trained with Unsupervised STDP for Human Action Recognition
Authors:
Mireille El-Assal,
Pierre Tirilly,
Ioan Marius Bilasco
Abstract:
Current advances in technology have highlighted the importance of video analysis in the domain of computer vision. However, video analysis has considerably high computational costs with traditional artificial neural networks (ANNs). Spiking neural networks (SNNs) are third generation biologically plausible models that process the information in the form of spikes. Unsupervised learning with SNNs u…
▽ More
Current advances in technology have highlighted the importance of video analysis in the domain of computer vision. However, video analysis has considerably high computational costs with traditional artificial neural networks (ANNs). Spiking neural networks (SNNs) are third generation biologically plausible models that process the information in the form of spikes. Unsupervised learning with SNNs using the spike timing dependent plasticity (STDP) rule has the potential to overcome some bottlenecks of regular artificial neural networks, but STDP-based SNNs are still immature and their performance is far behind that of ANNs. In this work, we study the performance of SNNs when challenged with the task of human action recognition, because this task has many real-time applications in computer vision, such as video surveillance. In this paper we introduce a multi-layered 3D convolutional SNN model trained with unsupervised STDP. We compare the performance of this model to those of a 2D STDP-based SNN when challenged with the KTH and Weizmann datasets. We also compare single-layer and multi-layer versions of these models in order to get an accurate assessment of their performance. We show that STDP-based convolutional SNNs can learn motion patterns using 3D kernels, thus enabling motion-based recognition from videos. Finally, we give evidence that 3D convolution is superior to 2D convolution with STDP-based SNNs, especially when dealing with long video sequences.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
Dynamic Facial Expression Recognition under Partial Occlusion with Optical Flow Reconstruction
Authors:
Delphine Poux,
Benjamin Allaert,
Nacim Ihaddadene,
Ioan Marius Bilasco,
Chaabane Djeraba,
Mohammed Bennamoun
Abstract:
Video facial expression recognition is useful for many applications and received much interest lately. Although some solutions give really good results in a controlled environment (no occlusion), recognition in the presence of partial facial occlusion remains a challenging task. To handle occlusions, solutions based on the reconstruction of the occluded part of the face have been proposed. These s…
▽ More
Video facial expression recognition is useful for many applications and received much interest lately. Although some solutions give really good results in a controlled environment (no occlusion), recognition in the presence of partial facial occlusion remains a challenging task. To handle occlusions, solutions based on the reconstruction of the occluded part of the face have been proposed. These solutions are mainly based on the texture or the geometry of the face. However, the similarity of the face movement between different persons doing the same expression seems to be a real asset for the reconstruction. In this paper we exploit this asset and propose a new solution based on an auto-encoder with skip connections to reconstruct the occluded part of the face in the optical flow domain. To the best of our knowledge, this is the first proposition to directly reconstruct the movement for facial expression recognition. We validated our approach in the controlled dataset CK+ on which different occlusions were generated. Our experiments show that the proposed method reduce significantly the gap, in terms of recognition accuracy, between occluded and non-occluded situations. We also compare our approach with existing state-of-the-art solutions. In order to lay the basis of a reproducible and fair comparison in the future, we also propose a new experimental protocol that includes occlusion generation and reconstruction evaluation.
△ Less
Submitted 24 December, 2020;
originally announced December 2020.
-
Improving STDP-based Visual Feature Learning with Whitening
Authors:
Pierre Falez,
Pierre Tirilly,
Ioan Marius Bilasco
Abstract:
In recent years, spiking neural networks (SNNs) emerge as an alternative to deep neural networks (DNNs). SNNs present a higher computational efficiency using low-power neuromorphic hardware and require less labeled data for training using local and unsupervised learning rules such as spike timing-dependent plasticity (STDP). SNN have proven their effectiveness in image classification on simple dat…
▽ More
In recent years, spiking neural networks (SNNs) emerge as an alternative to deep neural networks (DNNs). SNNs present a higher computational efficiency using low-power neuromorphic hardware and require less labeled data for training using local and unsupervised learning rules such as spike timing-dependent plasticity (STDP). SNN have proven their effectiveness in image classification on simple datasets such as MNIST. However, to process natural images, a pre-processing step is required. Difference-of-Gaussians (DoG) filtering is typically used together with on-center/off-center coding, but it results in a loss of information that is detrimental to the classification performance. In this paper, we propose to use whitening as a pre-processing step before learning features with STDP. Experiments on CIFAR-10 show that whitening allows STDP to learn visual features that are closer to the ones learned with standard neural networks, with a significantly increased classification performance as compared to DoG filtering. We also propose an approximation of whitening as convolution kernels that is computationally cheaper to learn and more suited to be implemented on neuromorphic hardware. Experiments on CIFAR-10 show that it performs similarly to regular whitening. Cross-dataset experiments on CIFAR-10 and STL-10 also show that it is fairly stable across datasets, making it possible to learn a single whitening transformation to process different datasets.
△ Less
Submitted 24 February, 2020;
originally announced February 2020.
-
Impact of facial landmark localization on facial expression recognition
Authors:
Romain Belmonte,
Benjamin Allaert,
Pierre Tirilly,
Ioan Marius Bilasco,
Chaabane Djeraba,
Nicu Sebe
Abstract:
Although facial landmark localization (FLL) approaches are becoming increasingly accurate for characterizing facial regions, one question remains unanswered: what is the impact of these approaches on subsequent related tasks? In this paper, the focus is put on facial expression recognition (FER), where facial landmarks are used for face registration, which is a common usage. Since the most used da…
▽ More
Although facial landmark localization (FLL) approaches are becoming increasingly accurate for characterizing facial regions, one question remains unanswered: what is the impact of these approaches on subsequent related tasks? In this paper, the focus is put on facial expression recognition (FER), where facial landmarks are used for face registration, which is a common usage. Since the most used datasets for facial landmark localization do not allow for a proper measurement of performance according to the different difficulties (e.g., pose, expression, illumination, occlusion, motion blur), we also quantify the performance of recent approaches in the presence of head pose variations and facial expressions. Finally, a study of the impact of these approaches on FER is conducted. We show that the landmark accuracy achieved so far optimizing the conventional Euclidean distance does not necessarily guarantee a gain in performance for FER. To deal with this issue, we propose a new evaluation metric for FLL adapted to FER.
△ Less
Submitted 19 July, 2021; v1 submitted 26 May, 2019;
originally announced May 2019.
-
Facial Expressions Analysis Under Occlusions Based on Specificities of Facial Motion Propagation
Authors:
Delphine Poux,
Benjamin Allaert,
Jose Mennesson,
Nacim Ihaddadene,
Ioan Marius Bilasco,
Chaabane Djeraba
Abstract:
Although much progress has been made in the facial expression analysis field, facial occlusions are still challenging. The main innovation brought by this contribution consists in exploiting the specificities of facial movement propagation for recognizing expressions in presence of important occlusions. The movement induced by an expression extends beyond the movement epicenter. Thus, the movement…
▽ More
Although much progress has been made in the facial expression analysis field, facial occlusions are still challenging. The main innovation brought by this contribution consists in exploiting the specificities of facial movement propagation for recognizing expressions in presence of important occlusions. The movement induced by an expression extends beyond the movement epicenter. Thus, the movement occurring in an occluded region propagates towards neighboring visible regions. In presence of occlusions, per expression, we compute the importance of each unoccluded facial region and we construct adapted facial frameworks that boost the performance of per expression binary classifier. The output of each expression-dependant binary classifier is then aggregated and fed into a fusion process that aims constructing, per occlusion, a unique model that recognizes all the facial expressions considered. The evaluations highlight the robustness of this approach in presence of significant facial occlusions.
△ Less
Submitted 30 April, 2019;
originally announced April 2019.
-
Optical Flow Techniques for Facial Expression Analysis -- a Practical Evaluation Study
Authors:
Benjamin Allaert,
Isaac Ronald Ward,
Ioan Marius Bilasco,
Chaabane Djeraba,
Mohammed Bennamoun
Abstract:
Optical flow techniques are becoming increasingly performant and robust when estimating motion in a scene, but their performance has yet to be proven in the area of facial expression recognition. In this work, a variety of optical flow approaches are evaluated across multiple facial expression datasets, so as to provide a consistent performance evaluation. The aim of this work is not to propose a…
▽ More
Optical flow techniques are becoming increasingly performant and robust when estimating motion in a scene, but their performance has yet to be proven in the area of facial expression recognition. In this work, a variety of optical flow approaches are evaluated across multiple facial expression datasets, so as to provide a consistent performance evaluation. The aim of this work is not to propose a new expression recognition technique, but to understand better the adequacy of existing state-of-the art optical flow for encoding facial motion in the context of facial expression recognition. Our evaluations highlight the fact that motion approximation methods used to overcome motion discontinuities have a significant impact when optical flows are used to characterize facial expressions.
△ Less
Submitted 31 January, 2022; v1 submitted 25 April, 2019;
originally announced April 2019.
-
Multi-layered Spiking Neural Network with Target Timestamp Threshold Adaptation and STDP
Authors:
Pierre Falez,
Pierre Tirilly,
Ioan Marius Bilasco,
Philippe Devienne,
Pierre Boulet
Abstract:
Spiking neural networks (SNNs) are good candidates to produce ultra-energy-efficient hardware. However, the performance of these models is currently behind traditional methods. Introducing multi-layered SNNs is a promising way to reduce this gap. We propose in this paper a new threshold adaptation system which uses a timestamp objective at which neurons should fire. We show that our method leads t…
▽ More
Spiking neural networks (SNNs) are good candidates to produce ultra-energy-efficient hardware. However, the performance of these models is currently behind traditional methods. Introducing multi-layered SNNs is a promising way to reduce this gap. We propose in this paper a new threshold adaptation system which uses a timestamp objective at which neurons should fire. We show that our method leads to state-of-the-art classification rates on the MNIST dataset (98.60%) and the Faces/Motorbikes dataset (99.46%) with an unsupervised SNN followed by a linear SVM. We also investigate the sparsity level of the network by testing different inhibition policies and STDP rules.
△ Less
Submitted 3 April, 2019;
originally announced April 2019.
-
Unsupervised Visual Feature Learning with Spike-timing-dependent Plasticity: How Far are we from Traditional Feature Learning Approaches?
Authors:
Pierre Falez,
Pierre Tirilly,
Ioan Marius Bilasco,
Philippe Devienne,
Pierre Boulet
Abstract:
Spiking neural networks (SNNs) equipped with latency coding and spike-timing dependent plasticity rules offer an alternative to solve the data and energy bottlenecks of standard computer vision approaches: they can learn visual features without supervision and can be implemented by ultra-low power hardware architectures. However, their performance in image classification has never been evaluated o…
▽ More
Spiking neural networks (SNNs) equipped with latency coding and spike-timing dependent plasticity rules offer an alternative to solve the data and energy bottlenecks of standard computer vision approaches: they can learn visual features without supervision and can be implemented by ultra-low power hardware architectures. However, their performance in image classification has never been evaluated on recent image datasets. In this paper, we compare SNNs to auto-encoders on three visual recognition datasets, and extend the use of SNNs to color images. The analysis of the results helps us identify some bottlenecks of SNNs: the limits of on-center/off-center coding, especially for color images, and the ineffectiveness of current inhibition mechanisms. These issues should be addressed to build effective SNNs for image recognition.
△ Less
Submitted 5 April, 2019; v1 submitted 14 January, 2019;
originally announced January 2019.