-
S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction
Authors:
Mohammad Adiban,
Kalin Stefanov,
Sabato Marco Siniscalchi,
Giampiero Salvi
Abstract:
We address the video prediction task by putting forth a novel model that combines (i) our recently proposed hierarchical residual vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel spatiotemporal PixelCNN (ST-PixelCNN). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the intrinsic capab…
▽ More
We address the video prediction task by putting forth a novel model that combines (i) our recently proposed hierarchical residual vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel spatiotemporal PixelCNN (ST-PixelCNN). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the intrinsic capabilities of HR-VQVAE at modeling still images with a parsimonious representation, combined with the ST-PixelCNN's ability at handling spatiotemporal information, S-HR-VQVAE can better deal with chief challenges in video prediction. These include learning spatiotemporal information, handling high dimensional data, combating blurry prediction, and implicit modeling of physical characteristics. Extensive experimental results on the KTH Human Action and Moving-MNIST tasks demonstrate that our model compares favorably against top video prediction techniques both in quantitative and qualitative evaluations despite a much smaller model size. Finally, we boost S-HR-VQVAE by proposing a novel training method to jointly estimate the HR-VQVAE and ST-PixelCNN parameters.
△ Less
Submitted 11 June, 2024; v1 submitted 13 July, 2023;
originally announced July 2023.
-
Hierarchical Residual Learning Based Vector Quantized Variational Autoencoder for Image Reconstruction and Generation
Authors:
Mohammad Adiban,
Kalin Stefanov,
Sabato Marco Siniscalchi,
Giampiero Salvi
Abstract:
We propose a multi-layer variational autoencoder method, we call HR-VQVAE, that learns hierarchical discrete representations of the data. By utilizing a novel objective function, each layer in HR-VQVAE learns a discrete representation of the residual from previous layers through a vector quantized encoder. Furthermore, the representations at each layer are hierarchically linked to those at previou…
▽ More
We propose a multi-layer variational autoencoder method, we call HR-VQVAE, that learns hierarchical discrete representations of the data. By utilizing a novel objective function, each layer in HR-VQVAE learns a discrete representation of the residual from previous layers through a vector quantized encoder. Furthermore, the representations at each layer are hierarchically linked to those at previous layers. We evaluate our method on the tasks of image reconstruction and generation. Experimental results demonstrate that the discrete representations learned by HR-VQVAE enable the decoder to reconstruct high-quality images with less distortion than the baseline methods, namely VQVAE and VQVAE-2. HR-VQVAE can also generate high-quality and diverse images that outperform state-of-the-art generative models, providing further verification of the efficiency of the learned representations. The hierarchical nature of HR-VQVAE i) reduces the decoding search time, making the method particularly suitable for high-load tasks and ii) allows to increase the codebook size without incurring the codebook collapse problem.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
STEP-GAN: A Step-by-Step Training for Multi Generator GANs with application to Cyber Security in Power Systems
Authors:
Mohammad Adiban,
Arash Safari,
Giampiero Salvi
Abstract:
In this study, we introduce a novel unsupervised countermeasure for smart grid power systems, based on generative adversarial networks (GANs). Given the pivotal role of smart grid systems (SGSs) in urban life, their security is of particular importance. In recent years, however, advances in the field of machine learning, have raised concerns about cyber attacks on these systems. Power systems, amo…
▽ More
In this study, we introduce a novel unsupervised countermeasure for smart grid power systems, based on generative adversarial networks (GANs). Given the pivotal role of smart grid systems (SGSs) in urban life, their security is of particular importance. In recent years, however, advances in the field of machine learning, have raised concerns about cyber attacks on these systems. Power systems, among the most important components of urban infrastructure, have, for example, been widely attacked by adversaries. Attackers disrupt power systems using false data injection attacks (FDIA), resulting in a breach of availability, integrity, or confidential principles of the system. Our model simulates possible attacks on power systems using multiple generators in a step-by-step interaction with a discriminator in the training phase. As a consequence, our system is robust to unseen attacks. Moreover, the proposed model considerably reduces the well-known mode collapse problem of GAN-based models. Our method is general and it can be potentially employed in a wide range of one of one-class classification tasks. The proposed model has low computational complexity and outperforms baseline systems about 14% and 41% in terms of accuracy on the highly imbalanced publicly available industrial control system (ICS) cyber attack power system dataset.
△ Less
Submitted 10 September, 2020;
originally announced September 2020.
-
Replay Spoofing Countermeasure Using Autoencoder and Siamese Network on ASVspoof 2019 Challenge
Authors:
Mohammad Adiban,
Hossein Sameti,
Saeedreza Shehnepoor
Abstract:
Automatic Speaker Verification (ASV) is the process of identifying a person based on the voice presented to a system. Different synthetic approaches allow spoofing to deceive ASV systems (ASVs), whether using techniques to imitate a voice or recunstruct the features. Attackers try to beat up the ASVs using four general techniques; impersonation, speech synthesis, voice conversion, and replay. The…
▽ More
Automatic Speaker Verification (ASV) is the process of identifying a person based on the voice presented to a system. Different synthetic approaches allow spoofing to deceive ASV systems (ASVs), whether using techniques to imitate a voice or recunstruct the features. Attackers try to beat up the ASVs using four general techniques; impersonation, speech synthesis, voice conversion, and replay. The last technique is considered as a common and high potential tool for spoofing purposes since replay attacks are more accessible and require no technical knowledge from adversaries. In this study, we introduce a novel replay spoofing countermeasure for ASVs. Accordingly, we used the Constant Q Cepstral Coefficient (CQCC) features fed into an autoencoder to attain more informative features and to consider the noise information of spoofed utterances for discrimination purpose. Finally, different configurations of the Siamese network were used for the first time in this context for classification. The experiments performed on ASVspoof challenge 2019 dataset using Equal Error Rate (EER) and Tandem Detection Cost Function (t-DCF) as evaluation metrics show that the proposed system improved the results over the baseline by 10.73% and 0.2344 in terms of EER and t-DCF, respectively.
△ Less
Submitted 29 October, 2019;
originally announced October 2019.
-
Statistical feature embedding for heart sound classification
Authors:
Mohammad Adiban,
Bagher BabaAli,
Saeedreza Shehnepoor
Abstract:
Cardiovascular Disease (CVD) is considered as one of the principal causes of death in the world. Over recent years, this field of study has attracted researchers' attention to investigate heart sounds' patterns for disease diagnostics. In this study, an approach is proposed for normal/abnormal heart sound classification on the Physionet challenge 2016 dataset. For the first time, a fixed-length fe…
▽ More
Cardiovascular Disease (CVD) is considered as one of the principal causes of death in the world. Over recent years, this field of study has attracted researchers' attention to investigate heart sounds' patterns for disease diagnostics. In this study, an approach is proposed for normal/abnormal heart sound classification on the Physionet challenge 2016 dataset. For the first time, a fixed-length feature vector; called i-vector; is extracted from each heart sound using Mel Frequency Cepstral Coefficient (MFCC) features. Afterwards, Principal Component Analysis (PCA) transform and Variational Autoencoder (VAE) are applied on the i-vector to achieve dimension reduction. Eventually, the reduced size vector is fed to Gaussian Mixture Models (GMMs) and Support Vector Machine (SVM) for classification purpose. Experimental results demonstrate the proposed method could achieve a performance improvement of 16% based on Modified Accuracy (MAcc) compared with the baseline system on the Physoinet dataset.
△ Less
Submitted 9 November, 2020; v1 submitted 26 April, 2019;
originally announced April 2019.