Search | arXiv e-print repository

Enhancing Anomaly Detection Generalization through Knowledge Exposure: The Dual Effects of Augmentation

Authors: Mohammad Akhavan Anvari, Ro**a Kashefi, Vahid Reza Khazaie, Mohammad Khalooei, Mohammad Sabokrou

Abstract: Anomaly detection involves identifying instances within a dataset that deviate from the norm and occur infrequently. Current benchmarks tend to favor methods biased towards low diversity in normal data, which does not align with real-world scenarios. Despite advancements in these benchmarks, contemporary anomaly detection methods often struggle with out-of-distribution generalization, particularly… ▽ More Anomaly detection involves identifying instances within a dataset that deviate from the norm and occur infrequently. Current benchmarks tend to favor methods biased towards low diversity in normal data, which does not align with real-world scenarios. Despite advancements in these benchmarks, contemporary anomaly detection methods often struggle with out-of-distribution generalization, particularly in classifying samples with subtle transformations during testing. These methods typically assume that normal samples during test time have distributions very similar to those in the training set, while anomalies are distributed much further away. However, real-world test samples often exhibit various levels of distribution shift while maintaining semantic consistency. Therefore, effectively generalizing to samples that have undergone semantic-preserving transformations, while accurately detecting normal samples whose semantic meaning has changed after transformation as anomalies, is crucial for the trustworthiness and reliability of a model. For example, although it is clear that rotation shifts the meaning for a car in the context of anomaly detection but preserves the meaning for a bird, current methods are likely to detect both as abnormal. This complexity underscores the necessity for dynamic learning procedures rooted in the intrinsic concept of outliers. To address this issue, we propose new testing protocols and a novel method called Knowledge Exposure (KE), which integrates external knowledge to comprehend concept dynamics and differentiate transformations that induce semantic shifts. This approach enhances generalization by utilizing insights from a pre-trained CLIP model to evaluate the significance of anomalies for each concept. Evaluation on CIFAR-10, CIFAR-100, and SVHN with the new protocols demonstrates superior performance compared to previous methods. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2311.16485 [pdf, other]

Class-Adaptive Sampling Policy for Efficient Continual Learning

Authors: Hossein Rezaei, Mohammad Sabokrou

Abstract: Continual learning (CL) aims to acquire new knowledge while preserving information from previous experiences without forgetting. Though buffer-based methods (i.e., retaining samples from previous tasks) have achieved acceptable performance, determining how to allocate the buffer remains a critical challenge. Most recent research focuses on refining these methods but often fails to sufficiently con… ▽ More Continual learning (CL) aims to acquire new knowledge while preserving information from previous experiences without forgetting. Though buffer-based methods (i.e., retaining samples from previous tasks) have achieved acceptable performance, determining how to allocate the buffer remains a critical challenge. Most recent research focuses on refining these methods but often fails to sufficiently consider the varying influence of samples on the learning process, and frequently overlooks the complexity of the classes/concepts being learned. Generally, these methods do not directly take into account the contribution of individual classes. However, our investigation indicates that more challenging classes necessitate preserving a larger number of samples compared to less challenging ones. To address this issue, we propose a novel method and policy named 'Class-Adaptive Sampling Policy' (CASP), which dynamically allocates storage space within the buffer. By utilizing concepts of class contribution and difficulty, CASP adaptively manages buffer space, allowing certain classes to occupy a larger portion of the buffer while reducing storage for others. This approach significantly improves the efficiency of knowledge retention and utilization. CASP provides a versatile solution to boost the performance and efficiency of CL. It meets the demand for dynamic buffer allocation, accommodating the varying contributions of different classes and their learning complexities over time. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.06786 [pdf, other]

Explainability of Vision Transformers: A Comprehensive Review and New Perspectives

Authors: Ro**a Kashefi, Leili Barekatain, Mohammad Sabokrou, Fatemeh Aghaeipoor

Abstract: Transformers have had a significant impact on natural language processing and have recently demonstrated their potential in computer vision. They have shown promising results over convolution neural networks in fundamental computer vision tasks. However, the scientific community has not fully grasped the inner workings of vision transformers, nor the basis for their decision-making, which undersco… ▽ More Transformers have had a significant impact on natural language processing and have recently demonstrated their potential in computer vision. They have shown promising results over convolution neural networks in fundamental computer vision tasks. However, the scientific community has not fully grasped the inner workings of vision transformers, nor the basis for their decision-making, which underscores the importance of explainability methods. Understanding how these models arrive at their decisions not only improves their performance but also builds trust in AI systems. This study explores different explainability methods proposed for visual transformers and presents a taxonomy for organizing them according to their motivations, structures, and application scenarios. In addition, it provides a comprehensive review of evaluation criteria that can be used for comparing explanation results, as well as explainability tools and frameworks. Finally, the paper highlights essential but unexplored aspects that can enhance the explainability of visual transformers, and promising research directions are suggested for future investment. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: 20 pages,5 figures

arXiv:2310.05263 [pdf, other]

Confidence-driven Sampling for Backdoor Attacks

Authors: Pengfei He, Han Xu, Yue Xing, Jie Ren, Yingqian Cui, Shenglai Zeng, Jiliang Tang, Makoto Yamada, Mohammad Sabokrou

Abstract: Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios. Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples. Our research highlights the overlooked drawbacks of random sampling, which make that attack detectab… ▽ More Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios. Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples. Our research highlights the overlooked drawbacks of random sampling, which make that attack detectable and defensible. The core idea of this paper is to strategically poison samples near the model's decision boundary and increase defense difficulty. We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks. Importantly, our method operates independently of existing trigger designs, providing versatility and compatibility with various backdoor attack techniques. We substantiate the effectiveness of our approach through a comprehensive set of empirical experiments, demonstrating its potential to significantly enhance resilience against backdoor attacks in DNNs. △ Less

Submitted 8 October, 2023; originally announced October 2023.

arXiv:2307.01473 [pdf, other]

Mitigating Bias: Enhancing Image Classification by Improving Model Explanations

Authors: Raha Ahmadi, Mohammad Javad Rajabi, Mohammad Khalooie, Mohammad Sabokrou

Abstract: Deep learning models have demonstrated remarkable capabilities in learning complex patterns and concepts from training data. However, recent findings indicate that these models tend to rely heavily on simple and easily discernible features present in the background of images rather than the main concepts or objects they are intended to classify. This phenomenon poses a challenge to image classifie… ▽ More Deep learning models have demonstrated remarkable capabilities in learning complex patterns and concepts from training data. However, recent findings indicate that these models tend to rely heavily on simple and easily discernible features present in the background of images rather than the main concepts or objects they are intended to classify. This phenomenon poses a challenge to image classifiers as the crucial elements of interest in images may be overshadowed. In this paper, we propose a novel approach to address this issue and improve the learning of main concepts by image classifiers. Our central idea revolves around concurrently guiding the model's attention toward the foreground during the classification task. By emphasizing the foreground, which encapsulates the primary objects of interest, we aim to shift the focus of the model away from the dominant influence of the background. To accomplish this, we introduce a mechanism that encourages the model to allocate sufficient attention to the foreground. We investigate various strategies, including modifying the loss function or incorporating additional architectural components, to enable the classifier to effectively capture the primary concept within an image. Additionally, we explore the impact of different foreground attention mechanisms on model performance and provide insights into their effectiveness. Through extensive experimentation on benchmark datasets, we demonstrate the efficacy of our proposed approach in improving the classification accuracy of image classifiers. Our findings highlight the importance of foreground attention in enhancing model understanding and representation of the main concepts within images. The results of this study contribute to advancing the field of image classification and provide valuable insights for develo** more robust and accurate deep-learning models. △ Less

Submitted 22 September, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

Comments: Accepted to ACML2023

arXiv:2306.15755 [pdf, other]

Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving

Authors: Mozhgan Pourkeshavarz, Mohammad Sabokrou, Amir Rasouli

Abstract: In autonomous driving, behavior prediction is fundamental for safe motion planning, hence the security and robustness of prediction models against adversarial attacks are of paramount importance. We propose a novel adversarial backdoor attack against trajectory prediction models as a means of studying their potential vulnerabilities. Our attack affects the victim at training time via naturalistic,… ▽ More In autonomous driving, behavior prediction is fundamental for safe motion planning, hence the security and robustness of prediction models against adversarial attacks are of paramount importance. We propose a novel adversarial backdoor attack against trajectory prediction models as a means of studying their potential vulnerabilities. Our attack affects the victim at training time via naturalistic, hence stealthy, poisoned samples crafted using a novel two-step approach. First, the triggers are crafted by perturbing the trajectory of attacking vehicle and then disguised by transforming the scene using a bi-level optimization technique. The proposed attack does not depend on a particular model architecture and operates in a black-box manner, thus can be effective without any knowledge of the victim model. We conduct extensive empirical studies using state-of-the-art prediction models on two benchmark datasets using metrics customized for trajectory prediction. We show that the proposed attack is highly effective, as it can significantly hinder the performance of prediction models, unnoticeable by the victims, and efficient as it forces the victim to generate malicious behavior even under constrained conditions. Via ablative studies, we analyze the impact of different attack design choices followed by an evaluation of existing defence mechanisms against the proposed attack. △ Less

Submitted 22 November, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

arXiv:2306.08336 [pdf, other]

Global-Local Processing in Convolutional Neural Networks

Authors: Zahra Rezvani, Soroor Shekarizeh, Mohammad Sabokrou

Abstract: Convolutional Neural Networks (CNNs) have achieved outstanding performance on image processing challenges. Actually, CNNs imitate the typically developed human brain structures at the micro-level (Artificial neurons). At the same time, they distance themselves from imitating natural visual perception in humans at the macro architectures (high-level cognition). Recently it has been investigated tha… ▽ More Convolutional Neural Networks (CNNs) have achieved outstanding performance on image processing challenges. Actually, CNNs imitate the typically developed human brain structures at the micro-level (Artificial neurons). At the same time, they distance themselves from imitating natural visual perception in humans at the macro architectures (high-level cognition). Recently it has been investigated that CNNs are highly biased toward local features and fail to detect the global aspects of their input. Nevertheless, the literature offers limited clues on this problem. To this end, we propose a simple yet effective solution inspired by the unconscious behavior of the human pupil. We devise a simple module called Global Advantage Stream (GAS) to learn and capture the holistic features of input samples (i.e., the global features). Then, the GAS features were combined with a CNN network as a plug-and-play component called the Global/Local Processing (GLP) model. The experimental results confirm that this stream improves the accuracy with an insignificant additional computational/temporal load and makes the network more robust to adversarial attacks. Furthermore, investigating the interpretation of the model shows that it learns a more holistic representation similar to the perceptual system of healthy humans △ Less

Submitted 14 June, 2023; originally announced June 2023.

arXiv:2306.06414 [pdf, other]

Revealing Model Biases: Assessing Deep Neural Networks via Recovered Sample Analysis

Authors: Mohammad Mahdi Mehmanchi, Mahbod Nouri, Mohammad Sabokrou

Abstract: This paper proposes a straightforward and cost-effective approach to assess whether a deep neural network (DNN) relies on the primary concepts of training samples or simply learns discriminative, yet simple and irrelevant features that can differentiate between classes. The paper highlights that DNNs, as discriminative classifiers, often find the simplest features to discriminate between classes,… ▽ More This paper proposes a straightforward and cost-effective approach to assess whether a deep neural network (DNN) relies on the primary concepts of training samples or simply learns discriminative, yet simple and irrelevant features that can differentiate between classes. The paper highlights that DNNs, as discriminative classifiers, often find the simplest features to discriminate between classes, leading to a potential bias towards irrelevant features and sometimes missing generalization. While a generalization test is one way to evaluate a trained model's performance, it can be costly and may not cover all scenarios to ensure that the model has learned the primary concepts. Furthermore, even after conducting a generalization test, identifying bias in the model may not be possible. Here, the paper proposes a method that involves recovering samples from the parameters of the trained model and analyzing the reconstruction quality. We believe that if the model's weights are optimized to discriminate based on some features, these features will be reflected in the reconstructed samples. If the recovered samples contain the primary concepts of the training data, it can be concluded that the model has learned the essential and determining features. On the other hand, if the recovered samples contain irrelevant features, it can be concluded that the model is biased towards these features. The proposed method does not require any test or generalization samples, only the parameters of the trained model and the training data that lie on the margin. Our experiments demonstrate that the proposed method can determine whether the model has learned the desired features of the training data. The paper highlights that our understanding of how these models work is limited, and the proposed approach addresses this issue. △ Less

Submitted 10 June, 2023; originally announced June 2023.

arXiv:2306.03531 [pdf, other]

A Unified Concept-Based System for Local, Global, and Misclassification Explanations

Authors: Fatemeh Aghaeipoor, Dorsa Asgarian, Mohammad Sabokrou

Abstract: Explainability of Deep Neural Networks (DNNs) has been garnering increasing attention in recent years. Of the various explainability approaches, concept-based techniques stand out for their ability to utilize human-meaningful concepts instead of focusing solely on individual pixels. However, there is a scarcity of methods that consistently provide both local and global explanations. Moreover, most… ▽ More Explainability of Deep Neural Networks (DNNs) has been garnering increasing attention in recent years. Of the various explainability approaches, concept-based techniques stand out for their ability to utilize human-meaningful concepts instead of focusing solely on individual pixels. However, there is a scarcity of methods that consistently provide both local and global explanations. Moreover, most of the methods have no offer to explain misclassification cases. Considering these challenges, we present a unified concept-based system for unsupervised learning of both local and global concepts. Our primary objective is to uncover the intrinsic concepts underlying each data category by training surrogate explainer networks to estimate the importance of the concepts. Our experimental results substantiated the efficacy of the discovered concepts through diverse quantitative and qualitative assessments, encompassing faithfulness, completeness, and generality. Furthermore, our approach facilitates the explanation of both accurate and erroneous predictions, rendering it a valuable tool for comprehending the characteristics of the target objects and classes. △ Less

Submitted 4 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

ACM Class: I.2

arXiv:2305.19424 [pdf, other]

Quantifying Overfitting: Evaluating Neural Network Performance through Analysis of Null Space

Authors: Hossein Rezaei, Mohammad Sabokrou

Abstract: Machine learning models that are overfitted/overtrained are more vulnerable to knowledge leakage, which poses a risk to privacy. Suppose we download or receive a model from a third-party collaborator without knowing its training accuracy. How can we determine if it has been overfitted or overtrained on its training data? It's possible that the model was intentionally over-trained to make it vulner… ▽ More Machine learning models that are overfitted/overtrained are more vulnerable to knowledge leakage, which poses a risk to privacy. Suppose we download or receive a model from a third-party collaborator without knowing its training accuracy. How can we determine if it has been overfitted or overtrained on its training data? It's possible that the model was intentionally over-trained to make it vulnerable during testing. While an overfitted or overtrained model may perform well on testing data and even some generalization tests, we can't be sure it's not over-fitted. Conducting a comprehensive generalization test is also expensive. The goal of this paper is to address these issues and ensure the privacy and generalization of our method using only testing data. To achieve this, we analyze the null space in the last layer of neural networks, which enables us to quantify overfitting without access to training data or knowledge of the accuracy of those data. We evaluated our approach on various architectures and datasets and observed a distinct pattern in the angle of null space when models are overfitted. Furthermore, we show that models with poor generalization exhibit specific characteristics in this space. Our work represents the first attempt to quantify overfitting without access to training data or knowing any knowledge about the training samples. △ Less

Submitted 30 May, 2023; originally announced May 2023.

arXiv:2211.10892 [pdf, other]

Towards Realistic Out-of-Distribution Detection: A Novel Evaluation Framework for Improving Generalization in OOD Detection

Authors: Vahid Reza Khazaie, Anthony Wong, Mohammad Sabokrou

Abstract: This paper presents a novel evaluation framework for Out-of-Distribution (OOD) detection that aims to assess the performance of machine learning models in more realistic settings. We observed that the real-world requirements for testing OOD detection methods are not satisfied by the current testing protocols. They usually encourage methods to have a strong bias towards a low level of diversity in… ▽ More This paper presents a novel evaluation framework for Out-of-Distribution (OOD) detection that aims to assess the performance of machine learning models in more realistic settings. We observed that the real-world requirements for testing OOD detection methods are not satisfied by the current testing protocols. They usually encourage methods to have a strong bias towards a low level of diversity in normal data. To address this limitation, we propose new OOD test datasets (CIFAR-10-R, CIFAR-100-R, and ImageNet-30-R) that can allow researchers to benchmark OOD detection performance under realistic distribution shifts. Additionally, we introduce a Generalizability Score (GS) to measure the generalization ability of a model during OOD detection. Our experiments demonstrate that improving the performance on existing benchmark datasets does not necessarily improve the usability of OOD detection models in real-world scenarios. While leveraging deep pre-trained features has been identified as a promising avenue for OOD detection research, our experiments show that state-of-the-art pre-trained models tested on our proposed datasets suffer a significant drop in performance. To address this issue, we propose a post-processing stage for adapting pre-trained features under these distribution shifts before calculating the OOD scores, which significantly enhances the performance of state-of-the-art pre-trained models on our benchmarks. △ Less

Submitted 31 August, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

arXiv:2207.11031 [pdf]

MobileDenseNet: A new approach to object detection on mobile devices

Authors: Mohammad Hajizadeh, Mohammad Sabokrou, Adel Rahmani

Abstract: Object detection problem solving has developed greatly within the past few years. There is a need for lighter models in instances where hardware limitations exist, as well as a demand for models to be tailored to mobile devices. In this article, we will assess the methods used when creating algorithms that address these issues. The main goal of this article is to increase accuracy in state-of-the-… ▽ More Object detection problem solving has developed greatly within the past few years. There is a need for lighter models in instances where hardware limitations exist, as well as a demand for models to be tailored to mobile devices. In this article, we will assess the methods used when creating algorithms that address these issues. The main goal of this article is to increase accuracy in state-of-the-art algorithms while maintaining speed and real-time efficiency. The most significant issues in one-stage object detection pertains to small objects and inaccurate localization. As a solution, we created a new network by the name of MobileDenseNet suitable for embedded systems. We also developed a light neck FCPNLite for mobile devices that will aid with the detection of small objects. Our research revealed that very few papers cited necks in embedded systems. What differentiates our network from others is our use of concatenation features. A small yet significant change to the head of the network amplified accuracy without increasing speed or limiting parameters. In short, our focus on the challenging CoCo and Pascal VOC datasets were 24.8 and 76.8 in percentage terms respectively - a rate higher than that recorded by other state-of-the-art systems thus far. Our network is able to increase accuracy while maintaining real-time efficiency on mobile devices. We calculated operational speed on Pixel 3 (Snapdragon 845) to 22.8 fps. The source code of this research is available on https://github.com/hajizadeh/MobileDenseNet. △ Less

Submitted 22 July, 2022; originally announced July 2022.

arXiv:2205.14297 [pdf, other]

Fake It Till You Make It: Towards Accurate Near-Distribution Novelty Detection

Authors: Hossein Mirzaei, Mohammadreza Salehi, Sajjad Shahabi, Efstratios Gavves, Cees G. M. Snoek, Mohammad Sabokrou, Mohammad Hossein Rohban

Abstract: We aim for image-based novelty detection. Despite considerable progress, existing models either fail or face a dramatic drop under the so-called "near-distribution" setting, where the differences between normal and anomalous samples are subtle. We first demonstrate existing methods experience up to 20% decrease in performance in the near-distribution setting. Next, we propose to exploit a score-ba… ▽ More We aim for image-based novelty detection. Despite considerable progress, existing models either fail or face a dramatic drop under the so-called "near-distribution" setting, where the differences between normal and anomalous samples are subtle. We first demonstrate existing methods experience up to 20% decrease in performance in the near-distribution setting. Next, we propose to exploit a score-based generative model to produce synthetic near-distribution anomalous data. Our model is then fine-tuned to distinguish such data from the normal samples. We provide a quantitative as well as qualitative evaluation of this strategy, and compare the results with a variety of GAN-based models. Effectiveness of our method for both the near-distribution and standard novelty detection is assessed through extensive experiments on datasets in diverse applications such as medical images, object classification, and quality control. This reveals that our method considerably improves over existing models, and consistently decreases the gap between the near-distribution and standard novelty detection performance. The code repository is available at https://github.com/rohban-lab/FITYMI. △ Less

Submitted 28 November, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

arXiv:2202.00050 [pdf, other]

Deep-Disaster: Unsupervised Disaster Detection and Localization Using Visual Data

Authors: Soroor Shekarizadeh, Razieh Rastgoo, Saif Al-Kuwari, Mohammad Sabokrou

Abstract: Social media plays a significant role in sharing essential information, which helps humanitarian organizations in rescue operations during and after disaster incidents. However, develo** an efficient method that can provide rapid analysis of social media images in the early hours of disasters is still largely an open problem, mainly due to the lack of suitable datasets and the sheer complexity o… ▽ More Social media plays a significant role in sharing essential information, which helps humanitarian organizations in rescue operations during and after disaster incidents. However, develo** an efficient method that can provide rapid analysis of social media images in the early hours of disasters is still largely an open problem, mainly due to the lack of suitable datasets and the sheer complexity of this task. In addition, supervised methods can not generalize well to novel disaster incidents. In this paper, inspired by the success of Knowledge Distillation (KD) methods, we propose an unsupervised deep neural network to detect and localize damages in social media images. Our proposed KD architecture is a feature-based distillation approach that comprises a pre-trained teacher and a smaller student network, with both networks having similar GAN architecture containing a generator and a discriminator. The student network is trained to emulate the behavior of the teacher on training input samples, which, in turn, contain images that do not include any damaged regions. Therefore, the student network only learns the distribution of no damage data and would have different behavior from the teacher network-facing damages. To detect damage, we utilize the difference between features generated by two networks using a defined score function that demonstrates the probability of damages occurring. Our experimental results on the benchmark dataset confirm that our approach outperforms state-of-the-art methods in detecting and localizing the damaged areas, especially for novel disaster types. △ Less

Submitted 31 January, 2022; originally announced February 2022.

arXiv:2201.01609 [pdf, other]

All You Need In Sign Language Production

Authors: Razieh Rastgoo, Kourosh Kiani, Sergio Escalera, Vassilis Athitsos, Mohammad Sabokrou

Abstract: Sign Language is the dominant form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental. To this end, sign language recognition and production are two necessary p… ▽ More Sign Language is the dominant form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental. To this end, sign language recognition and production are two necessary parts for making such a two-way system. Sign language recognition and production need to cope with some critical challenges. In this survey, we review recent advances in Sign Language Production (SLP) and related areas using deep learning. To have more realistic perspectives to sign language, we present an introduction to the Deaf culture, Deaf centers, psychological perspective of sign language, the main differences between spoken language and sign language. Furthermore, we present the fundamental components of a bi-directional sign language translation system, discussing the main challenges in this area. Also, the backbone architectures and methods in SLP are briefly introduced and the proposed taxonomy on SLP is presented. Finally, a general framework for SLP and performance evaluation, and also a discussion on the recent developments, advantages, and limitations in SLP, commenting on possible lines for future research are presented. △ Less

Submitted 6 January, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2103.15910

arXiv:2110.14051 [pdf, other]

A Unified Survey on Anomaly, Novelty, Open-Set, and Out-of-Distribution Detection: Solutions and Future Challenges

Authors: Mohammadreza Salehi, Hossein Mirzaei, Dan Hendrycks, Yixuan Li, Mohammad Hossein Rohban, Mohammad Sabokrou

Abstract: Machine learning models often encounter samples that are diverged from the training distribution. Failure to recognize an out-of-distribution (OOD) sample, and consequently assign that sample to an in-class label significantly compromises the reliability of a model. The problem has gained significant attention due to its importance for safety deploying models in open-world settings. Detecting OOD… ▽ More Machine learning models often encounter samples that are diverged from the training distribution. Failure to recognize an out-of-distribution (OOD) sample, and consequently assign that sample to an in-class label significantly compromises the reliability of a model. The problem has gained significant attention due to its importance for safety deploying models in open-world settings. Detecting OOD samples is challenging due to the intractability of modeling all possible unknown distributions. To date, several research domains tackle the problem of detecting unfamiliar samples, including anomaly detection, novelty detection, one-class learning, open set recognition, and out-of-distribution detection. Despite having similar and shared concepts, out-of-distribution, open-set, and anomaly detection have been investigated independently. Accordingly, these research avenues have not cross-pollinated, creating research barriers. While some surveys intend to provide an overview of these approaches, they seem to only focus on a specific domain without examining the relationship between different domains. This survey aims to provide a cross-domain and comprehensive review of numerous eminent works in respective areas while identifying their commonalities. Researchers can benefit from the overview of research advances in different fields and develop future methodology synergistically. Furthermore, to the best of our knowledge, while there are surveys in anomaly detection or one-class learning, there is no comprehensive or up-to-date survey on out-of-distribution detection, which our survey covers extensively. Finally, having a unified cross-domain perspective, we discuss and shed light on future lines of research, intending to bring these fields closer together. △ Less

Submitted 3 December, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: Published in Transaction on Machine Learning (TMLR)

arXiv:2109.00796 [pdf, other]

Multi-Modal Zero-Shot Sign Language Recognition

Authors: Razieh Rastgoo, Kourosh Kiani, Sergio Escalera, Mohammad Sabokrou

Abstract: Zero-Shot Learning (ZSL) has rapidly advanced in recent years. Towards overcoming the annotation bottleneck in the Sign Language Recognition (SLR), we explore the idea of Zero-Shot Sign Language Recognition (ZS-SLR) with no annotated visual examples, by leveraging their textual descriptions. In this way, we propose a multi-modal Zero-Shot Sign Language Recognition (ZS-SLR) model harnessing from th… ▽ More Zero-Shot Learning (ZSL) has rapidly advanced in recent years. Towards overcoming the annotation bottleneck in the Sign Language Recognition (SLR), we explore the idea of Zero-Shot Sign Language Recognition (ZS-SLR) with no annotated visual examples, by leveraging their textual descriptions. In this way, we propose a multi-modal Zero-Shot Sign Language Recognition (ZS-SLR) model harnessing from the complementary capabilities of deep features fused with the skeleton-based ones. A Transformer-based model along with a C3D model is used for hand detection and deep features extraction, respectively. To make a trade-off between the dimensionality of the skeletonbased and deep features, we use an Auto-Encoder (AE) on top of the Long Short Term Memory (LSTM) network. Finally, a semantic space is used to map the visual features to the lingual embedding of the class labels, achieved via the Bidirectional Encoder Representations from Transformers (BERT) model. Results on four large-scale datasets, RKS-PERSIANSIGN, First-Person, ASLVID, and isoGD, show the superiority of the proposed model compared to state-of-the-art alternatives in ZS-SLR. △ Less

Submitted 2 September, 2021; originally announced September 2021.

Comments: arXiv admin note: text overlap with arXiv:2108.10059

arXiv:2103.15910 [pdf, other]

Sign Language Production: A Review

Authors: Razieh Rastgoo, Kourosh Kiani, Sergio Escalera, Mohammad Sabokrou

Abstract: Sign Language is the dominant yet non-primary form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental. To this end, sign language recognition and production are… ▽ More Sign Language is the dominant yet non-primary form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental. To this end, sign language recognition and production are two necessary parts for making such a two-way system. Sign language recognition and production need to cope with some critical challenges. In this survey, we review recent advances in Sign Language Production (SLP) and related areas using deep learning. This survey aims to briefly summarize recent achievements in SLP, discussing their advantages, limitations, and future directions of research. △ Less

Submitted 29 March, 2021; originally announced March 2021.

arXiv:2103.15486 [pdf, other]

ClaRe: Practical Class Incremental Learning By Remembering Previous Class Representations

Authors: Bahram Mohammadi, Mohammad Sabokrou

Abstract: This paper presents a practical and simple yet efficient method to effectively deal with the catastrophic forgetting for Class Incremental Learning (CIL) tasks. CIL tends to learn new concepts perfectly, but not at the expense of performance and accuracy for old data. Learning new knowledge in the absence of data instances from previous classes or even imbalance samples of both old and new classes… ▽ More This paper presents a practical and simple yet efficient method to effectively deal with the catastrophic forgetting for Class Incremental Learning (CIL) tasks. CIL tends to learn new concepts perfectly, but not at the expense of performance and accuracy for old data. Learning new knowledge in the absence of data instances from previous classes or even imbalance samples of both old and new classes makes CIL an ongoing challenging problem. These issues can be tackled by storing exemplars belonging to the previous tasks or by utilizing the rehearsal strategy. Inspired by the rehearsal strategy with the approach of using generative models, we propose ClaRe, an efficient solution for CIL by remembering the representations of learned classes in each increment. Taking this approach leads to generating instances with the same distribution of the learned classes. Hence, our model is somehow retrained from the scratch using a new training set including both new and the generated samples. Subsequently, the imbalance data problem is also solved. ClaRe has a better generalization than prior methods thanks to producing diverse instances from the distribution of previously learned classes. We comprehensively evaluate ClaRe on the MNIST benchmark. Results show a very low degradation on accuracy against facing new knowledge over time. Furthermore, contrary to the most proposed solutions, the memory limitation is not problematic any longer which is considered as a consequential issue in this research area. △ Less

Submitted 29 March, 2021; originally announced March 2021.

arXiv:2103.12216 [pdf, other]

ZS-IL: Looking Back on Learned Experiences For Zero-Shot Incremental Learning

Authors: Mozhgan PourKeshavarz, Mohammad Sabokrou

Abstract: Classical deep neural networks are limited in their ability to learn from emerging streams of training data. When trained sequentially on new or evolving tasks, their performance degrades sharply, making them inappropriate in real-world use cases. Existing methods tackle it by either storing old data samples or only updating a parameter set of DNNs, which, however, demands a large memory budget or… ▽ More Classical deep neural networks are limited in their ability to learn from emerging streams of training data. When trained sequentially on new or evolving tasks, their performance degrades sharply, making them inappropriate in real-world use cases. Existing methods tackle it by either storing old data samples or only updating a parameter set of DNNs, which, however, demands a large memory budget or spoils the flexibility of models to learn the incremented class distribution. In this paper, we shed light on an on-call transfer set to provide past experiences whenever a new class arises in the data stream. In particular, we propose a Zero-Shot Incremental Learning not only to replay past experiences the model has learned but also to perform this in a zero-shot manner. Towards this end, we introduced a memory recovery paradigm in which we query the network to synthesize past exemplars whenever a new task (class) emerges. Thus, our method needs no fixed-sized memory, besides calls the proposed memory recovery paradigm to provide past exemplars, named a transfer set in order to mitigate catastrophically forgetting the former classes. Moreover, in contrast with recently proposed methods, the suggested paradigm does not desire a parallel architecture since it only relies on the learner network. Compared to the state-of-the-art data techniques without buffering past data samples, ZS-IL demonstrates significantly better performance on the well-known datasets (CIFAR-10, Tiny-ImageNet) in both Task-IL and Class-IL settings. △ Less

Submitted 22 March, 2021; originally announced March 2021.

arXiv:2103.10502

Ano-Graph: Learning Normal Scene Contextual Graphs to Detect Video Anomalies

Authors: Masoud Pourreza, Mohammadreza Salehi, Mohammad Sabokrou

Abstract: Video anomaly detection has proved to be a challenging task owing to its unsupervised training procedure and high spatio-temporal complexity existing in real-world scenarios. In the absence of anomalous training samples, state-of-the-art methods try to extract features that fully grasp normal behaviors in both space and time domains using different approaches such as autoencoders, or generative ad… ▽ More Video anomaly detection has proved to be a challenging task owing to its unsupervised training procedure and high spatio-temporal complexity existing in real-world scenarios. In the absence of anomalous training samples, state-of-the-art methods try to extract features that fully grasp normal behaviors in both space and time domains using different approaches such as autoencoders, or generative adversarial networks. However, these approaches completely ignore or, by using the ability of deep networks in the hierarchical modeling, poorly model the spatio-temporal interactions that exist between objects. To address this issue, we propose a novel yet efficient method named Ano-Graph for learning and modeling the interaction of normal objects. Towards this end, a Spatio-Temporal Graph (STG) is made by considering each node as an object's feature extracted from a real-time off-the-shelf object detector, and edges are made based on their interactions. After that, a self-supervised learning method is employed on the STG in such a way that encapsulates interactions in a semantic space. Our method is data-efficient, significantly more robust against common real-world variations such as illumination, and passes SOTA by a large margin on the challenging datasets ADOC and Street Scene while stays competitive on Avenue, ShanghaiTech, and UCSD. △ Less

Submitted 16 November, 2022; v1 submitted 18 March, 2021; originally announced March 2021.

Comments: Inconsistencies in the results

arXiv:2103.01739 [pdf, other]

Image/Video Deep Anomaly Detection: A Survey

Authors: Bahram Mohammadi, Mahmood Fathy, Mohammad Sabokrou

Abstract: The considerable significance of Anomaly Detection (AD) problem has recently drawn the attention of many researchers. Consequently, the number of proposed methods in this research field has been increased steadily. AD strongly correlates with the important computer vision and image processing tasks such as image/video anomaly, irregularity and sudden event detection. More recently, Deep Neural Net… ▽ More The considerable significance of Anomaly Detection (AD) problem has recently drawn the attention of many researchers. Consequently, the number of proposed methods in this research field has been increased steadily. AD strongly correlates with the important computer vision and image processing tasks such as image/video anomaly, irregularity and sudden event detection. More recently, Deep Neural Networks (DNNs) offer a high performance set of solutions, but at the expense of a heavy computational cost. However, there is a noticeable gap between the previously proposed methods and an applicable real-word approach. Regarding the raised concerns about AD as an ongoing challenging problem, notably in images and videos, the time has come to argue over the pitfalls and prospects of methods have attempted to deal with visual AD tasks. Hereupon, in this survey we intend to conduct an in-depth investigation into the images/videos deep learning based AD methods. We also discuss current challenges and future research directions thoroughly. △ Less

Submitted 2 March, 2021; originally announced March 2021.

arXiv:2009.11090 [pdf, other]

Robustification of Segmentation Models Against Adversarial Perturbations In Medical Imaging

Authors: Hanwool Park, Amirhossein Bayat, Mohammad Sabokrou, Jan S. Kirschke, Bjoern H. Menze

Abstract: This paper presents a novel yet efficient defense framework for segmentation models against adversarial attacks in medical imaging. In contrary to the defense methods against adversarial attacks for classification models which widely are investigated, such defense methods for segmentation models has been less explored. Our proposed method can be used for any deep learning models without revising t… ▽ More This paper presents a novel yet efficient defense framework for segmentation models against adversarial attacks in medical imaging. In contrary to the defense methods against adversarial attacks for classification models which widely are investigated, such defense methods for segmentation models has been less explored. Our proposed method can be used for any deep learning models without revising the target deep learning models, as well as can be independent of adversarial attacks. Our framework consists of a frequency domain converter, a detector, and a reformer. The frequency domain converter helps the detector detects adversarial examples by using a frame domain of an image. The reformer helps target models to predict more precisely. We have experiments to empirically show that our proposed method has a better performance compared to the existing defense method. △ Less

Submitted 23 September, 2020; originally announced September 2020.

arXiv:2006.11629 [pdf, other]

G2D: Generate to Detect Anomaly

Authors: Masoud Pourreza, Bahram Mohammadi, Mostafa Khaki, Samir Bouindour, Hichem Snoussi, Mohammad Sabokrou

Abstract: In this paper, we propose a novel method for irregularity detection. Previous researches solve this problem as a One-Class Classification (OCC) task where they train a reference model on all of the available samples. Then, they consider a test sample as an anomaly if it has a diversion from the reference model. Generative Adversarial Networks (GANs) have achieved the most promising results for OCC… ▽ More In this paper, we propose a novel method for irregularity detection. Previous researches solve this problem as a One-Class Classification (OCC) task where they train a reference model on all of the available samples. Then, they consider a test sample as an anomaly if it has a diversion from the reference model. Generative Adversarial Networks (GANs) have achieved the most promising results for OCC while implementing and training such networks, especially for the OCC task, is a cumbersome and computationally expensive procedure. To cope with the mentioned challenges, we present a simple but effective method to solve the irregularity detection as a binary classification task in order to make the implementation easier along with improving the detection performance. We learn two deep neural networks (generator and discriminator) in a GAN-style setting on merely the normal samples. During training, the generator gradually becomes an expert to generate samples which are similar to the normal ones. In the training phase, when the generator fails to produce normal data (in the early stages of learning and also prior to the complete convergence), it can be considered as an irregularity generator. In this way, we simultaneously generate the irregular samples. Afterward, we train a binary classifier on the generated anomalous samples along with the normal instances in order to be capable of detecting irregularities. The proposed framework applies to different related applications of outlier and anomaly detection in images and videos, respectively. The results confirm that our proposed method is superior to the baseline and state-of-the-art solutions. △ Less

Submitted 27 June, 2020; v1 submitted 20 June, 2020; originally announced June 2020.

arXiv:2002.04821 [pdf, other]

Deep-HR: Fast Heart Rate Estimation from Face Video Under Realistic Conditions

Authors: Mohammad Sabokrou, Masoud Pourreza, Xiaobai Li, Mahmood Fathy, Guoying Zhao

Abstract: This paper presents a novel method for remote heart rate (HR) estimation. Recent studies have proved that blood pum** by the heart is highly correlated to the intense color of face pixels, and surprisingly can be utilized for remote HR estimation. Researchers successfully proposed several methods for this task, but making it work in realistic situations is still a challenging problem in computer… ▽ More This paper presents a novel method for remote heart rate (HR) estimation. Recent studies have proved that blood pum** by the heart is highly correlated to the intense color of face pixels, and surprisingly can be utilized for remote HR estimation. Researchers successfully proposed several methods for this task, but making it work in realistic situations is still a challenging problem in computer vision community. Furthermore, learning to solve such a complex task on a dataset with very limited annotated samples is not reasonable. Consequently, researchers do not prefer to use the deep learning approaches for this problem. In this paper, we propose a simple yet efficient approach to benefit the advantages of the Deep Neural Network (DNN) by simplifying HR estimation from a complex task to learning from very correlated representation to HR. Inspired by previous work, we learn a component called Front-End (FE) to provide a discriminative representation of face videos, afterward a light deep regression auto-encoder as Back-End (BE) is learned to map the FE representation to HR. Regression task on the informative representation is simple and could be learned efficiently on limited training samples. Beside of this, to be more accurate and work well on low-quality videos, two deep encoder-decoder networks are trained to refine the output of FE. We also introduce a challenging dataset (HR-D) to show that our method can efficiently work in realistic conditions. Experimental results on HR-D and MAHNOB datasets confirm that our method could run as a real-time method and estimate the average HR better than state-of-the-art ones. △ Less

Submitted 12 February, 2020; originally announced February 2020.

arXiv:2001.06099 [pdf, other]

Code-Bridged Classifier (CBC): A Low or Negative Overhead Defense for Making a CNN Classifier Robust Against Adversarial Attacks

Authors: Farnaz Behnia, Ali Mirzaeian, Mohammad Sabokrou, Sai Manoj, Tinoosh Mohsenin, Khaled N. Khasawneh, Liang Zhao, Houman Homayoun, Avesta Sasan

Abstract: In this paper, we propose Code-Bridged Classifier (CBC), a framework for making a Convolutional Neural Network (CNNs) robust against adversarial attacks without increasing or even by decreasing the overall models' computational complexity. More specifically, we propose a stacked encoder-convolutional model, in which the input image is first encoded by the encoder module of a denoising auto-encoder… ▽ More In this paper, we propose Code-Bridged Classifier (CBC), a framework for making a Convolutional Neural Network (CNNs) robust against adversarial attacks without increasing or even by decreasing the overall models' computational complexity. More specifically, we propose a stacked encoder-convolutional model, in which the input image is first encoded by the encoder module of a denoising auto-encoder, and then the resulting latent representation (without being decoded) is fed to a reduced complexity CNN for image classification. We illustrate that this network not only is more robust to adversarial examples but also has a significantly lower computational complexity when compared to the prior art defenses. △ Less

Submitted 16 January, 2020; originally announced January 2020.

Comments: 6 pages, Accepted and to appear in ISQED 2020

arXiv:1911.03306 [pdf, other]

AutoIDS: Auto-encoder Based Method for Intrusion Detection System

Authors: Mohammed Gharib, Bahram Mohammadi, Shadi Hejareh Dastgerdi, Mohammad Sabokrou

Abstract: Intrusion Detection System (IDS) is one of the most effective solutions for providing primary security services. IDSs are generally working based on attack signatures or by detecting anomalies. In this paper, we have presented AutoIDS, a novel yet efficient solution for IDS, based on a semi-supervised machine learning technique. AutoIDS can distinguish abnormal packet flows from normal ones by tak… ▽ More Intrusion Detection System (IDS) is one of the most effective solutions for providing primary security services. IDSs are generally working based on attack signatures or by detecting anomalies. In this paper, we have presented AutoIDS, a novel yet efficient solution for IDS, based on a semi-supervised machine learning technique. AutoIDS can distinguish abnormal packet flows from normal ones by taking advantage of cascading two efficient detectors. These detectors are two encoder-decoder neural networks that are forced to provide a compressed and a sparse representation from the normal flows. In the test phase, failing these neural networks on providing compressed or sparse representation from an incoming packet flow, means such flow does not comply with the normal traffic and thus it is considered as an intrusion. For lowering the computational cost along with preserving the accuracy, a large number of flows are just processed by the first detector. In fact, the second detector is only used for difficult samples which the first detector is not confident about them. We have evaluated AutoIDS on the NSL-KDD benchmark as a widely-used and well-known dataset. The accuracy of AutoIDS is 90.17\% showing its superiority compared to the other state-of-the-art methods. △ Less

Submitted 8 November, 2019; originally announced November 2019.

arXiv:1908.10455 [pdf, other]

Self-Supervised Representation Learning via Neighborhood-Relational Encoding

Authors: Mohammad Sabokrou, Mohammad Khalooei, Ehsan Adeli

Abstract: In this paper, we propose a novel self-supervised representation learning by taking advantage of a neighborhood-relational encoding (NRE) among the training data. Conventional unsupervised learning methods only focused on training deep networks to understand the primitive characteristics of the visual data, mainly to be able to reconstruct the data from a latent space. They often neglected the rel… ▽ More In this paper, we propose a novel self-supervised representation learning by taking advantage of a neighborhood-relational encoding (NRE) among the training data. Conventional unsupervised learning methods only focused on training deep networks to understand the primitive characteristics of the visual data, mainly to be able to reconstruct the data from a latent space. They often neglected the relation among the samples, which can serve as an important metric for self-supervision. Different from the previous work, NRE aims at preserving the local neighborhood structure on the data manifold. Therefore, it is less sensitive to outliers. We integrate our NRE component with an encoder-decoder structure for learning to represent samples considering their local neighborhood information. Such discriminative and unsupervised representation learning scheme is adaptable to different computer vision tasks due to its independence from intense annotation requirements. We evaluate our proposed method for different tasks, including classification, detection, and segmentation based on the learned latent representations. In addition, we adopt the auto-encoding capability of our proposed method for applications like defense against adversarial example attacks and video anomaly detection. Results confirm the performance of our method is better or at least comparable with the state-of-the-art for each specific application, but with a generic and self-supervised approach. △ Less

Submitted 27 August, 2019; originally announced August 2019.

Comments: Accepted in International Conference on Computer Vision (ICCV) 2019

arXiv:1907.05640 [pdf, other]

AVD: Adversarial Video Distillation

Authors: Mohammad Tavakolian, Mohammad Sabokrou, Abdenour Hadid

Abstract: In this paper, we present a simple yet efficient approach for video representation, called Adversarial Video Distillation (AVD). The key idea is to represent videos by compressing them in the form of realistic images, which can be used in a variety of video-based scene analysis applications. Representing a video as a single image enables us to address the problem of video analysis by image analysi… ▽ More In this paper, we present a simple yet efficient approach for video representation, called Adversarial Video Distillation (AVD). The key idea is to represent videos by compressing them in the form of realistic images, which can be used in a variety of video-based scene analysis applications. Representing a video as a single image enables us to address the problem of video analysis by image analysis techniques. To this end, we exploit a 3D convolutional encoder-decoder network to encode the input video as an image by minimizing the reconstruction error. Furthermore, weak supervision by an adversarial training procedure is imposed on the output of the encoder to generate semantically realistic images. The encoder learns to extract semantically meaningful representations from a given input video by map** the 3D input into a 2D latent representation. The obtained representation can be simply used as the input of deep models pre-trained on images for video classification. We evaluated the effectiveness of our proposed method for video-based activity recognition on three standard and challenging benchmark datasets, i.e. UCF101, HMDB51, and Kinetics. The experimental results demonstrate that AVD achieves interesting performance, outperforming the state-of-the-art methods for video classification. △ Less

Submitted 12 July, 2019; originally announced July 2019.

arXiv:1904.11577 [pdf, other]

End-to-End Adversarial Learning for Intrusion Detection in Computer Networks

Authors: Bahram Mohammadi, Mohammad Sabokrou

Abstract: This paper presents a simple yet efficient method for an anomaly-based Intrusion Detection System (IDS). In reality, IDSs can be defined as a one-class classification system, where the normal traffic is the target class. The high diversity of network attacks in addition to the need for generalization, motivate us to propose a semi-supervised method. Inspired by the successes of Generative Adversar… ▽ More This paper presents a simple yet efficient method for an anomaly-based Intrusion Detection System (IDS). In reality, IDSs can be defined as a one-class classification system, where the normal traffic is the target class. The high diversity of network attacks in addition to the need for generalization, motivate us to propose a semi-supervised method. Inspired by the successes of Generative Adversarial Networks (GANs) for training deep models in semi-unsupervised setting, we have proposed an end-to-end deep architecture for IDS. The proposed architecture is composed of two deep networks, each of which trained by competing with each other to understand the underlying concept of the normal traffic class. The key idea of this paper is to compensate the lack of anomalous traffic by approximately obtain them from normal flows. In this case, our method is not biased towards the available intrusions in the training set leading to more accurate detection. The proposed method has been evaluated on NSL-KDD dataset. The results confirm that our method outperforms the other state-of-the-art approaches. △ Less

Submitted 25 April, 2019; originally announced April 2019.

arXiv:1806.09986 [pdf]

Online Signature Verification using Deep Representation: A new Descriptor

Authors: Mohammad Hajizadeh Saffar, Mohsen Fayyaz, Mohammad Sabokrou, Mahmood Fathy

Abstract: This paper presents an accurate method for verifying online signatures. The main difficulty of signature verification come from: (1) Lacking enough training samples (2) The methods must be spatial change invariant. To deal with these difficulties and modeling the signatures efficiently, we propose a method that a one-class classifier per each user is built on discriminative features. First, we pre… ▽ More This paper presents an accurate method for verifying online signatures. The main difficulty of signature verification come from: (1) Lacking enough training samples (2) The methods must be spatial change invariant. To deal with these difficulties and modeling the signatures efficiently, we propose a method that a one-class classifier per each user is built on discriminative features. First, we pre-train a sparse auto-encoder using a large number of unlabeled signatures, then we applied the discriminative features, which are learned by auto-encoder to represent the training and testing signatures as a self-thought learning method (i.e. we have introduced a signature descriptor). Finally, user's signatures are modeled and classified using a one-class classifier. The proposed method is independent on signature datasets thanks to self-taught learning. The experimental results indicate significant error reduction and accuracy enhancement in comparison with state-of-the-art methods on SVC2004 and SUSIG datasets. △ Less

Submitted 23 June, 2018; originally announced June 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1505.08153

arXiv:1806.06172 [pdf]

Semantic Video Segmentation: A Review on Recent Approaches

Authors: Mohammad Hajizadeh Saffar, Mohsen Fayyaz, Mohammad Sabokrou, Mahmood Fathy

Abstract: This paper gives an overview on semantic segmentation consists of an explanation of this field, it's status and relation with other vision fundamental tasks, different datasets and common evaluation parameters that have been used by researchers. This survey also includes an overall review on a variety of recent approaches (RDF, MRF, CRF, etc.) and their advantages and challenges and shows the supe… ▽ More This paper gives an overview on semantic segmentation consists of an explanation of this field, it's status and relation with other vision fundamental tasks, different datasets and common evaluation parameters that have been used by researchers. This survey also includes an overall review on a variety of recent approaches (RDF, MRF, CRF, etc.) and their advantages and challenges and shows the superiority of CNN-based semantic segmentation systems on CamVid and NYUDv2 datasets. In addition, some areas that is ideal for future work have mentioned. △ Less

Submitted 15 June, 2018; originally announced June 2018.

arXiv:1805.09521 [pdf, other]

AVID: Adversarial Visual Irregularity Detection

Authors: Mohammad Sabokrou, Masoud Pourreza, Mohsen Fayyaz, Rahim Entezari, Mahmood Fathy, Jürgen Gall, Ehsan Adeli

Abstract: Real-time detection of irregularities in visual data is very invaluable and useful in many prospective applications including surveillance, patient monitoring systems, etc. With the surge of deep learning methods in the recent years, researchers have tried a wide spectrum of methods for different applications. However, for the case of irregularity or anomaly detection in videos, training an end-to… ▽ More Real-time detection of irregularities in visual data is very invaluable and useful in many prospective applications including surveillance, patient monitoring systems, etc. With the surge of deep learning methods in the recent years, researchers have tried a wide spectrum of methods for different applications. However, for the case of irregularity or anomaly detection in videos, training an end-to-end model is still an open challenge, since often irregularity is not well-defined and there are not enough irregular samples to use during training. In this paper, inspired by the success of generative adversarial networks (GANs) for training deep models in unsupervised or self-supervised settings, we propose an end-to-end deep network for detection and fine localization of irregularities in videos (and images). Our proposed architecture is composed of two networks, which are trained in competing with each other while collaborating to find the irregularity. One network works as a pixel-level irregularity Inpainter, and the other works as a patch-level Detector. After an adversarial self-supervised training, in which I tries to fool D into accepting its inpainted output as regular (normal), the two networks collaborate to detect and fine-segment the irregularity in any given testing video. Our results on three different datasets show that our method can outperform the state-of-the-art and fine-segment the irregularity. △ Less

Submitted 17 July, 2018; v1 submitted 24 May, 2018; originally announced May 2018.

arXiv:1802.09088 [pdf, other]

Adversarially Learned One-Class Classifier for Novelty Detection

Authors: Mohammad Sabokrou, Mohammad Khalooei, Mahmood Fathy, Ehsan Adeli

Abstract: Novelty detection is the process of identifying the observation(s) that differ in some respect from the training observations (the target class). In reality, the novelty class is often absent during training, poorly sampled or not well defined. Therefore, one-class classifiers can efficiently model such problems. However, due to the unavailability of data from the novelty class, training an end-to… ▽ More Novelty detection is the process of identifying the observation(s) that differ in some respect from the training observations (the target class). In reality, the novelty class is often absent during training, poorly sampled or not well defined. Therefore, one-class classifiers can efficiently model such problems. However, due to the unavailability of data from the novelty class, training an end-to-end deep network is a cumbersome task. In this paper, inspired by the success of generative adversarial networks for training deep models in unsupervised and semi-supervised settings, we propose an end-to-end architecture for one-class classification. Our architecture is composed of two deep networks, each of which trained by competing with each other while collaborating to understand the underlying concept in the target class, and then classify the testing samples. One network works as the novelty detector, while the other supports it by enhancing the inlier samples and distorting the outliers. The intuition is that the separability of the enhanced inliers and distorted outliers is much better than deciding on the original samples. The proposed framework applies to different related applications of anomaly and outlier detection in images and videos. The results on MNIST and Caltech-256 image datasets, along with the challenging UCSD Ped2 dataset for video anomaly detection illustrate that our proposed method learns the target class effectively and is superior to the baseline and state-of-the-art methods. △ Less

Submitted 24 May, 2018; v1 submitted 25 February, 2018; originally announced February 2018.

Comments: CVPR 2018 Paper

arXiv:1802.06205 [pdf, other]

Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet

Authors: Seyyed Hossein Hasanpour, Mohammad Rouhani, Mohsen Fayyaz, Mohammad Sabokrou, Ehsan Adeli

Abstract: Major winning Convolutional Neural Networks (CNNs), such as VGGNet, ResNet, DenseNet, \etc, include tens to hundreds of millions of parameters, which impose considerable computation and memory overheads. This limits their practical usage in training and optimizing for real-world applications. On the contrary, light-weight architectures, such as SqueezeNet, are being proposed to address this issue.… ▽ More Major winning Convolutional Neural Networks (CNNs), such as VGGNet, ResNet, DenseNet, \etc, include tens to hundreds of millions of parameters, which impose considerable computation and memory overheads. This limits their practical usage in training and optimizing for real-world applications. On the contrary, light-weight architectures, such as SqueezeNet, are being proposed to address this issue. However, they mainly suffer from low accuracy, as they have compromised between the processing power and efficiency. These inefficiencies mostly stem from following an ad-hoc designing procedure. In this work, we discuss and propose several crucial design principles for an efficient architecture design and elaborate intuitions concerning different aspects of the design procedure. Furthermore, we introduce a new layer called {\it SAF-pooling} to improve the generalization power of the network while kee** it simple by choosing best features. Based on such principles, we propose a simple architecture called {\it SimpNet}. We empirically show that SimpNet provides a good trade-off between the computation/memory efficiency and the accuracy solely based on these primitive but crucial principles. SimpNet outperforms the deeper and more complex architectures such as VGGNet, ResNet, WideResidualNet \etc, on several well-known benchmarks, while having 2 to 25 times fewer number of parameters and operations. We obtain state-of-the-art results (in terms of a balance between the accuracy and the number of involved parameters) on standard datasets, such as CIFAR10, CIFAR100, MNIST and SVHN. The implementations are available at \href{url}{https://github.com/Coderx7/SimpNet}. △ Less

Submitted 17 February, 2018; originally announced February 2018.

Comments: The Submitted version to the IEEE TIP on December 2017, replaced high resolution images with low-res counterparts due to arXiv size limitation, 19 pages

arXiv:1609.00866 [pdf, other]

Deep-Anomaly: Fully Convolutional Neural Network for Fast Anomaly Detection in Crowded Scenes

Authors: Mohammad Sabokrou, Mohsen Fayyaz, Mahmood Fathy, Zahra Moayedd, Reinhard klette

Abstract: The detection of abnormal behaviours in crowded scenes has to deal with many challenges. This paper presents an efficient method for detection and localization of anomalies in videos. Using fully convolutional neural networks (FCNs) and temporal data, a pre-trained supervised FCN is transferred into an unsupervised FCN ensuring the detection of (global) anomalies in scenes. High performance in ter… ▽ More The detection of abnormal behaviours in crowded scenes has to deal with many challenges. This paper presents an efficient method for detection and localization of anomalies in videos. Using fully convolutional neural networks (FCNs) and temporal data, a pre-trained supervised FCN is transferred into an unsupervised FCN ensuring the detection of (global) anomalies in scenes. High performance in terms of speed and accuracy is achieved by investigating the cascaded detection as a result of reducing computation complexities. This FCN-based architecture addresses two main tasks, feature representation and cascaded outlier detection. Experimental results on two benchmarks suggest that detection and localization of the proposed method outperforms existing methods in terms of accuracy. △ Less

Submitted 30 April, 2017; v1 submitted 3 September, 2016; originally announced September 2016.

arXiv:1608.06037 [pdf, other]

Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures

Authors: Seyyed Hossein Hasanpour, Mohammad Rouhani, Mohsen Fayyaz, Mohammad Sabokrou

Abstract: Major winning Convolutional Neural Networks (CNNs), such as AlexNet, VGGNet, ResNet, GoogleNet, include tens to hundreds of millions of parameters, which impose considerable computation and memory overhead. This limits their practical use for training, optimization and memory efficiency. On the contrary, light-weight architectures, being proposed to address this issue, mainly suffer from low accur… ▽ More Major winning Convolutional Neural Networks (CNNs), such as AlexNet, VGGNet, ResNet, GoogleNet, include tens to hundreds of millions of parameters, which impose considerable computation and memory overhead. This limits their practical use for training, optimization and memory efficiency. On the contrary, light-weight architectures, being proposed to address this issue, mainly suffer from low accuracy. These inefficiencies mostly stem from following an ad hoc procedure. We propose a simple architecture, called SimpleNet, based on a set of designing principles, with which we empirically show, a well-crafted yet simple and reasonably deep architecture can perform on par with deeper and more complex architectures. SimpleNet provides a good tradeoff between the computation/memory efficiency and the accuracy. Our simple 13-layer architecture outperforms most of the deeper and complex architectures to date such as VGGNet, ResNet, and GoogleNet on several well-known benchmarks while having 2 to 25 times fewer number of parameters and operations. This makes it very handy for embedded systems or systems with computational and memory limitations. We achieved state-of-the-art result on CIFAR10 outperforming several heavier architectures, near state of the art on MNIST and competitive results on CIFAR100 and SVHN. We also outperformed the much larger and deeper architectures such as VGGNet and popular variants of ResNets among others on the ImageNet dataset. Models are made available at: https://github.com/Coderx7/SimpleNet △ Less

Submitted 27 April, 2023; v1 submitted 21 August, 2016; originally announced August 2016.

Comments: Added the long-overdue ImageNet results and updated the missed cifar10/100 results from 2018

arXiv:1608.05971 [pdf, other]

STFCN: Spatio-Temporal FCN for Semantic Video Segmentation

Authors: Mohsen Fayyaz, Mohammad Hajizadeh Saffar, Mohammad Sabokrou, Mahmood Fathy, Reinhard Klette, Fay Huang

Abstract: This paper presents a novel method to involve both spatial and temporal features for semantic video segmentation. Current work on convolutional neural networks(CNNs) has shown that CNNs provide advanced spatial features supporting a very good performance of solutions for both image and video analysis, especially for the semantic segmentation task. We investigate how involving temporal features als… ▽ More This paper presents a novel method to involve both spatial and temporal features for semantic video segmentation. Current work on convolutional neural networks(CNNs) has shown that CNNs provide advanced spatial features supporting a very good performance of solutions for both image and video analysis, especially for the semantic segmentation task. We investigate how involving temporal features also has a good effect on segmenting video data. We propose a module based on a long short-term memory (LSTM) architecture of a recurrent neural network for interpreting the temporal characteristics of video frames over time. Our system takes as input frames of a video and produces a correspondingly-sized output; for segmenting the video our method combines the use of three components: First, the regional spatial features of frames are extracted using a CNN; then, using LSTM the temporal features are added; finally, by deconvolving the spatio-temporal features we produce pixel-wise predictions. Our key insight is to build spatio-temporal convolutional networks (spatio-temporal CNNs) that have an end-to-end architecture for semantic video segmentation. We adapted fully some known convolutional network architectures (such as FCN-AlexNet and FCN-VGG16), and dilated convolution into our spatio-temporal CNNs. Our spatio-temporal CNNs achieve state-of-the-art semantic segmentation, as demonstrated for the Camvid and NYUDv2 datasets. △ Less

Submitted 2 September, 2016; v1 submitted 21 August, 2016; originally announced August 2016.

arXiv:1603.02419 [pdf]

doi 10.1109/ICCKE.2012.6395359

An Evolvable Fuzzy Logic System for handoff management in heterogeneous Wireless Networks

Authors: Hossein Fayyazi, Mohammad Sabokrou

Abstract: One of the features of the Next Generation Wireless Networks (NGWNs) is its heterogeneous communication environment. Heterogeneous networks are ranging from wireless WAN, LAN, MAN and PAN. The most important parameters in this regard are different data rates and coverage of these networks. In this paper, we propose an evolvable fuzzy logic-based algorithm for managing handover process in heterogen… ▽ More One of the features of the Next Generation Wireless Networks (NGWNs) is its heterogeneous communication environment. Heterogeneous networks are ranging from wireless WAN, LAN, MAN and PAN. The most important parameters in this regard are different data rates and coverage of these networks. In this paper, we propose an evolvable fuzzy logic-based algorithm for managing handover process in heterogeneous wireless networks. We use fuzzy logic system to determine the right time of initiating the handoff procedure and utilize Genetic Algorithm (GA) to predict the best form of rules of the fuzzy module. We implement our proposed scheme on a special simulation model and discuss about the impact of evolution on the algorithm. △ Less

Submitted 8 March, 2016; originally announced March 2016.

arXiv:1511.07425

Real-Time Anomalous Behavior Detection and Localization in Crowded Scenes

Authors: Mohammad Sabokrou, Mahmood Fathy, Mojtaba Hosseini

Abstract: In this paper, we propose an accurate and real-time anomaly detection and localization in crowded scenes, and two descriptors for representing anomalous behavior in video are proposed. We consider a video as being a set of cubic patches. Based on the low likelihood of an anomaly occurrence, and the redundancy of structures in normal patches in videos, two (global and local) views are considered fo… ▽ More In this paper, we propose an accurate and real-time anomaly detection and localization in crowded scenes, and two descriptors for representing anomalous behavior in video are proposed. We consider a video as being a set of cubic patches. Based on the low likelihood of an anomaly occurrence, and the redundancy of structures in normal patches in videos, two (global and local) views are considered for modeling the video. Our algorithm has two components, for (1) representing the patches using local and global descriptors, and for (2) modeling the training patches using a new representation. We have two Gaussian models for all training patches respect to global and local descriptors. The local and global features are based on structure similarity between adjacent patches and the features that are learned in an unsupervised way. We propose a fusion strategy to combine the two descriptors as the output of our system. Experimental results show that our algorithm performs like a state-of-the-art method on several standard datasets, but even is more time-efficient. △ Less

Submitted 2 January, 2016; v1 submitted 21 November, 2015; originally announced November 2015.

Comments: This paper has been withdrawn by the author due to some error in experimental result. There are some mistakes

arXiv:1511.06936 [pdf, other]

Real-Time Anomaly Detection and Localization in Crowded Scenes

Authors: Mohammad Sabokrou, Mahmood Fathy, Mojtaba Hosseini, Reinhard Klette

Abstract: In this paper, we propose a method for real-time anomaly detection and localization in crowded scenes. Each video is defined as a set of non-overlap** cubic patches, and is described using two local and global descriptors. These descriptors capture the video properties from different aspects. By incorporating simple and cost-effective Gaussian classifiers, we can distinguish normal activities an… ▽ More In this paper, we propose a method for real-time anomaly detection and localization in crowded scenes. Each video is defined as a set of non-overlap** cubic patches, and is described using two local and global descriptors. These descriptors capture the video properties from different aspects. By incorporating simple and cost-effective Gaussian classifiers, we can distinguish normal activities and anomalies in videos. The local and global features are based on structure similarity between adjacent patches and the features learned in an unsupervised way, using a sparse auto- encoder. Experimental results show that our algorithm is comparable to a state-of-the-art procedure on UCSD ped2 and UMN benchmarks, but even more time-efficient. The experiments confirm that our system can reliably detect and localize anomalies as soon as they happen in a video. △ Less

Submitted 21 November, 2015; originally announced November 2015.

Comments: CVPRw 2015

arXiv:1508.03710 [pdf]

A Novel Approach For Finger Vein Verification Based on Self-Taught Learning

Authors: Mohsen Fayyaz, Masoud PourReza, Mohammad Hajizadeh Saffar, Mohammad Sabokrou, Mahmood Fathy

Abstract: In this paper, we propose a method for user Finger Vein Authentication (FVA) as a biometric system. Using the discriminative features for classifying theses finger veins is one of the main tips that make difference in related works, Thus we propose to learn a set of representative features, based on autoencoders. We model the user finger vein using a Gaussian distribution. Experimental results sho… ▽ More In this paper, we propose a method for user Finger Vein Authentication (FVA) as a biometric system. Using the discriminative features for classifying theses finger veins is one of the main tips that make difference in related works, Thus we propose to learn a set of representative features, based on autoencoders. We model the user finger vein using a Gaussian distribution. Experimental results show that our algorithm perform like a state-of-the-art on SDUMLA-HMT benchmark. △ Less

Submitted 15 August, 2015; originally announced August 2015.

Comments: 4 pages, 4 figures, Submitted Iranian Conference on Machine Vision and Image Processing

arXiv:1506.00122

IDSA: Intelligent Distributed Sensor Activation Algorithm For Target Tracking With Wireless Sensor Network

Authors: Mohammad Sabokrou, Mahmood Fathy, Mojtaba Hoseini

Abstract: One important application of the Wireless Sensor Network(WSN) is target tracking, the aim of this application is converging to an event or object in an area. In this paper, we propose an energy-efficient distributed sensor activation protocol based on predicted location technique, called Intelligent Distributed Sensor Activation Algorithm (IDSA). The proposed algorithm predicts the location of tar… ▽ More One important application of the Wireless Sensor Network(WSN) is target tracking, the aim of this application is converging to an event or object in an area. In this paper, we propose an energy-efficient distributed sensor activation protocol based on predicted location technique, called Intelligent Distributed Sensor Activation Algorithm (IDSA). The proposed algorithm predicts the location of target in the next time interval, by analyzing current location and movement history of the target, this prediction is done by computational intelligence. The fewest essential number of sensor nodes within the predicted location will be activated to cover the target. The results show that the proposed method outperforms the existing methods such as Naïve and DSA in terms of energy consumption and the number of nodes that was involved in tracking the target. △ Less

Submitted 15 February, 2017; v1 submitted 30 May, 2015; originally announced June 2015.

Comments: We have found several mistake in this our paper. (1) The English writing of paper must improved (2) the experimental results especially Figure 9, 10 and 12 are wrong (3) Some references must revised. We are providing new version of paper. We are concern about, this version make confusion for readers

arXiv:1505.08153 [pdf]

doi 10.1109/AISP.2015.7123528

Feature Representation for Online Signature Verification

Authors: Mohsen Fayyaz, Mohammad Hajizadeh_Saffar, Mohammad Sabokrou, Mahmood Fathy

Abstract: Biometrics systems have been used in a wide range of applications and have improved people authentication. Signature verification is one of the most common biometric methods with techniques that employ various specifications of a signature. Recently, deep learning has achieved great success in many fields, such as image, sounds and text processing. In this paper, deep learning method has been used… ▽ More Biometrics systems have been used in a wide range of applications and have improved people authentication. Signature verification is one of the most common biometric methods with techniques that employ various specifications of a signature. Recently, deep learning has achieved great success in many fields, such as image, sounds and text processing. In this paper, deep learning method has been used for feature extraction and feature selection. △ Less

Submitted 29 May, 2015; originally announced May 2015.

Comments: 10 pages, 10 figures, Submitted to IEEE Transactions on Information Forensics and Security

Showing 1–44 of 44 results for author: Sabokrou, M