Search | arXiv e-print repository

Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling

Authors: Cristian Rodriguez-Opazo, Ehsan Abbasnejad, Damien Teney, Edison Marrese-Taylor, Hamed Damirchi, Anton van den Hengel

Abstract: Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning. Various architectures, from vision transformers (ViTs) to convolutional networks (ResNets) have been trained with CLIP to serve as general solutions to diverse vision tasks. This paper explores the differences across various CLIP-trained vision backbones. Despite using the same data an… ▽ More Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning. Various architectures, from vision transformers (ViTs) to convolutional networks (ResNets) have been trained with CLIP to serve as general solutions to diverse vision tasks. This paper explores the differences across various CLIP-trained vision backbones. Despite using the same data and training objective, we find that these architectures have notably different representations, different classification performance across datasets, and different robustness properties to certain types of image perturbations. Our findings indicate a remarkable possible synergy across backbones by leveraging their respective strengths. In principle, classification accuracy could be improved by over 40 percentage with an informed selection of the optimal backbone per test example.Using this insight, we develop a straightforward yet powerful approach to adaptively ensemble multiple backbones. The approach uses as few as one labeled example per class to tune the adaptive combination of backbones. On a large collection of datasets, the method achieves a remarkable increase in accuracy of up to 39.1% over the best single backbone, well beyond traditional ensembles △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2312.14400

arXiv:2404.05311 [pdf, other]

BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack

Authors: Viet Quoc Vo, Ehsan Abbasnejad, Damith C. Ranasinghe

Abstract: We study the unique, less-well understood problem of generating sparse adversarial samples simply by observing the score-based replies to model queries. Sparse attacks aim to discover a minimum number-the l0 bounded-perturbations to model inputs to craft adversarial examples and misguide model decisions. But, in contrast to query-based dense attack counterparts against black-box models, constructi… ▽ More We study the unique, less-well understood problem of generating sparse adversarial samples simply by observing the score-based replies to model queries. Sparse attacks aim to discover a minimum number-the l0 bounded-perturbations to model inputs to craft adversarial examples and misguide model decisions. But, in contrast to query-based dense attack counterparts against black-box models, constructing sparse adversarial perturbations, even when models serve confidence score information to queries in a score-based setting, is non-trivial. Because, such an attack leads to i) an NP-hard problem; and ii) a non-differentiable search space. We develop the BruSLeAttack-a new, faster (more query-efficient) Bayesian algorithm for the problem. We conduct extensive attack evaluations including an attack demonstration against a Machine Learning as a Service (MLaaS) offering exemplified by Google Cloud Vision and robustness testing of adversarial training regimes and a recent defense against black-box attacks. The proposed attack scales to achieve state-of-the-art attack success rates and query efficiency on standard computer vision tasks such as ImageNet across different model architectures. Our artefacts and DIY attack samples are available on GitHub. Importantly, our work facilitates faster evaluation of model vulnerabilities and raises our vigilance on the safety, security and reliability of deployed systems. △ Less

Submitted 1 June, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

Comments: Published as a conference paper at the International Conference on Learning Representations (ICLR 2024). Code is available at https://brusliattack.github.io/

arXiv:2403.18309 [pdf, other]

Bayesian Learned Models Can Detect Adversarial Malware For Free

Authors: Bao Gia Doan, Dang Quang Nguyen, Paul Montague, Tamas Abraham, Olivier De Vel, Seyit Camtepe, Salil S. Kanhere, Ehsan Abbasnejad, Damith C. Ranasinghe

Abstract: The vulnerability of machine learning-based malware detectors to adversarial attacks has prompted the need for robust solutions. Adversarial training is an effective method but is computationally expensive to scale up to large datasets and comes at the cost of sacrificing model performance for robustness. We hypothesize that adversarial malware exploits the low-confidence regions of models and can… ▽ More The vulnerability of machine learning-based malware detectors to adversarial attacks has prompted the need for robust solutions. Adversarial training is an effective method but is computationally expensive to scale up to large datasets and comes at the cost of sacrificing model performance for robustness. We hypothesize that adversarial malware exploits the low-confidence regions of models and can be identified using epistemic uncertainty of ML approaches -- epistemic uncertainty in a machine learning-based malware detector is a result of a lack of similar training samples in regions of the problem space. In particular, a Bayesian formulation can capture the model parameters' distribution and quantify epistemic uncertainty without sacrificing model performance. To verify our hypothesis, we consider Bayesian learning approaches with a mutual information-based formulation to quantify uncertainty and detect adversarial malware in Android, Windows domains and PDF malware. We found, quantifying uncertainty through Bayesian learning methods can defend against adversarial malware. In particular, Bayesian models: (1) are generally capable of identifying adversarial malware in both feature and problem space, (2) can detect concept drift by measuring uncertainty, and (3) with a diversity-promoting approach (or better posterior approximations) lead to parameter instances from the posterior to significantly enhance a detectors' ability. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted to the 29th European Symposium on Research in Computer Security (ESORICS) 2024 Conference

arXiv:2403.07968 [pdf, other]

Do Deep Neural Network Solutions Form a Star Domain?

Authors: Ankit Sonthalia, Alexander Rubinstein, Ehsan Abbasnejad, Seong Joon Oh

Abstract: It has recently been conjectured that neural network solution sets reachable via stochastic gradient descent (SGD) are convex, considering permutation invariances (Entezari et al., 2022). This means that a linear path can connect two independent solutions with low loss, given the weights of one of the models are appropriately permuted. However, current methods to test this theory often require ver… ▽ More It has recently been conjectured that neural network solution sets reachable via stochastic gradient descent (SGD) are convex, considering permutation invariances (Entezari et al., 2022). This means that a linear path can connect two independent solutions with low loss, given the weights of one of the models are appropriately permuted. However, current methods to test this theory often require very wide networks to succeed. In this work, we conjecture that more generally, the SGD solution set is a "star domain" that contains a "star model" that is linearly connected to all the other solutions via paths with low loss values, modulo permutations. We propose the Starlight algorithm that finds a star model of a given learning task. We validate our claim by showing that this star model is linearly connected with other independently found solutions. As an additional benefit of our study, we demonstrate better uncertainty estimates on the Bayesian Model Averaging over the obtained star domain. Further, we demonstrate star models as potential substitutes for model ensembles. Our code is available at https://github.com/aktsonthalia/starlight. △ Less

Submitted 9 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07356 [pdf, other]

Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning

Authors: Mark D. McDonnell, Dong Gong, Ehsan Abbasnejad, Anton van den Hengel

Abstract: Continual learning requires a model to adapt to ongoing changes in the data distribution, and often to the set of tasks to be performed. It is rare, however, that the data and task changes are completely unpredictable. Given a description of an overarching goal or data theme, which we call a realm, humans can often guess what concepts are associated with it. We show here that the combination of a… ▽ More Continual learning requires a model to adapt to ongoing changes in the data distribution, and often to the set of tasks to be performed. It is rare, however, that the data and task changes are completely unpredictable. Given a description of an overarching goal or data theme, which we call a realm, humans can often guess what concepts are associated with it. We show here that the combination of a large language model and an image generation model can similarly provide useful premonitions as to how a continual learning challenge might develop over time. We use the large language model to generate text descriptions of semantically related classes that might potentially appear in the data stream in future. These descriptions are then rendered using Stable Diffusion to generate new labelled image samples. The resulting synthetic dataset is employed for supervised pre-training, but is discarded prior to commencing continual learning, along with the pre-training classification head. We find that the backbone of our pre-trained networks can learn representations useful for the downstream continual learning problem, thus becoming a valuable input to any existing continual learning method. Although there are complexities arising from the domain gap between real and synthetic images, we show that pre-training models in this manner improves multiple Class Incremenal Learning (CIL) methods on fine-grained image classification benchmarks. Supporting code can be found at https://github.com/cl-premonition/premonition. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 31 pages total (14 main paper, 5 references, 12 appendices)

arXiv:2403.02241 [pdf, other]

Neural Redshift: Random Networks are not Random Functions

Authors: Damien Teney, Armand Nicolicioiu, Valentin Hartmann, Ehsan Abbasnejad

Abstract: Our understanding of the generalization capabilities of neural networks (NNs) is still incomplete. Prevailing explanations are based on implicit biases of gradient descent (GD) but they cannot account for the capabilities of models from gradient-free methods nor the simplicity bias recently observed in untrained networks. This paper seeks other sources of generalization in NNs. Findings. To unde… ▽ More Our understanding of the generalization capabilities of neural networks (NNs) is still incomplete. Prevailing explanations are based on implicit biases of gradient descent (GD) but they cannot account for the capabilities of models from gradient-free methods nor the simplicity bias recently observed in untrained networks. This paper seeks other sources of generalization in NNs. Findings. To understand the inductive biases provided by architectures independently from GD, we examine untrained, random-weight networks. Even simple MLPs show strong inductive biases: uniform sampling in weight space yields a very biased distribution of functions in terms of complexity. But unlike common wisdom, NNs do not have an inherent "simplicity bias". This property depends on components such as ReLUs, residual connections, and layer normalizations. Alternative architectures can be built with a bias for any level of complexity. Transformers also inherit all these properties from their building blocks. Implications. We provide a fresh explanation for the success of deep learning independent from gradient-based training. It points at promising avenues for controlling the solutions implemented by trained models. △ Less

Submitted 5 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Journal ref: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2024

arXiv:2312.14400 [pdf, other]

Unveiling Backbone Effects in CLIP: Exploring Representational Synergies and Variances

Authors: Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Ehsan Abbasnejad, Hamed Damirchi, Ignacio M. Jara, Felipe Bravo-Marquez, Anton van den Hengel

Abstract: Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning. Various neural architectures, spanning Transformer-based models like Vision Transformers (ViTs) to Convolutional Networks (ConvNets) like ResNets, are trained with CLIP and serve as universal backbones across diverse vision tasks. Despite utilizing the same data and training objectives… ▽ More Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning. Various neural architectures, spanning Transformer-based models like Vision Transformers (ViTs) to Convolutional Networks (ConvNets) like ResNets, are trained with CLIP and serve as universal backbones across diverse vision tasks. Despite utilizing the same data and training objectives, the effectiveness of representations learned by these architectures raises a critical question. Our investigation explores the differences in CLIP performance among these backbone architectures, revealing significant disparities in their classifications. Notably, normalizing these representations results in substantial performance variations. Our findings showcase a remarkable possible synergy between backbone predictions that could reach an improvement of over 20% through informed selection of the appropriate backbone. Moreover, we propose a simple, yet effective approach to combine predictions from multiple backbones, leading to a notable performance boost of up to 6.34\%. We will release the code for reproducing the results. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2311.17949 [pdf, other]

Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines

Authors: Hamed Damirchi, Cristian Rodríguez-Opazo, Ehsan Abbasnejad, Damien Teney, Javen Qinfeng Shi, Stephen Gould, Anton van den Hengel

Abstract: Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box. The Web likely contains the information necessary to excel on any specific application, but identifying the right data a priori is challenging. This paper shows how to leverage recent advances in NLP and multi-modal le… ▽ More Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box. The Web likely contains the information necessary to excel on any specific application, but identifying the right data a priori is challenging. This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval. We propose to retrieve useful data from the Web at test time based on test cases that the model is uncertain about. Different from existing retrieval-augmented approaches, we then update the model to address this underlying uncertainty. We demonstrate substantial improvements in zero-shot performance, e.g. a remarkable increase of 15 percentage points in accuracy on the Stanford Cars and Flowers datasets. We also present extensive experiments that explore the impact of noisy retrieval and different learning strategies. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.03866 [pdf, other]

SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial Network for an end-to-end image translation

Authors: Iman Abbasnejad, Fabio Zambetta, Flora Salim, Timothy Wiley, Jeffrey Chan, Russell Gallagher, Ehsan Abbasnejad

Abstract: SCONE-GAN presents an end-to-end image translation, which is shown to be effective for learning to generate realistic and diverse scenery images. Most current image-to-image translation approaches are devised as two map**s: a translation from the source to target domain and another to represent its inverse. While successful in many applications, these approaches may suffer from generating trivia… ▽ More SCONE-GAN presents an end-to-end image translation, which is shown to be effective for learning to generate realistic and diverse scenery images. Most current image-to-image translation approaches are devised as two map**s: a translation from the source to target domain and another to represent its inverse. While successful in many applications, these approaches may suffer from generating trivial solutions with limited diversity. That is because these methods learn more frequent associations rather than the scene structures. To mitigate the problem, we propose SCONE-GAN that utilises graph convolutional networks to learn the objects dependencies, maintain the image structure and preserve its semantics while transferring images into the target domain. For more realistic and diverse image generation we introduce style reference image. We enforce the model to maximize the mutual information between the style image and output. The proposed method explicitly maximizes the mutual information between the related patches, thus encouraging the generator to produce more diverse images. We validate the proposed algorithm for image-to-image translation and stylizing outdoor images. Both qualitative and quantitative results demonstrate the effectiveness of our approach on four dataset. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 9 pages, 5 figures

arXiv:2309.04659 [pdf, other]

Progressive Feature Adjustment for Semi-supervised Learning from Pretrained Models

Authors: Hai-Ming Xu, Lingqiao Liu, Hao Chen, Ehsan Abbasnejad, Rafael Felix

Abstract: As an effective way to alleviate the burden of data annotation, semi-supervised learning (SSL) provides an attractive solution due to its ability to leverage both labeled and unlabeled data to build a predictive model. While significant progress has been made recently, SSL algorithms are often evaluated and developed under the assumption that the network is randomly initialized. This is in sharp c… ▽ More As an effective way to alleviate the burden of data annotation, semi-supervised learning (SSL) provides an attractive solution due to its ability to leverage both labeled and unlabeled data to build a predictive model. While significant progress has been made recently, SSL algorithms are often evaluated and developed under the assumption that the network is randomly initialized. This is in sharp contrast to most vision recognition systems that are built from fine-tuning a pretrained network for better performance. While the marriage of SSL and a pretrained model seems to be straightforward, recent literature suggests that naively applying state-of-the-art SSL with a pretrained model fails to unleash the full potential of training data. In this paper, we postulate the underlying reason is that the pretrained feature representation could bring a bias inherited from the source data, and the bias tends to be magnified through the self-training process in a typical SSL algorithm. To overcome this issue, we propose to use pseudo-labels from the unlabelled data to update the feature extractor that is less sensitive to incorrect labels and only allow the classifier to be trained from the labeled data. More specifically, we progressively adjust the feature extractor to ensure its induced feature distribution maintains a good class separability even under strong input perturbation. Through extensive experimental studies, we show that the proposed approach achieves superior performance over existing solutions. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: to appear at ICCVW2023 (Workshop on Visual Continual Learning)

arXiv:2307.02251 [pdf, other]

RanPAC: Random Projections and Pre-trained Models for Continual Learning

Authors: Mark D. McDonnell, Dong Gong, Amin Parveneh, Ehsan Abbasnejad, Anton van den Hengel

Abstract: Continual learning (CL) aims to incrementally learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones. Most CL works focus on tackling catastrophic forgetting under a learning-from-scratch paradigm. However, with the increasing prominence of foundation models, pre-trained models equipped with informative representations have become available for v… ▽ More Continual learning (CL) aims to incrementally learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones. Most CL works focus on tackling catastrophic forgetting under a learning-from-scratch paradigm. However, with the increasing prominence of foundation models, pre-trained models equipped with informative representations have become available for various downstream requirements. Several CL methods based on pre-trained models have been explored, either utilizing pre-extracted features directly (which makes bridging distribution gaps challenging) or incorporating adaptors (which may be subject to forgetting). In this paper, we propose a concise and effective approach for CL with pre-trained models. Given that forgetting occurs during parameter updating, we contemplate an alternative approach that exploits training-free random projectors and class-prototype accumulation, which thus bypasses the issue. Specifically, we inject a frozen Random Projection layer with nonlinear activation between the pre-trained model's feature representations and output head, which captures interactions between features with expanded dimensionality, providing enhanced linear separability for class-prototype-based CL. We also demonstrate the importance of decorrelating the class-prototypes to reduce the distribution disparity when using pre-trained representations. These techniques prove to be effective and circumvent the problem of forgetting for both class- and domain-incremental continual learning. Compared to previous methods applied to pre-trained ViT-B/16 models, we reduce final error rates by between 20% and 62% on seven class-incremental benchmarks, despite not using any rehearsal memory. We conclude that the full potential of pre-trained models for simple, effective, and fast CL has not hitherto been fully tapped. Code is at github.com/RanPAC/RanPAC. △ Less

Submitted 15 January, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: 32 pages, 11 figures

Journal ref: 37th Annual Conference on Neural Information Processing Systems (NeurIPS 2023), Dec 2023, New Orleans, United States

arXiv:2305.18026 [pdf, other]

Semantic Role Labeling Guided Out-of-distribution Detection

Authors: **an Zou, Maihao Guo, Yu Tian, Yuhao Lin, Haiyao Cao, Lingqiao Liu, Ehsan Abbasnejad, Javen Qinfeng Shi

Abstract: Identifying unexpected domain-shifted instances in natural language processing is crucial in real-world applications. Previous works identify the out-of-distribution (OOD) instance by leveraging a single global feature embedding to represent the sentence, which cannot characterize subtle OOD patterns well. Another major challenge current OOD methods face is learning effective low-dimensional sente… ▽ More Identifying unexpected domain-shifted instances in natural language processing is crucial in real-world applications. Previous works identify the out-of-distribution (OOD) instance by leveraging a single global feature embedding to represent the sentence, which cannot characterize subtle OOD patterns well. Another major challenge current OOD methods face is learning effective low-dimensional sentence representations to identify the hard OOD instances that are semantically similar to the in-distribution (ID) data. In this paper, we propose a new unsupervised OOD detection method, namely Semantic Role Labeling Guided Out-of-distribution Detection (SRLOOD), that separates, extracts, and learns the semantic role labeling (SRL) guided fine-grained local feature representations from different arguments of a sentence and the global feature representations of the full sentence using a margin-based contrastive loss. A novel self-supervised approach is also introduced to enhance such global-local feature learning by predicting the SRL extracted role. The resulting model achieves SOTA performance on four OOD benchmarks, indicating the effectiveness of our approach. The code is publicly accessible via \url{https://github.com/cytai/SRLOOD}. △ Less

Submitted 18 March, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: accepted by COLING 2024

arXiv:2305.16817 [pdf, other]

Selective Mixup Helps with Distribution Shifts, But Not (Only) because of Mixup

Authors: Damien Teney, **dong Wang, Ehsan Abbasnejad

Abstract: Mixup is a highly successful technique to improve generalization of neural networks by augmenting the training data with combinations of random pairs. Selective mixup is a family of methods that apply mixup to specific pairs, e.g. only combining examples across classes or domains. These methods have claimed remarkable improvements on benchmarks with distribution shifts, but their mechanisms and li… ▽ More Mixup is a highly successful technique to improve generalization of neural networks by augmenting the training data with combinations of random pairs. Selective mixup is a family of methods that apply mixup to specific pairs, e.g. only combining examples across classes or domains. These methods have claimed remarkable improvements on benchmarks with distribution shifts, but their mechanisms and limitations remain poorly understood. We examine an overlooked aspect of selective mixup that explains its success in a completely new light. We find that the non-random selection of pairs affects the training distribution and improve generalization by means completely unrelated to the mixing. For example in binary classification, mixup across classes implicitly resamples the data for a uniform class distribution - a classical solution to label shift. We show empirically that this implicit resampling explains much of the improvements in prior work. Theoretically, these results rely on a regression toward the mean, an accidental property that we identify in several datasets. We have found a new equivalence between two successful methods: selective mixup and resampling. We identify limits of the former, confirm the effectiveness of the latter, and find better combinations of their respective benefits. △ Less

Submitted 2 June, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

arXiv:2303.13556 [pdf, other]

ProtoCon: Pseudo-label Refinement via Online Clustering and Prototypical Consistency for Efficient Semi-supervised Learning

Authors: Islam Nassar, Munawar Hayat, Ehsan Abbasnejad, Hamid Rezatofighi, Gholamreza Haffari

Abstract: Confidence-based pseudo-labeling is among the dominant approaches in semi-supervised learning (SSL). It relies on including high-confidence predictions made on unlabeled data as additional targets to train the model. We propose ProtoCon, a novel SSL method aimed at the less-explored label-scarce SSL where such methods usually underperform. ProtoCon refines the pseudo-labels by leveraging their nea… ▽ More Confidence-based pseudo-labeling is among the dominant approaches in semi-supervised learning (SSL). It relies on including high-confidence predictions made on unlabeled data as additional targets to train the model. We propose ProtoCon, a novel SSL method aimed at the less-explored label-scarce SSL where such methods usually underperform. ProtoCon refines the pseudo-labels by leveraging their nearest neighbours' information. The neighbours are identified as the training proceeds using an online clustering approach operating in an embedding space trained via a prototypical loss to encourage well-formed clusters. The online nature of ProtoCon allows it to utilise the label history of the entire dataset in one training cycle to refine labels in the following cycle without the need to store image embeddings. Hence, it can seamlessly scale to larger datasets at a low cost. Finally, ProtoCon addresses the poor training signal in the initial phase of training (due to fewer confident predictions) by introducing an auxiliary self-supervised loss. It delivers significant gains and faster convergence over state-of-the-art across 5 datasets, including CIFARs, ImageNet and DomainNet. △ Less

Submitted 22 March, 2023; originally announced March 2023.

Comments: Accepted in CVPR2023 (highlight)

arXiv:2301.12680 [pdf, other]

Feature-Space Bayesian Adversarial Learning Improved Malware Detector Robustness

Authors: Bao Gia Doan, Shuiqiao Yang, Paul Montague, Olivier De Vel, Tamas Abraham, Seyit Camtepe, Salil S. Kanhere, Ehsan Abbasnejad, Damith C. Ranasinghe

Abstract: We present a new algorithm to train a robust malware detector. Modern malware detectors rely on machine learning algorithms. Now, the adversarial objective is to devise alterations to the malware code to decrease the chance of being detected whilst preserving the functionality and realism of the malware. Adversarial learning is effective in improving robustness but generating functional and realis… ▽ More We present a new algorithm to train a robust malware detector. Modern malware detectors rely on machine learning algorithms. Now, the adversarial objective is to devise alterations to the malware code to decrease the chance of being detected whilst preserving the functionality and realism of the malware. Adversarial learning is effective in improving robustness but generating functional and realistic adversarial malware samples is non-trivial. Because: i) in contrast to tasks capable of using gradient-based feedback, adversarial learning in a domain without a differentiable map** function from the problem space (malware code inputs) to the feature space is hard; and ii) it is difficult to ensure the adversarial malware is realistic and functional. This presents a challenge for develo** scalable adversarial machine learning algorithms for large datasets at a production or commercial scale to realize robust malware detectors. We propose an alternative; perform adversarial learning in the feature space in contrast to the problem space. We prove the projection of perturbed, yet valid malware, in the problem space into feature space will always be a subset of adversarials generated in the feature space. Hence, by generating a robust network against feature-space adversarial examples, we inherently achieve robustness against problem-space adversarial examples. We formulate a Bayesian adversarial learning objective that captures the distribution of models for improved robustness. We prove that our learning method bounds the difference between the adversarial risk and empirical risk explaining the improved robustness. We show that adversarially trained BNNs achieve state-of-the-art robustness. Notably, adversarially trained BNNs are robust against stronger attacks with larger attack budgets by a margin of up to 15% on a recent production-scale malware dataset of more than 20 million samples. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Comments: Accepted to AAAI 2023 conference

arXiv:2212.02003 [pdf, other]

Bayesian Learning with Information Gain Provably Bounds Risk for a Robust Adversarial Defense

Authors: Bao Gia Doan, Ehsan Abbasnejad, Javen Qinfeng Shi, Damith C. Ranasinghe

Abstract: We present a new algorithm to learn a deep neural network model robust against adversarial attacks. Previous algorithms demonstrate an adversarially trained Bayesian Neural Network (BNN) provides improved robustness. We recognize the adversarial learning approach for approximating the multi-modal posterior distribution of a Bayesian model can lead to mode collapse; consequently, the model's achiev… ▽ More We present a new algorithm to learn a deep neural network model robust against adversarial attacks. Previous algorithms demonstrate an adversarially trained Bayesian Neural Network (BNN) provides improved robustness. We recognize the adversarial learning approach for approximating the multi-modal posterior distribution of a Bayesian model can lead to mode collapse; consequently, the model's achievements in robustness and performance are sub-optimal. Instead, we first propose preventing mode collapse to better approximate the multi-modal posterior distribution. Second, based on the intuition that a robust model should ignore perturbations and only consider the informative content of the input, we conceptualize and formulate an information gain objective to measure and force the information learned from both benign and adversarial training instances to be similar. Importantly. we prove and demonstrate that minimizing the information gain objective allows the adversarial risk to approach the conventional empirical risk. We believe our efforts provide a step toward a basis for a principled method of adversarially training BNNs. Our model demonstrate significantly improved robustness--up to 20%--compared with adversarial training and Adv-BNN under PGD attacks with 0.035 distortion on both CIFAR-10 and STL-10 datasets. △ Less

Submitted 1 December, 2023; v1 submitted 4 December, 2022; originally announced December 2022.

Comments: Published at ICML 2022. Code is available at https://github.com/baogiadoan/IG-BNN

Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:5309-5323, 2022

arXiv:2210.10317 [pdf, other]

LAVA: Label-efficient Visual Learning and Adaptation

Authors: Islam Nassar, Munawar Hayat, Ehsan Abbasnejad, Hamid Rezatofighi, Mehrtash Harandi, Gholamreza Haffari

Abstract: We present LAVA, a simple yet effective method for multi-domain visual transfer learning with limited data. LAVA builds on a few recent innovations to enable adapting to partially labelled datasets with class and domain shifts. First, LAVA learns self-supervised visual representations on the source dataset and ground them using class label semantics to overcome transfer collapse problems associate… ▽ More We present LAVA, a simple yet effective method for multi-domain visual transfer learning with limited data. LAVA builds on a few recent innovations to enable adapting to partially labelled datasets with class and domain shifts. First, LAVA learns self-supervised visual representations on the source dataset and ground them using class label semantics to overcome transfer collapse problems associated with supervised pretraining. Secondly, LAVA maximises the gains from unlabelled target data via a novel method which uses multi-crop augmentations to obtain highly robust pseudo-labels. By combining these ingredients, LAVA achieves a new state-of-the-art on ImageNet semi-supervised protocol, as well as on 7 out of 10 datasets in multi-domain few-shot learning on the Meta-dataset. Code and models are made available. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: Accepted in WACV2023

arXiv:2209.00613 [pdf, other]

ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets

Authors: Damien Teney, Yong Lin, Seong Joon Oh, Ehsan Abbasnejad

Abstract: Several studies have compared the in-distribution (ID) and out-of-distribution (OOD) performance of models in computer vision and NLP. They report a frequent positive correlation and some surprisingly never even observe an inverse correlation indicative of a necessary trade-off. The possibility of inverse patterns is important to determine whether ID performance can serve as a proxy for OOD genera… ▽ More Several studies have compared the in-distribution (ID) and out-of-distribution (OOD) performance of models in computer vision and NLP. They report a frequent positive correlation and some surprisingly never even observe an inverse correlation indicative of a necessary trade-off. The possibility of inverse patterns is important to determine whether ID performance can serve as a proxy for OOD generalization capabilities. This paper shows with multiple datasets that inverse correlations between ID and OOD performance do happen in real-world data - not only in theoretical worst-case settings. We also explain theoretically how these cases can arise even in a minimal linear setting, and why past studies could miss such cases due to a biased selection of models. Our observations lead to recommendations that contradict those found in much of the current literature. - High OOD performance sometimes requires trading off ID performance. - Focusing on ID performance alone may not lead to optimal OOD performance. It may produce diminishing (eventually negative) returns in OOD performance. - In these cases, studies on OOD generalization that use ID performance for model selection (a common recommended practice) will necessarily miss the best-performing models, making these studies blind to a whole range of phenomena. △ Less

Submitted 19 May, 2023; v1 submitted 1 September, 2022; originally announced September 2022.

arXiv:2208.14571 [pdf, other]

Truncated Matrix Power Iteration for Differentiable DAG Learning

Authors: Zhen Zhang, Ignavier Ng, Dong Gong, Yuhang Liu, Ehsan M Abbasnejad, Mingming Gong, Kun Zhang, Javen Qinfeng Shi

Abstract: Recovering underlying Directed Acyclic Graph (DAG) structures from observational data is highly challenging due to the combinatorial nature of the DAG-constrained optimization problem. Recently, DAG learning has been cast as a continuous optimization problem by characterizing the DAG constraint as a smooth equality one, generally based on polynomials over adjacency matrices. Existing methods place… ▽ More Recovering underlying Directed Acyclic Graph (DAG) structures from observational data is highly challenging due to the combinatorial nature of the DAG-constrained optimization problem. Recently, DAG learning has been cast as a continuous optimization problem by characterizing the DAG constraint as a smooth equality one, generally based on polynomials over adjacency matrices. Existing methods place very small coefficients on high-order polynomial terms for stabilization, since they argue that large coefficients on the higher-order terms are harmful due to numeric exploding. On the contrary, we discover that large coefficients on higher-order terms are beneficial for DAG learning, when the spectral radiuses of the adjacency matrices are small, and that larger coefficients for higher-order terms can approximate the DAG constraints much better than the small counterparts. Based on this, we propose a novel DAG learning method with efficient truncated matrix power iteration to approximate geometric series based DAG constraints. Empirically, our DAG learning method outperforms the previous state-of-the-arts in various settings, often by a factor of $3$ or more in terms of structural Hamming distance. △ Less

Submitted 20 December, 2022; v1 submitted 30 August, 2022; originally announced August 2022.

Comments: Published in NeurIPS 2022

arXiv:2207.02598 [pdf, other]

Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning

Authors: Damien Teney, Maxime Peyrard, Ehsan Abbasnejad

Abstract: Machine learning (ML) models are typically optimized for their accuracy on a given dataset. However, this predictive criterion rarely captures all desirable properties of a model, in particular how well it matches a domain expert's understanding of a task. Underspecification refers to the existence of multiple models that are indistinguishable in their in-domain accuracy, even though they differ i… ▽ More Machine learning (ML) models are typically optimized for their accuracy on a given dataset. However, this predictive criterion rarely captures all desirable properties of a model, in particular how well it matches a domain expert's understanding of a task. Underspecification refers to the existence of multiple models that are indistinguishable in their in-domain accuracy, even though they differ in other desirable properties such as out-of-distribution (OOD) performance. Identifying these situations is critical for assessing the reliability of ML models. We formalize the concept of underspecification and propose a method to identify and partially address it. We train multiple models with an independence constraint that forces them to implement different functions. They discover predictive features that are otherwise ignored by standard empirical risk minimization (ERM), which we then distill into a global model with superior OOD performance. Importantly, we constrain the models to align with the data manifold to ensure that they discover meaningful features. We demonstrate the method on multiple datasets in computer vision (collages, WILDS-Camelyon17, GQA) and discuss general implications of underspecification. Most notably, in-domain performance cannot serve for OOD model selection without additional assumptions. △ Less

Submitted 6 July, 2022; originally announced July 2022.

Comments: Long version of a paper accepted at the 2022 European Conference on Computer Vision (ECCV)

arXiv:2206.14355 [pdf, other]

EBMs vs. CL: Exploring Self-Supervised Visual Pretraining for Visual Question Answering

Authors: Violetta Shevchenko, Ehsan Abbasnejad, Anthony Dick, Anton van den Hengel, Damien Teney

Abstract: The availability of clean and diverse labeled data is a major roadblock for training models on complex tasks such as visual question answering (VQA). The extensive work on large vision-and-language models has shown that self-supervised learning is effective for pretraining multimodal interactions. In this technical report, we focus on visual representations. We review and evaluate self-supervised… ▽ More The availability of clean and diverse labeled data is a major roadblock for training models on complex tasks such as visual question answering (VQA). The extensive work on large vision-and-language models has shown that self-supervised learning is effective for pretraining multimodal interactions. In this technical report, we focus on visual representations. We review and evaluate self-supervised methods to leverage unlabeled images and pretrain a model, which we then fine-tune on a custom VQA task that allows controlled evaluation and diagnosis. We compare energy-based models (EBMs) with contrastive learning (CL). While EBMs are growing in popularity, they lack an evaluation on downstream tasks. We find that both EBMs and CL can learn representations from unlabeled images that enable training a VQA model on very little annotated data. In a simple setting similar to CLEVR, we find that CL representations also improve systematic generalization, and even match the performance of representations from a larger, supervised, ImageNet-pretrained model. However, we find EBMs to be difficult to train because of instabilities and high variability in their results. Although EBMs prove useful for OOD detection, other results on supervised energy-based training and uncertainty calibration are largely negative. Overall, CL currently seems a preferable option over EBMs. △ Less

Submitted 28 June, 2022; originally announced June 2022.

arXiv:2206.06606 [pdf, other]

Astock: A New Dataset and Automated Stock Trading based on Stock-specific News Analyzing Model

Authors: **an Zou, Haiyao Cao, Lingqiao Liu, Yuhao Lin, Ehsan Abbasnejad, Javen Qinfeng Shi

Abstract: Natural Language Processing(NLP) demonstrates a great potential to support financial decision-making by analyzing the text from social media or news outlets. In this work, we build a platform to study the NLP-aided stock auto-trading algorithms systematically. In contrast to the previous work, our platform is characterized by three features: (1) We provide financial news for each specific stock. (… ▽ More Natural Language Processing(NLP) demonstrates a great potential to support financial decision-making by analyzing the text from social media or news outlets. In this work, we build a platform to study the NLP-aided stock auto-trading algorithms systematically. In contrast to the previous work, our platform is characterized by three features: (1) We provide financial news for each specific stock. (2) We provide various stock factors for each stock. (3) We evaluate performance from more financial-relevant metrics. Such a design allows us to develop and evaluate NLP-aided stock auto-trading algorithms in a more realistic setting. In addition to designing an evaluation platform and dataset collection, we also made a technical contribution by proposing a system to automatically learn a good feature representation from various input information. The key to our algorithm is a method called semantic role labeling Pooling (SRLP), which leverages Semantic Role Labeling (SRL) to create a compact representation of each news paragraph. Based on SRLP, we further incorporate other stock factors to make the final prediction. In addition, we propose a self-supervised learning strategy based on SRLP to enhance the out-of-distribution generalization performance of our system. Through our experimental study, we show that the proposed method achieves better performance and outperforms all the baselines' annualized rate of return as well as the maximum drawdown of the CSI300 index and XIN9 index on real trading. Our Astock dataset and code are available at https://github.com/**anZou/Astock. △ Less

Submitted 14 June, 2022; originally announced June 2022.

arXiv:2205.10189 [pdf, other]

Progressive Class Semantic Matching for Semi-supervised Text Classification

Authors: Hai-Ming Xu, Lingqiao Liu, Ehsan Abbasnejad

Abstract: Semi-supervised learning is a promising way to reduce the annotation cost for text-classification. Combining with pre-trained language models (PLMs), e.g., BERT, recent semi-supervised learning methods achieved impressive performance. In this work, we further investigate the marriage between semi-supervised learning and a pre-trained language model. Unlike existing approaches that utilize PLMs onl… ▽ More Semi-supervised learning is a promising way to reduce the annotation cost for text-classification. Combining with pre-trained language models (PLMs), e.g., BERT, recent semi-supervised learning methods achieved impressive performance. In this work, we further investigate the marriage between semi-supervised learning and a pre-trained language model. Unlike existing approaches that utilize PLMs only for model parameter initialization, we explore the inherent topic matching capability inside PLMs for building a more powerful semi-supervised learning approach. Specifically, we propose a joint semi-supervised learning process that can progressively build a standard $K$-way classifier and a matching network for the input text and the Class Semantic Representation (CSR). The CSR will be initialized from the given labeled sentences and progressively updated through the training process. By means of extensive experiments, we show that our method can not only bring remarkable improvement to baselines, but also overall be more stable, and achieves state-of-the-art performance in semi-supervised text classification. △ Less

Submitted 20 May, 2022; originally announced May 2022.

Comments: NAACL2022 (oral)

arXiv:2203.07034 [pdf, other]

Active Learning by Feature Mixing

Authors: Amin Parvaneh, Ehsan Abbasnejad, Damien Teney, Reza Haffari, Anton van den Hengel, Javen Qinfeng Shi

Abstract: The promise of active learning (AL) is to reduce labelling costs by selecting the most valuable examples to annotate from a pool of unlabelled data. Identifying these examples is especially challenging with high-dimensional data (e.g. images, videos) and in low-data regimes. In this paper, we propose a novel method for batch AL called ALFA-Mix. We identify unlabelled instances with sufficiently-di… ▽ More The promise of active learning (AL) is to reduce labelling costs by selecting the most valuable examples to annotate from a pool of unlabelled data. Identifying these examples is especially challenging with high-dimensional data (e.g. images, videos) and in low-data regimes. In this paper, we propose a novel method for batch AL called ALFA-Mix. We identify unlabelled instances with sufficiently-distinct features by seeking inconsistencies in predictions resulting from interventions on their representations. We construct interpolations between representations of labelled and unlabelled instances then examine the predicted labels. We show that inconsistencies in these predictions help discovering features that the model is unable to recognise in the unlabelled instances. We derive an efficient implementation based on a closed-form solution to the optimal interpolation causing changes in predictions. Our method outperforms all recent AL approaches in 30 different settings on 12 benchmarks of images, videos, and non-visual data. The improvements are especially significant in low-data regimes and on self-trained vision transformers, where ALFA-Mix outperforms the state-of-the-art in 59% and 43% of the experiments respectively. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: CVPR 2022

arXiv:2202.00091 [pdf, other]

Query Efficient Decision Based Sparse Attacks Against Black-Box Deep Learning Models

Authors: Viet Quoc Vo, Ehsan Abbasnejad, Damith C. Ranasinghe

Abstract: Despite our best efforts, deep learning models remain highly vulnerable to even tiny adversarial perturbations applied to the inputs. The ability to extract information from solely the output of a machine learning model to craft adversarial perturbations to black-box models is a practical threat against real-world systems, such as autonomous cars or machine learning models exposed as a service (ML… ▽ More Despite our best efforts, deep learning models remain highly vulnerable to even tiny adversarial perturbations applied to the inputs. The ability to extract information from solely the output of a machine learning model to craft adversarial perturbations to black-box models is a practical threat against real-world systems, such as autonomous cars or machine learning models exposed as a service (MLaaS). Of particular interest are sparse attacks. The realization of sparse attacks in black-box models demonstrates that machine learning models are more vulnerable than we believe. Because these attacks aim to minimize the number of perturbed pixels measured by l_0 norm-required to mislead a model by solely observing the decision (the predicted label) returned to a model query; the so-called decision-based attack setting. But, such an attack leads to an NP-hard optimization problem. We develop an evolution-based algorithm-SparseEvo-for the problem and evaluate against both convolutional deep neural networks and vision transformers. Notably, vision transformers are yet to be investigated under a decision-based attack setting. SparseEvo requires significantly fewer model queries than the state-of-the-art sparse attack Pointwise for both untargeted and targeted attacks. The attack algorithm, although conceptually simple, is also competitive with only a limited query budget against the state-of-the-art gradient-based whitebox attacks in standard computer vision tasks such as ImageNet. Importantly, the query efficient SparseEvo, along with decision-based attacks, in general, raise new questions regarding the safety of deployed systems and poses new directions to study and understand the robustness of machine learning models. △ Less

Submitted 23 March, 2023; v1 submitted 31 January, 2022; originally announced February 2022.

Comments: Published as a conference paper at the International Conference on Learning Representations (ICLR 2022). Code is available at https://sparseevoattack.github.io/

arXiv:2112.05282 [pdf, other]

doi 10.14722/ndss.2022.24200

RamBoAttack: A Robust Query Efficient Deep Neural Network Decision Exploit

Authors: Viet Quoc Vo, Ehsan Abbasnejad, Damith C. Ranasinghe

Abstract: Machine learning models are critically susceptible to evasion attacks from adversarial examples. Generally, adversarial examples, modified inputs deceptively similar to the original input, are constructed under whitebox settings by adversaries with full access to the model. However, recent attacks have shown a remarkable reduction in query numbers to craft adversarial examples using blackbox attac… ▽ More Machine learning models are critically susceptible to evasion attacks from adversarial examples. Generally, adversarial examples, modified inputs deceptively similar to the original input, are constructed under whitebox settings by adversaries with full access to the model. However, recent attacks have shown a remarkable reduction in query numbers to craft adversarial examples using blackbox attacks. Particularly, alarming is the ability to exploit the classification decision from the access interface of a trained model provided by a growing number of Machine Learning as a Service providers including Google, Microsoft, IBM and used by a plethora of applications incorporating these models. The ability of an adversary to exploit only the predicted label from a model to craft adversarial examples is distinguished as a decision-based attack. In our study, we first deep dive into recent state-of-the-art decision-based attacks in ICLR and SP to highlight the costly nature of discovering low distortion adversarial employing gradient estimation methods. We develop a robust query efficient attack capable of avoiding entrapment in a local minimum and misdirection from noisy gradients seen in gradient estimation methods. The attack method we propose, RamBoAttack, exploits the notion of Randomized Block Coordinate Descent to explore the hidden classifier manifold, targeting perturbations to manipulate only localized input features to address the issues of gradient estimation methods. Importantly, the RamBoAttack is more robust to the different sample inputs available to an adversary and the targeted class. Overall, for a given target class, RamBoAttack is demonstrated to be more robust at achieving a lower distortion within a given query budget. We curate our extensive results using the large-scale high-resolution ImageNet dataset and open-source our attack, test samples and artifacts on GitHub. △ Less

Submitted 23 March, 2023; v1 submitted 9 December, 2021; originally announced December 2021.

Comments: Published in Network and Distributed System Security (NDSS) Symposium 2022. Code is available at https://ramboattack.github.io/

arXiv:2111.09999 [pdf, other]

TnT Attacks! Universal Naturalistic Adversarial Patches Against Deep Neural Network Systems

Authors: Bao Gia Doan, Minhui Xue, Shiqing Ma, Ehsan Abbasnejad, Damith C. Ranasinghe

Abstract: Deep neural networks are vulnerable to attacks from adversarial inputs and, more recently, Trojans to misguide or hijack the model's decision. We expose the existence of an intriguing class of spatially bounded, physically realizable, adversarial examples -- Universal NaTuralistic adversarial paTches -- we call TnTs, by exploring the superset of the spatially bounded adversarial example space and… ▽ More Deep neural networks are vulnerable to attacks from adversarial inputs and, more recently, Trojans to misguide or hijack the model's decision. We expose the existence of an intriguing class of spatially bounded, physically realizable, adversarial examples -- Universal NaTuralistic adversarial paTches -- we call TnTs, by exploring the superset of the spatially bounded adversarial example space and the natural input space within generative adversarial networks. Now, an adversary can arm themselves with a patch that is naturalistic, less malicious-looking, physically realizable, highly effective achieving high attack success rates, and universal. A TnT is universal because any input image captured with a TnT in the scene will: i) misguide a network (untargeted attack); or ii) force the network to make a malicious decision (targeted attack). Interestingly, now, an adversarial patch attacker has the potential to exert a greater level of control -- the ability to choose a location-independent, natural-looking patch as a trigger in contrast to being constrained to noisy perturbations -- an ability is thus far shown to be only possible with Trojan attack methods needing to interfere with the model building processes to embed a backdoor at the risk discovery; but, still realize a patch deployable in the physical world. Through extensive experiments on the large-scale visual classification task, ImageNet with evaluations across its entire validation set of 50,000 images, we demonstrate the realistic threat from TnTs and the robustness of the attack. We show a generalization of the attack to create patches achieving higher attack success rates than existing state-of-the-art methods. Our results show the generalizability of the attack to different visual classification tasks (CIFAR-10, GTSRB, PubFig) and multiple state-of-the-art deep neural networks such as WideResnet50, Inception-V3 and VGG-16. △ Less

Submitted 25 July, 2022; v1 submitted 18 November, 2021; originally announced November 2021.

Comments: Accepted for publication in the IEEE Transactions on Information Forensics & Security (TIFS)

arXiv:2106.10784 [pdf, other]

iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients

Authors: Miao Zhang, Steven Su, Shirui Pan, Xiaojun Chang, Ehsan Abbasnejad, Reza Haffari

Abstract: \textit{Differentiable ARchiTecture Search} (DARTS) has recently become the mainstream of neural architecture search (NAS) due to its efficiency and simplicity. With a gradient-based bi-level optimization, DARTS alternately optimizes the inner model weights and the outer architecture parameter in a weight-sharing supernet. A key challenge to the scalability and quality of the learned architectures… ▽ More \textit{Differentiable ARchiTecture Search} (DARTS) has recently become the mainstream of neural architecture search (NAS) due to its efficiency and simplicity. With a gradient-based bi-level optimization, DARTS alternately optimizes the inner model weights and the outer architecture parameter in a weight-sharing supernet. A key challenge to the scalability and quality of the learned architectures is the need for differentiating through the inner-loop optimisation. While much has been discussed about several potentially fatal factors in DARTS, the architecture gradient, a.k.a. hypergradient, has received less attention. In this paper, we tackle the hypergradient computation in DARTS based on the implicit function theorem, making it only depends on the obtained solution to the inner-loop optimization and agnostic to the optimization path. To further reduce the computational requirements, we formulate a stochastic hypergradient approximation for differentiable NAS, and theoretically show that the architecture optimization with the proposed method, named iDARTS, is expected to converge to a stationary point. Comprehensive experiments on two NAS benchmark search spaces and the common NAS search space verify the effectiveness of our proposed method. It leads to architectures outperforming, with large margins, those learned by the baseline methods. △ Less

Submitted 20 June, 2021; originally announced June 2021.

Comments: ICML2021

arXiv:2105.05612 [pdf, other]

Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization

Authors: Damien Teney, Ehsan Abbasnejad, Simon Lucey, Anton van den Hengel

Abstract: Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features and can ignore complex, equally-predictive ones. This simplicity bias can explain their lack of robustness out of distribution (OOD). The more complex the task to learn, the more likely it is that statistical artifacts (i.e. selection biases, spurious correlations) are simpler than the mecha… ▽ More Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features and can ignore complex, equally-predictive ones. This simplicity bias can explain their lack of robustness out of distribution (OOD). The more complex the task to learn, the more likely it is that statistical artifacts (i.e. selection biases, spurious correlations) are simpler than the mechanisms to learn. We demonstrate that the simplicity bias can be mitigated and OOD generalization improved. We train a set of similar models to fit the data in different ways using a penalty on the alignment of their input gradients. We show theoretically and empirically that this induces the learning of more complex predictive patterns. OOD generalization fundamentally requires information beyond i.i.d. examples, such as multiple training environments, counterfactual examples, or other side information. Our approach shows that we can defer this requirement to an independent model selection stage. We obtain SOTA results in visual recognition on biased data and generalization across visual domains. The method - the first to evade the simplicity bias - highlights the need for a better understanding and control of inductive biases in deep learning. △ Less

Submitted 11 September, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

Comments: CVPR 2022

arXiv:2104.05248 [pdf, other]

All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grou** and Co-training

Authors: Islam Nassar, Samitha Herath, Ehsan Abbasnejad, Wray Buntine, Gholamreza Haffari

Abstract: Pseudo-labeling is a key component in semi-supervised learning (SSL). It relies on iteratively using the model to generate artificial labels for the unlabeled data to train against. A common property among its various methods is that they only rely on the model's prediction to make labeling decisions without considering any prior knowledge about the visual similarity among the classes. In this pap… ▽ More Pseudo-labeling is a key component in semi-supervised learning (SSL). It relies on iteratively using the model to generate artificial labels for the unlabeled data to train against. A common property among its various methods is that they only rely on the model's prediction to make labeling decisions without considering any prior knowledge about the visual similarity among the classes. In this paper, we demonstrate that this degrades the quality of pseudo-labeling as it poorly represents visually similar classes in the pool of pseudo-labeled data. We propose SemCo, a method which leverages label semantics and co-training to address this problem. We train two classifiers with two different views of the class labels: one classifier uses the one-hot view of the labels and disregards any potential similarity among the classes, while the other uses a distributed view of the labels and groups potentially similar classes together. We then co-train the two classifiers to learn based on their disagreements. We show that our method achieves state-of-the-art performance across various SSL tasks including 5.6% accuracy improvement on Mini-ImageNet dataset with 1000 labeled examples. We also show that our method requires smaller batch size and fewer training iterations to reach its best performance. We make our code available at https://github.com/islam-nassar/semco. △ Less

Submitted 12 April, 2021; originally announced April 2021.

Comments: Accepted in CVPR2021

arXiv:2103.00446 [pdf, other]

Learning for Visual Navigation by Imagining the Success

Authors: Mahdi Kazemi Moghaddam, Ehsan Abbasnejad, Qi Wu, Javen Shi, Anton Van Den Hengel

Abstract: Visual navigation is often cast as a reinforcement learning (RL) problem. Current methods typically result in a suboptimal policy that learns general obstacle avoidance and search behaviours. For example, in the target-object navigation setting, the policies learnt by traditional methods often fail to complete the task, even when the target is clearly within reach from a human perspective. In orde… ▽ More Visual navigation is often cast as a reinforcement learning (RL) problem. Current methods typically result in a suboptimal policy that learns general obstacle avoidance and search behaviours. For example, in the target-object navigation setting, the policies learnt by traditional methods often fail to complete the task, even when the target is clearly within reach from a human perspective. In order to address this issue, we propose to learn to imagine a latent representation of the successful (sub-)goal state. To do so, we have developed a module which we call Foresight Imagination (ForeSIT). ForeSIT is trained to imagine the recurrent latent representation of a future state that leads to success, e.g. either a sub-goal state that is important to reach before the target, or the goal state itself. By conditioning the policy on the generated imagination during training, our agent learns how to use this imagination to achieve its goal robustly. Our agent is able to imagine what the (sub-)goal state may look like (in the latent space) and can learn to navigate towards that state. We develop an efficient learning algorithm to train ForeSIT in an on-policy manner and integrate it into our RL objective. The integration is not trivial due to the constantly evolving state representation shared between both the imagination and the policy. We, empirically, observe that our method outperforms the state-of-the-art methods by a large margin in the commonly accepted benchmark AI2THOR environment. Our method can be readily integrated or added to other model-free RL navigation frameworks. △ Less

Submitted 28 February, 2021; originally announced March 2021.

arXiv:2005.09241 [pdf, other]

On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law

Authors: Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton van den Hengel

Abstract: Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set. OOD benchmarks are designed to present a different joint distribution of data and labels between training and test time. VQA-CP has become the standard OOD benchmark for visual question answering, but we discovered three troubling practices… ▽ More Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set. OOD benchmarks are designed to present a different joint distribution of data and labels between training and test time. VQA-CP has become the standard OOD benchmark for visual question answering, but we discovered three troubling practices in its current use. First, most published methods rely on explicit knowledge of the construction of the OOD splits. They often rely on ``inverting'' the distribution of labels, e.g. answering mostly 'yes' when the common training answer is 'no'. Second, the OOD test set is used for model selection. Third, a model's in-domain performance is assessed after retraining it on in-domain splits (VQA v2) that exhibit a more balanced distribution of labels. These three practices defeat the objective of evaluating generalization, and put into question the value of methods specifically designed for this dataset. We show that embarrassingly-simple methods, including one that generates answers at random, surpass the state of the art on some question types. We provide short- and long-term solutions to avoid these pitfalls and realize the benefits of OOD evaluation. △ Less

Submitted 19 May, 2020; originally announced May 2020.

arXiv:2004.12794 [pdf, other]

Hybrid Neuro-Evolutionary Method for Predicting Wind Turbine Power Output

Authors: Mehdi Neshat, Meysam Majidi Nezhad, Ehsan Abbasnejad, Daniele Groppi, Azim Heydari, Lina Bertling Tjernberg, Davide Astiaso Garcia, Bradley Alexander, Markus Wagner

Abstract: Reliable wind turbine power prediction is imperative to the planning, scheduling and control of wind energy farms for stable power production. In recent years Machine Learning (ML) methods have been successfully applied in a wide range of domains, including renewable energy. However, due to the challenging nature of power prediction in wind farms, current models are far short of the accuracy requi… ▽ More Reliable wind turbine power prediction is imperative to the planning, scheduling and control of wind energy farms for stable power production. In recent years Machine Learning (ML) methods have been successfully applied in a wide range of domains, including renewable energy. However, due to the challenging nature of power prediction in wind farms, current models are far short of the accuracy required by industry. In this paper, we deploy a composite ML approach--namely a hybrid neuro-evolutionary algorithm--for accurate forecasting of the power output in wind-turbine farms. We use historical data in the supervisory control and data acquisition (SCADA) systems as input to estimate the power output from an onshore wind farm in Sweden. At the beginning stage, the k-means clustering method and an Autoencoder are employed, respectively, to detect and filter noise in the SCADA measurements. Next, with the prior knowledge that the underlying wind patterns are highly non-linear and diverse, we combine a self-adaptive differential evolution (SaDE) algorithm as a hyper-parameter optimizer, and a recurrent neural network (RNN) called Long Short-term memory (LSTM) to model the power curve of a wind turbine in a farm. Two short time forecasting horizons, including ten-minutes ahead and one-hour ahead, are considered in our experiments. We show that our approach outperforms its counterparts. △ Less

Submitted 2 April, 2020; originally announced April 2020.

arXiv:2004.03222 [pdf, other]

Optimistic Agent: Accurate Graph-Based Value Estimation for More Successful Visual Navigation

Authors: Mahdi Kazemi Moghaddam, Qi Wu, Ehsan Abbasnejad, Javen Qinfeng Shi

Abstract: We humans can impeccably search for a target object, given its name only, even in an unseen environment. We argue that this ability is largely due to three main reasons: the incorporation of prior knowledge (or experience), the adaptation of it to the new environment using the observed visual cues and most importantly optimistically searching without giving up early. This is currently missing in t… ▽ More We humans can impeccably search for a target object, given its name only, even in an unseen environment. We argue that this ability is largely due to three main reasons: the incorporation of prior knowledge (or experience), the adaptation of it to the new environment using the observed visual cues and most importantly optimistically searching without giving up early. This is currently missing in the state-of-the-art visual navigation methods based on Reinforcement Learning (RL). In this paper, we propose to use externally learned prior knowledge of the relative object locations and integrate it into our model by constructing a neural graph. In order to efficiently incorporate the graph without increasing the state-space complexity, we propose our Graph-based Value Estimation (GVE) module. GVE provides a more accurate baseline for estimating the Advantage function in actor-critic RL algorithm. This results in reduced value estimation error and, consequently, convergence to a more optimal policy. Through empirical studies, we show that our agent, dubbed as the optimistic agent, has a more realistic estimate of the state value during a navigation episode which leads to a higher success rate. Our extensive ablation studies show the efficacy of our simple method which achieves the state-of-the-art results measured by the conventional visual navigation metrics, e.g. Success Rate (SR) and Success weighted by Path Length (SPL), in AI2THOR environment. △ Less

Submitted 6 December, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

Comments: Accepted for publication at WACV 2021

arXiv:2002.11894 [pdf, other]

Unshuffling Data for Improved Generalization

Authors: Damien Teney, Ehsan Abbasnejad, Anton van den Hengel

Abstract: Generalization beyond the training distribution is a core challenge in machine learning. The common practice of mixing and shuffling examples when training neural networks may not be optimal in this regard. We show that partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple training environments can guide the learning of models with better out-of-distribution generalization… ▽ More Generalization beyond the training distribution is a core challenge in machine learning. The common practice of mixing and shuffling examples when training neural networks may not be optimal in this regard. We show that partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple training environments can guide the learning of models with better out-of-distribution generalization. We describe a training procedure to capture the patterns that are stable across environments while discarding spurious ones. The method makes a step beyond correlation-based learning: the choice of the partitioning allows injecting information about the task that cannot be otherwise recovered from the joint distribution of the training data. We demonstrate multiple use cases with the task of visual question answering, which is notorious for dataset biases. We obtain significant improvements on VQA-CP, using environments built from prior knowledge, existing meta data, or unsupervised clustering. We also get improvements on GQA using annotations of "equivalent questions", and on multi-dataset training (VQA v2 / Visual Genome) by treating them as distinct environments. △ Less

Submitted 20 November, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

arXiv:2002.09106 [pdf, other]

An Evolutionary Deep Learning Method for Short-term Wind Speed Prediction: A Case Study of the Lillgrund Offshore Wind Farm

Authors: Mehdi Neshat, Meysam Majidi Nezhad, Ehsan Abbasnejad, Lina Bertling Tjernberg, Davide Astiaso Garcia, Bradley Alexander, Markus Wagner

Abstract: Accurate short-term wind speed forecasting is essential for large-scale integration of wind power generation. However, the seasonal and stochastic characteristics of wind speed make forecasting a challenging task. This study uses a new hybrid evolutionary approach that uses a popular evolutionary search algorithm, CMA-ES, to tune the hyper-parameters of two Long short-term memory(LSTM) ANN models… ▽ More Accurate short-term wind speed forecasting is essential for large-scale integration of wind power generation. However, the seasonal and stochastic characteristics of wind speed make forecasting a challenging task. This study uses a new hybrid evolutionary approach that uses a popular evolutionary search algorithm, CMA-ES, to tune the hyper-parameters of two Long short-term memory(LSTM) ANN models for wind prediction. The proposed hybrid approach is trained on data gathered from an offshore wind turbine installed in a Swedish wind farm located in the Baltic Sea. Two forecasting horizons including ten-minutes ahead (absolute short term) and one-hour ahead (short term) are considered in our experiments. Our experimental results indicate that the new approach is superior to five other applied machine learning models, i.e., polynomial neural network (PNN), feed-forward neural network (FNN), nonlinear autoregressive neural network (NAR) and adaptive neuro-fuzzy inference system (ANFIS), as measured by five performance criteria. △ Less

Submitted 20 February, 2020; originally announced February 2020.

arXiv:1909.13471 [pdf, other]

On Incorporating Semantic Prior Knowledge in Deep Learning Through Embedding-Space Constraints

Authors: Damien Teney, Ehsan Abbasnejad, Anton van den Hengel

Abstract: The knowledge that humans hold about a problem often extends far beyond a set of training data and output labels. While the success of deep learning mostly relies on supervised training, important properties cannot be inferred efficiently from end-to-end annotations alone, for example causal relations or domain-specific invariances. We present a general technique to supplement supervised training… ▽ More The knowledge that humans hold about a problem often extends far beyond a set of training data and output labels. While the success of deep learning mostly relies on supervised training, important properties cannot be inferred efficiently from end-to-end annotations alone, for example causal relations or domain-specific invariances. We present a general technique to supplement supervised training with prior knowledge expressed as relations between training instances. We illustrate the method on the task of visual question answering to exploit various auxiliary annotations, including relations of equivalence and of logical entailment between questions. Existing methods to use these annotations, including auxiliary losses and data augmentation, cannot guarantee the strict inclusion of these relations into the model since they require a careful balancing against the end-to-end objective. Our method uses these relations to shape the embedding space of the model, and treats them as strict constraints on its learned representations. In the context of VQA, this approach brings significant improvements in accuracy and robustness, in particular over the common practice of incorporating the constraints as a soft regularizer. We also show that incorporating this type of prior knowledge with our method brings consistent improvements, independently from the amount of supervised data used. It demonstrates the value of an additional training signal that is otherwise difficult to extract from end-to-end annotations alone. △ Less

Submitted 16 November, 2019; v1 submitted 30 September, 2019; originally announced September 2019.

arXiv:1908.03369 [pdf, other]

doi 10.1145/3427228.3427264

Februus: Input Purification Defense Against Trojan Attacks on Deep Neural Network Systems

Authors: Bao Gia Doan, Ehsan Abbasnejad, Damith C. Ranasinghe

Abstract: We propose Februus; a new idea to neutralize highly potent and insidious Trojan attacks on Deep Neural Network (DNN) systems at run-time. In Trojan attacks, an adversary activates a backdoor crafted in a deep neural network model using a secret trigger, a Trojan, applied to any input to alter the model's decision to a target prediction---a target determined by and only known to the attacker. Febru… ▽ More We propose Februus; a new idea to neutralize highly potent and insidious Trojan attacks on Deep Neural Network (DNN) systems at run-time. In Trojan attacks, an adversary activates a backdoor crafted in a deep neural network model using a secret trigger, a Trojan, applied to any input to alter the model's decision to a target prediction---a target determined by and only known to the attacker. Februus sanitizes the incoming input by surgically removing the potential trigger artifacts and restoring the input for the classification task. Februus enables effective Trojan mitigation by sanitizing inputs with no loss of performance for sanitized inputs, Trojaned or benign. Our extensive evaluations on multiple infected models based on four popular datasets across three contrasting vision applications and trigger types demonstrate the high efficacy of Februus. We dramatically reduced attack success rates from 100% to near 0% for all cases (achieving 0% on multiple cases) and evaluated the generalizability of Februus to defend against complex adaptive attacks; notably, we realized the first defense against the advanced partial Trojan attack. To the best of our knowledge, Februus is the first backdoor defense method for operation at run-time capable of sanitizing Trojaned inputs without requiring anomaly detection methods, model retraining or costly labeled data. △ Less

Submitted 28 September, 2020; v1 submitted 9 August, 2019; originally announced August 2019.

Comments: 16 pages, to appear in the 36th Annual Computer Security Applications Conference (ACSAC 2020)

Journal ref: In the 36th Annual Computer Security Applications Conference (ACSAC 2020)

arXiv:1907.03076 [pdf, other]

doi 10.1007/978-3-030-36711-4_30

Adaptive Neuro-Surrogate-Based Optimisation Method for Wave Energy Converters Placement Optimisation

Authors: Mehdi Neshat, Ehsan Abbasnejad, Qinfeng Shi, Bradley Alexander, Markus Wagner

Abstract: The installed amount of renewable energy has expanded massively in recent years. Wave energy, with its high capacity factors has great potential to complement established sources of solar and wind energy. This study explores the problem of optimising the layout of advanced, three-tether wave energy converters in a size-constrained farm in a numerically modelled ocean environment. Simulating and co… ▽ More The installed amount of renewable energy has expanded massively in recent years. Wave energy, with its high capacity factors has great potential to complement established sources of solar and wind energy. This study explores the problem of optimising the layout of advanced, three-tether wave energy converters in a size-constrained farm in a numerically modelled ocean environment. Simulating and computing the complicated hydrodynamic interactions in wave farms can be computationally costly, which limits optimisation methods to have just a few thousand evaluations. For dealing with this expensive optimisation problem, an adaptive neuro-surrogate optimisation (ANSO) method is proposed that consists of a surrogate Recurrent Neural Network (RNN) model trained with a very limited number of observations. This model is coupled with a fast meta-heuristic optimiser for adjusting the model's hyper-parameters. The trained model is applied using a greedy local search with a backtracking optimisation strategy. For evaluating the performance of the proposed approach, some of the more popular and successful Evolutionary Algorithms (EAs) are compared in four real wave scenarios (Sydney, Perth, Adelaide and Tasmania). Experimental results show that the adaptive neuro model is competitive with other optimisation methods in terms of total harnessed power output and faster in terms of total computational costs. △ Less

Submitted 8 July, 2019; v1 submitted 6 July, 2019; originally announced July 2019.

Comments: 12 pages, 2 tables, 6 figures

Journal ref: International Conference on Neural Information Processing,2019,pp. 353-366. Springer

arXiv:1906.09433 [pdf, other]

Deep Single Image Deraining Via Estimating Transmission and Atmospheric Light in rainy Scenes

Authors: Yinglong Wang, Qinfeng Shi, Ehsan Abbasnejad, Chao Ma, ** Ma, Bing Zeng

Abstract: Rain removal in images/videos is still an important task in computer vision field and attracting attentions of more and more people. Traditional methods always utilize some incomplete priors or filters (e.g. guided filter) to remove rain effect. Deep learning gives more probabilities to better solve this task. However, they remove rain either by evaluating background from rainy image directly or l… ▽ More Rain removal in images/videos is still an important task in computer vision field and attracting attentions of more and more people. Traditional methods always utilize some incomplete priors or filters (e.g. guided filter) to remove rain effect. Deep learning gives more probabilities to better solve this task. However, they remove rain either by evaluating background from rainy image directly or learning a rain residual first then subtracting the residual to obtain a clear background. No other models are used in deep learning based de-raining methods to remove rain and obtain other information about rainy scenes. In this paper, we utilize an extensively-used image degradation model which is derived from atmospheric scattering principles to model the formation of rainy images and try to learn the transmission, atmospheric light in rainy scenes and remove rain further. To reach this goal, we propose a robust evaluation method of global atmospheric light in a rainy scene. Instead of using the estimated atmospheric light directly to learn a network to calculate transmission, we utilize it as ground truth and design a simple but novel triangle-shaped network structure to learn atmospheric light for every rainy image, then fine-tune the network to obtain a better estimation of atmospheric light during the training of transmission network. Furthermore, more efficient ShuffleNet Units are utilized in transmission network to learn transmission map and the de-raining image is then obtained by the image degradation model. By subjective and objective comparisons, our method outperforms the selected state-of-the-art works. △ Less

Submitted 22 June, 2019; originally announced June 2019.

Comments: 10 pages, 9 figures, 3 tables

arXiv:1905.04430 [pdf, other]

Follow the Attention: Combining Partial Pose and Object Motion for Fine-Grained Action Detection

Authors: Mohammad Mahdi Kazemi Moghaddam, Ehsan Abbasnejad, Javen Shi

Abstract: Retailers have long been searching for ways to effectively understand their customers' behaviour in order to provide a smooth and pleasant shop** experience that attracts more customers everyday and maximises their revenue, consequently. Humans can flawlessly understand others' behaviour by combining different visual cues from activity to gestures and facial expressions. Empowering the computer… ▽ More Retailers have long been searching for ways to effectively understand their customers' behaviour in order to provide a smooth and pleasant shop** experience that attracts more customers everyday and maximises their revenue, consequently. Humans can flawlessly understand others' behaviour by combining different visual cues from activity to gestures and facial expressions. Empowering the computer vision systems to do so, however, is still an open problem due to its intrinsic challenges as well as extrinsic enforced difficulties like lack of publicly available data and unique environment conditions (wild). In this work, We emphasise on detecting the first and by far the most crucial cue in behaviour analysis; that is human activity detection in computer vision. To do so, we introduce a framework for integrating human pose and object motion to both temporally detect and classify the activities in a fine-grained manner (very short and similar activities). We incorporate partial human pose and interaction with the objects in a multi-stream neural network architecture to guide the spatiotemporal attention mechanism for more efficient activity recognition. To this end, in the absence of pose supervision, we propose to use the Generative Adversarial Network (GAN) to generate exact joint locations from noisy probability heat maps. Additionally, based on the intuition that complex actions demand more than one source of information to be identified even by humans, we integrate the second stream of object motion to our network as a prior knowledge that we quantitatively show improves the recognition results. We empirically show the capability of our approach by achieving state-of-the-art results on MERL shop** dataset. We further investigate the effectiveness of this approach on a new shop** dataset that we have collected to address existing shortcomings. △ Less

Submitted 26 June, 2019; v1 submitted 10 May, 2019; originally announced May 2019.

arXiv:1905.03721 [pdf, other]

doi 10.1109/TMM.2021.3065169

Show, Price and Negotiate: A Negotiator with Online Value Look-Ahead

Authors: Amin Parvaneh, Ehsan Abbasnejad, Qi Wu, Javen Qinfeng Shi, Anton van den Hengel

Abstract: Negotiation, as an essential and complicated aspect of online shop**, is still challenging for an intelligent agent. To that end, we propose the Price Negotiator, a modular deep neural network that addresses the unsolved problems in recent studies by (1) considering images of the items as a crucial, though neglected, source of information in a negotiation, (2) heuristically finding the most simi… ▽ More Negotiation, as an essential and complicated aspect of online shop**, is still challenging for an intelligent agent. To that end, we propose the Price Negotiator, a modular deep neural network that addresses the unsolved problems in recent studies by (1) considering images of the items as a crucial, though neglected, source of information in a negotiation, (2) heuristically finding the most similar items from an external online source to predict the potential value and an acceptable agreement price, (3) predicting a general price-based action at each turn which is fed into the language generator to output the supporting natural language, and (4) adjusting the prices based on the predicted actions. Empirically, we show that our model, that is trained in both supervised and reinforcement learning setting, significantly improves negotiation on the CraigslistBargain dataset, in terms of the agreement price, price consistency, and dialogue quality. △ Less

Submitted 12 March, 2021; v1 submitted 7 May, 2019; originally announced May 2019.

Comments: published in IEEE Transactions on Multimedia

arXiv:1904.03367 [pdf, other]

Reinforcement Learning with Attention that Works: A Self-Supervised Approach

Authors: Anthony Manchin, Ehsan Abbasnejad, Anton van den Hengel

Abstract: Attention models have had a significant positive impact on deep learning across a range of tasks. However previous attempts at integrating attention with reinforcement learning have failed to produce significant improvements. We propose the first combination of self attention and reinforcement learning that is capable of producing significant improvements, including new state of the art results in… ▽ More Attention models have had a significant positive impact on deep learning across a range of tasks. However previous attempts at integrating attention with reinforcement learning have failed to produce significant improvements. We propose the first combination of self attention and reinforcement learning that is capable of producing significant improvements, including new state of the art results in the Arcade Learning Environment. Unlike the selective attention models used in previous attempts, which constrain the attention via preconceived notions of importance, our implementation utilises the Markovian properties inherent in the state input. Our method produces a faithful visualisation of the policy, focusing on the behaviour of the agent. Our experiments demonstrate that the trained policies use multiple simultaneous foci of attention, and are able to modulate attention over time to deal with situations of partial observability. △ Less

Submitted 6 April, 2019; originally announced April 2019.

arXiv:1812.06401 [pdf, other]

What's to know? Uncertainty as a Guide to Asking Goal-oriented Questions

Authors: Ehsan Abbasnejad, Qi Wu, Javen Shi, Anton van den Hengel

Abstract: One of the core challenges in Visual Dialogue problems is asking the question that will provide the most useful information towards achieving the required objective. Encouraging an agent to ask the right questions is difficult because we don't know a-priori what information the agent will need to achieve its task, and we don't have an explicit model of what it knows already. We propose a solution… ▽ More One of the core challenges in Visual Dialogue problems is asking the question that will provide the most useful information towards achieving the required objective. Encouraging an agent to ask the right questions is difficult because we don't know a-priori what information the agent will need to achieve its task, and we don't have an explicit model of what it knows already. We propose a solution to this problem based on a Bayesian model of the uncertainty in the implicit model maintained by the visual dialogue agent, and in the function used to select an appropriate output. By selecting the question that minimises the predicted regret with respect to this implicit model the agent actively reduces ambiguity. The Bayesian model of uncertainty also enables a principled method for identifying when enough information has been acquired, and an action should be selected. We evaluate our approach on two goal-oriented dialogue datasets, one for visual-based collaboration task and the other for a negotiation-based task. Our uncertainty-aware information-seeking model outperforms its counterparts in these two challenging problems. △ Less

Submitted 16 December, 2018; originally announced December 2018.

arXiv:1812.06398 [pdf, other]

Gold Seeker: Information Gain from Policy Distributions for Goal-oriented Vision-and-Langauge Reasoning

Authors: Ehsan Abbasnejad, Iman Abbasnejad, Qi Wu, Javen Shi, Anton van den Hengel

Abstract: As Computer Vision moves from a passive analysis of pixels to active analysis of semantics, the breadth of information algorithms need to reason over has expanded significantly. One of the key challenges in this vein is the ability to identify the information required to make a decision, and select an action that will recover it. We propose a reinforcement-learning approach that maintains a distri… ▽ More As Computer Vision moves from a passive analysis of pixels to active analysis of semantics, the breadth of information algorithms need to reason over has expanded significantly. One of the key challenges in this vein is the ability to identify the information required to make a decision, and select an action that will recover it. We propose a reinforcement-learning approach that maintains a distribution over its internal information, thus explicitly representing the ambiguity in what it knows, and needs to know, towards achieving its goal. Potential actions are then generated according to this distribution. For each potential action a distribution of the expected outcomes is calculated, and the value of the potential information gain assessed. The action taken is that which maximizes the potential information gain. We demonstrate this approach applied to two vision-and-language problems that have attracted significant recent interest, visual dialog and visual query generation. In both cases, the method actively selects actions that will best reduce its internal uncertainty and outperforms its competitors in achieving the goal of the challenge. △ Less

Submitted 29 March, 2020; v1 submitted 16 December, 2018; originally announced December 2018.

arXiv:1811.08127 [pdf, other]

Deep Auto-Set: A Deep Auto-Encoder-Set Network for Activity Recognition Using Wearables

Authors: Alireza Abedin Varamin, Ehsan Abbasnejad, Qinfeng Shi, Damith Ranasinghe, Hamid Rezatofighi

Abstract: Automatic recognition of human activities from time-series sensor data (referred to as HAR) is a growing area of research in ubiquitous computing. Most recent research in the field adopts supervised deep learning paradigms to automate extraction of intrinsic features from raw signal inputs and addresses HAR as a multi-class classification problem where detecting a single activity class within the… ▽ More Automatic recognition of human activities from time-series sensor data (referred to as HAR) is a growing area of research in ubiquitous computing. Most recent research in the field adopts supervised deep learning paradigms to automate extraction of intrinsic features from raw signal inputs and addresses HAR as a multi-class classification problem where detecting a single activity class within the duration of a sensory data segment suffices. However, due to the innate diversity of human activities and their corresponding duration, no data segment is guaranteed to contain sensor recordings of a single activity type. In this paper, we express HAR more naturally as a set prediction problem where the predictions are sets of ongoing activity elements with unfixed and unknown cardinality. For the first time, we address this problem by presenting a novel HAR approach that learns to output activity sets using deep neural networks. Moreover, motivated by the limited availability of annotated HAR datasets as well as the unfortunate immaturity of existing unsupervised systems, we complement our supervised set learning scheme with a prior unsupervised feature learning process that adopts convolutional auto-encoders to exploit unlabeled data. The empirical experiments on two widely adopted HAR datasets demonstrate the substantial improvement of our proposed methodology over the baseline models. △ Less

Submitted 20 November, 2018; originally announced November 2018.

Comments: Accepted at MobiQuitous 2018

arXiv:1807.05380 [pdf, other]

3D Hand Pose Estimation using Simulation and Partial-Supervision with a Shared Latent Space

Authors: Masoud Abdi, Ehsan Abbasnejad, Chee Peng Lim, Saeid Nahavandi

Abstract: Tremendous amounts of expensive annotated data are a vital ingredient for state-of-the-art 3d hand pose estimation. Therefore, synthetic data has been popularized as annotations are automatically available. However, models trained only with synthetic samples do not generalize to real data, mainly due to the gap between the distribution of synthetic and real data. In this paper, we propose a novel… ▽ More Tremendous amounts of expensive annotated data are a vital ingredient for state-of-the-art 3d hand pose estimation. Therefore, synthetic data has been popularized as annotations are automatically available. However, models trained only with synthetic samples do not generalize to real data, mainly due to the gap between the distribution of synthetic and real data. In this paper, we propose a novel method that seeks to predict the 3d position of the hand using both synthetic and partially-labeled real data. Accordingly, we form a shared latent space between three modalities: synthetic depth image, real depth image, and pose. We demonstrate that by carefully learning the shared latent space, we can find a regression model that is able to generalize to real data. As such, we show that our method produces accurate predictions in both semi-supervised and unsupervised settings. Additionally, the proposed model is capable of generating novel, meaningful, and consistent samples from all of the three domains. We evaluate our method qualitatively and quantitively on two highly competitive benchmarks (i.e., NYU and ICVL) and demonstrate its superiority over the state-of-the-art methods. The source code will be made available at https://github.com/masabdi/LSPS. △ Less

Submitted 14 July, 2018; originally announced July 2018.

Comments: Oral presentation at British Machine Vision Conference (BMVC) 2018

arXiv:1706.05477 [pdf, other]

Bayesian Conditional Generative Adverserial Networks

Authors: M. Ehsan Abbasnejad, Qinfeng Shi, Iman Abbasnejad, Anton van den Hengel, Anthony Dick

Abstract: Traditional GANs use a deterministic generator function (typically a neural network) to transform a random noise input $z$ to a sample $\mathbf{x}$ that the discriminator seeks to distinguish. We propose a new GAN called Bayesian Conditional Generative Adversarial Networks (BC-GANs) that use a random generator function to transform a deterministic input $y'$ to a sample $\mathbf{x}$. Our BC-GANs e… ▽ More Traditional GANs use a deterministic generator function (typically a neural network) to transform a random noise input $z$ to a sample $\mathbf{x}$ that the discriminator seeks to distinguish. We propose a new GAN called Bayesian Conditional Generative Adversarial Networks (BC-GANs) that use a random generator function to transform a deterministic input $y'$ to a sample $\mathbf{x}$. Our BC-GANs extend traditional GANs to a Bayesian framework, and naturally handle unsupervised learning, supervised learning, and semi-supervised learning problems. Experiments show that the proposed BC-GANs outperforms the state-of-the-arts. △ Less

Submitted 17 June, 2017; originally announced June 2017.

arXiv:1611.08998 [pdf, other]

DeepSetNet: Predicting Sets with Deep Neural Networks

Authors: S. Hamid Rezatofighi, Vijay Kumar B G, Anton Milan, Ehsan Abbasnejad, Anthony Dick, Ian Reid

Abstract: This paper addresses the task of set prediction using deep learning. This is important because the output of many computer vision tasks, including image tagging and object detection, are naturally expressed as sets of entities rather than vectors. As opposed to a vector, the size of a set is not fixed in advance, and it is invariant to the ordering of entities within it. We define a likelihood for… ▽ More This paper addresses the task of set prediction using deep learning. This is important because the output of many computer vision tasks, including image tagging and object detection, are naturally expressed as sets of entities rather than vectors. As opposed to a vector, the size of a set is not fixed in advance, and it is invariant to the ordering of entities within it. We define a likelihood for a set distribution and learn its parameters using a deep neural network. We also derive a loss for predicting a discrete distribution corresponding to set cardinality. Set prediction is demonstrated on the problem of multi-class image classification. Moreover, we show that the proposed cardinality loss can also trivially be applied to the tasks of object counting and pedestrian detection. Our approach outperforms existing methods in all three cases on standard datasets. △ Less

Submitted 10 August, 2017; v1 submitted 28 November, 2016; originally announced November 2016.

Comments: Accepted in IEEE International Conference on Computer Vision (ICCV), Venice, 2017, (Spotlight)

arXiv:1611.07800 [pdf, other]

Infinite Variational Autoencoder for Semi-Supervised Learning

Authors: Ehsan Abbasnejad, Anthony Dick, Anton van den Hengel

Abstract: This paper presents an infinite variational autoencoder (VAE) whose capacity adapts to suit the input data. This is achieved using a mixture model where the mixing coefficients are modeled by a Dirichlet process, allowing us to integrate over the coefficients when performing inference. Critically, this then allows us to automatically vary the number of autoencoders in the mixture based on the data… ▽ More This paper presents an infinite variational autoencoder (VAE) whose capacity adapts to suit the input data. This is achieved using a mixture model where the mixing coefficients are modeled by a Dirichlet process, allowing us to integrate over the coefficients when performing inference. Critically, this then allows us to automatically vary the number of autoencoders in the mixture based on the data. Experiments show the flexibility of our method, particularly for semi-supervised learning, where only a small number of training samples are available. △ Less

Submitted 23 November, 2016; v1 submitted 23 November, 2016; originally announced November 2016.

Showing 1–50 of 51 results for author: Abbasnejad, E