Search | arXiv e-print repository

arXiv:2406.19675 [pdf]

Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey

Authors: Uchitha Rajapaksha, Ferdous Sohel, Hamid Laga, Dean Diepeveen, Mohammed Bennamoun

Abstract: Estimating depth from single RGB images and videos is of widespread interest due to its applications in many areas, including autonomous driving, 3D reconstruction, digital entertainment, and robotics. More than 500 deep learning-based papers have been published in the past 10 years, which indicates the growing interest in the task. This paper presents a comprehensive survey of the existing deep l… ▽ More Estimating depth from single RGB images and videos is of widespread interest due to its applications in many areas, including autonomous driving, 3D reconstruction, digital entertainment, and robotics. More than 500 deep learning-based papers have been published in the past 10 years, which indicates the growing interest in the task. This paper presents a comprehensive survey of the existing deep learning-based methods, the challenges they address, and how they have evolved in their architecture and supervision methods. It provides a taxonomy for classifying the current work based on their input and output modalities, network architectures, and learning methods. It also discusses the major milestones in the history of monocular depth estimation, and different pipelines, datasets, and evaluation metrics used in existing methods. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 46 pages, 10 figures, The paper has been accepted for publication in ACM Computing Surveys 2024

ACM Class: I.2.10; I.4; I.5.1; I.4.8

arXiv:2403.01156 [pdf, other]

Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation

Authors: Lian Xu, Mohammed Bennamoun, Farid Boussaid, Wanli Ouyang, Ferdous Sohel, Dan Xu

Abstract: Most existing weakly supervised semantic segmentation (WSSS) methods rely on Class Activation Map** (CAM) to extract coarse class-specific localization maps using image-level labels. Prior works have commonly used an off-line heuristic thresholding process that combines the CAM maps with off-the-shelf saliency maps produced by a general pre-trained saliency model to produce more accurate pseudo-… ▽ More Most existing weakly supervised semantic segmentation (WSSS) methods rely on Class Activation Map** (CAM) to extract coarse class-specific localization maps using image-level labels. Prior works have commonly used an off-line heuristic thresholding process that combines the CAM maps with off-the-shelf saliency maps produced by a general pre-trained saliency model to produce more accurate pseudo-segmentation labels. We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from these saliency maps and the significant inter-task correlation between saliency detection and semantic segmentation. In the proposed AuxSegNet+, saliency detection and multi-label image classification are used as auxiliary tasks to improve the primary task of semantic segmentation with only image-level ground-truth labels. We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps. In particular, we propose a cross-task dual-affinity learning module to learn both pairwise and unary affinities, which are used to enhance the task-specific features and predictions by aggregating both query-dependent and query-independent global context for both saliency detection and semantic segmentation. The learned cross-task pairwise affinity can also be used to refine and propagate CAM maps to provide better pseudo labels for both tasks. Iterative improvement of segmentation performance is enabled by cross-task affinity learning and pseudo-label updating. Extensive experiments demonstrate the effectiveness of the proposed approach with new state-of-the-art WSSS results on the challenging PASCAL VOC and MS COCO benchmarks. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: Accepted at IEEE Transactions on Neural Networks and Learning Systems. arXiv admin note: substantial text overlap with arXiv:2107.11787

arXiv:2402.01271 [pdf, other]

An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec

Authors: Lin** Xu, Jiawei Jiang, Dejun Zhang, Xianjun Xia, Li Chen, Yijian Xiao, Piao Ding, Shenyi Song, Sixing Yin, Ferdous Sohel

Abstract: Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleave… ▽ More Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleaved structure using 1D-CNN and Intra-BRNN is designed to exploit the intra-frame correlations more efficiently. Furthermore, Group-wise and Beam-search Residual Vector Quantizer (GB-RVQ) is used to reduce the quantization noise. CBRC encodes audio every 20ms with no additional latency, which is suitable for real-time communication. Experimental results demonstrate the superiority of the proposed codec when comparing CBRC at 3kbps with Opus at 12kbps. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: INTERSPEECH 2023

arXiv:2202.01975 [pdf]

Performance of multilabel machine learning models and risk stratification schemas for predicting stroke and bleeding risk in patients with non-valvular atrial fibrillation

Authors: Juan Lu, Rebecca Hutchens, Joseph Hung, Mohammed Bennamoun, Brendan McQuillan, Tom Briffa, Ferdous Sohel, Kevin Murray, Jonathon Stewart, Benjamin Chow, Frank Sanfilippo, Girish Dwivedi

Abstract: Appropriate antithrombotic therapy for patients with atrial fibrillation (AF) requires assessment of ischemic stroke and bleeding risks. However, risk stratification schemas such as CHA2DS2-VASc and HAS-BLED have modest predictive capacity for patients with AF. Machine learning (ML) techniques may improve predictive performance and support decision-making for appropriate antithrombotic therapy. We… ▽ More Appropriate antithrombotic therapy for patients with atrial fibrillation (AF) requires assessment of ischemic stroke and bleeding risks. However, risk stratification schemas such as CHA2DS2-VASc and HAS-BLED have modest predictive capacity for patients with AF. Machine learning (ML) techniques may improve predictive performance and support decision-making for appropriate antithrombotic therapy. We compared the performance of multilabel ML models with the currently used risk scores for predicting outcomes in AF patients. Materials and Methods This was a retrospective cohort study of 9670 patients, mean age 76.9 years, 46% women, who were hospitalized with non-valvular AF, and had 1-year follow-up. The primary outcome was ischemic stroke and major bleeding admission. The secondary outcomes were all-cause death and event-free survival. The discriminant power of ML models was compared with clinical risk scores by the area under the curve (AUC). Risk stratification was assessed using the net reclassification index. Results Multilabel gradient boosting machine provided the best discriminant power for stroke, major bleeding, and death (AUC = 0.685, 0.709, and 0.765 respectively) compared to other ML models. It provided modest performance improvement for stroke compared to CHA2DS2-VASc (AUC = 0.652), but significantly improved major bleeding prediction compared to HAS-BLED (AUC = 0.522). It also had a much greater discriminant power for death compared with CHA2DS2-VASc (AUC = 0.606). Also, models identified additional risk features (such as hemoglobin level, renal function, etc.) for each outcome. Conclusions Multilabel ML models can outperform clinical risk stratification scores for predicting the risk of major bleeding and death in non-valvular AF patients. △ Less

Submitted 2 February, 2022; originally announced February 2022.

arXiv:2112.07819 [pdf, other]

Weed Recognition using Deep Learning Techniques on Class-imbalanced Imagery

Authors: A S M Mahmudul Hasan, Ferdous Sohel, Dean Diepeveen, Hamid Laga, Michael G. K. Jones

Abstract: Most weed species can adversely impact agricultural productivity by competing for nutrients required by high-value crops. Manual weeding is not practical for large crop** areas. Many studies have been undertaken to develop automatic weed management systems for agricultural crops. In this process, one of the major tasks is to recognise the weeds from images. However, weed recognition is a challen… ▽ More Most weed species can adversely impact agricultural productivity by competing for nutrients required by high-value crops. Manual weeding is not practical for large crop** areas. Many studies have been undertaken to develop automatic weed management systems for agricultural crops. In this process, one of the major tasks is to recognise the weeds from images. However, weed recognition is a challenging task. It is because weed and crop plants can be similar in colour, texture and shape which can be exacerbated further by the imaging conditions, geographic or weather conditions when the images are recorded. Advanced machine learning techniques can be used to recognise weeds from imagery. In this paper, we have investigated five state-of-the-art deep neural networks, namely VGG16, ResNet-50, Inception-V3, Inception-ResNet-v2 and MobileNetV2, and evaluated their performance for weed recognition. We have used several experimental settings and multiple dataset combinations. In particular, we constructed a large weed-crop dataset by combining several smaller datasets, mitigating class imbalance by data augmentation, and using this dataset in benchmarking the deep neural networks. We investigated the use of transfer learning techniques by preserving the pre-trained weights for extracting the features and fine-tuning them using the images of crop and weed datasets. We found that VGG16 performed better than others on small-scale datasets, while ResNet-50 performed better than other deep networks on the large combined dataset. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: The paper is accepted by Crop and Pasture Science journal (https://www.publish.csiro.au/CP/justaccepted/CP21626)

arXiv:2110.05906 [pdf, other]

Energy-cost aware off-grid base stations with IoT devices for develo** a green heterogeneous network

Authors: Khondoker Ziaul Islam, MD. Sanwar Hossain, B. M. Ruhul Amin, Ferdous Sohel

Abstract: Heterogeneous network (HetNet) is a specified cellular platform to tackle the rapidly growing anticipated data traffic. From communications perspective, data loads can be mapped to energy loads that are generally placed on the operator networks. Meanwhile, renewable energy aided networks offer to curtail fossil fuel consumption, so to reduce environmental pollution. This paper proposes a renewable… ▽ More Heterogeneous network (HetNet) is a specified cellular platform to tackle the rapidly growing anticipated data traffic. From communications perspective, data loads can be mapped to energy loads that are generally placed on the operator networks. Meanwhile, renewable energy aided networks offer to curtail fossil fuel consumption, so to reduce environmental pollution. This paper proposes a renewable energy based power supply architecture for off-grid HetNet using a novel energy sharing model. Solar photovoltaic (PV) along with sufficient energy storage devices are used for each macro, micro, pico, or femto base station (BS). Additionally, biomass generator (BG) is used for macro and micro BSs. The collocated macro and micro BSs are connected through end-to-end resistive lines. A novel weighted proportional-fair resource-scheduling algorithm with sleep mechanisms is proposed for non-real time (NRT) applications by trading-off the power consumption and communication delays. Furthermore, the proposed algorithm with extended discontinuous reception (eDRX) and power saving mode (PSM) for narrowband internet of things (IoT) applications extends battery lifetime for IoT devices. HOMER optimization software is used to perform optimal system architecture, economic, and carbon footprint analyses while Monte-Carlo simulation tool is used for evaluating the throughput and energy efficiency performances. The proposed algorithms are valid for the practical data of the rural areas. We demonstrate the proposed power supply architecture is energy-efficient, cost-effective, reliable, and eco-friendly. △ Less

Submitted 12 October, 2021; originally announced October 2021.

arXiv:2110.00899 [pdf, other]

Anti-aliasing Deep Image Classifiers using Novel Depth Adaptive Blurring and Activation Function

Authors: Md Tahmid Hossain, Shyh Wei Teng, Ferdous Sohel, Guojun Lu

Abstract: Deep convolutional networks are vulnerable to image translation or shift, partly due to common down-sampling layers, e.g., max-pooling and strided convolution. These operations violate the Nyquist sampling rate and cause aliasing. The textbook solution is low-pass filtering (blurring) before down-sampling, which can benefit deep networks as well. Even so, non-linearity units, such as ReLU, often r… ▽ More Deep convolutional networks are vulnerable to image translation or shift, partly due to common down-sampling layers, e.g., max-pooling and strided convolution. These operations violate the Nyquist sampling rate and cause aliasing. The textbook solution is low-pass filtering (blurring) before down-sampling, which can benefit deep networks as well. Even so, non-linearity units, such as ReLU, often re-introduce the problem, suggesting that blurring alone may not suffice. In this work, first, we analyse deep features with Fourier transform and show that Depth Adaptive Blurring is more effective, as opposed to monotonic blurring. To this end, we outline how this can replace existing down-sampling methods. Second, we introduce a novel activation function -- with a built-in low pass filter, to keep the problem from reappearing. From experiments, we observe generalisation on other forms of transformations and corruptions as well, e.g., rotation, scale, and noise. We evaluate our method under three challenging settings: (1) a variety of image translations; (2) adversarial attacks -- both $\ell_{p}$ bounded and unbounded; and (3) data corruptions and perturbations. In each setting, our method achieves state-of-the-art results and improves clean accuracy on various benchmark datasets. △ Less

Submitted 2 October, 2021; originally announced October 2021.

arXiv:2109.12756 [pdf, other]

A novel network training approach for open set image recognition

Authors: Md Tahmid Hossain, Shyh Wei Teng, Guojun Lu, Ferdous Sohel

Abstract: Convolutional Neural Networks (CNNs) are commonly designed for closed set arrangements, where test instances only belong to some "Known Known" (KK) classes used in training. As such, they predict a class label for a test sample based on the distribution of the KK classes. However, when used under the Open Set Recognition (OSR) setup (where an input may belong to an "Unknown Unknown" or UU class),… ▽ More Convolutional Neural Networks (CNNs) are commonly designed for closed set arrangements, where test instances only belong to some "Known Known" (KK) classes used in training. As such, they predict a class label for a test sample based on the distribution of the KK classes. However, when used under the Open Set Recognition (OSR) setup (where an input may belong to an "Unknown Unknown" or UU class), such a network will always classify a test instance as one of the KK classes even if it is from a UU class. As a solution, recently, data augmentation based on Generative Adversarial Networks(GAN) has been used. In this work, we propose a novel approach for mining a "Known UnknownTrainer" or KUT set and design a deep OSR Network (OSRNet) to harness this dataset. The goal isto teach OSRNet the essence of the UUs through KUT set, which is effectively a collection of mined "hard Known Unknown negatives". Once trained, OSRNet can detect the UUs while maintaining high classification accuracy on KKs. We evaluate OSRNet on six benchmark datasets and demonstrate it outperforms contemporary OSR methods. △ Less

Submitted 26 September, 2021; originally announced September 2021.

arXiv:2107.11787 [pdf, other]

Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation

Authors: Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid Boussaid, Ferdous Sohel, Dan Xu

Abstract: Semantic segmentation is a challenging task in the absence of densely labelled data. Only relying on class activation maps (CAM) with image-level labels provides deficient segmentation supervision. Prior works thus consider pre-trained models to produce coarse saliency maps to guide the generation of pseudo segmentation labels. However, the commonly used off-line heuristic generation process canno… ▽ More Semantic segmentation is a challenging task in the absence of densely labelled data. Only relying on class activation maps (CAM) with image-level labels provides deficient segmentation supervision. Prior works thus consider pre-trained models to produce coarse saliency maps to guide the generation of pseudo segmentation labels. However, the commonly used off-line heuristic generation process cannot fully exploit the benefits of these coarse saliency maps. Motivated by the significant inter-task correlation, we propose a novel weakly supervised multi-task framework termed as AuxSegNet, to leverage saliency detection and multi-label image classification as auxiliary tasks to improve the primary task of semantic segmentation using only image-level ground-truth labels. Inspired by their similar structured semantics, we also propose to learn a cross-task global pixel-level affinity map from the saliency and segmentation representations. The learned cross-task affinity can be used to refine saliency predictions and propagate CAM maps to provide improved pseudo labels for both tasks. The mutual boost between pseudo label updating and cross-task affinity learning enables iterative improvements on segmentation performance. Extensive experiments demonstrate the effectiveness of the proposed auxiliary learning network structure and the cross-task affinity learning method. The proposed approach achieves state-of-the-art weakly supervised segmentation performance on the challenging PASCAL VOC 2012 and MS COCO benchmarks. △ Less

Submitted 26 July, 2021; v1 submitted 25 July, 2021; originally announced July 2021.

Comments: Accepted at ICCV 2021

arXiv:2103.01415 [pdf, other]

A Survey of Deep Learning Techniques for Weed Detection from Images

Authors: A S M Mahmudul Hasan, Ferdous Sohel, Dean Diepeveen, Hamid Laga, Michael G. K. Jones

Abstract: The rapid advances in Deep Learning (DL) techniques have enabled rapid detection, localisation, and recognition of objects from images or videos. DL techniques are now being used in many applications related to agriculture and farming. Automatic detection and classification of weeds can play an important role in weed management and so contribute to higher yields. Weed detection in crops from image… ▽ More The rapid advances in Deep Learning (DL) techniques have enabled rapid detection, localisation, and recognition of objects from images or videos. DL techniques are now being used in many applications related to agriculture and farming. Automatic detection and classification of weeds can play an important role in weed management and so contribute to higher yields. Weed detection in crops from imagery is inherently a challenging problem because both weeds and crops have similar colours ('green-on-green'), and their shapes and texture can be very similar at the growth phase. Also, a crop in one setting can be considered a weed in another. In addition to their detection, the recognition of specific weed species is essential so that targeted controlling mechanisms (e.g. appropriate herbicides and correct doses) can be applied. In this paper, we review existing deep learning-based weed detection and classification techniques. We cover the detailed literature on four main procedures, i.e., data acquisition, dataset preparation, DL techniques employed for detection, location and classification of weeds in crops, and evaluation metrics approaches. We found that most studies applied supervised learning techniques, they achieved high classification accuracy by fine-tuning pre-trained models on any plant dataset, and past experiments have already achieved high accuracy when a large amount of labelled data is available. △ Less

Submitted 1 March, 2021; originally announced March 2021.

arXiv:2101.02141 [pdf, other]

Integrated Generalized Zero-Shot Learning for Fine-Grained Classification

Authors: Tasfia Shermin, Shyh Wei Teng, Ferdous Sohel, Manzur Murshed, Guojun Lu

Abstract: Embedding learning (EL) and feature synthesizing (FS) are two of the popular categories of fine-grained GZSL methods. EL or FS using global features cannot discriminate fine details in the absence of local features. On the other hand, EL or FS methods exploiting local features either neglect direct attribute guidance or global information. Consequently, neither method performs well. In this paper,… ▽ More Embedding learning (EL) and feature synthesizing (FS) are two of the popular categories of fine-grained GZSL methods. EL or FS using global features cannot discriminate fine details in the absence of local features. On the other hand, EL or FS methods exploiting local features either neglect direct attribute guidance or global information. Consequently, neither method performs well. In this paper, we propose to explore global and direct attribute-supervised local visual features for both EL and FS categories in an integrated manner for fine-grained GZSL. The proposed integrated network has an EL sub-network and a FS sub-network. Consequently, the proposed integrated network can be tested in two ways. We propose a novel two-step dense attention mechanism to discover attribute-guided local visual features. We introduce new mutual learning between the sub-networks to exploit mutually beneficial information for optimization. Moreover, we propose to compute source-target class similarity based on mutual information and transfer-learn the target classes to reduce bias towards the source domain during testing. We demonstrate that our proposed method outperforms contemporary methods on benchmark datasets. △ Less

Submitted 15 August, 2021; v1 submitted 31 December, 2020; originally announced January 2021.

Comments: Accepted in Pattern Recognition

Journal ref: Pattern Recognition, 2021

arXiv:2012.15054 [pdf, other]

Bidirectional Map** Coupled GAN for Generalized Zero-Shot Learning

Authors: Tasfia Shermin, Shyh Wei Teng, Ferdous Sohel, Manzur Murshed, Guojun Lu

Abstract: Bidirectional map**-based generalized zero-shot learning (GZSL) methods rely on the quality of synthesized features to recognize seen and unseen data. Therefore, learning a joint distribution of seen-unseen domains and preserving domain distinction is crucial for these methods. However, existing methods only learn the underlying distribution of seen data, although unseen class semantics are avai… ▽ More Bidirectional map**-based generalized zero-shot learning (GZSL) methods rely on the quality of synthesized features to recognize seen and unseen data. Therefore, learning a joint distribution of seen-unseen domains and preserving domain distinction is crucial for these methods. However, existing methods only learn the underlying distribution of seen data, although unseen class semantics are available in the GZSL problem setting. Most methods neglect retaining domain distinction and use the learned distribution to recognize seen and unseen data. Consequently, they do not perform well. In this work, we utilize the available unseen class semantics alongside seen class semantics and learn joint distribution through a strong visual-semantic coupling. We propose a bidirectional map** coupled generative adversarial network (BMCoGAN) by extending the coupled generative adversarial network into a dual-domain learning bidirectional map** model. We further integrate a Wasserstein generative adversarial optimization to supervise the joint distribution learning. We design a loss optimization for retaining domain distinctive information in the synthesized features and reducing bias towards seen classes, which pushes synthesized seen features towards real seen features and pulls synthesized unseen features away from real seen features. We evaluate BMCoGAN on benchmark datasets and demonstrate its superior performance against contemporary methods. △ Less

Submitted 19 February, 2021; v1 submitted 30 December, 2020; originally announced December 2020.

arXiv:2012.00220 [pdf, other]

Imputation of Missing Data with Class Imbalance using Conditional Generative Adversarial Networks

Authors: Saqib Ejaz Awan, Mohammed Bennamoun, Ferdous Sohel, Frank M Sanfilippo, Girish Dwivedi

Abstract: Missing data is a common problem faced with real-world datasets. Imputation is a widely used technique to estimate the missing data. State-of-the-art imputation approaches, such as Generative Adversarial Imputation Nets (GAIN), model the distribution of observed data to approximate the missing values. Such an approach usually models a single distribution for the entire dataset, which overlooks the… ▽ More Missing data is a common problem faced with real-world datasets. Imputation is a widely used technique to estimate the missing data. State-of-the-art imputation approaches, such as Generative Adversarial Imputation Nets (GAIN), model the distribution of observed data to approximate the missing values. Such an approach usually models a single distribution for the entire dataset, which overlooks the class-specific characteristics of the data. Class-specific characteristics are especially useful when there is a class imbalance. We propose a new method for imputing missing data based on its class-specific characteristics by adapting the popular Conditional Generative Adversarial Networks (CGAN). Our Conditional Generative Adversarial Imputation Network (CGAIN) imputes the missing data using class-specific distributions, which can produce the best estimates for the missing values. We tested our approach on benchmark datasets and achieved superior performance compared with the state-of-the-art and popular imputation approaches. △ Less

Submitted 30 November, 2020; originally announced December 2020.

arXiv:2009.07532 [pdf]

RCNN for Region of Interest Detection in Whole Slide Images

Authors: A Nugaliyadde, Kok Wai Wong, Jeremy Parry, Ferdous Sohel, Hamid Laga, Upeka V. Somaratne, Chris Yeomans, Orchid Foster

Abstract: Digital pathology has attracted significant attention in recent years. Analysis of Whole Slide Images (WSIs) is challenging because they are very large, i.e., of Giga-pixel resolution. Identifying Regions of Interest (ROIs) is the first step for pathologists to analyse further the regions of diagnostic interest for cancer detection and other anomalies. In this paper, we investigate the use of RCNN… ▽ More Digital pathology has attracted significant attention in recent years. Analysis of Whole Slide Images (WSIs) is challenging because they are very large, i.e., of Giga-pixel resolution. Identifying Regions of Interest (ROIs) is the first step for pathologists to analyse further the regions of diagnostic interest for cancer detection and other anomalies. In this paper, we investigate the use of RCNN, which is a deep machine learning technique, for detecting such ROIs only using a small number of labelled WSIs for training. For experimentation, we used real WSIs from a public hospital pathology service in Western Australia. We used 60 WSIs for training the RCNN model and another 12 WSIs for testing. The model was further tested on a new set of unseen WSIs. The results show that RCNN can be effectively used for ROI detection from WSIs. △ Less

Submitted 17 September, 2020; v1 submitted 16 September, 2020; originally announced September 2020.

Comments: This paper was accepted to the 27th International Conference on Neural Information Processing (ICONIP 2020) and will be published in the Springer CCIS Series

arXiv:2007.09453 [pdf, other]

doi 10.1109/ACCESS.2021.3089598

Robust Image Classification Using A Low-Pass Activation Function and DCT Augmentation

Authors: Md Tahmid Hossain, Shyh Wei Teng, Ferdous Sohel, Guojun Lu

Abstract: Convolutional Neural Network's (CNN's) performance disparity on clean and corrupted datasets has recently come under scrutiny. In this work, we analyse common corruptions in the frequency domain, i.e., High Frequency corruptions (HFc, e.g., noise) and Low Frequency corruptions (LFc, e.g., blur). Although a simple solution to HFc is low-pass filtering, ReLU -- a widely used Activation Function (AF)… ▽ More Convolutional Neural Network's (CNN's) performance disparity on clean and corrupted datasets has recently come under scrutiny. In this work, we analyse common corruptions in the frequency domain, i.e., High Frequency corruptions (HFc, e.g., noise) and Low Frequency corruptions (LFc, e.g., blur). Although a simple solution to HFc is low-pass filtering, ReLU -- a widely used Activation Function (AF), does not have any filtering mechanism. In this work, we instill low-pass filtering into the AF (LP-ReLU) to improve robustness against HFc. To deal with LFc, we complement LP-ReLU with Discrete Cosine Transform based augmentation. LP-ReLU, coupled with DCT augmentation, enables a deep network to tackle the entire spectrum of corruption. We use CIFAR-10-C and Tiny ImageNet-C for evaluation and demonstrate improvements of 5% and 7.3% in accuracy respectively, compared to the State-Of-The-Art (SOTA). We further evaluate our method's stability on a variety of perturbations in CIFAR-10-P and Tiny ImageNet-P, achieving new SOTA in these experiments as well. To further strengthen our understanding regarding CNN's lack of robustness, a decision space visualisation process is proposed and presented in this work. △ Less

Submitted 12 June, 2021; v1 submitted 18 July, 2020; originally announced July 2020.

arXiv:2007.00384 [pdf, other]

doi 10.1109/TMM.2020.3016126

Adversarial Network with Multiple Classifiers for Open Set Domain Adaptation

Authors: Tasfia Shermin, Guojun Lu, Shyh Wei Teng, Manzur Murshed, Ferdous Sohel

Abstract: Domain adaptation aims to transfer knowledge from a domain with adequate labeled samples to a domain with scarce labeled samples. Prior research has introduced various open set domain adaptation settings in the literature to extend the applications of domain adaptation methods in real-world scenarios. This paper focuses on the type of open set domain adaptation setting where the target domain has… ▽ More Domain adaptation aims to transfer knowledge from a domain with adequate labeled samples to a domain with scarce labeled samples. Prior research has introduced various open set domain adaptation settings in the literature to extend the applications of domain adaptation methods in real-world scenarios. This paper focuses on the type of open set domain adaptation setting where the target domain has both private ('unknown classes') label space and the shared ('known classes') label space. However, the source domain only has the 'known classes' label space. Prevalent distribution-matching domain adaptation methods are inadequate in such a setting that demands adaptation from a smaller source domain to a larger and diverse target domain with more classes. For addressing this specific open set domain adaptation setting, prior research introduces a domain adversarial model that uses a fixed threshold for distinguishing known from unknown target samples and lacks at handling negative transfers. We extend their adversarial model and propose a novel adversarial domain adaptation model with multiple auxiliary classifiers. The proposed multi-classifier structure introduces a weighting module that evaluates distinctive domain characteristics for assigning the target samples with weights which are more representative to whether they are likely to belong to the known and unknown classes to encourage positive transfers during adversarial training and simultaneously reduces the domain gap between the shared classes of the source and target domains. A thorough experimental investigation shows that our proposed method outperforms existing domain adaptation methods on a number of domain adaptation datasets. △ Less

Submitted 7 August, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

Comments: Accepted in IEEE Transactions on Multimedia (in press), 2020

Journal ref: IEEE Transactions on Multimedia, 2020 (CODE: https://github.com/tasfia/DAMC)

arXiv:2005.14502 [pdf, other]

Unconstrained Matching of 2D and 3D Descriptors for 6-DOF Pose Estimation

Authors: Uzair Nadeem, Mohammed Bennamoun, Roberto Togneri, Ferdous Sohel

Abstract: This paper proposes a novel concept to directly match feature descriptors extracted from 2D images with feature descriptors extracted from 3D point clouds. We use this concept to directly localize images in a 3D point cloud. We generate a dataset of matching 2D and 3D points and their corresponding feature descriptors, which is used to learn a Descriptor-Matcher classifier. To localize the pose of… ▽ More This paper proposes a novel concept to directly match feature descriptors extracted from 2D images with feature descriptors extracted from 3D point clouds. We use this concept to directly localize images in a 3D point cloud. We generate a dataset of matching 2D and 3D points and their corresponding feature descriptors, which is used to learn a Descriptor-Matcher classifier. To localize the pose of an image at test time, we extract keypoints and feature descriptors from the query image. The trained Descriptor-Matcher is then used to match the features from the image and the point cloud. The locations of the matched features are used in a robust pose estimation algorithm to predict the location and orientation of the query image. We carried out an extensive evaluation of the proposed method for indoor and outdoor scenarios and with different types of point clouds to verify the feasibility of our approach. Experimental results demonstrate that direct matching of feature descriptors from images and point clouds is not only a viable idea but can also be reliably used to estimate the 6-DOF poses of query cameras in any type of 3D point cloud in an unconstrained manner with high precision. △ Less

Submitted 29 May, 2020; originally announced May 2020.

arXiv:1912.10373 [pdf]

Robust Pose Invariant Shape and Texture based Hand Recognition

Authors: F. Sohel, A. El-Sallam, M. Bennamoun

Abstract: This paper presents a novel personal identification and verification system using information extracted from the hand shape and texture. The system has two major constituent modules: a fully automatic and robust peg free segmentation and pose normalisation module, and a recognition module. In the first module, the hand is segmented from its background using a thresholding technique based on Otsu`s… ▽ More This paper presents a novel personal identification and verification system using information extracted from the hand shape and texture. The system has two major constituent modules: a fully automatic and robust peg free segmentation and pose normalisation module, and a recognition module. In the first module, the hand is segmented from its background using a thresholding technique based on Otsu`s method combined with a skin colour detector. A set of fully automatic algorithms are then proposed to segment the palm and fingers. In these algorithms, the skeleton and the contour of the hand and fingers are estimated and used to determine the global pose of the hand and the pose of each individual finger. Finally the palm and fingers are cropped, pose corrected and normalised. In the recognition module, various shape and texture based features are extracted and used for matching purposes. The modified Hausdorff distance, the Iterative Closest Point (ICP) and Independent Component Analysis (ICA) algorithms are used for shape and texture features of the fingers. For the palmprints, we use the Discrete Cosine Transform (DCT), directional line features and ICA. Recognition (identification and verification) tests were performed using fusion strategies based on the similarity scores of the fingers and the palm. Experimental results show that the proposed system exhibits a superior performance over existing systems with an accuracy of over 98\% for hand identification and verification (at equal error rate) in a database of 560 different subjects. △ Less

Submitted 21 December, 2019; originally announced December 2019.

arXiv:1906.10881 [pdf, other]

doi 10.3390/s20020447

Automatic Hierarchical Classification of Kelps using Deep Residual Features

Authors: Ammar Mahmood, Ana Giraldo Ospina, Mohammed Bennamoun, Senjian An, Ferdous Sohel, Farid Boussaid, Renae Hovey, Robert B. Fisher, Gary Kendrick

Abstract: Across the globe, remote image data is rapidly being collected for the assessment of benthic communities from shallow to extremely deep waters on continental slopes to the abyssal seas. Exploiting this data is presently limited by the time it takes for experts to identify organisms found in these images. With this limitation in mind, a large effort has been made globally to introduce automation an… ▽ More Across the globe, remote image data is rapidly being collected for the assessment of benthic communities from shallow to extremely deep waters on continental slopes to the abyssal seas. Exploiting this data is presently limited by the time it takes for experts to identify organisms found in these images. With this limitation in mind, a large effort has been made globally to introduce automation and machine learning algorithms to accelerate both classification and assessment of marine benthic biota. One major issue lies with organisms that move with swell and currents, like kelps. This paper presents an automatic hierarchical classification method (local binary classification as opposed to the conventional flat classification) to classify kelps in images collected by autonomous underwater vehicles. The proposed kelp classification approach exploits learned feature representations extracted from deep residual networks. We show that these generic features outperform the traditional off-the-shelf CNN features and the conventional hand-crafted features. Experiments also demonstrate that the hierarchical classification method outperforms the traditional parallel multi-class classifications by a significant margin (90.0% vs 57.6% and 77.2% vs 59.0%) on Benthoz15 and Rottnest datasets respectively. Furthermore, we compare different hierarchical classification approaches and experimentally show that the sibling hierarchical training approach outperforms the inclusive hierarchical approach by a significant margin. We also report an application of our proposed method to study the change in kelp cover over time for annually repeated AUV surveys. △ Less

Submitted 23 January, 2020; v1 submitted 26 June, 2019; originally announced June 2019.

Comments: MDPI Sensors

Journal ref: Sensors 2020, 20, 447

arXiv:1906.06064 [pdf, other]

Direct Image to Point Cloud Descriptors Matching for 6-DOF Camera Localization in Dense 3D Point Cloud

Authors: Uzair Nadeem, Mohammad A. A. K. Jalwana, Mohammed Bennamoun, Roberto Togneri, Ferdous Sohel

Abstract: We propose a novel concept to directly match feature descriptors extracted from RGB images, with feature descriptors extracted from 3D point clouds. We use this concept to localize the position and orientation (pose) of the camera of a query image in dense point clouds. We generate a dataset of matching 2D and 3D descriptors, and use it to train a proposed Descriptor-Matcher algorithm. To localize… ▽ More We propose a novel concept to directly match feature descriptors extracted from RGB images, with feature descriptors extracted from 3D point clouds. We use this concept to localize the position and orientation (pose) of the camera of a query image in dense point clouds. We generate a dataset of matching 2D and 3D descriptors, and use it to train a proposed Descriptor-Matcher algorithm. To localize a query image in a point cloud, we extract 2D keypoints and descriptors from the query image. Then the Descriptor-Matcher is used to find the corresponding pairs 2D and 3D keypoints by matching the 2D descriptors with the pre-extracted 3D descriptors of the point cloud. This information is used in a robust pose estimation algorithm to localize the query image in the 3D point cloud. Experiments demonstrate that directly matching 2D and 3D descriptors is not only a viable idea but also achieves competitive accuracy compared to other state-of-the-art approaches for camera pose localization. △ Less

Submitted 14 June, 2019; originally announced June 2019.

arXiv:1904.08936 [pdf, other]

Language Modeling through Long Term Memory Network

Authors: Anupiya Nugaliyadde, Kok Wai Wong, Ferdous Sohel, Hong Xie

Abstract: Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM), and Memory Networks which contain memory are popularly used to learn patterns in sequential data. Sequential data has long sequences that hold relationships. RNN can handle long sequences but suffers from the vanishing and exploding gradient problems. While LSTM and other memory networks address this problem, they are not cap… ▽ More Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM), and Memory Networks which contain memory are popularly used to learn patterns in sequential data. Sequential data has long sequences that hold relationships. RNN can handle long sequences but suffers from the vanishing and exploding gradient problems. While LSTM and other memory networks address this problem, they are not capable of handling long sequences (50 or more data points long sequence patterns). Language modelling requiring learning from longer sequences are affected by the need for more information in memory. This paper introduces Long Term Memory network (LTM), which can tackle the exploding and vanishing gradient problems and handles long sequences without forgetting. LTM is designed to scale data in the memory and gives a higher weight to the input in the sequence. LTM avoid overfitting by scaling the cell state after achieving the optimal results. The LTM is tested on Penn treebank dataset, and Text8 dataset and LTM achieves test perplexities of 83 and 82 respectively. 650 LTM cells achieved a test perplexity of 67 for Penn treebank, and 600 cells achieved a test perplexity of 77 for Text8. LTM achieves state of the art results by only using ten hidden LTM cells for both datasets. △ Less

Submitted 18 April, 2019; originally announced April 2019.

Comments: The paper is accepted to be published in IJCNN 2019

arXiv:1903.10150 [pdf, other]

Enhanced Transfer Learning with ImageNet Trained Classification Layer

Authors: Tasfia Shermin, Shyh Wei Teng, Manzur Murshed, Guojun Lu, Ferdous Sohel, Manoranjan Paul

Abstract: Parameter fine tuning is a transfer learning approach whereby learned parameters from pre-trained source network are transferred to the target network followed by fine-tuning. Prior research has shown that this approach is capable of improving task performance. However, the impact of the ImageNet pre-trained classification layer in parameter fine-tuning is mostly unexplored in the literature. In t… ▽ More Parameter fine tuning is a transfer learning approach whereby learned parameters from pre-trained source network are transferred to the target network followed by fine-tuning. Prior research has shown that this approach is capable of improving task performance. However, the impact of the ImageNet pre-trained classification layer in parameter fine-tuning is mostly unexplored in the literature. In this paper, we propose a fine-tuning approach with the pre-trained classification layer. We employ layer-wise fine-tuning to determine which layers should be frozen for optimal performance. Our empirical analysis demonstrates that the proposed fine-tuning performs better than traditional fine-tuning. This finding indicates that the pre-trained classification layer holds less category-specific or more global information than believed earlier. Thus, we hypothesize that the presence of this layer is crucial for growing network depth to adapt better to a new task. Our study manifests that careful normalization and scaling are essential for creating harmony between the pre-trained and new layers for target domain adaptation. We evaluate the proposed depth augmented networks for fine-tuning on several challenging benchmark datasets and show that they can achieve higher classification accuracy than contemporary transfer learning approaches. △ Less

Submitted 19 September, 2019; v1 submitted 25 March, 2019; originally announced March 2019.

Comments: 14 pages

arXiv:1901.07176 [pdf]

Enhancing Semantic Word Representations by Embedding Deeper Word Relationships

Authors: Anupiya Nugaliyadde, Kok Wai Wong, Ferdous Sohel, Hong Xie

Abstract: Word representations are created using analogy context-based statistics and lexical relations on words. Word representations are inputs for the learning models in Natural Language Understanding (NLU) tasks. However, to understand language, knowing only the context is not sufficient. Reading between the lines is a key component of NLU. Embedding deeper word relationships which are not represented i… ▽ More Word representations are created using analogy context-based statistics and lexical relations on words. Word representations are inputs for the learning models in Natural Language Understanding (NLU) tasks. However, to understand language, knowing only the context is not sufficient. Reading between the lines is a key component of NLU. Embedding deeper word relationships which are not represented in the context enhances the word representation. This paper presents a word embedding which combines an analogy, context-based statistics using Word2Vec, and deeper word relationships using Conceptnet, to create an expanded word representation. In order to fine-tune the word representation, Self-Organizing Map is used to optimize it. The proposed word representation is compared with semantic word representations using Simlex 999. Furthermore, the use of 3D visual representations has shown to be capable of representing the similarity and association between words. The proposed word representation shows a Spearman correlation score of 0.886 and provided the best results when compared to the current state-of-the-art methods, and exceed the human performance of 0.78. △ Less

Submitted 22 January, 2019; originally announced January 2019.

Comments: Accepted for the International Conference on Computer and Automation Engineering (ICCAE) 2019

arXiv:1810.04020 [pdf, other]

A Comprehensive Survey of Deep Learning for Image Captioning

Authors: Md. Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin, Hamid Laga

Abstract: Generating a description of an image is called image captioning. Image captioning requires to recognize the important objects, their attributes and their relationships in an image. It also needs to generate syntactically and semantically correct sentences. Deep learning-based techniques are capable of handling the complexities and challenges of image captioning. In this survey paper, we aim to pre… ▽ More Generating a description of an image is called image captioning. Image captioning requires to recognize the important objects, their attributes and their relationships in an image. It also needs to generate syntactically and semantically correct sentences. Deep learning-based techniques are capable of handling the complexities and challenges of image captioning. In this survey paper, we aim to present a comprehensive review of existing deep learning-based image captioning techniques. We discuss the foundation of the techniques to analyze their performances, strengths and limitations. We also discuss the datasets and the evaluation metrics popularly used in deep learning based automatic image captioning. △ Less

Submitted 14 October, 2018; v1 submitted 6 October, 2018; originally announced October 2018.

Comments: 36 Pages, Accepted as a Journal Paper in ACM Computing Surveys (October 2018)

arXiv:1803.09470 [pdf, other]

Real Time Surveillance for Low Resolution and Limited-Data Scenarios: An Image Set Classification Approach

Authors: Uzair Nadeem, Syed Afaq Ali Shah, Mohammed Bennamoun, Roberto Togneri, Ferdous Sohel

Abstract: This paper proposes a novel image set classification technique based on the concept of linear regression. Unlike most other approaches, the proposed technique does not involve any training or feature extraction. The gallery image sets are represented as subspaces in a high dimensional space. Class specific gallery subspaces are used to estimate regression models for each image of the test image se… ▽ More This paper proposes a novel image set classification technique based on the concept of linear regression. Unlike most other approaches, the proposed technique does not involve any training or feature extraction. The gallery image sets are represented as subspaces in a high dimensional space. Class specific gallery subspaces are used to estimate regression models for each image of the test image set. Images of the test set are then projected on the gallery subspaces. Residuals, calculated using the Euclidean distance between the original and the projected test images, are used as the distance metric. Three different strategies are devised to decide on the final class of the test image set. We performed extensive evaluations of the proposed technique under the challenges of low resolution, noise and less gallery data for the tasks of surveillance, video-based face recognition and object recognition. Experiments show that the proposed technique achieves a better classification accuracy and a faster execution time compared to existing techniques especially under the challenging conditions of low resolution and small gallery and test data. △ Less

Submitted 3 March, 2019; v1 submitted 26 March, 2018; originally announced March 2018.

arXiv:1711.05627 [pdf, ps, other]

Exploiting Layerwise Convexity of Rectifier Networks with Sign Constrained Weights

Authors: Senjian An, Farid Boussaid, Mohammed Bennamoun, Ferdous Sohel

Abstract: By introducing sign constraints on the weights, this paper proposes sign constrained rectifier networks (SCRNs), whose training can be solved efficiently by the well known majorization-minimization (MM) algorithms. We prove that the proposed two-hidden-layer SCRNs, which exhibit negative weights in the second hidden layer and negative weights in the output layer, are capable of separating any two… ▽ More By introducing sign constraints on the weights, this paper proposes sign constrained rectifier networks (SCRNs), whose training can be solved efficiently by the well known majorization-minimization (MM) algorithms. We prove that the proposed two-hidden-layer SCRNs, which exhibit negative weights in the second hidden layer and negative weights in the output layer, are capable of separating any two (or more) disjoint pattern sets. Furthermore, the proposed two-hidden-layer SCRNs can decompose the patterns of each class into several clusters so that each cluster is convexly separable from all the patterns from the other classes. This provides a means to learn the pattern structures and analyse the discriminant factors between different classes of patterns. △ Less

Submitted 14 November, 2017; originally announced November 2017.

Comments: 11 pages

arXiv:1703.03492 [pdf, other]

doi 10.1109/CVPR.2017.486

A New Representation of Skeleton Sequences for 3D Action Recognition

Authors: Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, Farid Boussaid

Abstract: This paper presents a new method for 3D action recognition with skeleton sequences (i.e., 3D trajectories of human skeleton joints). The proposed method first transforms each skeleton sequence into three clips each consisting of several frames for spatial temporal feature learning using deep neural networks. Each clip is generated from one channel of the cylindrical coordinates of the skeleton seq… ▽ More This paper presents a new method for 3D action recognition with skeleton sequences (i.e., 3D trajectories of human skeleton joints). The proposed method first transforms each skeleton sequence into three clips each consisting of several frames for spatial temporal feature learning using deep neural networks. Each clip is generated from one channel of the cylindrical coordinates of the skeleton sequence. Each frame of the generated clips represents the temporal information of the entire skeleton sequence, and incorporates one particular spatial relationship between the joints. The entire clips include multiple frames with different spatial relationships, which provide useful spatial structural information of the human skeleton. We propose to use deep convolutional neural networks to learn long-term temporal information of the skeleton sequence from the frames of the generated clips, and then use a Multi-Task Learning Network (MTLN) to jointly process all frames of the generated clips in parallel to incorporate spatial structural information for action recognition. Experimental results clearly show the effectiveness of the proposed new representation and feature learning method for 3D action recognition. △ Less

Submitted 4 June, 2017; v1 submitted 9 March, 2017; originally announced March 2017.

Comments: CVPR 2017

arXiv:1701.02485 [pdf, other]

Efficient Image Set Classification using Linear Regression based Image Reconstruction

Authors: Syed Afaq Ali Shah, Uzair Nadeem, Mohammed Bennamoun, Ferdous Sohel, Roberto Togneri

Abstract: We propose a novel image set classification technique using linear regression models. Downsampled gallery image sets are interpreted as subspaces of a high dimensional space to avoid the computationally expensive training step. We estimate regression models for each test image using the class specific gallery subspaces. Images of the test set are then reconstructed using the regression models. Bas… ▽ More We propose a novel image set classification technique using linear regression models. Downsampled gallery image sets are interpreted as subspaces of a high dimensional space to avoid the computationally expensive training step. We estimate regression models for each test image using the class specific gallery subspaces. Images of the test set are then reconstructed using the regression models. Based on the minimum reconstruction error between the reconstructed and the original images, a weighted voting strategy is used to classify the test set. We performed extensive evaluation on the benchmark UCSD/Honda, CMU Mobo and YouTube Celebrity datasets for face classification, and ETH-80 dataset for object classification. The results demonstrate that by using only a small amount of training data, our technique achieved competitive classification accuracy and superior computational speed compared with the state-of-the-art methods. △ Less

Submitted 10 January, 2017; originally announced January 2017.

arXiv:1611.06656 [pdf, other]

doi 10.1109/ICIP.2017.8296551

ResFeats: Residual Network Based Features for Image Classification

Authors: Ammar Mahmood, Mohammed Bennamoun, Senjian An, Ferdous Sohel

Abstract: Deep residual networks have recently emerged as the state-of-the-art architecture in image segmentation and object detection. In this paper, we propose new image features (called ResFeats) extracted from the last convolutional layer of deep residual networks pre-trained on ImageNet. We propose to use ResFeats for diverse image classification tasks namely, object classification, scene classificatio… ▽ More Deep residual networks have recently emerged as the state-of-the-art architecture in image segmentation and object detection. In this paper, we propose new image features (called ResFeats) extracted from the last convolutional layer of deep residual networks pre-trained on ImageNet. We propose to use ResFeats for diverse image classification tasks namely, object classification, scene classification and coral classification and show that ResFeats consistently perform better than their CNN counterparts on these classification tasks. Since the ResFeats are large feature vectors, we propose to use PCA for dimensionality reduction. Experimental results are provided to show the effectiveness of ResFeats with state-of-the-art classification accuracies on Caltech-101, Caltech-256 and MLC datasets and a significant performance improvement on MIT-67 dataset compared to the widely used CNN features. △ Less

Submitted 21 November, 2016; originally announced November 2016.

Journal ref: A. Mahmood, M. Bennamoun, S. An and F. Sohel, "Resfeats: Residual network based features for image classification," 2017 IEEE International Conference on Image Processing (ICIP), Bei**g, 2017, pp. 1597-1601

arXiv:1608.05267 [pdf, other]

doi 10.1109/TMM.2017.2778559

Leveraging Structural Context Models and Ranking Score Fusion for Human Interaction Prediction

Authors: Qiuhong Ke, Mohammed Bennamoun, Senjian An, Farid Bossaid, Ferdous Sohel

Abstract: Predicting an interaction before it is fully executed is very important in applications such as human-robot interaction and video surveillance. In a two-human interaction scenario, there often contextual dependency structure between the global interaction context of the two humans and the local context of the different body parts of each human. In this paper, we propose to learn the structure of t… ▽ More Predicting an interaction before it is fully executed is very important in applications such as human-robot interaction and video surveillance. In a two-human interaction scenario, there often contextual dependency structure between the global interaction context of the two humans and the local context of the different body parts of each human. In this paper, we propose to learn the structure of the interaction contexts, and combine it with the spatial and temporal information of a video sequence for a better prediction of the interaction class. The structural models, including the spatial and the temporal models, are learned with Long Short Term Memory (LSTM) networks to capture the dependency of the global and local contexts of each RGB frame and each optical flow image, respectively. LSTM networks are also capable of detecting the key information from the global and local interaction contexts. Moreover, to effectively combine the structural models with the spatial and temporal models for interaction prediction, a ranking score fusion method is also introduced to automatically compute the optimal weight of each model for score fusion. Experimental results on the BIT Interaction and the UT-Interaction datasets clearly demonstrate the benefits of the proposed method. △ Less

Submitted 7 June, 2017; v1 submitted 18 August, 2016; originally announced August 2016.

arXiv:1606.02009 [pdf, other]

Learning deep structured network for weakly supervised change detection

Authors: Salman H Khan, Xuming He, Fatih Porikli, Mohammed Bennamoun, Ferdous Sohel, Roberto Togneri

Abstract: Conventional change detection methods require a large number of images to learn background models or depend on tedious pixel-level labeling by humans. In this paper, we present a weakly supervised approach that needs only image-level labels to simultaneously detect and localize changes in a pair of images. To this end, we employ a deep neural network with DAG topology to learn patterns of change f… ▽ More Conventional change detection methods require a large number of images to learn background models or depend on tedious pixel-level labeling by humans. In this paper, we present a weakly supervised approach that needs only image-level labels to simultaneously detect and localize changes in a pair of images. To this end, we employ a deep neural network with DAG topology to learn patterns of change from image-level labeled training data. On top of the initial CNN activations, we define a CRF model to incorporate the local differences and context with the dense connections between individual pixels. We apply a constrained mean-field algorithm to estimate the pixel-level labels, and use the estimated labels to update the parameters of the CNN in an iterative EM framework. This enables imposing global constraints on the observed foreground probability mass function. Our evaluations on four benchmark datasets demonstrate superior detection and localization performance. △ Less

Submitted 22 May, 2017; v1 submitted 6 June, 2016; originally announced June 2016.

arXiv:1508.03422 [pdf, other]

Cost Sensitive Learning of Deep Feature Representations from Imbalanced Data

Authors: Salman H. Khan, Munawar Hayat, Mohammed Bennamoun, Ferdous Sohel, Roberto Togneri

Abstract: Class imbalance is a common problem in the case of real-world object detection and classification tasks. Data of some classes is abundant making them an over-represented majority, and data of other classes is scarce, making them an under-represented minority. This imbalance makes it challenging for a classifier to appropriately learn the discriminating boundaries of the majority and minority class… ▽ More Class imbalance is a common problem in the case of real-world object detection and classification tasks. Data of some classes is abundant making them an over-represented majority, and data of other classes is scarce, making them an under-represented minority. This imbalance makes it challenging for a classifier to appropriately learn the discriminating boundaries of the majority and minority classes. In this work, we propose a cost sensitive deep neural network which can automatically learn robust feature representations for both the majority and minority classes. During training, our learning procedure jointly optimizes the class dependent costs and the neural network parameters. The proposed approach is applicable to both binary and multi-class problems without any modification. Moreover, as opposed to data level approaches, we do not alter the original data distribution which results in a lower computational cost during the training process. We report the results of our experiments on six major image classification datasets and show that the proposed approach significantly outperforms the baseline algorithms. Comparisons with popular data sampling techniques and cost sensitive classifiers demonstrate the superior performance of our proposed method. △ Less

Submitted 23 March, 2017; v1 submitted 14 August, 2015; originally announced August 2015.

arXiv:1506.05196 [pdf, other]

doi 10.1109/TIP.2016.2567076

A Discriminative Representation of Convolutional Features for Indoor Scene Recognition

Authors: Salman H. Khan, Munawar Hayat, Mohammed Bennamoun, Roberto Togneri, Ferdous Sohel

Abstract: Indoor scene recognition is a multi-faceted and challenging problem due to the diverse intra-class variations and the confusing inter-class similarities. This paper presents a novel approach which exploits rich mid-level convolutional features to categorize indoor scenes. Traditionally used convolutional features preserve the global spatial structure, which is a desirable property for general obje… ▽ More Indoor scene recognition is a multi-faceted and challenging problem due to the diverse intra-class variations and the confusing inter-class similarities. This paper presents a novel approach which exploits rich mid-level convolutional features to categorize indoor scenes. Traditionally used convolutional features preserve the global spatial structure, which is a desirable property for general object recognition. However, we argue that this structuredness is not much helpful when we have large variations in scene layouts, e.g., in indoor scenes. We propose to transform the structured convolutional activations to another highly discriminative feature space. The representation in the transformed space not only incorporates the discriminative aspects of the target dataset, but it also encodes the features in terms of the general object categories that are present in indoor scenes. To this end, we introduce a new large-scale dataset of 1300 object categories which are commonly present in indoor scenes. Our proposed approach achieves a significant performance boost over previous state of the art approaches on five major scene classification datasets. △ Less

Submitted 16 June, 2015; originally announced June 2015.

arXiv:1304.3192 [pdf, ps, other]

doi 10.1007/s11263-013-0627-y

Rotational Projection Statistics for 3D Local Surface Description and Object Recognition

Authors: Yulan Guo, Ferdous Sohel, Mohammed Bennamoun, Min Lu, Jianwei Wan

Abstract: Recognizing 3D objects in the presence of noise, varying mesh resolution, occlusion and clutter is a very challenging task. This paper presents a novel method named Rotational Projection Statistics (RoPS). It has three major modules: Local Reference Frame (LRF) definition, RoPS feature description and 3D object recognition. We propose a novel technique to define the LRF by calculating the scatter… ▽ More Recognizing 3D objects in the presence of noise, varying mesh resolution, occlusion and clutter is a very challenging task. This paper presents a novel method named Rotational Projection Statistics (RoPS). It has three major modules: Local Reference Frame (LRF) definition, RoPS feature description and 3D object recognition. We propose a novel technique to define the LRF by calculating the scatter matrix of all points lying on the local surface. RoPS feature descriptors are obtained by rotationally projecting the neighboring points of a feature point onto 2D planes and calculating a set of statistics (including low-order central moments and entropy) of the distribution of these projected points. Using the proposed LRF and RoPS descriptor, we present a hierarchical 3D object recognition algorithm. The performance of the proposed LRF, RoPS descriptor and object recognition algorithm was rigorously tested on a number of popular and publicly available datasets. Our proposed techniques exhibited superior performance compared to existing techniques. We also showed that our method is robust with respect to noise and varying mesh resolution. Our RoPS based algorithm achieved recognition rates of 100%, 98.9%, 95.4% and 96.0% respectively when tested on the Bologna, UWA, Queen's and Ca' Foscari Venezia Datasets. △ Less

Submitted 11 April, 2013; originally announced April 2013.

Comments: The final publication is available at link.springer.com International Journal of Computer Vision 2013

ACM Class: I.4; I.5.4

arXiv:1009.4582 [pdf]

Text Classification using the Concept of Association Rule of Data Mining

Authors: Chowdhury Mofizur Rahman, Ferdous Ahmed Sohel, Parvez Naushad, S. M. Kamruzzaman

Abstract: As the amount of online text increases, the demand for text classification to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Automatic classification of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained from t… ▽ More As the amount of online text increases, the demand for text classification to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Automatic classification of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained from texts which have themselves been manually classified. In this paper we will discuss a procedure of classifying text using the concept of association rule of data mining. Association rule mining technique has been used to derive feature set from pre-classified text documents. Naive Bayes classifier is then used on derived features for final classification. △ Less

Submitted 23 September, 2010; originally announced September 2010.

Comments: 8 Pages, International Conference

Journal ref: Proc. International Conference on Information Technology, Kathmandu, Nepal, pp. 234-241, May. 2003

Showing 1–35 of 35 results for author: Sohel, F