Search | arXiv e-print repository

Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling

Authors: Wele Gedara Chaminda Bandara, Vishal M. Patel

Abstract: In this paper, we introduce Attention Prompt Tuning (APT) - a computationally efficient variant of prompt tuning for video-based applications such as action recognition. Prompt tuning approaches involve injecting a set of learnable prompts along with data tokens during fine-tuning while kee** the backbone frozen. This approach greatly reduces the number of learnable parameters compared to full t… ▽ More In this paper, we introduce Attention Prompt Tuning (APT) - a computationally efficient variant of prompt tuning for video-based applications such as action recognition. Prompt tuning approaches involve injecting a set of learnable prompts along with data tokens during fine-tuning while kee** the backbone frozen. This approach greatly reduces the number of learnable parameters compared to full tuning. For image-based downstream tasks, normally a couple of learnable prompts achieve results close to those of full tuning. However, videos, which contain more complex spatiotemporal information, require hundreds of tunable prompts to achieve reasonably good results. This reduces the parameter efficiency observed in images and significantly increases latency and the number of floating-point operations (FLOPs) during inference. To tackle these issues, we directly inject the prompts into the keys and values of the non-local attention mechanism within the transformer block. Additionally, we introduce a novel prompt reparameterization technique to make APT more robust against hyperparameter selection. The proposed APT approach greatly reduces the number of FLOPs and latency while achieving a significant performance boost over the existing parameter-efficient tuning methods on UCF101, HMDB51, and SSv2 datasets for action recognition. The code and pre-trained models are available at https://github.com/wgcban/apt △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: Accepted at 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG'24) Code available at: https://github.com/wgcban/apt 12 pages, 8 figures, 6 tables

arXiv:2312.02151 [pdf, other]

Guarding Barlow Twins Against Overfitting with Mixed Samples

Authors: Wele Gedara Chaminda Bandara, Celso M. De Melo, Vishal M. Patel

Abstract: Self-supervised Learning (SSL) aims to learn transferable feature representations for downstream applications without relying on labeled data. The Barlow Twins algorithm, renowned for its widespread adoption and straightforward implementation compared to its counterparts like contrastive learning methods, minimizes feature redundancy while maximizing invariance to common corruptions. Optimizing fo… ▽ More Self-supervised Learning (SSL) aims to learn transferable feature representations for downstream applications without relying on labeled data. The Barlow Twins algorithm, renowned for its widespread adoption and straightforward implementation compared to its counterparts like contrastive learning methods, minimizes feature redundancy while maximizing invariance to common corruptions. Optimizing for the above objective forces the network to learn useful representations, while avoiding noisy or constant features, resulting in improved downstream task performance with limited adaptation. Despite Barlow Twins' proven effectiveness in pre-training, the underlying SSL objective can inadvertently cause feature overfitting due to the lack of strong interaction between the samples unlike the contrastive learning approaches. From our experiments, we observe that optimizing for the Barlow Twins objective doesn't necessarily guarantee sustained improvements in representation quality beyond a certain pre-training phase, and can potentially degrade downstream performance on some datasets. To address this challenge, we introduce Mixed Barlow Twins, which aims to improve sample interaction during Barlow Twins training via linearly interpolated samples. This results in an additional regularization term to the original Barlow Twins objective, assuming linear interpolation in the input space translates to linearly interpolated features in the feature space. Pre-training with this regularization effectively mitigates feature overfitting and further enhances the downstream performance on CIFAR-10, CIFAR-100, TinyImageNet, STL-10, and ImageNet datasets. The code and checkpoints are available at: https://github.com/wgcban/mix-bt.git △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: Code and checkpoints are available at: https://github.com/wgcban/mix-bt.git

arXiv:2303.12790 [pdf, other]

$CrowdDiff$: Multi-hypothesis Crowd Density Estimation using Diffusion Models

Authors: Yasiru Ranasinghe, Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara, Vishal M. Patel

Abstract: Crowd counting is a fundamental problem in crowd analysis which is typically accomplished by estimating a crowd density map and summing over the density values. However, this approach suffers from background noise accumulation and loss of density due to the use of broad Gaussian kernels to create the ground truth density maps. This issue can be overcome by narrowing the Gaussian kernel. However, e… ▽ More Crowd counting is a fundamental problem in crowd analysis which is typically accomplished by estimating a crowd density map and summing over the density values. However, this approach suffers from background noise accumulation and loss of density due to the use of broad Gaussian kernels to create the ground truth density maps. This issue can be overcome by narrowing the Gaussian kernel. However, existing approaches perform poorly when trained with ground truth density maps with broad kernels. To deal with this limitation, we propose using conditional diffusion models to predict density maps, as diffusion models show high fidelity to training data during generation. With that, we present $CrowdDiff$ that generates the crowd density map as a reverse diffusion process. Furthermore, as the intermediate time steps of the diffusion process are noisy, we incorporate a regression branch for direct crowd estimation only during training to improve the feature learning. In addition, owing to the stochastic nature of the diffusion model, we introduce producing multiple density maps to improve the counting performance contrary to the existing crowd counting pipelines. We conduct extensive experiments on publicly available datasets to validate the effectiveness of our method. $CrowdDiff$ outperforms existing state-of-the-art crowd counting methods on several public crowd analysis benchmarks with significant improvements. △ Less

Submitted 4 April, 2024; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: Accepted at CVPR'24. The project is available at https://dylran.github.io/crowddiff.github.io

arXiv:2303.09536 [pdf, other]

Deep Metric Learning for Unsupervised Remote Sensing Change Detection

Authors: Wele Gedara Chaminda Bandara, Vishal M. Patel

Abstract: Remote Sensing Change Detection (RS-CD) aims to detect relevant changes from Multi-Temporal Remote Sensing Images (MT-RSIs), which aids in various RS applications such as land cover, land use, human development analysis, and disaster response. The performance of existing RS-CD methods is attributed to training on large annotated datasets. Furthermore, most of these models are less transferable in… ▽ More Remote Sensing Change Detection (RS-CD) aims to detect relevant changes from Multi-Temporal Remote Sensing Images (MT-RSIs), which aids in various RS applications such as land cover, land use, human development analysis, and disaster response. The performance of existing RS-CD methods is attributed to training on large annotated datasets. Furthermore, most of these models are less transferable in the sense that the trained model often performs very poorly when there is a domain gap between training and test datasets. This paper proposes an unsupervised CD method based on deep metric learning that can deal with both of these issues. Given an MT-RSI, the proposed method generates corresponding change probability map by iteratively optimizing an unsupervised CD loss without training it on a large dataset. Our unsupervised CD method consists of two interconnected deep networks, namely Deep-Change Probability Generator (D-CPG) and Deep-Feature Extractor (D-FE). The D-CPG is designed to predict change and no change probability maps for a given MT-RSI, while D-FE is used to extract deep features of MT-RSI that will be further used in the proposed unsupervised CD loss. We use transfer learning capability to initialize the parameters of D-FE. We iteratively optimize the parameters of D-CPG and D-FE for a given MT-RSI by minimizing the proposed unsupervised ``similarity-dissimilarity loss''. This loss is motivated by the principle of metric learning where we simultaneously maximize the distance between change pair-wise pixels while minimizing the distance between no-change pair-wise pixels in bi-temporal image domain and their deep feature domain. The experiments conducted on three CD datasets show that our unsupervised CD method achieves significant improvements over the state-of-the-art supervised and unsupervised CD methods. Code available at https://github.com/wgcban/Metric-CD △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: Code available at https://github.com/wgcban/Metric-CD

arXiv:2212.00793 [pdf, other]

Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

Authors: Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara, Vishal M. Patel

Abstract: Generating photos satisfying multiple constraints find broad utility in the content creation industry. A key hurdle to accomplishing this task is the need for paired data consisting of all modalities (i.e., constraints) and their corresponding output. Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to t… ▽ More Generating photos satisfying multiple constraints find broad utility in the content creation industry. A key hurdle to accomplishing this task is the need for paired data consisting of all modalities (i.e., constraints) and their corresponding output. Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to this problem based on denoising diffusion probabilistic models (DDPMs). Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Since each sampling step in the DDPM follows a Gaussian distribution, we show that there exists a closed-form solution for generating an image given various constraints. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task through our proposed sampling strategy. We also introduce a novel reliability parameter that allows using different off-the-shelf diffusion models trained across various datasets during sampling time alone to guide it to the desired outcome satisfying multiple constraints. We perform experiments on various standard multimodal tasks to demonstrate the effectiveness of our approach. More details can be found in https://nithin-gk.github.io/projectpages/Multidiff/index.html △ Less

Submitted 20 April, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

Comments: Accepted at CVPR 2023

arXiv:2211.09120 [pdf, other]

AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders

Authors: Wele Gedara Chaminda Bandara, Naman Patel, Ali Gholami, Mehdi Nikkhah, Motilal Agrawal, Vishal M. Patel

Abstract: Masked Autoencoders (MAEs) learn generalizable representations for image, text, audio, video, etc., by reconstructing masked input data from tokens of the visible data. Current MAE approaches for videos rely on random patch, tube, or frame-based masking strategies to select these tokens. This paper proposes AdaMAE, an adaptive masking strategy for MAEs that is end-to-end trainable. Our adaptive ma… ▽ More Masked Autoencoders (MAEs) learn generalizable representations for image, text, audio, video, etc., by reconstructing masked input data from tokens of the visible data. Current MAE approaches for videos rely on random patch, tube, or frame-based masking strategies to select these tokens. This paper proposes AdaMAE, an adaptive masking strategy for MAEs that is end-to-end trainable. Our adaptive masking strategy samples visible tokens based on the semantic context using an auxiliary sampling network. This network estimates a categorical distribution over spacetime-patch tokens. The tokens that increase the expected reconstruction error are rewarded and selected as visible tokens, motivated by the policy gradient algorithm in reinforcement learning. We show that AdaMAE samples more tokens from the high spatiotemporal information regions, thereby allowing us to mask 95% of tokens, resulting in lower memory requirements and faster pre-training. We conduct ablation studies on the Something-Something v2 (SSv2) dataset to demonstrate the efficacy of our adaptive sampling approach and report state-of-the-art results of 70.0% and 81.7% in top-1 accuracy on SSv2 and Kinetics-400 action classification datasets with a ViT-Base backbone and 800 pre-training epochs. △ Less

Submitted 16 November, 2022; originally announced November 2022.

Comments: Code available at: https://github.com/wgcban/adamae

arXiv:2206.11892 [pdf, other]

DDPM-CD: Denoising Diffusion Probabilistic Models as Feature Extractors for Change Detection

Authors: Wele Gedara Chaminda Bandara, Nithin Gopalakrishnan Nair, Vishal M. Patel

Abstract: Remote sensing change detection is crucial for understanding the dynamics of our planet's surface, facilitating the monitoring of environmental changes, evaluating human impact, predicting future trends, and supporting decision-making. In this work, we introduce a novel approach for change detection that can leverage off-the-shelf, unlabeled remote sensing images in the training process by pre-tra… ▽ More Remote sensing change detection is crucial for understanding the dynamics of our planet's surface, facilitating the monitoring of environmental changes, evaluating human impact, predicting future trends, and supporting decision-making. In this work, we introduce a novel approach for change detection that can leverage off-the-shelf, unlabeled remote sensing images in the training process by pre-training a Denoising Diffusion Probabilistic Model (DDPM) - a class of generative models used in image synthesis. DDPMs learn the training data distribution by gradually converting training images into a Gaussian distribution using a Markov chain. During inference (i.e., sampling), they can generate a diverse set of samples closer to the training distribution, starting from Gaussian noise, achieving state-of-the-art image synthesis results. However, in this work, our focus is not on image synthesis but on utilizing it as a pre-trained feature extractor for the downstream application of change detection. Specifically, we fine-tune a lightweight change classifier utilizing the feature representations produced by the pre-trained DDPM alongside change labels. Experiments conducted on the LEVIR-CD, WHU-CD, DSIFN-CD, and CDD datasets demonstrate that the proposed DDPM-CD method significantly outperforms the existing state-of-the-art change detection methods in terms of F1 score, IoU, and overall accuracy, highlighting the pivotal role of pre-trained DDPM as a feature extractor for downstream applications. We have made both the code and pre-trained models available at https://github.com/wgcban/ddpm-cd △ Less

Submitted 12 January, 2024; v1 submitted 23 June, 2022; originally announced June 2022.

Comments: Code available at: https://github.com/wgcban/ddpm-cd

arXiv:2206.08481 [pdf, other]

Orientation-guided Graph Convolutional Network for Bone Surface Segmentation

Authors: Aimon Rahman, Wele Gedara Chaminda Bandara, Jeya Maria Jose Valanarasu, Ilker Hacihaliloglu, Vishal M Patel

Abstract: Due to imaging artifacts and low signal-to-noise ratio in ultrasound images, automatic bone surface segmentation networks often produce fragmented predictions that can hinder the success of ultrasound-guided computer-assisted surgical procedures. Existing pixel-wise predictions often fail to capture the accurate topology of bone tissues due to a lack of supervision to enforce connectivity. In this… ▽ More Due to imaging artifacts and low signal-to-noise ratio in ultrasound images, automatic bone surface segmentation networks often produce fragmented predictions that can hinder the success of ultrasound-guided computer-assisted surgical procedures. Existing pixel-wise predictions often fail to capture the accurate topology of bone tissues due to a lack of supervision to enforce connectivity. In this work, we propose an orientation-guided graph convolutional network to improve connectivity while segmenting the bone surface. We also propose an additional supervision on the orientation of the bone surface to further impose connectivity. We validated our approach on 1042 vivo US scans of femur, knee, spine, and distal radius. Our approach improves over the state-of-the-art methods by 5.01% in connectivity metric. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: Accepted at MICCAI 2022

arXiv:2206.05039 [pdf, other]

Image Generation with Multimodal Priors using Denoising Diffusion Probabilistic Models

Authors: Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara, Vishal M Patel

Abstract: Image synthesis under multi-modal priors is a useful and challenging task that has received increasing attention in recent years. A major challenge in using generative models to accomplish this task is the lack of paired data containing all modalities (i.e. priors) and corresponding outputs. In recent work, a variational auto-encoder (VAE) model was trained in a weakly supervised manner to address… ▽ More Image synthesis under multi-modal priors is a useful and challenging task that has received increasing attention in recent years. A major challenge in using generative models to accomplish this task is the lack of paired data containing all modalities (i.e. priors) and corresponding outputs. In recent work, a variational auto-encoder (VAE) model was trained in a weakly supervised manner to address this challenge. Since the generative power of VAEs is usually limited, it is difficult for this method to synthesize images belonging to complex distributions. To this end, we propose a solution based on a denoising diffusion probabilistic models to synthesise images under multi-model priors. Based on the fact that the distribution over each time step in the diffusion model is Gaussian, in this work we show that there exists a closed-form expression to the generate the image corresponds to the given modalities. The proposed solution does not require explicit retraining for all modalities and can leverage the outputs of individual modalities to generate realistic images according to different constraints. We conduct studies on two real-world datasets to demonstrate the effectiveness of our approach △ Less

Submitted 10 June, 2022; originally announced June 2022.

arXiv:2206.04514 [pdf, ps, other]

doi 10.1109/LGRS.2023.3270799

SAR Despeckling using a Denoising Diffusion Probabilistic Model

Authors: Malsha V. Perera, Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara, Vishal M. Patel

Abstract: Speckle is a multiplicative noise which affects all coherent imaging modalities including Synthetic Aperture Radar (SAR) images. The presence of speckle degrades the image quality and adversely affects the performance of SAR image understanding applications such as automatic target recognition and change detection. Thus, SAR despeckling is an important problem in remote sensing. In this paper, we… ▽ More Speckle is a multiplicative noise which affects all coherent imaging modalities including Synthetic Aperture Radar (SAR) images. The presence of speckle degrades the image quality and adversely affects the performance of SAR image understanding applications such as automatic target recognition and change detection. Thus, SAR despeckling is an important problem in remote sensing. In this paper, we introduce SAR-DDPM, a denoising diffusion probabilistic model for SAR despeckling. The proposed method comprises of a Markov chain that transforms clean images to white Gaussian noise by repeatedly adding random noise. The despeckled image is recovered by a reverse process which iteratively predicts the added noise using a noise predictor which is conditioned on the speckled image. In addition, we propose a new inference strategy based on cycle spinning to improve the despeckling performance. Our experiments on both synthetic and real SAR images demonstrate that the proposed method achieves significant improvements in both quantitative and qualitative results over the state-of-the-art despeckling methods. △ Less

Submitted 9 June, 2022; originally announced June 2022.

Comments: Our code is available at https://github.com/malshaV/SAR_DDPM

arXiv:2205.15906 [pdf, ps, other]

SAR Despeckling Using Overcomplete Convolutional Networks

Authors: Malsha V. Perera, Wele Gedara Chaminda Bandara, Jeya Maria Jose Valanarasu, Vishal M. Patel

Abstract: Synthetic Aperture Radar (SAR) despeckling is an important problem in remote sensing as speckle degrades SAR images, affecting downstream tasks like detection and segmentation. Recent studies show that convolutional neural networks(CNNs) outperform classical despeckling methods. Traditional CNNs try to increase the receptive field size as the network goes deeper, thus extracting global features. H… ▽ More Synthetic Aperture Radar (SAR) despeckling is an important problem in remote sensing as speckle degrades SAR images, affecting downstream tasks like detection and segmentation. Recent studies show that convolutional neural networks(CNNs) outperform classical despeckling methods. Traditional CNNs try to increase the receptive field size as the network goes deeper, thus extracting global features. However,speckle is relatively small, and increasing receptive field does not help in extracting speckle features. This study employs an overcomplete CNN architecture to focus on learning low-level features by restricting the receptive field. The proposed network consists of an overcomplete branch to focus on the local structures and an undercomplete branch that focuses on the global structures. We show that the proposed network improves despeckling performance compared to recent despeckling methods on synthetic and real SAR images. △ Less

Submitted 31 May, 2022; originally announced May 2022.

Comments: Accepted to International Geoscience and Remote Sensing Symposium (IGARSS), 2022. Our code is available at https://github.com/malshaV/sar_overcomplete

arXiv:2204.08454 [pdf, other]

Revisiting Consistency Regularization for Semi-supervised Change Detection in Remote Sensing Images

Authors: Wele Gedara Chaminda Bandara, Vishal M. Patel

Abstract: Remote-sensing (RS) Change Detection (CD) aims to detect "changes of interest" from co-registered bi-temporal images. The performance of existing deep supervised CD methods is attributed to the large amounts of annotated data used to train the networks. However, annotating large amounts of remote sensing images is labor-intensive and expensive, particularly with bi-temporal images, as it requires… ▽ More Remote-sensing (RS) Change Detection (CD) aims to detect "changes of interest" from co-registered bi-temporal images. The performance of existing deep supervised CD methods is attributed to the large amounts of annotated data used to train the networks. However, annotating large amounts of remote sensing images is labor-intensive and expensive, particularly with bi-temporal images, as it requires pixel-wise comparisons by a human expert. On the other hand, we often have access to unlimited unlabeled multi-temporal RS imagery thanks to ever-increasing earth observation programs. In this paper, we propose a simple yet effective way to leverage the information from unlabeled bi-temporal images to improve the performance of CD approaches. More specifically, we propose a semi-supervised CD model in which we formulate an unsupervised CD loss in addition to the supervised Cross-Entropy (CE) loss by constraining the output change probability map of a given unlabeled bi-temporal image pair to be consistent under the small random perturbations applied on the deep feature difference map that is obtained by subtracting their latent feature representations. Experiments conducted on two publicly available CD datasets show that the proposed semi-supervised CD method can reach closer to the performance of supervised CD even with access to as little as 10% of the annotated training data. Code available at https://github.com/wgcban/SemiCD △ Less

Submitted 21 April, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

Comments: Code available at https://github.com/wgcban/SemiCD 36 pages

arXiv:2203.02503 [pdf, other]

HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening

Authors: Wele Gedara Chaminda Bandara, Vishal M. Patel

Abstract: Pansharpening aims to fuse a registered high-resolution panchromatic image (PAN) with a low-resolution hyperspectral image (LR-HSI) to generate an enhanced HSI with high spectral and spatial resolution. Existing pansharpening approaches neglect using an attention mechanism to transfer HR texture features from PAN to LR-HSI features, resulting in spatial and spectral distortions. In this paper, we… ▽ More Pansharpening aims to fuse a registered high-resolution panchromatic image (PAN) with a low-resolution hyperspectral image (LR-HSI) to generate an enhanced HSI with high spectral and spatial resolution. Existing pansharpening approaches neglect using an attention mechanism to transfer HR texture features from PAN to LR-HSI features, resulting in spatial and spectral distortions. In this paper, we present a novel attention mechanism for pansharpening called HyperTransformer, in which features of LR-HSI and PAN are formulated as queries and keys in a transformer, respectively. HyperTransformer consists of three main modules, namely two separate feature extractors for PAN and HSI, a multi-head feature soft attention module, and a spatial-spectral feature fusion module. Such a network improves both spatial and spectral quality measures of the pansharpened HSI by learning cross-feature space dependencies and long-range details of PAN and LR-HSI. Furthermore, HyperTransformer can be utilized across multiple spatial scales at the backbone for obtaining improved performance. Extensive experiments conducted on three widely used datasets demonstrate that HyperTransformer achieves significant improvement over the state-of-the-art methods on both spatial and spectral quality measures. Implementation code and pre-trained weights can be accessed at https://github.com/wgcban/HyperTransformer. △ Less

Submitted 28 March, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

Comments: Accepted at CVPR'22. Project page: https://www.wgcban.com/research#h.ar24vwqlm021 Code available at: https://github.com/wgcban/HyperTransformer

arXiv:2201.09355 [pdf, ps, other]

Transformer-based SAR Image Despeckling

Authors: Malsha V. Perera, Wele Gedara Chaminda Bandara, Jeya Maria Jose Valanarasu, Vishal M. Patel

Abstract: Synthetic Aperture Radar (SAR) images are usually degraded by a multiplicative noise known as speckle which makes processing and interpretation of SAR images difficult. In this paper, we introduce a transformer-based network for SAR image despeckling. The proposed despeckling network comprises of a transformer-based encoder which allows the network to learn global dependencies between different im… ▽ More Synthetic Aperture Radar (SAR) images are usually degraded by a multiplicative noise known as speckle which makes processing and interpretation of SAR images difficult. In this paper, we introduce a transformer-based network for SAR image despeckling. The proposed despeckling network comprises of a transformer-based encoder which allows the network to learn global dependencies between different image regions - aiding in better despeckling. The network is trained end-to-end with synthetically generated speckled images using a composite loss function. Experiments show that the proposed method achieves significant improvements over traditional and convolutional neural network-based despeckling methods on both synthetic and real SAR images. △ Less

Submitted 23 January, 2022; originally announced January 2022.

Comments: Submitted to International Geoscience and Remote Sensing Symposium (IGARSS), 2022. Our code is available at https://github.com/malshaV/sar_transformer

arXiv:2201.01293 [pdf, other]

A Transformer-Based Siamese Network for Change Detection

Authors: Wele Gedara Chaminda Bandara, Vishal M. Patel

Abstract: This paper presents a transformer-based Siamese network architecture (abbreviated by ChangeFormer) for Change Detection (CD) from a pair of co-registered remote sensing images. Different from recent CD frameworks, which are based on fully convolutional networks (ConvNets), the proposed method unifies hierarchically structured transformer encoder with Multi-Layer Perception (MLP) decoder in a Siame… ▽ More This paper presents a transformer-based Siamese network architecture (abbreviated by ChangeFormer) for Change Detection (CD) from a pair of co-registered remote sensing images. Different from recent CD frameworks, which are based on fully convolutional networks (ConvNets), the proposed method unifies hierarchically structured transformer encoder with Multi-Layer Perception (MLP) decoder in a Siamese network architecture to efficiently render multi-scale long-range details required for accurate CD. Experiments on two CD datasets show that the proposed end-to-end trainable ChangeFormer architecture achieves better CD performance than previous counterparts. Our code is available at https://github.com/wgcban/ChangeFormer. △ Less

Submitted 1 September, 2022; v1 submitted 4 January, 2022; originally announced January 2022.

Comments: Accepted to International Geoscience and Remote Sensing Symposium (IGARSS), 2022. 4 pages, 2 figures. Code & trained models are available at https://github.com/wgcban/ChangeFormer

arXiv:2109.07701 [pdf, other]

SPIN Road Mapper: Extracting Roads from Aerial Images via Spatial and Interaction Space Graph Reasoning for Autonomous Driving

Authors: Wele Gedara Chaminda Bandara, Jeya Maria Jose Valanarasu, Vishal M. Patel

Abstract: Road extraction is an essential step in building autonomous navigation systems. Detecting road segments is challenging as they are of varying widths, bifurcated throughout the image, and are often occluded by terrain, cloud, or other weather conditions. Using just convolution neural networks (ConvNets) for this problem is not effective as it is inefficient at capturing distant dependencies between… ▽ More Road extraction is an essential step in building autonomous navigation systems. Detecting road segments is challenging as they are of varying widths, bifurcated throughout the image, and are often occluded by terrain, cloud, or other weather conditions. Using just convolution neural networks (ConvNets) for this problem is not effective as it is inefficient at capturing distant dependencies between road segments in the image which is essential to extract road connectivity. To this end, we propose a Spatial and Interaction Space Graph Reasoning (SPIN) module which when plugged into a ConvNet performs reasoning over graphs constructed on spatial and interaction spaces projected from the feature maps. Reasoning over spatial space extracts dependencies between different spatial regions and other contextual information. Reasoning over a projected interaction space helps in appropriate delineation of roads from other topographies present in the image. Thus, SPIN extracts long-range dependencies between road segments and effectively delineates roads from other semantics. We also introduce a SPIN pyramid which performs SPIN graph reasoning across multiple scales to extract multi-scale features. We propose a network based on stacked hourglass modules and SPIN pyramid for road segmentation which achieves better performance compared to existing methods. Moreover, our method is computationally efficient and significantly boosts the convergence speed during training, making it feasible for applying on large-scale high-resolution aerial images. Code available at: https://github.com/wgcban/SPIN_RoadMapper.git. △ Less

Submitted 15 September, 2021; originally announced September 2021.

Comments: Code available at: https://github.com/wgcban/SPIN_RoadMapper.git

Journal ref: IEEE Conference of Robotics and Automation (ICRA) 2022

arXiv:2108.10387 [pdf, ps, other]

A Sensitivity Matrix Approach Using Two-Stage Optimization for Voltage Regulation of LV Networks with High PV Penetration

Authors: A. S. Jameel Hassan, Umar Marikkar, G. W. Kasun Prabhath, Aranee Balachandran, W. G. Chaminda Bandara, Parakrama B. Ekanayake, Roshan I. Godaliyadda, Janaka B. Ekanayake

Abstract: The occurrence of voltage violations are a major deterrent for absorbing more roof-top solar power to smart Low Voltage Distribution Grids (LVDG). Recent studies have focused on decentralized control methods to solve this problem due to the high computational time in performing load flows in centralized control techniques. To address this issue a novel sensitivity matrix is developed to estimate v… ▽ More The occurrence of voltage violations are a major deterrent for absorbing more roof-top solar power to smart Low Voltage Distribution Grids (LVDG). Recent studies have focused on decentralized control methods to solve this problem due to the high computational time in performing load flows in centralized control techniques. To address this issue a novel sensitivity matrix is developed to estimate voltages of the network by replacing load flow simulations. In this paper, a Centralized Active, Reactive Power Management System (CARPMS) is proposed to optimally utilize the reactive power capability of smart photo-voltaic inverters with minimal active power curtailment to mitigate the voltage violation problem. The developed sensitivity matrix is able to reduce the time consumed by 48% compared to load flow simulations, enabling near real-time control optimization. Given the large solution space of power systems, a novel two-stage optimization is proposed, where the solution space is narrowed down by a Feasible Region Search (FRS) step, followed by Particle Swarm Optimization (PSO). The performance of the proposed methodology is analyzed in comparison to the load flow method to demonstrate the accuracy and the capability of the optimization algorithm to mitigate voltage violations in near real-time. The deviation of mean voltages of the proposed methodology from load flow method was; 6.5*10^-3 p.u for reactive power control using Q-injection, 1.02*10^-2 p.u for reactive power control using Q-absorption, and 0 p.u for active power curtailment case. △ Less

Submitted 23 August, 2021; originally announced August 2021.

arXiv:2107.02630 [pdf, other]

Hyperspectral Pansharpening Based on Improved Deep Image Prior and Residual Reconstruction

Authors: Wele Gedara Chaminda Bandara, Jeya Maria Jose Valanarasu, Vishal M. Patel

Abstract: Hyperspectral pansharpening aims to synthesize a low-resolution hyperspectral image (LR-HSI) with a registered panchromatic image (PAN) to generate an enhanced HSI with high spectral and spatial resolution. Recently proposed HS pansharpening methods have obtained remarkable results using deep convolutional networks (ConvNets), which typically consist of three steps: (1) up-sampling the LR-HSI, (2)… ▽ More Hyperspectral pansharpening aims to synthesize a low-resolution hyperspectral image (LR-HSI) with a registered panchromatic image (PAN) to generate an enhanced HSI with high spectral and spatial resolution. Recently proposed HS pansharpening methods have obtained remarkable results using deep convolutional networks (ConvNets), which typically consist of three steps: (1) up-sampling the LR-HSI, (2) predicting the residual image via a ConvNet, and (3) obtaining the final fused HSI by adding the outputs from first and second steps. Recent methods have leveraged Deep Image Prior (DIP) to up-sample the LR-HSI due to its excellent ability to preserve both spatial and spectral information, without learning from large data sets. However, we observed that the quality of up-sampled HSIs can be further improved by introducing an additional spatial-domain constraint to the conventional spectral-domain energy function. We define our spatial-domain constraint as the $L_1$ distance between the predicted PAN image and the actual PAN image. To estimate the PAN image of the up-sampled HSI, we also propose a learnable spectral response function (SRF). Moreover, we noticed that the residual image between the up-sampled HSI and the reference HSI mainly consists of edge information and very fine structures. In order to accurately estimate fine information, we propose a novel over-complete network, called HyperKite, which focuses on learning high-level features by constraining the receptive from increasing in the deep layers. We perform experiments on three HSI datasets to demonstrate the superiority of our DIP-HyperKite over the state-of-the-art pansharpening methods. The deployment codes, pre-trained models, and final fusion outputs of our DIP-HyperKite and the methods used for the comparisons will be publicly made available at https://github.com/wgcban/DIP-HyperKite.git. △ Less

Submitted 6 July, 2021; originally announced July 2021.

arXiv:2011.14644 [pdf]

Quantitative Assessment of Adulteration and Reuse of Coconut Oil Using Transmittance Multispectral Imaging

Authors: S. Herath, H. K. Weerasooriya, D. Y. L. Ranasinghe, W. G. C. Bandara, H. M. V. R. Herath, G. M. R. I. Godaliyadda, M. P. B. Ekanayake, Terrence Madhujith

Abstract: Coconut oil known for its wide range of uses is often adulterated with other edible oils. Repeated use of coconut oil in food preparation could lead to many health issues. Existing methods available for evaluating quality of oil are laborious and time consuming. Therefore, we propose an imaging system hardware and image processing-based algorithm to estimate the adulteration of coconut oil with pa… ▽ More Coconut oil known for its wide range of uses is often adulterated with other edible oils. Repeated use of coconut oil in food preparation could lead to many health issues. Existing methods available for evaluating quality of oil are laborious and time consuming. Therefore, we propose an imaging system hardware and image processing-based algorithm to estimate the adulteration of coconut oil with palm oil as the adulterant. A clear functional relationship between adulteration level and Bhattacharyya distance was observed as R2 = 0.9876 on the training samples. Thereafter, another algorithm is proposed to develop a spectral-clustering based classifier to determine the effect of reheat and reuse of coconut oil. Distinct clusters were obtained for different levels of reheated oil classes and the classification was performed with an accuracy of 0.983 on training samples. Further, the input images for the proposed algorithms were generated using an in-house developed transmittance based multispectral imaging system. △ Less

Submitted 30 November, 2020; originally announced November 2020.

Comments: 10 pages, 11 figures, journal

arXiv:2009.08260 [pdf]

doi 10.1016/j.apenergy.2020.116022

Coordinated PV re-phasing: a novel method to maximize renewable energy integration in LV networks by mitigating network unbalances

Authors: W. G. Chaminda Bandara, G. M. R. I. Godaliyadda, M. P. B. Ekanayake, J. B. Ekanayake

Abstract: As combating climate change has become a top priority and as many countries are taking steps to make their power generation sustainable, there is a marked increase in the use of renewable energy sources (RESs) for electricity generation. Among these RESs, solar photovoltaics (PV) is one of the most popular sources of energy connected to LV distribution networks. With the greater integration of sol… ▽ More As combating climate change has become a top priority and as many countries are taking steps to make their power generation sustainable, there is a marked increase in the use of renewable energy sources (RESs) for electricity generation. Among these RESs, solar photovoltaics (PV) is one of the most popular sources of energy connected to LV distribution networks. With the greater integration of solar PV into LV distribution networks, utility providers impose caps to solar penetration in order to operate their network safely and within acceptable norms. One parameter that restricts solar PV penetration is unbalances created by loads and single-phase rooftop schemes connected to LV distribution grids. In this paper, a novel method is proposed to mitigate voltage unbalance in LV distribution grids by optimally re-phasing grid-connected rooftop PV systems. A modified version of the discrete bacterial foraging optimization algorithm (DBFOA) is introduced as the optimization technique to minimize the overall voltage unbalance of the network as the objective function, subjected to various network and operating parameters. The impact of utilizing the proposed PV re-phasing technique as opposed to a fixed phase configuration are compared based on overall voltage unbalance, which was observed hourly throughout the day. The case studies show that the proposed approach can significantly mitigate the overall voltage unbalance during the daytime and can facilitate to increase the usable PV capacity of the considered network by 77%. △ Less

Submitted 17 September, 2020; originally announced September 2020.

Showing 1–20 of 20 results for author: Bandara, W G C