Search | arXiv e-print repository

Annotation Cost-Efficient Active Learning for Deep Metric Learning Driven Remote Sensing Image Retrieval

Authors: Genc Hoxha, Gencer Sumbul, Julia Henkel, Lars Möllenbrok, Begüm Demir

Abstract: Deep metric learning (DML) has shown to be very effective for content-based image retrieval (CBIR) in remote sensing (RS). Most of DML methods for CBIR rely on many annotated images to accurately learn model parameters of deep neural networks. However, gathering many image annotations is time consuming and costly. To address this, we propose an annotation cost-efficient active learning (ANNEAL) me… ▽ More Deep metric learning (DML) has shown to be very effective for content-based image retrieval (CBIR) in remote sensing (RS). Most of DML methods for CBIR rely on many annotated images to accurately learn model parameters of deep neural networks. However, gathering many image annotations is time consuming and costly. To address this, we propose an annotation cost-efficient active learning (ANNEAL) method specifically designed for DML driven CBIR in RS. ANNEAL aims to create a small but informative training set made up of similar and dissimilar image pairs to be utilized for learning a deep metric space. The informativeness of the image pairs is assessed combining uncertainty and diversity criteria. To assess the uncertainty of image pairs, we introduce two algorithms: 1) metric-guided uncertainty estimation (MGUE); and 2) binary classifier guided uncertainty estimation (BCGUE). MGUE automatically estimates a threshold value that acts as a "boundary" between similar and dissimilar image pairs based on the distances in the metric space. The closer the similarity between image pairs to the estimated threshold value the higher their uncertainty. BCGUE estimates the uncertainty of the image pairs based on the confidence of the classifier in assigning the correct similarity label. The diversity criterion is assessed through a clustering-based strategy. ANNEAL selects the most informative image pairs by combining either MGUE or BCGUE with clustering-based strategy. The selected image pairs are sent to expert annotators to be labeled as similar or dissimilar. This way of annotating images significantly reduces the annotation cost compared to the cost of annotating images with LULC labels. Experimental results carried out on two RS benchmark datasets demonstrate the effectiveness of our method. The code of the proposed method will be publicly available upon the acceptance of the paper. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Submitted to IEEE Transactions on Geoscience and Remote Sensing

arXiv:2401.07782 [pdf, other]

Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote Sensing

Authors: Jakob Hackstein, Gencer Sumbul, Kai Norman Clasen, Begüm Demir

Abstract: Self-supervised learning through masked autoencoders (MAEs) has recently attracted great attention for remote sensing (RS) image representation learning, and thus embodies a significant potential for content-based image retrieval (CBIR) from ever-growing RS image archives. However, the existing studies on MAEs in RS assume that the considered RS images are acquired by a single image sensor, and th… ▽ More Self-supervised learning through masked autoencoders (MAEs) has recently attracted great attention for remote sensing (RS) image representation learning, and thus embodies a significant potential for content-based image retrieval (CBIR) from ever-growing RS image archives. However, the existing studies on MAEs in RS assume that the considered RS images are acquired by a single image sensor, and thus are only suitable for uni-modal CBIR problems. The effectiveness of MAEs for cross-sensor CBIR, which aims to search semantically similar images across different image modalities, has not been explored yet. In this paper, we take the first step to explore the effectiveness of MAEs for sensor-agnostic CBIR in RS. To this end, we present a systematic overview on the possible adaptations of the vanilla MAE to exploit masked image modeling on multi-sensor RS image archives (denoted as cross-sensor masked autoencoders [CSMAEs]). Based on different adjustments applied to the vanilla MAE, we introduce different CSMAE models. We also provide an extensive experimental analysis of these CSMAE models. We finally derive a guideline to exploit masked image modeling for uni-modal and cross-modal CBIR problems in RS. The code of this work is publicly available at https://github.com/jakhac/CSMAE. △ Less

Submitted 11 April, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

Comments: This work has been submitted to the IEEE for possible publication. Our code is available at https://github.com/jakhac/CSMAE

arXiv:2311.06141 [pdf, other]

Federated Learning Across Decentralized and Unshared Archives for Remote Sensing Image Classification

Authors: Barış Büyüktaş, Gencer Sumbul, Begüm Demir

Abstract: Federated learning (FL) enables the collaboration of multiple deep learning models to learn from decentralized data archives (i.e., clients) without accessing data on clients. Although FL offers ample opportunities in knowledge discovery from distributed image archives, it is seldom considered in remote sensing (RS). In this paper, as a first time in RS, we present a comparative study of state-of-… ▽ More Federated learning (FL) enables the collaboration of multiple deep learning models to learn from decentralized data archives (i.e., clients) without accessing data on clients. Although FL offers ample opportunities in knowledge discovery from distributed image archives, it is seldom considered in remote sensing (RS). In this paper, as a first time in RS, we present a comparative study of state-of-the-art FL algorithms for RS image classification problems. To this end, we initially provide a systematic review of the FL algorithms presented in the computer vision and machine learning communities. Then, we select several state-of-the-art FL algorithms based on their effectiveness with respect to training data heterogeneity across clients (known as non-IID data). After presenting an extensive overview of the selected algorithms, a theoretical comparison of the algorithms is conducted based on their: 1) local training complexity; 2) aggregation complexity; 3) learning efficiency; 4) communication cost; and 5) scalability in terms of number of clients. After the theoretical comparison, experimental analyses are presented to compare them under different decentralization scenarios. For the experimental analyses, we focus our attention on multi-label image classification problems in RS. Based on our comprehensive analyses, we finally derive a guideline for selecting suitable FL algorithms in RS. The code of this work is publicly available at https://git.tu-berlin.de/rsim/FL-RS. △ Less

Submitted 14 June, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

Comments: Accepted at the IEEE Geoscience and Remote Sensing Magazine

arXiv:2306.11605 [pdf, other]

Annotation Cost Efficient Active Learning for Content Based Image Retrieval

Authors: Julia Henkel, Genc Hoxha, Gencer Sumbul, Lars Möllenbrok, Begüm Demir

Abstract: Deep metric learning (DML) based methods have been found very effective for content-based image retrieval (CBIR) in remote sensing (RS). For accurately learning the model parameters of deep neural networks, most of the DML methods require a high number of annotated training images, which can be costly to gather. To address this problem, in this paper we present an annotation cost efficient active… ▽ More Deep metric learning (DML) based methods have been found very effective for content-based image retrieval (CBIR) in remote sensing (RS). For accurately learning the model parameters of deep neural networks, most of the DML methods require a high number of annotated training images, which can be costly to gather. To address this problem, in this paper we present an annotation cost efficient active learning (AL) method (denoted as ANNEAL). The proposed method aims to iteratively enrich the training set by annotating the most informative image pairs as similar or dissimilar, while accurately modelling a deep metric space. This is achieved by two consecutive steps. In the first step the pairwise image similarity is modelled based on the available training set. Then, in the second step the most uncertain and diverse (i.e., informative) image pairs are selected to be annotated. Unlike the existing AL methods for CBIR, at each AL iteration of ANNEAL a human expert is asked to annotate the most informative image pairs as similar/dissimilar. This significantly reduces the annotation cost compared to annotating images with land-use/land cover class labels. Experimental results show the effectiveness of our method. The code of ANNEAL is publicly available at https://git.tu-berlin.de/rsim/ANNEAL. △ Less

Submitted 26 June, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2023. Our code is available at https://git.tu-berlin.de/rsim/ANNEAL

arXiv:2306.08575 [pdf, other]

Label Noise Robust Image Representation Learning based on Supervised Variational Autoencoders in Remote Sensing

Authors: Gencer Sumbul, Begüm Demir

Abstract: Due to the publicly available thematic maps and crowd-sourced data, remote sensing (RS) image annotations can be gathered at zero cost for training deep neural networks (DNNs). However, such annotation sources may increase the risk of including noisy labels in training data, leading to inaccurate RS image representation learning (IRL). To address this issue, in this paper we propose a label noise… ▽ More Due to the publicly available thematic maps and crowd-sourced data, remote sensing (RS) image annotations can be gathered at zero cost for training deep neural networks (DNNs). However, such annotation sources may increase the risk of including noisy labels in training data, leading to inaccurate RS image representation learning (IRL). To address this issue, in this paper we propose a label noise robust IRL method that aims to prevent the interference of noisy labels on IRL, independently from the learning task being considered in RS. To this end, the proposed method combines a supervised variational autoencoder (SVAE) with any kind of DNN. This is achieved by defining variational generative process based on image features. This allows us to define the importance of each training sample for IRL based on the loss values acquired from the SVAE and the task head of the considered DNN. Then, the proposed method imposes lower importance to images with noisy labels, while giving higher importance to those with correct labels during IRL. Experimental results show the effectiveness of the proposed method when compared to well-known label noise robust IRL methods applied to RS images. The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/RS-IRL-SVAE. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2023. Our code is available at https://git.tu-berlin.de/rsim/RS-IRL-SVAE

arXiv:2306.00792 [pdf, other]

Learning Across Decentralized Multi-Modal Remote Sensing Archives with Federated Learning

Authors: Barış Büyüktaş, Gencer Sumbul, Begüm Demir

Abstract: The development of federated learning (FL) methods, which aim to learn from distributed databases (i.e., clients) without accessing data on clients, has recently attracted great attention. Most of these methods assume that the clients are associated with the same data modality. However, remote sensing (RS) images in different clients can be associated with different data modalities that can improv… ▽ More The development of federated learning (FL) methods, which aim to learn from distributed databases (i.e., clients) without accessing data on clients, has recently attracted great attention. Most of these methods assume that the clients are associated with the same data modality. However, remote sensing (RS) images in different clients can be associated with different data modalities that can improve the classification performance when jointly used. To address this problem, in this paper we introduce a novel multi-modal FL framework that aims to learn from decentralized multi-modal RS image archives for RS image classification problems. The proposed framework is made up of three modules: 1) multi-modal fusion (MF); 2) feature whitening (FW); and 3) mutual information maximization (MIM). The MF module performs iterative model averaging to learn without accessing data on clients in the case that clients are associated with different data modalities. The FW module aligns the representations learned among the different clients. The MIM module maximizes the similarity of images from different modalities. Experimental results show the effectiveness of the proposed framework compared to iterative model averaging, which is a widely used algorithm in FL. The code of the proposed framework is publicly available at https://git.tu-berlin.de/rsim/MM-FL. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2023. Our code is available at https://git.tu-berlin.de/rsim/MM-FL

arXiv:2212.01261 [pdf, other]

doi 10.1109/TIP.2023.3293776

Generative Reasoning Integrated Label Noise Robust Deep Image Representation Learning

Authors: Gencer Sumbul, Begüm Demir

Abstract: The development of deep learning based image representation learning (IRL) methods has attracted great attention for various image understanding problems. Most of these methods require the availability of a high quantity and quality of annotated training images, which can be time-consuming and costly to gather. To reduce labeling costs, crowdsourced data, automatic labeling procedures or citizen s… ▽ More The development of deep learning based image representation learning (IRL) methods has attracted great attention for various image understanding problems. Most of these methods require the availability of a high quantity and quality of annotated training images, which can be time-consuming and costly to gather. To reduce labeling costs, crowdsourced data, automatic labeling procedures or citizen science projects can be considered. However, such approaches increase the risk of including label noise in training data. It may result in overfitting on noisy labels when discriminative reasoning is employed. This leads to sub-optimal learning procedures, and thus inaccurate characterization of images. To address this, we introduce a generative reasoning integrated label noise robust deep representation learning (GRID) approach. Our approach aims to model the complementary characteristics of discriminative and generative reasoning for IRL under noisy labels. To this end, we first integrate generative reasoning into discriminative reasoning through a supervised variational autoencoder. This allows GRID to automatically detect training samples with noisy labels. Then, through our label noise robust hybrid representation learning strategy, GRID adjusts the whole learning procedure for IRL of these samples through generative reasoning and that of other samples through discriminative reasoning. Our approach learns discriminative image representations while preventing interference of noisy labels independently from the IRL method being selected. Thus, unlike the existing methods, GRID does not depend on the type of annotation, neural network architecture, loss function or learning task, and thus can be directly utilized for various problems. Experimental results show its effectiveness compared to state-of-the-art methods. The code of GRID is publicly available at https://github.com/gencersumbul/GRID. △ Less

Submitted 4 August, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: Accepted at the IEEE Transactions on Image Processing. Our code is available at https://github.com/gencersumbul/GRID

arXiv:2212.01165 [pdf, other]

Deep Active Learning for Multi-Label Classification of Remote Sensing Images

Authors: Lars Möllenbrok, Gencer Sumbul, Begüm Demir

Abstract: In this letter, we introduce deep active learning (AL) for multi-label classification (MLC) problems in remote sensing (RS). In particular, we investigate the effectiveness of several AL query functions for MLC of RS images. Unlike the existing AL query functions (which are defined for single-label classification or semantic segmentation problems), each query function in this paper is based on the… ▽ More In this letter, we introduce deep active learning (AL) for multi-label classification (MLC) problems in remote sensing (RS). In particular, we investigate the effectiveness of several AL query functions for MLC of RS images. Unlike the existing AL query functions (which are defined for single-label classification or semantic segmentation problems), each query function in this paper is based on the evaluation of two criteria: i) multi-label uncertainty; and ii) multi-label diversity. The multi-label uncertainty criterion is associated to the confidence of the deep neural networks (DNNs) in correctly assigning multi-labels to each image. To assess this criterion, we investigate three strategies: i) learning multi-label loss ordering; ii) measuring temporal discrepancy of multi-label predictions; and iii) measuring magnitude of approximated gradient embeddings. The multi-label diversity criterion is associated to the selection of a set of images that are as diverse as possible to each other that prevents redundancy among them. To assess this criterion, we exploit a clustering based strategy. We combine each of the above-mentioned uncertainty strategies with the clustering based diversity strategy, resulting in three different query functions. All the considered query functions are introduced for the first time in the framework of MLC problems in RS. Experimental results obtained on two benchmark archives show that these query functions result in the selection of a highly informative set of samples at each iteration of the AL process. △ Less

Submitted 25 September, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: Accepted to IEEE Geoscience and Remote Sensing Letters

arXiv:2202.11429 [pdf, other]

doi 10.1109/ICIP46576.2022.9897475

A Novel Self-Supervised Cross-Modal Image Retrieval Method In Remote Sensing

Authors: Gencer Sumbul, Markus Müller, Begüm Demir

Abstract: Due to the availability of multi-modal remote sensing (RS) image archives, one of the most important research topics is the development of cross-modal RS image retrieval (CM-RSIR) methods that search semantically similar images across different modalities. Existing CM-RSIR methods require the availability of a high quality and quantity of annotated training images. The collection of a sufficient n… ▽ More Due to the availability of multi-modal remote sensing (RS) image archives, one of the most important research topics is the development of cross-modal RS image retrieval (CM-RSIR) methods that search semantically similar images across different modalities. Existing CM-RSIR methods require the availability of a high quality and quantity of annotated training images. The collection of a sufficient number of reliable labeled images is time consuming, complex and costly in operational scenarios, and can significantly affect the final accuracy of CM-RSIR. In this paper, we introduce a novel self-supervised CM-RSIR method that aims to: i) model mutual-information between different modalities in a self-supervised manner; ii) retain the distributions of modal-specific feature spaces similar to each other; and iii) define the most similar images within each modality without requiring any annotated training image. To this end, we propose a novel objective including three loss functions that simultaneously: i) maximize mutual information of different modalities for inter-modal similarity preservation; ii) minimize the angular distance of multi-modal image tuples for the elimination of inter-modal discrepancies; and iii) increase cosine similarity of the most similar images within each modality for the characterization of intra-modal similarities. Experimental results show the effectiveness of the proposed method compared to state-of-the-art methods. The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/SS-CM-RSIR. △ Less

Submitted 12 July, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

Comments: Accepted at IEEE International Conference on Image Processing (ICIP) 2022. Our code is available at https://git.tu-berlin.de/rsim/SS-CM-RSIR

arXiv:2202.11388 [pdf, ps, other]

doi 10.1109/ICIP46576.2022.9897939

Deep Metric Learning-Based Semi-Supervised Regression With Alternate Learning

Authors: Adina Zell, Gencer Sumbul, Begüm Demir

Abstract: This paper introduces a novel deep metric learning-based semi-supervised regression (DML-S2R) method for parameter estimation problems. The proposed DML-S2R method aims to mitigate the problems of insufficient amount of labeled samples without collecting any additional sample with a target value. To this end, it is made up of two main steps: i) pairwise similarity modeling with scarce labeled data… ▽ More This paper introduces a novel deep metric learning-based semi-supervised regression (DML-S2R) method for parameter estimation problems. The proposed DML-S2R method aims to mitigate the problems of insufficient amount of labeled samples without collecting any additional sample with a target value. To this end, it is made up of two main steps: i) pairwise similarity modeling with scarce labeled data; and ii) triplet-based metric learning with abundant unlabeled data. The first step aims to model pairwise sample similarities by using a small number of labeled samples. This is achieved by estimating the target value differences of labeled samples with a Siamese neural network (SNN). The second step aims to learn a triplet-based metric space (in which similar samples are close to each other and dissimilar samples are far apart from each other) when the number of labeled samples is insufficient. This is achieved by employing the SNN of the first step for triplet-based deep metric learning that exploits not only labeled samples but also unlabeled samples. For the end-to-end training of DML-S2R, we investigate an alternate learning strategy for the two steps. Due to this strategy, the encoded information in each step becomes a guidance for learning phase of the other step. The experimental results confirm the success of DML-S2R compared to the state-of-the-art semi-supervised regression methods. The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/DML-S2R. △ Less

Submitted 12 July, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

Comments: Accepted at IEEE International Conference on Image Processing (ICIP) 2022. Our code is available at https://git.tu-berlin.de/rsim/DML-S2R

arXiv:2201.06459 [pdf, other]

doi 10.1109/IGARSS46834.2022.9884146

A Novel Framework to Jointly Compress and Index Remote Sensing Images for Efficient Content-Based Retrieval

Authors: Gencer Sumbul, Jun Xiang, Nimisha Thekke Madam, Begüm Demir

Abstract: Remote sensing (RS) images are usually stored in compressed format to reduce the storage size of the archives. Thus, existing content-based image retrieval (CBIR) systems in RS require decoding images before applying CBIR (which is computationally demanding in the case of large-scale CBIR problems). To address this problem, in this paper, we present a joint framework that simultaneously learns RS… ▽ More Remote sensing (RS) images are usually stored in compressed format to reduce the storage size of the archives. Thus, existing content-based image retrieval (CBIR) systems in RS require decoding images before applying CBIR (which is computationally demanding in the case of large-scale CBIR problems). To address this problem, in this paper, we present a joint framework that simultaneously learns RS image compression and indexing. Thus, it eliminates the need for decoding RS images before applying CBIR. The proposed framework is made up of two modules. The first module compresses RS images based on an auto-encoder architecture. The second module produces hash codes with a high discrimination capability by employing soft pairwise, bit-balancing and classification loss functions. We also introduce a two stage learning strategy with gradient manipulation techniques to obtain image representations that are compatible with both RS image indexing and compression. Experimental results show the efficacy of the proposed framework when compared to widely used approaches in RS. The code of the proposed framework is available at https://git.tu-berlin.de/rsim/RS-JCIF. △ Less

Submitted 3 June, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2022. Our code is available at https://git.tu-berlin.de/rsim/RS-JCIF

arXiv:2106.00506 [pdf, other]

doi 10.1109/IGARSS47720.2021.9554466

A Novel Graph-Theoretic Deep Representation Learning Method for Multi-Label Remote Sensing Image Retrieval

Authors: Gencer Sumbul, Begüm Demir

Abstract: This paper presents a novel graph-theoretic deep representation learning method in the framework of multi-label remote sensing (RS) image retrieval problems. The proposed method aims to extract and exploit multi-label co-occurrence relationships associated to each RS image in the archive. To this end, each training image is initially represented with a graph structure that provides region-based im… ▽ More This paper presents a novel graph-theoretic deep representation learning method in the framework of multi-label remote sensing (RS) image retrieval problems. The proposed method aims to extract and exploit multi-label co-occurrence relationships associated to each RS image in the archive. To this end, each training image is initially represented with a graph structure that provides region-based image representation combining both local information and the related spatial organization. Unlike the other graph-based methods, the proposed method contains a novel learning strategy to train a deep neural network for automatically predicting a graph structure of each RS image in the archive. This strategy employs a region representation learning loss function to characterize the image content based on its multi-label co-occurrence relationship. Experimental results show the effectiveness of the proposed method for retrieval problems in RS compared to state-of-the-art deep representation learning methods. The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/GT-DRL-CBIR . △ Less

Submitted 1 June, 2021; originally announced June 2021.

Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2021. Our code is available at https://git.tu-berlin.de/rsim/GT-DRL-CBIR

arXiv:2105.07921 [pdf, other]

doi 10.1109/MGRS.2021.3089174

BigEarthNet-MM: A Large Scale Multi-Modal Multi-Label Benchmark Archive for Remote Sensing Image Classification and Retrieval

Authors: Gencer Sumbul, Arne de Wall, Tristan Kreuziger, Filipe Marcelino, Hugo Costa, Pedro Benevides, Mário Caetano, Begüm Demir, Volker Markl

Abstract: This paper presents the multi-modal BigEarthNet (BigEarthNet-MM) benchmark archive made up of 590,326 pairs of Sentinel-1 and Sentinel-2 image patches to support the deep learning (DL) studies in multi-modal multi-label remote sensing (RS) image retrieval and classification. Each pair of patches in BigEarthNet-MM is annotated with multi-labels provided by the CORINE Land Cover (CLC) map of 2018 ba… ▽ More This paper presents the multi-modal BigEarthNet (BigEarthNet-MM) benchmark archive made up of 590,326 pairs of Sentinel-1 and Sentinel-2 image patches to support the deep learning (DL) studies in multi-modal multi-label remote sensing (RS) image retrieval and classification. Each pair of patches in BigEarthNet-MM is annotated with multi-labels provided by the CORINE Land Cover (CLC) map of 2018 based on its thematically most detailed Level-3 class nomenclature. Our initial research demonstrates that some CLC classes are challenging to be accurately described by only considering (single-date) BigEarthNet-MM images. In this paper, we also introduce an alternative class-nomenclature as an evolution of the original CLC labels to address this problem. This is achieved by interpreting and arranging the CLC Level-3 nomenclature based on the properties of BigEarthNet-MM images in a new nomenclature of 19 classes. In our experiments, we show the potential of BigEarthNet-MM for multi-modal multi-label image retrieval and classification problems by considering several state-of-the-art DL models. We also demonstrate that the DL models trained from scratch on BigEarthNet-MM outperform those pre-trained on ImageNet, especially in relation to some complex classes, including agriculture and other vegetated and natural environments. We make all the data and the DL models publicly available at https://bigearth.net, offering an important resource to support studies on multi-modal image scene classification and retrieval problems in RS. △ Less

Submitted 17 June, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: Accepted at the IEEE Geoscience and Remote Sensing Magazine. Our code is available online at https://git.tu-berlin.de/rsim/BigEarthNet-MM_19-classes_models. arXiv admin note: substantial text overlap with arXiv:2001.06372

arXiv:2105.03647 [pdf, other]

doi 10.1109/TGRS.2021.3124326

Informative and Representative Triplet Selection for Multilabel Remote Sensing Image Retrieval

Authors: Gencer Sumbul, Mahdyar Ravanbakhsh, Begüm Demir

Abstract: Learning the similarity between remote sensing (RS) images forms the foundation for content-based RS image retrieval (CBIR). Recently, deep metric learning approaches that map the semantic similarity of images into an embedding (metric) space have been found very popular in RS. A common approach for learning the metric space relies on the selection of triplets of similar (positive) and dissimilar… ▽ More Learning the similarity between remote sensing (RS) images forms the foundation for content-based RS image retrieval (CBIR). Recently, deep metric learning approaches that map the semantic similarity of images into an embedding (metric) space have been found very popular in RS. A common approach for learning the metric space relies on the selection of triplets of similar (positive) and dissimilar (negative) images to a reference image called as an anchor. Choosing triplets is a difficult task particularly for multi-label RS CBIR, where each training image is annotated by multiple class labels. To address this problem, in this paper we propose a novel triplet sampling method in the framework of deep neural networks (DNNs) defined for multi-label RS CBIR problems. The proposed method selects a small set of the most representative and informative triplets based on two main steps. In the first step, a set of anchors that are diverse to each other in the embedding space is selected from the current mini-batch using an iterative algorithm. In the second step, different sets of positive and negative images are chosen for each anchor by evaluating the relevancy, hardness and diversity of the images among each other based on a novel strategy. Experimental results obtained on two multi-label benchmark archives show that the selection of the most informative and representative triplets in the context of DNNs results in: i) reducing the computational complexity of the training phase of the DNNs without any significant loss on the performance; and ii) an increase in learning speed since informative triplets allow fast convergence. The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/image-retrieval-from-triplets. △ Less

Submitted 9 November, 2021; v1 submitted 8 May, 2021; originally announced May 2021.

Comments: Accepted at the IEEE Transactions on Geoscience and Remote Sensing. Our code is available online at https://git.tu-berlin.de/rsim/image-retrieval-from-triplets

arXiv:2009.13935 [pdf, other]

doi 10.1109/IGARSS39084.2020.9323583

A Comparative Study of Deep Learning Loss Functions for Multi-Label Remote Sensing Image Classification

Authors: Hichame Yessou, Gencer Sumbul, Begüm Demir

Abstract: This paper analyzes and compares different deep learning loss functions in the framework of multi-label remote sensing (RS) image scene classification problems. We consider seven loss functions: 1) cross-entropy loss; 2) focal loss; 3) weighted cross-entropy loss; 4) Hamming loss; 5) Huber loss; 6) ranking loss; and 7) sparseMax loss. All the considered loss functions are analyzed for the first ti… ▽ More This paper analyzes and compares different deep learning loss functions in the framework of multi-label remote sensing (RS) image scene classification problems. We consider seven loss functions: 1) cross-entropy loss; 2) focal loss; 3) weighted cross-entropy loss; 4) Hamming loss; 5) Huber loss; 6) ranking loss; and 7) sparseMax loss. All the considered loss functions are analyzed for the first time in RS. After a theoretical analysis, an experimental analysis is carried out to compare the considered loss functions in terms of their: 1) overall accuracy; 2) class imbalance awareness (for which the number of samples associated to each class significantly varies); 3) convexibility and differentiability; and 4) learning efficiency (i.e., convergence speed). On the basis of our analysis, some guidelines are derived for a proper selection of a loss function in multi-label RS scene classification problems. △ Less

Submitted 29 September, 2020; originally announced September 2020.

Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2020. For code visit: https://gitlab.tubit.tu-berlin.de/rsim/RS-MLC-Losses

arXiv:2006.11529 [pdf, other]

doi 10.1109/TGRS.2020.3007523

Remote Sensing Image Scene Classification with Deep Neural Networks in JPEG 2000 Compressed Domain

Authors: Akshara Preethy Byju, Gencer Sumbul, Begüm Demir, Lorenzo Bruzzone

Abstract: To reduce the storage requirements, remote sensing (RS) images are usually stored in compressed format. Existing scene classification approaches using deep neural networks (DNNs) require to fully decompress the images, which is a computationally demanding task in operational applications. To address this issue, in this paper we propose a novel approach to achieve scene classification in JPEG 2000… ▽ More To reduce the storage requirements, remote sensing (RS) images are usually stored in compressed format. Existing scene classification approaches using deep neural networks (DNNs) require to fully decompress the images, which is a computationally demanding task in operational applications. To address this issue, in this paper we propose a novel approach to achieve scene classification in JPEG 2000 compressed RS images. The proposed approach consists of two main steps: i) approximation of the finer resolution sub-bands of reversible biorthogonal wavelet filters used in JPEG 2000; and ii) characterization of the high-level semantic content of approximated wavelet sub-bands and scene classification based on the learnt descriptors. This is achieved by taking codestreams associated with the coarsest resolution wavelet sub-band as input to approximate finer resolution sub-bands using a number of transposed convolutional layers. Then, a series of convolutional layers models the high-level semantic content of the approximated wavelet sub-band. Thus, the proposed approach models the multiresolution paradigm given in the JPEG 2000 compression algorithm in an end-to-end trainable unified neural network. In the classification stage, the proposed approach takes only the coarsest resolution wavelet sub-bands as input, thereby reducing the time required to apply decoding. Experimental results performed on two benchmark aerial image archives demonstrate that the proposed approach significantly reduces the computational time with similar classification accuracies when compared to traditional RS scene classification approaches (which requires full image decompression). △ Less

Submitted 15 December, 2020; v1 submitted 20 June, 2020; originally announced June 2020.

Comments: Accepted to IEEE Transactions on Geoscience and Remote Sensing

arXiv:2006.08432 [pdf, other]

doi 10.1109/TGRS.2020.3031111

SD-RSIC: Summarization Driven Deep Remote Sensing Image Captioning

Authors: Gencer Sumbul, Sonali Nayak, Begüm Demir

Abstract: Deep neural networks (DNNs) have been recently found popular for image captioning problems in remote sensing (RS). Existing DNN based approaches rely on the availability of a training set made up of a high number of RS images with their captions. However, captions of training images may contain redundant information (they can be repetitive or semantically similar to each other), resulting in infor… ▽ More Deep neural networks (DNNs) have been recently found popular for image captioning problems in remote sensing (RS). Existing DNN based approaches rely on the availability of a training set made up of a high number of RS images with their captions. However, captions of training images may contain redundant information (they can be repetitive or semantically similar to each other), resulting in information deficiency while learning a map** from the image domain to the language domain. To overcome this limitation, in this paper, we present a novel Summarization Driven Remote Sensing Image Captioning (SD-RSIC) approach. The proposed approach consists of three main steps. The first step obtains the standard image captions by jointly exploiting convolutional neural networks (CNNs) with long short-term memory (LSTM) networks. The second step, unlike the existing RS image captioning methods, summarizes the ground-truth captions of each training image into a single caption by exploiting sequence to sequence neural networks and eliminates the redundancy present in the training set. The third step automatically defines the adaptive weights associated to each RS image to combine the standard captions with the summarized captions based on the semantic content of the image. This is achieved by a novel adaptive weighting strategy defined in the context of LSTM networks. Experimental results obtained on the RSCID, UCM-Captions and Sydney-Captions datasets show the effectiveness of the proposed approach compared to the state-of-the-art RS image captioning approaches. The code of the proposed approach is publicly available at https://gitlab.tubit.tu-berlin.de/rsim/SD-RSIC. △ Less

Submitted 13 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: Accepted in the IEEE Transactions on Geoscience and Remote Sensing. For code visit: https://gitlab.tubit.tu-berlin.de/rsim/SD-RSIC

arXiv:2004.01613 [pdf, other]

Deep Learning for Image Search and Retrieval in Large Remote Sensing Archives

Authors: Gencer Sumbul, Jian Kang, Begüm Demir

Abstract: This chapter presents recent advances in content based image search and retrieval (CBIR) systems in remote sensing (RS) for fast and accurate information discovery from massive data archives. Initially, we analyze the limitations of the traditional CBIR systems that rely on the hand-crafted RS image descriptors. Then, we focus our attention on the advances in RS CBIR systems for which deep learnin… ▽ More This chapter presents recent advances in content based image search and retrieval (CBIR) systems in remote sensing (RS) for fast and accurate information discovery from massive data archives. Initially, we analyze the limitations of the traditional CBIR systems that rely on the hand-crafted RS image descriptors. Then, we focus our attention on the advances in RS CBIR systems for which deep learning (DL) models are at the forefront. In particular, we present the theoretical properties of the most recent DL based CBIR systems for the characterization of the complex semantic content of RS images. After discussing their strengths and limitations, we present the deep hashing based CBIR systems that have high time-efficient search capability within huge data archives. Finally, the most promising research directions in RS CBIR are discussed. △ Less

Submitted 3 July, 2020; v1 submitted 3 April, 2020; originally announced April 2020.

Comments: To appear as a book chapter in "Deep Learning for the Earth Sciences", John Wiley & Sons, 2020

arXiv:2001.06372

BigEarthNet Dataset with A New Class-Nomenclature for Remote Sensing Image Understanding

Authors: Gencer Sumbul, Jian Kang, Tristan Kreuziger, Filipe Marcelino, Hugo Costa, Pedro Benevides, Mario Caetano, Begüm Demir

Abstract: This paper presents BigEarthNet that is a large-scale Sentinel-2 multispectral image dataset with a new class nomenclature to advance deep learning (DL) studies in remote sensing (RS). BigEarthNet is made up of 590,326 image patches annotated with multi-labels provided by the CORINE Land Cover (CLC) map of 2018 based on its most thematic detailed Level-3 class nomenclature. Initial research demons… ▽ More This paper presents BigEarthNet that is a large-scale Sentinel-2 multispectral image dataset with a new class nomenclature to advance deep learning (DL) studies in remote sensing (RS). BigEarthNet is made up of 590,326 image patches annotated with multi-labels provided by the CORINE Land Cover (CLC) map of 2018 based on its most thematic detailed Level-3 class nomenclature. Initial research demonstrates that some CLC classes are challenging to be accurately described by considering only Sentinel-2 images. To increase the effectiveness of BigEarthNet, in this paper we introduce an alternative class-nomenclature to allow DL models for better learning and describing the complex spatial and spectral information content of the Sentinel-2 images. This is achieved by interpreting and arranging the CLC Level-3 nomenclature based on the properties of Sentinel-2 images in a new nomenclature of 19 classes. Then, the new class-nomenclature of BigEarthNet is used within state-of-the-art DL models in the context of multi-label classification. Results show that the models trained from scratch on BigEarthNet outperform those pre-trained on ImageNet, especially in relation to some complex classes including agriculture, other vegetated and natural environments. All DL models are made publicly available at http://bigearth.net/#downloads, offering an important resource to guide future progress on RS image analysis. △ Less

Submitted 13 June, 2021; v1 submitted 17 January, 2020; originally announced January 2020.

Comments: This paper has been withdrawn by the authors. This paper has been superseded by arXiv:2105.07921

arXiv:1912.06013 [pdf, other]

doi 10.1109/M2GARSS47143.2020.9105165

An Approach to Super-Resolution of Sentinel-2 Images Based on Generative Adversarial Networks

Authors: Kexin Zhang, Gencer Sumbul, Begüm Demir

Abstract: This paper presents a generative adversarial network based super-resolution (SR) approach (which is called as S2GAN) to enhance the spatial resolution of Sentinel-2 spectral bands. The proposed approach consists of two main steps. The first step aims to increase the spatial resolution of the bands with 20m and 60m spatial resolutions by the scaling factors of 2 and 6, respectively. To this end, we… ▽ More This paper presents a generative adversarial network based super-resolution (SR) approach (which is called as S2GAN) to enhance the spatial resolution of Sentinel-2 spectral bands. The proposed approach consists of two main steps. The first step aims to increase the spatial resolution of the bands with 20m and 60m spatial resolutions by the scaling factors of 2 and 6, respectively. To this end, we introduce a generator network that performs SR on the lower resolution bands with the guidance of the bands associated to 10m spatial resolution by utilizing the convolutional layers with residual connections and a long skip-connection between inputs and outputs. The second step aims to distinguish SR bands from their ground truth bands. This is achieved by the proposed discriminator network, which alternately characterizes the high level features of the two sets of bands and applying binary classification on the extracted features. Then, we formulate the adversarial learning of the generator and discriminator networks as a min-max game. In this learning procedure, the generator aims to produce realistic SR bands as much as possible so that the discriminator incorrectly classifies SR bands. Experimental results obtained on different Sentinel-2 images show the effectiveness of the proposed approach compared to both conventional and deep learning based SR approaches. △ Less

Submitted 7 February, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

Comments: Accepted at IEEE Mediterranean and Middle-East Geoscience and Remote Sensing Symposium (M2GARSS) 2020

arXiv:1902.11274 [pdf, other]

doi 10.1109/IGARSS.2019.8898188

A Novel Multi-Attention Driven System For Multi-Label Remote Sensing Image Classification

Authors: Gencer Sumbul, Begüm Demir

Abstract: This paper presents a novel multi-attention driven system that jointly exploits Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) in the context of multi-label remote sensing (RS) image classification. The proposed system consists of four main modules. The first module aims to extract preliminary local descriptors of RS image bands that can be associated to different spatial re… ▽ More This paper presents a novel multi-attention driven system that jointly exploits Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) in the context of multi-label remote sensing (RS) image classification. The proposed system consists of four main modules. The first module aims to extract preliminary local descriptors of RS image bands that can be associated to different spatial resolutions. To this end, we introduce a K-Branch CNN, in which each branch extracts descriptors of image bands that have the same spatial resolution. The second module aims to model spatial relationship among local descriptors. This is achieved by a bidirectional RNN architecture, in which Long Short-Term Memory nodes enrich local descriptors by considering spatial relationships of local areas (image patches). The third module aims to define multiple attention scores for local descriptors. This is achieved by a novel patch-based multi-attention mechanism that takes into account the joint occurrence of multiple land-cover classes and provides the attention-based local descriptors. The last module exploits these descriptors for multi-label RS image classification. Experimental results obtained on the BigEarthNet that is a large-scale Sentinel-2 benchmark archive show the effectiveness of the proposed method compared to a state of the art method. △ Less

Submitted 29 May, 2019; v1 submitted 28 February, 2019; originally announced February 2019.

Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2019

arXiv:1902.06148 [pdf, other]

doi 10.1109/IGARSS.2019.8900532

BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding

Authors: Gencer Sumbul, Marcela Charfuelan, Begüm Demir, Volker Markl

Abstract: This paper presents the BigEarthNet that is a new large-scale multi-label Sentinel-2 benchmark archive. The BigEarthNet consists of 590,326 Sentinel-2 image patches, each of which is a section of i) 120x120 pixels for 10m bands; ii) 60x60 pixels for 20m bands; and iii) 20x20 pixels for 60m bands. Unlike most of the existing archives, each image patch is annotated by multiple land-cover classes (i.… ▽ More This paper presents the BigEarthNet that is a new large-scale multi-label Sentinel-2 benchmark archive. The BigEarthNet consists of 590,326 Sentinel-2 image patches, each of which is a section of i) 120x120 pixels for 10m bands; ii) 60x60 pixels for 20m bands; and iii) 20x20 pixels for 60m bands. Unlike most of the existing archives, each image patch is annotated by multiple land-cover classes (i.e., multi-labels) that are provided from the CORINE Land Cover database of the year 2018 (CLC 2018). The BigEarthNet is significantly larger than the existing archives in remote sensing (RS) and thus is much more convenient to be used as a training source in the context of deep learning. This paper first addresses the limitations of the existing archives and then describes the properties of the BigEarthNet. Experimental results obtained in the framework of RS image scene classification problems show that a shallow Convolutional Neural Network (CNN) architecture trained on the BigEarthNet provides much higher accuracy compared to a state-of-the-art CNN model pre-trained on the ImageNet (which is a very popular large-scale benchmark archive in computer vision). The BigEarthNet opens up promising directions to advance operational RS applications and research in massive Sentinel-2 image archives. △ Less

Submitted 6 June, 2019; v1 submitted 16 February, 2019; originally announced February 2019.

Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2019

arXiv:1901.06403 [pdf, other]

doi 10.1109/TGRS.2019.2894425

Multisource Region Attention Network for Fine-Grained Object Recognition in Remote Sensing Imagery

Authors: Gencer Sumbul, Ramazan Gokberk Cinbis, Selim Aksoy

Abstract: Fine-grained object recognition concerns the identification of the type of an object among a large number of closely related sub-categories. Multisource data analysis, that aims to leverage the complementary spectral, spatial, and structural information embedded in different sources, is a promising direction towards solving the fine-grained recognition problem that involves low between-class varia… ▽ More Fine-grained object recognition concerns the identification of the type of an object among a large number of closely related sub-categories. Multisource data analysis, that aims to leverage the complementary spectral, spatial, and structural information embedded in different sources, is a promising direction towards solving the fine-grained recognition problem that involves low between-class variance, small training set sizes for rare classes, and class imbalance. However, the common assumption of co-registered sources may not hold at the pixel level for small objects of interest. We present a novel methodology that aims to simultaneously learn the alignment of multisource data and the classification model in a unified framework. The proposed method involves a multisource region attention network that computes per-source feature representations, assigns attention scores to candidate regions sampled around the expected object locations by using these representations, and classifies the objects by using an attention-driven multisource representation that combines the feature representations and the attention scores from all sources. All components of the model are realized using deep neural networks and are learned in an end-to-end fashion. Experiments using RGB, multispectral, and LiDAR elevation data for classification of street trees showed that our approach achieved 64.2% and 47.3% accuracies for the 18-class and 40-class settings, respectively, which correspond to 13% and 14.3% improvement relative to the commonly used feature concatenation approach from multiple sources. △ Less

Submitted 18 January, 2019; originally announced January 2019.

Comments: G. Sumbul, R. G. Cinbis, S. Aksoy, "Multisource Region Attention Network for Fine-Grained Object Recognition in Remote Sensing Imagery", IEEE Transactions on Geoscience and Remote Sensing (TGRS), in press, 2019

arXiv:1712.03323 [pdf, other]

doi 10.1109/TGRS.2017.2754648

Fine-Grained Object Recognition and Zero-Shot Learning in Remote Sensing Imagery

Authors: Gencer Sumbul, Ramazan Gokberk Cinbis, Selim Aksoy

Abstract: Fine-grained object recognition that aims to identify the type of an object among a large number of subcategories is an emerging application with the increasing resolution that exposes new details in image data. Traditional fully supervised algorithms fail to handle this problem where there is low between-class variance and high within-class variance for the classes of interest with small sample s… ▽ More Fine-grained object recognition that aims to identify the type of an object among a large number of subcategories is an emerging application with the increasing resolution that exposes new details in image data. Traditional fully supervised algorithms fail to handle this problem where there is low between-class variance and high within-class variance for the classes of interest with small sample sizes. We study an even more extreme scenario named zero-shot learning (ZSL) in which no training example exists for some of the classes. ZSL aims to build a recognition model for new unseen categories by relating them to seen classes that were previously learned. We establish this relation by learning a compatibility function between image features extracted via a convolutional neural network and auxiliary information that describes the semantics of the classes of interest by using training samples from the seen classes. Then, we show how knowledge transfer can be performed for the unseen classes by maximizing this function during inference. We introduce a new data set that contains 40 different types of street trees in 1-ft spatial resolution aerial data, and evaluate the performance of this model with manually annotated attributes, a natural language model, and a scientific taxonomy as auxiliary information. The experiments show that the proposed model achieves 14.3% recognition accuracy for the classes with no training examples, which is significantly better than a random guess accuracy of 6.3% for 16 test classes, and three other ZSL algorithms. △ Less

Submitted 8 December, 2017; originally announced December 2017.

Comments: G. Sumbul, R. G. Cinbis, S. Aksoy, "Fine-Grained Object Recognition and Zero-Shot Learning in Remote Sensing Imagery", IEEE Transactions on Geoscience and Remote Sensing (TGRS), in press, 2017

Showing 1–24 of 24 results for author: Sumbul, G