Search | arXiv e-print repository

UrbanSARFloods: Sentinel-1 SLC-Based Benchmark Dataset for Urban and Open-Area Flood Map**

Authors: Jie Zhao, Zhitong Xiong, Xiao Xiang Zhu

Abstract: Due to its cloud-penetrating capability and independence from solar illumination, satellite Synthetic Aperture Radar (SAR) is the preferred data source for large-scale flood map**, providing global coverage and including various land cover classes. However, most studies on large-scale SAR-derived flood map** using deep learning algorithms have primarily focused on flooded open areas, utilizing… ▽ More Due to its cloud-penetrating capability and independence from solar illumination, satellite Synthetic Aperture Radar (SAR) is the preferred data source for large-scale flood map**, providing global coverage and including various land cover classes. However, most studies on large-scale SAR-derived flood map** using deep learning algorithms have primarily focused on flooded open areas, utilizing available open-access datasets (e.g., Sen1Floods11) and with limited attention to urban floods. To address this gap, we introduce \textbf{UrbanSARFloods}, a floodwater dataset featuring pre-processed Sentinel-1 intensity data and interferometric coherence imagery acquired before and during flood events. It contains 8,879 $512\times 512$ chips covering 807,500 $km^2$ across 20 land cover classes and 5 continents, spanning 18 flood events. We used UrbanSARFloods to benchmark existing state-of-the-art convolutional neural networks (CNNs) for segmenting open and urban flood areas. Our findings indicate that prevalent approaches, including the Weighted Cross-Entropy (WCE) loss and the application of transfer learning with pretrained models, fall short in overcoming the obstacles posed by imbalanced data and the constraints of a small training dataset. Urban flood detection remains challenging. Future research should explore strategies for addressing imbalanced data challenges and investigate transfer learning's potential for SAR-based large-scale flood map**. Besides, expanding this dataset to include additional flood events holds promise for enhancing its utility and contributing to advancements in flood map** techniques. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Accepted by CVPR 2024 EarthVision Workshop

arXiv:2405.04285 [pdf, other]

On the Foundations of Earth and Climate Foundation Models

Authors: Xiao Xiang Zhu, Zhitong Xiong, Yi Wang, Adam J. Stewart, Konrad Heidler, Yuanyuan Wang, Zhenghang Yuan, Thomas Dujardin, Qingsong Xu, Yilei Shi

Abstract: Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model. Crafting the ideal Earth foundation model, we define eleven features which would allow such a foundation model to be beneficial for any geoscientific downstream application in an en… ▽ More Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model. Crafting the ideal Earth foundation model, we define eleven features which would allow such a foundation model to be beneficial for any geoscientific downstream application in an environmental- and human-centric manner.We further shed light on the way forward to achieve the ideal model and to evaluate Earth foundation models. What comes after foundation models? Energy efficient adaptation, adversarial defenses, and interpretability are among the emerging directions. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2309.16499 [pdf, other]

Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for Cross-City Semantic Segmentation using High-Resolution Domain Adaptation Networks

Authors: Danfeng Hong, Bing Zhang, Hao Li, Yuxuan Li, **g Yao, Chenyu Li, Martin Werner, Jocelyn Chanussot, Alexander Zipf, Xiao Xiang Zhu

Abstract: Artificial intelligence (AI) approaches nowadays have gained remarkable success in single-modality-dominated remote sensing (RS) applications, especially with an emphasis on individual urban environments (e.g., single cities or regions). Yet these AI models tend to meet the performance bottleneck in the case studies across cities or regions, due to the lack of diverse RS information and cutting-ed… ▽ More Artificial intelligence (AI) approaches nowadays have gained remarkable success in single-modality-dominated remote sensing (RS) applications, especially with an emphasis on individual urban environments (e.g., single cities or regions). Yet these AI models tend to meet the performance bottleneck in the case studies across cities or regions, due to the lack of diverse RS information and cutting-edge solutions with high generalization ability. To this end, we build a new set of multimodal remote sensing benchmark datasets (including hyperspectral, multispectral, SAR) for the study purpose of the cross-city semantic segmentation task (called C2Seg dataset), which consists of two cross-city scenes, i.e., Berlin-Augsburg (in Germany) and Bei**g-Wuhan (in China). Beyond the single city, we propose a high-resolution domain adaptation network, HighDAN for short, to promote the AI model's generalization ability from the multi-city environments. HighDAN is capable of retaining the spatially topological structure of the studied urban scene well in a parallel high-to-low resolution fusion fashion but also closing the gap derived from enormous differences of RS image representations between different cities by means of adversarial learning. In addition, the Dice loss is considered in HighDAN to alleviate the class imbalance issue caused by factors across cities. Extensive experiments conducted on the C2Seg dataset show the superiority of our HighDAN in terms of segmentation performance and generalization ability, compared to state-of-the-art competitors. The C2Seg dataset and the semantic segmentation toolbox (involving the proposed HighDAN) will be available publicly at https://github.com/danfenghong. △ Less

Submitted 3 October, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.16468 [pdf, other]

HyperLISTA-ABT: An Ultra-light Unfolded Network for Accurate Multi-component Differential Tomographic SAR Inversion

Authors: Kun Qian, Yuanyuan Wang, Peter Jung, Yilei Shi, Xiao Xiang Zhu

Abstract: Deep neural networks based on unrolled iterative algorithms have achieved remarkable success in sparse reconstruction applications, such as synthetic aperture radar (SAR) tomographic inversion (TomoSAR). However, the currently available deep learning-based TomoSAR algorithms are limited to three-dimensional (3D) reconstruction. The extension of deep learning-based algorithms to four-dimensional (4… ▽ More Deep neural networks based on unrolled iterative algorithms have achieved remarkable success in sparse reconstruction applications, such as synthetic aperture radar (SAR) tomographic inversion (TomoSAR). However, the currently available deep learning-based TomoSAR algorithms are limited to three-dimensional (3D) reconstruction. The extension of deep learning-based algorithms to four-dimensional (4D) imaging, i.e., differential TomoSAR (D-TomoSAR) applications, is impeded mainly due to the high-dimensional weight matrices required by the network designed for D-TomoSAR inversion, which typically contain millions of freely trainable parameters. Learning such huge number of weights requires an enormous number of training samples, resulting in a large memory burden and excessive time consumption. To tackle this issue, we propose an efficient and accurate algorithm called HyperLISTA-ABT. The weights in HyperLISTA-ABT are determined in an analytical way according to a minimum coherence criterion, trimming the model down to an ultra-light one with only three hyperparameters. Additionally, HyperLISTA-ABT improves the global thresholding by utilizing an adaptive blockwise thresholding scheme, which applies block-coordinate techniques and conducts thresholding in local blocks, so that weak expressions and local features can be retained in the shrinkage step layer by layer. Simulations were performed and demonstrated the effectiveness of our approach, showing that HyperLISTA-ABT achieves superior computational efficiency and with no significant performance degradation compared to state-of-the-art methods. Real data experiments showed that a high-quality 4D point cloud could be reconstructed over a large area by the proposed HyperLISTA-ABT with affordable computational resources and in a fast time. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.11109 [pdf, other]

Self-supervised Domain-agnostic Domain Adaptation for Satellite Images

Authors: Fahong Zhang, Yilei Shi, Xiao Xiang Zhu

Abstract: Domain shift caused by, e.g., different geographical regions or acquisition conditions is a common issue in machine learning for global scale satellite image processing. A promising method to address this problem is domain adaptation, where the training and the testing datasets are split into two or multiple domains according to their distributions, and an adaptation method is applied to improve t… ▽ More Domain shift caused by, e.g., different geographical regions or acquisition conditions is a common issue in machine learning for global scale satellite image processing. A promising method to address this problem is domain adaptation, where the training and the testing datasets are split into two or multiple domains according to their distributions, and an adaptation method is applied to improve the generalizability of the model on the testing dataset. However, defining the domain to which each satellite image belongs is not trivial, especially under large-scale multi-temporal and multi-sensory scenarios, where a single image mosaic could be generated from multiple data sources. In this paper, we propose an self-supervised domain-agnostic domain adaptation (SS(DA)2) method to perform domain adaptation without such a domain definition. To achieve this, we first design a contrastive generative adversarial loss to train a generative network to perform image-to-image translation between any two satellite image patches. Then, we improve the generalizability of the downstream models by augmenting the training data with different testing spectral characteristics. The experimental results on public benchmarks verify the effectiveness of SS(DA)2. △ Less

Submitted 25 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

arXiv:2308.01146 [pdf, other]

UCDFormer: Unsupervised Change Detection Using a Transformer-driven Image Translation

Authors: Qingsong Xu, Yilei Shi, Jianhua Guo, Chaojun Ouyang, Xiao Xiang Zhu

Abstract: Change detection (CD) by comparing two bi-temporal images is a crucial task in remote sensing. With the advantages of requiring no cumbersome labeled change information, unsupervised CD has attracted extensive attention in the community. However, existing unsupervised CD approaches rarely consider the seasonal and style differences incurred by the illumination and atmospheric conditions in multi-t… ▽ More Change detection (CD) by comparing two bi-temporal images is a crucial task in remote sensing. With the advantages of requiring no cumbersome labeled change information, unsupervised CD has attracted extensive attention in the community. However, existing unsupervised CD approaches rarely consider the seasonal and style differences incurred by the illumination and atmospheric conditions in multi-temporal images. To this end, we propose a change detection with domain shift setting for remote sensing images. Furthermore, we present a novel unsupervised CD method using a light-weight transformer, called UCDFormer. Specifically, a transformer-driven image translation composed of a light-weight transformer and a domain-specific affinity weight is first proposed to mitigate domain shift between two images with real-time efficiency. After image translation, we can generate the difference map between the translated before-event image and the original after-event image. Then, a novel reliable pixel extraction module is proposed to select significantly changed/unchanged pixel positions by fusing the pseudo change maps of fuzzy c-means clustering and adaptive threshold. Finally, a binary change map is obtained based on these selected pixel pairs and a binary classifier. Experimental results on different unsupervised CD tasks with seasonal and style changes demonstrate the effectiveness of the proposed UCDFormer. For example, compared with several other related methods, UCDFormer improves performance on the Kappa coefficient by more than 12\%. In addition, UCDFormer achieves excellent performance for earthquake-induced landslide detection when considering large-scale applications. The code is available at \url{https://github.com/zhu-xlab/UCDFormer} △ Less

Submitted 2 August, 2023; originally announced August 2023.

Comments: 16 pages, 7 figures, IEEE Transactions on Geoscience and Remote Sensing

arXiv:2307.03461 [pdf, other]

A Deep Active Contour Model for Delineating Glacier Calving Fronts

Authors: Konrad Heidler, Lichao Mou, Erik Loebel, Mirko Scheinert, Sébastien Lefèvre, Xiao Xiang Zhu

Abstract: Choosing how to encode a real-world problem as a machine learning task is an important design decision in machine learning. The task of glacier calving front modeling has often been approached as a semantic segmentation task. Recent studies have shown that combining segmentation with edge detection can improve the accuracy of calving front detectors. Building on this observation, we completely rep… ▽ More Choosing how to encode a real-world problem as a machine learning task is an important design decision in machine learning. The task of glacier calving front modeling has often been approached as a semantic segmentation task. Recent studies have shown that combining segmentation with edge detection can improve the accuracy of calving front detectors. Building on this observation, we completely rephrase the task as a contour tracing problem and propose a model for explicit contour detection that does not incorporate any dense predictions as intermediate steps. The proposed approach, called ``Charting Outlines by Recurrent Adaptation'' (COBRA), combines Convolutional Neural Networks (CNNs) for feature extraction and active contour models for the delineation. By training and evaluating on several large-scale datasets of Greenland's outlet glaciers, we show that this approach indeed outperforms the aforementioned methods based on segmentation and edge-detection. Finally, we demonstrate that explicit contour detection has benefits over pixel-wise methods when quantifying the models' prediction uncertainties. The project page containing the code and animated model predictions can be found at \url{https://khdlr.github.io/COBRA/}. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: This work has been accepted by IEEE TGRS for publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2305.14209 [pdf, other]

Basis Pursuit Denoising via Recurrent Neural Network Applied to Super-resolving SAR Tomography

Authors: Kun Qian, Yuanyuan Wang, Peter Jung, Yilei Shi, Xiao Xiang Zhu

Abstract: Finding sparse solutions of underdetermined linear systems commonly requires the solving of L1 regularized least squares minimization problem, which is also known as the basis pursuit denoising (BPDN). They are computationally expensive since they cannot be solved analytically. An emerging technique known as deep unrolling provided a good combination of the descriptive ability of neural networks,… ▽ More Finding sparse solutions of underdetermined linear systems commonly requires the solving of L1 regularized least squares minimization problem, which is also known as the basis pursuit denoising (BPDN). They are computationally expensive since they cannot be solved analytically. An emerging technique known as deep unrolling provided a good combination of the descriptive ability of neural networks, explainable, and computational efficiency for BPDN. Many unrolled neural networks for BPDN, e.g. learned iterative shrinkage thresholding algorithm and its variants, employ shrinkage functions to prune elements with small magnitude. Through experiments on synthetic aperture radar tomography (TomoSAR), we discover the shrinkage step leads to unavoidable information loss in the dynamics of networks and degrades the performance of the model. We propose a recurrent neural network (RNN) with novel sparse minimal gated units (SMGUs) to solve the information loss issue. The proposed RNN architecture with SMGUs benefits from incorporating historical information into optimization, and thus effectively preserves full information in the final output. Taking TomoSAR inversion as an example, extensive simulations demonstrated that the proposed RNN outperforms the state-of-the-art deep learning-based algorithm in terms of super-resolution power as well as generalization ability. It achieved a 10% to 20% higher double scatterers detection rate and is less sensitive to phase and amplitude ratio differences between scatterers. Test on real TerraSAR-X spotlight images also shows a high-quality 3-D reconstruction of the test site. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.08413 [pdf, other]

Artificial intelligence to advance Earth observation: a perspective

Authors: Devis Tuia, Konrad Schindler, Begüm Demir, Gustau Camps-Valls, Xiao Xiang Zhu, Mrinalini Kochupillai, Sašo Džeroski, Jan N. van Rijn, Holger H. Hoos, Fabio Del Frate, Mihai Datcu, Jorge-Arnulfo Quiané-Ruiz, Volker Markl, Bertrand Le Saux, Rochelle Schneider

Abstract: Earth observation (EO) is a prime instrument for monitoring land and ocean processes, studying the dynamics at work, and taking the pulse of our planet. This article gives a bird's eye view of the essential scientific tools and approaches informing and supporting the transition from raw EO data to usable EO-based information. The promises, as well as the current challenges of these developments, a… ▽ More Earth observation (EO) is a prime instrument for monitoring land and ocean processes, studying the dynamics at work, and taking the pulse of our planet. This article gives a bird's eye view of the essential scientific tools and approaches informing and supporting the transition from raw EO data to usable EO-based information. The promises, as well as the current challenges of these developments, are highlighted under dedicated sections. Specifically, we cover the impact of (i) Computer vision; (ii) Machine learning; (iii) Advanced processing and computing; (iv) Knowledge-based AI; (v) Explainable AI and causal inference; (vi) Physics-aware models; (vii) User-centric approaches; and (viii) the much-needed discussion of ethical and societal issues related to the massive use of ML technologies in EO. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2305.03529 [pdf, other]

doi 10.1016/j.ophoto.2023.100044

Deep Unsupervised Learning for 3D ALS Point Cloud Change Detection

Authors: Iris de Gélis, Sudipan Saha, Muhammad Shahzad, Thomas Corpetti, Sébastien Lefèvre, Xiao Xiang Zhu

Abstract: Change detection from traditional \added{2D} optical images has limited capability to model the changes in the height or shape of objects. Change detection using 3D point cloud \added{from photogrammetry or LiDAR surveying} can fill this gap by providing critical depth information. While most existing machine learning based 3D point cloud change detection methods are supervised, they severely depe… ▽ More Change detection from traditional \added{2D} optical images has limited capability to model the changes in the height or shape of objects. Change detection using 3D point cloud \added{from photogrammetry or LiDAR surveying} can fill this gap by providing critical depth information. While most existing machine learning based 3D point cloud change detection methods are supervised, they severely depend on the availability of annotated training data, which is in practice a critical point. To circumnavigate this dependence, we propose an unsupervised 3D point cloud change detection method mainly based on self-supervised learning using deep clustering and contrastive learning. The proposed method also relies on an adaptation of deep change vector analysis to 3D point cloud via nearest point comparison. Experiments conducted on \added{an aerial LiDAR survey dataset} show that the proposed method obtains higher performance in comparison to the traditional unsupervised methods, with a gain of about 9\% in mean accuracy (to reach more than 85\%). Thus, it appears to be a relevant choice in scenario where prior knowledge (labels) is not ensured. △ Less

Submitted 15 December, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

Comments: This work has been accepted to Elsevier for publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Journal ref: ISPRS Open Journal of Photogrammetry and Remote Sensing Volume 9, August 2023, 100044

arXiv:2304.05464 [pdf, other]

UnCRtainTS: Uncertainty Quantification for Cloud Removal in Optical Satellite Time Series

Authors: Patrick Ebel, Vivien Sainte Fare Garnot, Michael Schmitt, Jan Dirk Wegner, Xiao Xiang Zhu

Abstract: Clouds and haze often occlude optical satellite images, hindering continuous, dense monitoring of the Earth's surface. Although modern deep learning methods can implicitly learn to ignore such occlusions, explicit cloud removal as pre-processing enables manual interpretation and allows training models when only few annotations are available. Cloud removal is challenging due to the wide range of oc… ▽ More Clouds and haze often occlude optical satellite images, hindering continuous, dense monitoring of the Earth's surface. Although modern deep learning methods can implicitly learn to ignore such occlusions, explicit cloud removal as pre-processing enables manual interpretation and allows training models when only few annotations are available. Cloud removal is challenging due to the wide range of occlusion scenarios -- from scenes partially visible through haze, to completely opaque cloud coverage. Furthermore, integrating reconstructed images in downstream applications would greatly benefit from trustworthy quality assessment. In this paper, we introduce UnCRtainTS, a method for multi-temporal cloud removal combining a novel attention-based architecture, and a formulation for multivariate uncertainty prediction. These two components combined set a new state-of-the-art performance in terms of image reconstruction on two public cloud removal datasets. Additionally, we show how the well-calibrated predicted uncertainties enable a precise control of the reconstruction quality. △ Less

Submitted 11 April, 2023; originally announced April 2023.

arXiv:2206.02850 [pdf, other]

GLF-CR: SAR-Enhanced Cloud Removal with Global-Local Fusion

Authors: Fang Xu, Yilei Shi, Patrick Ebel, Lei Yu, Gui-Song Xia, Wen Yang, Xiao Xiang Zhu

Abstract: The challenge of the cloud removal task can be alleviated with the aid of Synthetic Aperture Radar (SAR) images that can penetrate cloud cover. However, the large domain gap between optical and SAR images as well as the severe speckle noise of SAR images may cause significant interference in SAR-based cloud removal, resulting in performance degeneration. In this paper, we propose a novel global-lo… ▽ More The challenge of the cloud removal task can be alleviated with the aid of Synthetic Aperture Radar (SAR) images that can penetrate cloud cover. However, the large domain gap between optical and SAR images as well as the severe speckle noise of SAR images may cause significant interference in SAR-based cloud removal, resulting in performance degeneration. In this paper, we propose a novel global-local fusion based cloud removal (GLF-CR) algorithm to leverage the complementary information embedded in SAR images. Exploiting the power of SAR information to promote cloud removal entails two aspects. The first, global fusion, guides the relationship among all local optical windows to maintain the structure of the recovered region consistent with the remaining cloud-free regions. The second, local fusion, transfers complementary information embedded in the SAR image that corresponds to cloudy areas to generate reliable texture details of the missing regions, and uses dynamic filtering to alleviate the performance degradation caused by speckle noise. Extensive evaluation demonstrates that the proposed algorithm can yield high quality cloud-free images and outperform state-of-the-art cloud removal algorithms with a gain about 1.7dB in terms of PSNR on SEN12MS-CR dataset. △ Less

Submitted 9 August, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

arXiv:2112.04211 [pdf, other]

doi 10.1109/TGRS.2022.3164193

$\boldsymbolγ$-Net: Superresolving SAR Tomographic Inversion via Deep Learning

Authors: Kun Qian, Yuanyuan Wang, Yilei Shi, Xiao Xiang Zhu

Abstract: Synthetic aperture radar tomography (TomoSAR) has been extensively employed in 3-D reconstruction in dense urban areas using high-resolution SAR acquisitions. Compressive sensing (CS)-based algorithms are generally considered as the state of the art in super-resolving TomoSAR, in particular in the single look case. This superior performance comes at the cost of extra computational burdens, because… ▽ More Synthetic aperture radar tomography (TomoSAR) has been extensively employed in 3-D reconstruction in dense urban areas using high-resolution SAR acquisitions. Compressive sensing (CS)-based algorithms are generally considered as the state of the art in super-resolving TomoSAR, in particular in the single look case. This superior performance comes at the cost of extra computational burdens, because of the sparse reconstruction, which cannot be solved analytically and we need to employ computationally expensive iterative solvers. In this paper, we propose a novel deep learning-based super-resolving TomoSAR inversion approach, $\boldsymbolγ$-Net, to tackle this challenge. $\boldsymbolγ$-Net adopts advanced complex-valued learned iterative shrinkage thresholding algorithm (CV-LISTA) to mimic the iterative optimization step in sparse reconstruction. Simulations show the height estimate from a well-trained $\boldsymbolγ$-Net approaches the Cramér-Rao lower bound while improving the computational efficiency by 1 to 2 orders of magnitude comparing to the first-order CS-based methods. It also shows no degradation in the super-resolution power comparing to the state-of-the-art second-order TomoSAR solvers, which are much more computationally expensive than the first-order methods. Specifically, $\boldsymbolγ$-Net reaches more than $90\%$ detection rate in moderate super-resolving cases at 25 measurements at 6dB SNR. Moreover, simulation at limited baselines demonstrates that the proposed algorithm outperforms the second-order CS-based method by a fair margin. Test on real TerraSAR-X data with just 6 interferograms also shows high-quality 3-D reconstruction with high-density detected double scatterers. △ Less

Submitted 8 December, 2021; originally announced December 2021.

arXiv:2111.09460 [pdf, other]

Large-scale Building Height Retrieval from Single SAR Imagery based on Bounding Box Regression Networks

Authors: Yao Sun, Lichao Mou, Yuanyuan Wang, Sina Montazeri, Xiao Xiang Zhu

Abstract: Building height retrieval from synthetic aperture radar (SAR) imagery is of great importance for urban applications, yet highly challenging owing to the complexity of SAR data. This paper addresses the issue of building height retrieval in large-scale urban areas from a single TerraSAR-X spotlight or stripmap image. Based on the radar viewing geometry, we propose that this problem can be formulate… ▽ More Building height retrieval from synthetic aperture radar (SAR) imagery is of great importance for urban applications, yet highly challenging owing to the complexity of SAR data. This paper addresses the issue of building height retrieval in large-scale urban areas from a single TerraSAR-X spotlight or stripmap image. Based on the radar viewing geometry, we propose that this problem can be formulated as a bounding box regression problem and therefore allows for integrating height data from multiple data sources in generating ground truth on a larger scale. We introduce building footprints from geographic information system (GIS) data as complementary information and propose a bounding box regression network that exploits the location relationship between a building's footprint and its bounding box, allowing for fast computation. This is important for large-scale applications. The method is validated on four urban data sets using TerraSAR-X images in both high-resolution spotlight and stripmap modes. Experimental results show that the proposed network can reduce the computation cost significantly while kee** the height accuracy of individual buildings compared to a Faster R-CNN based method. Moreover, we investigate the impact of inaccurate GIS data on our proposed network, and this study shows that the bounding box regression network is robust against positioning errors in GIS data. The proposed method has great potential to be applied to regional or even global scales. △ Less

Submitted 17 November, 2021; originally announced November 2021.

arXiv:2105.00967 [pdf]

doi 10.1109/TGRS.2021.3069641

A lightweight deep learning based cloud detection method for Sentinel-2A imagery fusing multi-scale spectral and spatial features

Authors: Jun Li, Zhaocong Wu, Zhongwen Hu, Canliang Jian, Shaojie Luo, Lichao Mou, Xiao Xiang Zhu, Matthieu Molinier

Abstract: Clouds are a very important factor in the availability of optical remote sensing images. Recently, deep learning-based cloud detection methods have surpassed classical methods based on rules and physical models of clouds. However, most of these deep models are very large which limits their applicability and explainability, while other models do not make use of the full spectral information in mult… ▽ More Clouds are a very important factor in the availability of optical remote sensing images. Recently, deep learning-based cloud detection methods have surpassed classical methods based on rules and physical models of clouds. However, most of these deep models are very large which limits their applicability and explainability, while other models do not make use of the full spectral information in multi-spectral images such as Sentinel-2. In this paper, we propose a lightweight network for cloud detection, fusing multi-scale spectral and spatial features (CDFM3SF) and tailored for processing all spectral bands in Sentinel- 2A images. The proposed method consists of an encoder and a decoder. In the encoder, three input branches are designed to handle spectral bands at their native resolution and extract multiscale spectral features. Three novel components are designed: a mixed depth-wise separable convolution (MDSC) and a shared and dilated residual block (SDRB) to extract multi-scale spatial features, and a concatenation and sum (CS) operation to fuse multi-scale spectral and spatial features with little calculation and no additional parameters. The decoder of CD-FM3SF outputs three cloud masks at the same resolution as input bands to enhance the supervision information of small, middle and large clouds. To validate the performance of the proposed method, we manually labeled 36 Sentinel-2A scenes evenly distributed over mainland China. The experiment results demonstrate that CD-FM3SF outperforms traditional cloud detection methods and state-of-theart deep learning-based methods in both accuracy and speed. △ Less

Submitted 29 April, 2021; originally announced May 2021.

arXiv:2104.05107 [pdf, other]

doi 10.1109/MGRS.2020.3043504

Towards a Collective Agenda on AI for Earth Science Data Analysis

Authors: Devis Tuia, Ribana Roscher, Jan Dirk Wegner, Nathan Jacobs, Xiao Xiang Zhu, Gustau Camps-Valls

Abstract: In the last years we have witnessed the fields of geosciences and remote sensing and artificial intelligence to become closer. Thanks to both the massive availability of observational data, improved simulations, and algorithmic advances, these disciplines have found common objectives and challenges to advance the modeling and understanding of the Earth system. Despite such great opportunities, we… ▽ More In the last years we have witnessed the fields of geosciences and remote sensing and artificial intelligence to become closer. Thanks to both the massive availability of observational data, improved simulations, and algorithmic advances, these disciplines have found common objectives and challenges to advance the modeling and understanding of the Earth system. Despite such great opportunities, we also observed a worrying tendency to remain in disciplinary comfort zones applying recent advances from artificial intelligence on well resolved remote sensing problems. Here we take a position on research directions where we think the interface between these fields will have the most impact and become potential game changers. In our declared agenda for AI on Earth sciences, we aim to inspire researchers, especially the younger generations, to tackle these challenges for a real advance of remote sensing and the geosciences. △ Less

Submitted 11 April, 2021; originally announced April 2021.

Comments: In press at IEEE Geoscience and Remote Sensing Magazine

arXiv:2103.08741 [pdf, other]

doi 10.1109/TGRS.2021.3067096

Deep Reinforcement Learning for Band Selection in Hyperspectral Image Classification

Authors: Lichao Mou, Sudipan Saha, Yuansheng Hua, Francesca Bovolo, Lorenzo Bruzzone, Xiao Xiang Zhu

Abstract: Band selection refers to the process of choosing the most relevant bands in a hyperspectral image. By selecting a limited number of optimal bands, we aim at speeding up model training, improving accuracy, or both. It reduces redundancy among spectral bands while trying to preserve the original information of the image. By now many efforts have been made to develop unsupervised band selection appro… ▽ More Band selection refers to the process of choosing the most relevant bands in a hyperspectral image. By selecting a limited number of optimal bands, we aim at speeding up model training, improving accuracy, or both. It reduces redundancy among spectral bands while trying to preserve the original information of the image. By now many efforts have been made to develop unsupervised band selection approaches, of which the majority are heuristic algorithms devised by trial and error. In this paper, we are interested in training an intelligent agent that, given a hyperspectral image, is capable of automatically learning policy to select an optimal band subset without any hand-engineered reasoning. To this end, we frame the problem of unsupervised band selection as a Markov decision process, propose an effective method to parameterize it, and finally solve the problem by deep reinforcement learning. Once the agent is trained, it learns a band-selection policy that guides the agent to sequentially select bands by fully exploiting the hyperspectral image and previously picked bands. Furthermore, we propose two different reward schemes for the environment simulation of deep reinforcement learning and compare them in experiments. This, to the best of our knowledge, is the first study that explores a deep reinforcement learning model for hyperspectral image analysis, thus opening a new door for future research and showcasing the great potential of deep reinforcement learning in remote sensing applications. Extensive experiments are carried out on four hyperspectral data sets, and experimental results demonstrate the effectiveness of the proposed method. △ Less

Submitted 15 March, 2021; originally announced March 2021.

arXiv:2103.05102 [pdf, other]

doi 10.1109/TGRS.2021.3109957

Self-Supervised Multisensor Change Detection

Authors: Sudipan Saha, Patrick Ebel, Xiao Xiang Zhu

Abstract: Most change detection methods assume that pre-change and post-change images are acquired by the same sensor. However, in many real-life scenarios, e.g., natural disaster, it is more practical to use the latest available images before and after the occurrence of incidence, which may be acquired using different sensors. In particular, we are interested in the combination of the images acquired by op… ▽ More Most change detection methods assume that pre-change and post-change images are acquired by the same sensor. However, in many real-life scenarios, e.g., natural disaster, it is more practical to use the latest available images before and after the occurrence of incidence, which may be acquired using different sensors. In particular, we are interested in the combination of the images acquired by optical and Synthetic Aperture Radar (SAR) sensors. SAR images appear vastly different from the optical images even when capturing the same scene. Adding to this, change detection methods are often constrained to use only target image-pair, no labeled data, and no additional unlabeled data. Such constraints limit the scope of traditional supervised machine learning and unsupervised generative approaches for multi-sensor change detection. Recent rapid development of self-supervised learning methods has shown that some of them can even work with only few images. Motivated by this, in this work we propose a method for multi-sensor change detection using only the unlabeled target bi-temporal images that are used for training a network in self-supervised fashion by using deep clustering and contrastive learning. The proposed method is evaluated on four multi-modal bi-temporal scenes showing change and the benefits of our self-supervised approach are demonstrated. △ Less

Submitted 23 January, 2022; v1 submitted 12 February, 2021; originally announced March 2021.

arXiv:2103.01849 [pdf, other]

doi 10.1109/TGRS.2021.3064606

HED-UNet: Combined Segmentation and Edge Detection for Monitoring the Antarctic Coastline

Authors: Konrad Heidler, Lichao Mou, Celia Baumhoer, Andreas Dietz, Xiao Xiang Zhu

Abstract: Deep learning-based coastline detection algorithms have begun to outshine traditional statistical methods in recent years. However, they are usually trained only as single-purpose models to either segment land and water or delineate the coastline. In contrast to this, a human annotator will usually keep a mental map of both segmentation and delineation when performing manual coastline detection. T… ▽ More Deep learning-based coastline detection algorithms have begun to outshine traditional statistical methods in recent years. However, they are usually trained only as single-purpose models to either segment land and water or delineate the coastline. In contrast to this, a human annotator will usually keep a mental map of both segmentation and delineation when performing manual coastline detection. To take into account this task duality, we therefore devise a new model to unite these two approaches in a deep learning model. By taking inspiration from the main building blocks of a semantic segmentation framework (UNet) and an edge detection framework (HED), both tasks are combined in a natural way. Training is made efficient by employing deep supervision on side predictions at multiple resolutions. Finally, a hierarchical attention mechanism is introduced to adaptively merge these multiscale predictions into the final model output. The advantages of this approach over other traditional and deep learning-based methods for coastline detection are demonstrated on a dataset of Sentinel-1 imagery covering parts of the Antarctic coast, where coastline detection is notoriously difficult. An implementation of our method is available at \url{https://github.com/khdlr/HED-UNet}. △ Less

Submitted 2 March, 2021; originally announced March 2021.

Comments: This work has been accepted by IEEE TGRS for publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2103.01449 [pdf, other]

Interpretable Hyperspectral AI: When Non-Convex Modeling meets Hyperspectral Remote Sensing

Authors: Danfeng Hong, Wei He, Naoto Yokoya, **g Yao, Lianru Gao, Liangpei Zhang, Jocelyn Chanussot, Xiao Xiang Zhu

Abstract: Hyperspectral imaging, also known as image spectrometry, is a landmark technique in geoscience and remote sensing (RS). In the past decade, enormous efforts have been made to process and analyze these hyperspectral (HS) products mainly by means of seasoned experts. However, with the ever-growing volume of data, the bulk of costs in manpower and material resources poses new challenges on reducing t… ▽ More Hyperspectral imaging, also known as image spectrometry, is a landmark technique in geoscience and remote sensing (RS). In the past decade, enormous efforts have been made to process and analyze these hyperspectral (HS) products mainly by means of seasoned experts. However, with the ever-growing volume of data, the bulk of costs in manpower and material resources poses new challenges on reducing the burden of manual labor and improving efficiency. For this reason, it is, therefore, urgent to develop more intelligent and automatic approaches for various HS RS applications. Machine learning (ML) tools with convex optimization have successfully undertaken the tasks of numerous artificial intelligence (AI)-related applications. However, their ability in handling complex practical problems remains limited, particularly for HS data, due to the effects of various spectral variabilities in the process of HS imaging and the complexity and redundancy of higher dimensional HS signals. Compared to the convex models, non-convex modeling, which is capable of characterizing more complex real scenes and providing the model interpretability technically and theoretically, has been proven to be a feasible solution to reduce the gap between challenging HS vision tasks and currently advanced intelligent data processing models. △ Less

Submitted 1 March, 2021; originally announced March 2021.

arXiv:2011.11452 [pdf, other]

Multi-task Learning for Human Settlement Extent Regression and Local Climate Zone Classification

Authors: Chun** Qiu, Lukas Liebel, Lloyd H. Hughes, Michael Schmitt, Marco Körner, Xiao Xiang Zhu

Abstract: Human Settlement Extent (HSE) and Local Climate Zone (LCZ) maps are both essential sources, e.g., for sustainable urban development and Urban Heat Island (UHI) studies. Remote sensing (RS)- and deep learning (DL)-based classification approaches play a significant role by providing the potential for global map**. However, most of the efforts only focus on one of the two schemes, usually on a spec… ▽ More Human Settlement Extent (HSE) and Local Climate Zone (LCZ) maps are both essential sources, e.g., for sustainable urban development and Urban Heat Island (UHI) studies. Remote sensing (RS)- and deep learning (DL)-based classification approaches play a significant role by providing the potential for global map**. However, most of the efforts only focus on one of the two schemes, usually on a specific scale. This leads to unnecessary redundancies, since the learned features could be leveraged for both of these related tasks. In this letter, the concept of multi-task learning (MTL) is introduced to HSE regression and LCZ classification for the first time. We propose a MTL framework and develop an end-to-end Convolutional Neural Network (CNN), which consists of a backbone network for shared feature learning, attention modules for task-specific feature learning, and a weighting strategy for balancing the two tasks. We additionally propose to exploit HSE predictions as a prior for LCZ classification to enhance the accuracy. The MTL approach was extensively tested with Sentinel-2 data of 13 cities across the world. The results demonstrate that the framework is able to provide a competitive solution for both tasks. △ Less

Submitted 23 November, 2020; originally announced November 2020.

Comments: This work has been accepted by IEEE GRSL for publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2011.08362 [pdf, other]

doi 10.1109/TGRS.2020.3043089

CG-Net: Conditional GIS-aware Network for Individual Building Segmentation in VHR SAR Images

Authors: Yao Sun, Yuansheng Hua, Lichao Mou, Xiao Xiang Zhu

Abstract: Object retrieval and reconstruction from very high resolution (VHR) synthetic aperture radar (SAR) images are of great importance for urban SAR applications, yet highly challenging owing to the complexity of SAR data. This paper addresses the issue of individual building segmentation from a single VHR SAR image in large-scale urban areas. To achieve this, we introduce building footprints from GIS… ▽ More Object retrieval and reconstruction from very high resolution (VHR) synthetic aperture radar (SAR) images are of great importance for urban SAR applications, yet highly challenging owing to the complexity of SAR data. This paper addresses the issue of individual building segmentation from a single VHR SAR image in large-scale urban areas. To achieve this, we introduce building footprints from GIS data as complementary information and propose a novel conditional GIS-aware network (CG-Net). The proposed model learns multi-level visual features and employs building footprints to normalize the features for predicting building masks in the SAR image. We validate our method using a high resolution spotlight TerraSAR-X image collected over Berlin. Experimental results show that the proposed CG-Net effectively brings improvements with variant backbones. We further compare two representations of building footprints, namely complete building footprints and sensor-visible footprint segments, for our task, and conclude that the use of the former leads to better segmentation results. Moreover, we investigate the impact of inaccurate GIS data on our CG-Net, and this study shows that CG-Net is robust against positioning errors in GIS data. In addition, we propose an approach of ground truth generation of buildings from an accurate digital elevation model (DEM), which can be used to generate large-scale SAR image datasets. The segmentation results can be applied to reconstruct 3D building models at level-of-detail (LoD) 1, which is demonstrated in our experiments. △ Less

Submitted 16 November, 2020; originally announced November 2020.

arXiv:2009.06992 [pdf, other]

doi 10.1016/j.rse.2020.112096

Map** horizontal and vertical urban densification in Denmark with Landsat time-series from 1985 to 2018: a semantic segmentation solution

Authors: Tzu-Hsin Karen Chen, Chun** Qiu, Michael Schmitt, Xiao Xiang Zhu, Clive E. Sabel, Alexander V. Prishchepov

Abstract: Landsat imagery is an unparalleled freely available data source that allows reconstructing horizontal and vertical urban form. This paper addresses the challenge of using Landsat data, particularly its 30m spatial resolution, for monitoring three-dimensional urban densification. We compare temporal and spatial transferability of an adapted DeepLab model with a simple fully convolutional network (F… ▽ More Landsat imagery is an unparalleled freely available data source that allows reconstructing horizontal and vertical urban form. This paper addresses the challenge of using Landsat data, particularly its 30m spatial resolution, for monitoring three-dimensional urban densification. We compare temporal and spatial transferability of an adapted DeepLab model with a simple fully convolutional network (FCN) and a texture-based random forest (RF) model to map urban density in the two morphological dimensions: horizontal (compact, open, sparse) and vertical (high rise, low rise). We test whether a model trained on the 2014 data can be applied to 2006 and 1995 for Denmark, and examine whether we could use the model trained on the Danish data to accurately map other European cities. Our results show that an implementation of deep networks and the inclusion of multi-scale contextual information greatly improve the classification and the model's ability to generalize across space and time. DeepLab provides more accurate horizontal and vertical classifications than FCN when sufficient training data is available. By using DeepLab, the F1 score can be increased by 4 and 10 percentage points for detecting vertical urban growth compared to FCN and RF for Denmark. For map** the other European cities with training data from Denmark, DeepLab also shows an advantage of 6 percentage points over RF for both the dimensions. The resulting maps across the years 1985 to 2018 reveal different patterns of urban growth between Copenhagen and Aarhus, the two largest cities in Denmark, illustrating that those cities have used various planning policies in addressing population growth and housing supply challenges. In summary, we propose a transferable deep learning approach for automated, long-term map** of urban form from Landsat images. △ Less

Submitted 21 September, 2020; v1 submitted 15 September, 2020; originally announced September 2020.

Comments: Accepted manuscript including appendix (supplementary file)

ACM Class: I.4.6; I.4.9; J.2; J.4

Journal ref: Remote Sensing of Environment, 2020, 251

arXiv:2009.01009 [pdf, other]

doi 10.1109/TGRS.2020.3022209

SAR Tomography via Nonlinear Blind Scatterer Separation

Authors: Yuanyuan Wang, Xiao Xiang Zhu

Abstract: Layover separation has been fundamental to many synthetic aperture radar applications, such as building reconstruction and biomass estimation. Retrieving the scattering profile along the mixed dimension (elevation) is typically solved by inversion of the SAR imaging model, a process known as SAR tomography. This paper proposes a nonlinear blind scatterer separation method to retrieve the phase cen… ▽ More Layover separation has been fundamental to many synthetic aperture radar applications, such as building reconstruction and biomass estimation. Retrieving the scattering profile along the mixed dimension (elevation) is typically solved by inversion of the SAR imaging model, a process known as SAR tomography. This paper proposes a nonlinear blind scatterer separation method to retrieve the phase centers of the layovered scatterers, avoiding the computationally expensive tomographic inversion. We demonstrate that conventional linear separation methods, e.g., principle component analysis (PCA), can only partially separate the scatterers under good conditions. These methods produce systematic phase bias in the retrieved scatterers due to the nonorthogonality of the scatterers' steering vectors, especially when the intensities of the sources are similar or the number of images is low. The proposed method artificially increases the dimensionality of the data using kernel PCA, hence mitigating the aforementioned limitations. In the processing, the proposed method sequentially deflates the covariance matrix using the estimate of the brightest scatterer from kernel PCA. Simulations demonstrate the superior performance of the proposed method over conventional PCA-based methods in various respects. Experiments using TerraSAR-X data show an improvement in height reconstruction accuracy by a factor of one to three, depending on the used number of looks. △ Less

Submitted 2 September, 2020; originally announced September 2020.

Comments: This work has been accepted by IEEE TGRS for publication

arXiv:2008.01184 [pdf, other]

Generative Adversarial Networks for Synthesizing InSAR Patches

Authors: Philipp Sibler, Yuanyuan Wang, Stefan Auer, Mohsin Ali, Xiao Xiang Zhu

Abstract: Generative Adversarial Networks (GANs) have been employed with certain success for image translation tasks between optical and real-valued SAR intensity imagery. Applications include aiding interpretability of SAR scenes with their optical counterparts by artificial patch generation and automatic SAR-optical scene matching. The synthesis of artificial complex-valued InSAR image stacks asks for, be… ▽ More Generative Adversarial Networks (GANs) have been employed with certain success for image translation tasks between optical and real-valued SAR intensity imagery. Applications include aiding interpretability of SAR scenes with their optical counterparts by artificial patch generation and automatic SAR-optical scene matching. The synthesis of artificial complex-valued InSAR image stacks asks for, besides good perceptual quality, more stringent quality metrics like phase noise and phase coherence. This paper provides a signal processing model of generative CNN structures, describes effects influencing those quality metrics and presents a map** scheme of complex-valued data to given CNN structures based on popular Deep Learning frameworks. △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: accepted in preliminary version for EUSAR2020 conference

arXiv:2006.16013 [pdf, other]

Single-Look Multi-Master SAR Tomography: An Introduction

Authors: Nan Ge, Richard Bamler, Danfeng Hong, Xiao Xiang Zhu

Abstract: This paper addresses the general problem of single-look multi-master SAR tomography. For this purpose, we establish the single-look multi-master data model, analyze its implications for single and double scatterers, and propose a generic inversion framework. The core of this framework is nonconvex sparse recovery, for which we develop two algorithms: one extends the conventional nonlinear least sq… ▽ More This paper addresses the general problem of single-look multi-master SAR tomography. For this purpose, we establish the single-look multi-master data model, analyze its implications for single and double scatterers, and propose a generic inversion framework. The core of this framework is nonconvex sparse recovery, for which we develop two algorithms: one extends the conventional nonlinear least squares (NLS) to the single-look multi-master data model, and the other is based on bi-convex relaxation and alternating minimization (BiCRAM). We provide two theorems for the objective function of the NLS subproblem, which lead to its analytic solution up to a constant phase angle in the one-dimensional case. We also report our findings from the experiments on different acceleration techniques for BiCRAM. The proposed algorithms are applied to a real TerraSAR-X data set, and validated with height ground truth made available via a SAR imaging geodesy and simulation framework. This shows empirically that the \emph{single-master} approach, if applied to a single-look \emph{multi-master} stack, can be insufficient for layover separation, and the \emph{multi-master} approach can indeed perform slightly better (despite being computationally more expensive) even in the case of single scatterers. Besides, this paper also sheds light on the special case of single-look bistatic SAR tomography, which is relevant for current and future SAR missions such as TanDEM-X and Tandem-L. △ Less

Submitted 29 June, 2020; originally announced June 2020.

arXiv:2006.10027 [pdf, other]

Deep Learning Meets SAR

Authors: Xiao Xiang Zhu, Sina Montazeri, Mohsin Ali, Yuansheng Hua, Yuanyuan Wang, Lichao Mou, Yilei Shi, Feng Xu, Richard Bamler

Abstract: Deep learning in remote sensing has become an international hype, but it is mostly limited to the evaluation of optical data. Although deep learning has been introduced in Synthetic Aperture Radar (SAR) data processing, despite successful first attempts, its huge potential remains locked. In this paper, we provide an introduction to the most relevant deep learning models and concepts, point out po… ▽ More Deep learning in remote sensing has become an international hype, but it is mostly limited to the evaluation of optical data. Although deep learning has been introduced in Synthetic Aperture Radar (SAR) data processing, despite successful first attempts, its huge potential remains locked. In this paper, we provide an introduction to the most relevant deep learning models and concepts, point out possible pitfalls by analyzing special characteristics of SAR data, review the state-of-the-art of deep learning applied to SAR in depth, summarize available benchmarks, and recommend some important future research directions. With this effort, we hope to stimulate more research in this interesting yet under-exploited research field and to pave the way for use of deep learning in big SAR data processing workflows. △ Less

Submitted 5 January, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

Comments: article accepted by IEEE Geoscience and Remote Sensing Magazine. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2005.07983 [pdf, other]

Multi-level Feature Fusion-based CNN for Local Climate Zone Classification from Sentinel-2 Images: Benchmark Results on the So2Sat LCZ42 Dataset

Authors: Chun** Qiu, Xiaochong Tong, Michael Schmitt, Benjamin Bechtel, Xiao Xiang Zhu

Abstract: As a unique classification scheme for urban forms and functions, the local climate zone (LCZ) system provides essential general information for any studies related to urban environments, especially on a large scale. Remote sensing data-based classification approaches are the key to large-scale map** and monitoring of LCZs. The potential of deep learning-based approaches is not yet fully explored… ▽ More As a unique classification scheme for urban forms and functions, the local climate zone (LCZ) system provides essential general information for any studies related to urban environments, especially on a large scale. Remote sensing data-based classification approaches are the key to large-scale map** and monitoring of LCZs. The potential of deep learning-based approaches is not yet fully explored, even though advanced convolutional neural networks (CNNs) continue to push the frontiers for various computer vision tasks. One reason is that published studies are based on different datasets, usually at a regional scale, which makes it impossible to fairly and consistently compare the potential of different CNNs for real-world scenarios. This study is based on the big So2Sat LCZ42 benchmark dataset dedicated to LCZ classification. Using this dataset, we studied a range of CNNs of varying sizes. In addition, we proposed a CNN to classify LCZs from Sentinel-2 images, Sen2LCZ-Net. Using this base network, we propose fusing multi-level features using the extended Sen2LCZ-Net-MF. With this proposed simple network architecture and the highly competitive benchmark dataset, we obtain results that are better than those obtained by the state-of-the-art CNNs, while requiring less computation with fewer layers and parameters. Large-scale LCZ classification examples of completely unseen areas are presented, demonstrating the potential of our proposed Sen2LCZ-Net-MF as well as the So2Sat LCZ42 dataset. We also intensively investigated the influence of network depth and width and the effectiveness of the design choices made for Sen2LCZ-Net-MF. Our work will provide important baselines for future CNN-based algorithm developments for both LCZ classification and other urban land cover land use classification. △ Less

Submitted 16 May, 2020; originally announced May 2020.

arXiv:2003.07803 [pdf, other]

doi 10.1109/TGRS.2020.2986052

SAR Tomography at the Limit: Building Height Reconstruction Using Only 3-5 TanDEM-X Bistatic Interferograms

Authors: Yilei Shi, Richard Bamler, Yuanyuan Wang, Xiao Xiang Zhu

Abstract: Multi-baseline interferometric synthetic aperture radar (InSAR) techniques are effective approaches for retrieving the 3-D information of urban areas. In order to obtain a plausible reconstruction, it is necessary to use more than twenty interferograms. Hence, these methods are commonly not appropriate for large-scale 3-D urban map** using TanDEM-X data where only a few acquisitions are availabl… ▽ More Multi-baseline interferometric synthetic aperture radar (InSAR) techniques are effective approaches for retrieving the 3-D information of urban areas. In order to obtain a plausible reconstruction, it is necessary to use more than twenty interferograms. Hence, these methods are commonly not appropriate for large-scale 3-D urban map** using TanDEM-X data where only a few acquisitions are available in average for each city. This work proposes a new SAR tomographic processing framework to work with those extremely small stacks, which integrates the non-local filtering into SAR tomography inversion. The applicability of the algorithm is demonstrated using a TanDEM-X multi-baseline stack with 5 bistatic interferograms over the whole city of Munich, Germany. Systematic comparison of our result with TanDEM-X raw digital elevation models (DEM) and airborne LiDAR data shows that the relative height accuracy of two third buildings is within two meters, which outperforms the TanDEM-X raw DEM. The promising performance of the proposed algorithm paved the first step towards high quality large-scale 3-D urban map**. △ Less

Submitted 17 March, 2020; originally announced March 2020.

arXiv:2001.02935 [pdf, other]

doi 10.1109/TGRS.2020.2964617

Multipass SAR Interferometry Based on Total Variation Regularized Robust Low Rank Tensor Decomposition

Authors: Jian Kang, Yuanyuan Wang, Xiao Xiang Zhu

Abstract: Multipass SAR interferometry (InSAR) techniques based on meter-resolution spaceborne SAR satellites, such as TerraSAR-X or COSMO-Skymed, provide 3D reconstruction and the measurement of ground displacement over large urban areas. Conventional method such as Persistent Scatterer Interferometry (PSI) usually requires a fairly large SAR image stack (usually in the order of tens), in order to achieve… ▽ More Multipass SAR interferometry (InSAR) techniques based on meter-resolution spaceborne SAR satellites, such as TerraSAR-X or COSMO-Skymed, provide 3D reconstruction and the measurement of ground displacement over large urban areas. Conventional method such as Persistent Scatterer Interferometry (PSI) usually requires a fairly large SAR image stack (usually in the order of tens), in order to achieve reliable estimates of these parameters. Recently, low rank property in multipass InSAR data stack was explored and investigated in our previous work. By exploiting this low rank prior, more accurate estimation of the geophysical parameters can be achieved, which in turn can effectively reduce the number of interferograms required for a reliable estimation. Based on that, this paper proposes a novel tensor decomposition method in complex domain, which jointly exploits low rank and variational prior of the interferometric phase in InSAR data stacks. Specifically, a total variation (TV) regularized robust low rank tensor decomposition method is exploited for recovering outlier-free InSAR stacks. We demonstrate that the filtered InSAR data stacks can greatly improve the accuracy of geophysical parameters estimated from real data. Moreover, this paper demonstrates for the first time in the community that tensor-decomposition-based methods can be beneficial for large-scale urban map** problems using multipass InSAR. Two TerraSAR-X data stacks with large spatial areas demonstrate the promising performance of the proposed method. △ Less

Submitted 9 January, 2020; originally announced January 2020.

arXiv:1912.12171 [pdf, other]

So2Sat LCZ42: A Benchmark Dataset for Global Local Climate Zones Classification

Authors: Xiao Xiang Zhu, **gliang Hu, Chun** Qiu, Yilei Shi, Jian Kang, Lichao Mou, Hossein Bagheri, Matthias Häberle, Yuansheng Hua, Rong Huang, Lloyd Hughes, Hao Li, Yao Sun, Guichen Zhang, Shiyao Han, Michael Schmitt, Yuanyuan Wang

Abstract: Access to labeled reference data is one of the grand challenges in supervised machine learning endeavors. This is especially true for an automated analysis of remote sensing images on a global scale, which enables us to address global challenges such as urbanization and climate change using state-of-the-art machine learning techniques. To meet these pressing needs, especially in urban research, we… ▽ More Access to labeled reference data is one of the grand challenges in supervised machine learning endeavors. This is especially true for an automated analysis of remote sensing images on a global scale, which enables us to address global challenges such as urbanization and climate change using state-of-the-art machine learning techniques. To meet these pressing needs, especially in urban research, we provide open access to a valuable benchmark dataset named "So2Sat LCZ42," which consists of local climate zone (LCZ) labels of about half a million Sentinel-1 and Sentinel-2 image patches in 42 urban agglomerations (plus 10 additional smaller areas) across the globe. This dataset was labeled by 15 domain experts following a carefully designed labeling work flow and evaluation process over a period of six months. As rarely done in other labeled remote sensing dataset, we conducted rigorous quality assessment by domain experts. The dataset achieved an overall confidence of 85%. We believe this LCZ dataset is a first step towards an unbiased globallydistributed dataset for urban growth monitoring using machine learning methods, because LCZ provide a rather objective measure other than many other semantic land use and land cover classifications. It provides measures of the morphology, compactness, and height of urban areas, which are less dependent on human and culture. This dataset can be accessed from http://doi.org/10.14459/2018mp1483140. △ Less

Submitted 19 December, 2019; originally announced December 2019.

Comments: Article submitted to IEEE Geoscience and Remote Sensing Magazine

arXiv:1908.08854 [pdf, other]

doi 10.1109/MGRS.2019.2937630

Linking Points With Labels in 3D: A Review of Point Cloud Semantic Segmentation

Authors: Yuxing Xie, Jiaojiao Tian, Xiao Xiang Zhu

Abstract: 3D Point Cloud Semantic Segmentation (PCSS) is attracting increasing interest, due to its applicability in remote sensing, computer vision and robotics, and due to the new possibilities offered by deep learning techniques. In order to provide a needed up-to-date review of recent developments in PCSS, this article summarizes existing studies on this topic. Firstly, we outline the acquisition and ev… ▽ More 3D Point Cloud Semantic Segmentation (PCSS) is attracting increasing interest, due to its applicability in remote sensing, computer vision and robotics, and due to the new possibilities offered by deep learning techniques. In order to provide a needed up-to-date review of recent developments in PCSS, this article summarizes existing studies on this topic. Firstly, we outline the acquisition and evolution of the 3D point cloud from the perspective of remote sensing and computer vision, as well as the published benchmarks for PCSS studies. Then, traditional and advanced techniques used for Point Cloud Segmentation (PCS) and PCSS are reviewed and compared. Finally, important issues and open questions in PCSS studies are discussed. △ Less

Submitted 26 June, 2020; v1 submitted 23 August, 2019; originally announced August 2019.

Comments: The title of published version was modified to "Linking Points With Labels in 3D: A Review of Point Cloud Semantic Segmentation". To read its final version please go to IEEE Geoscience and Remote Sensing Magazine on IEEE XPlore: https://ieeexplore.ieee.org/document/9028090

arXiv:1901.01548 [pdf, other]

doi 10.1016/j.isprsjprs.2018.12.007

Potential of nonlocally filtered pursuit monostatic TanDEM-X data for coastline detection

Authors: Michael Schmitt, Gerald Baier, Xiao Xiang Zhu

Abstract: This article investigates the potential of nonlocally filtered pursuit monostatic TanDEM-X data for coastline detection in comparison to conventional TanDEM-X data, i.e. image pairs acquired in repeat-pass or bistatic mode. For this task, an unsupervised coastline detection procedure based on scale-space representations and K-medians clustering as well as morphological image post-processing is pro… ▽ More This article investigates the potential of nonlocally filtered pursuit monostatic TanDEM-X data for coastline detection in comparison to conventional TanDEM-X data, i.e. image pairs acquired in repeat-pass or bistatic mode. For this task, an unsupervised coastline detection procedure based on scale-space representations and K-medians clustering as well as morphological image post-processing is proposed. Since this procedure exploits a clear discriminability of "dark" and "bright" appearances of water and land surfaces, respectively, in both SAR amplitude and coherence imagery, TanDEM-X InSAR data acquired in pursuit monostatic mode is expected to provide a promising benefit. In addition, we investigate the benefit introduced by a utilization of a non-local InSAR filter for amplitude denoising and coherence estimation instead of a conventional box-car filter. Experiments carried out on real TanDEM-X pursuit monostatic data confirm our expectations and illustrate the advantage of the employed data configuration over conventional TanDEM-X products for automatic coastline detection. △ Less

Submitted 6 January, 2019; originally announced January 2019.

Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing 148: 130-141

arXiv:1810.11415 [pdf, other]

doi 10.1016/j.isprsjprs.2018.07.007

Fusion of TanDEM-X and Cartosat-1 Elevation Data Supported by NeuralNetwork-Predicted Weight Maps

Authors: Hossein Bagheri, Michael Schmitt, Xiao Xiang Zhu

Abstract: Recently, the bistatic SAR interferometry mission TanDEM-X provided a global terrain map with unprecedented accuracy. However, visual inspection and empirical assessment of TanDEM-X elevation data against high-resolution ground truth illustrates that the quality of the DEM decreases in urban areas because of SAR-inherent imaging properties. One possible solution for an enhancement of the TanDEM-X… ▽ More Recently, the bistatic SAR interferometry mission TanDEM-X provided a global terrain map with unprecedented accuracy. However, visual inspection and empirical assessment of TanDEM-X elevation data against high-resolution ground truth illustrates that the quality of the DEM decreases in urban areas because of SAR-inherent imaging properties. One possible solution for an enhancement of the TanDEM-X DEM quality is to fuse it with other elevation data derived from high-resolution optical stereoscopic imagery, such as that provided by the Cartosat-1 mission. This is usually done by Weighted Averaging (WA) of previously aligned DEM cells. The main contribution of this paper is to develop a method to efficiently predict weight maps in order to achieve optimized fusion results. The prediction is modeled using a fully connected Artificial Neural Network (ANN). The idea of this ANN is to extract suitable features from DEMs that relate to height residuals in training areas and then to automatically learn the pattern of the relationship between height errors and features. The results show the DEM fusion based on the ANN-predicted weights improves the qualities of the study DEMs. Apart from increasing the absolute accuracy of Cartosat-1 DEM by DEM fusion, the relative accuracy (respective to reference LiDAR data) ofDEMs is improved by up to 50% in urban areas and 22% in non-urban areas while the improvement by them-based method does not exceed 20% and 10% in urban and non-urban areas respectively. △ Less

Submitted 26 October, 2018; originally announced October 2018.

Comments: This is the pre-acceptance version, to read the final version, please go to ISPRS Journal of Photogrammetry and Remote Sensing on ScienceDirect

Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing, Volume 144, October 2018, Pages 285-297

arXiv:1810.11413 [pdf, other]

A Framework for SAR-Optical Stereogrammetry over Urban Areas

Authors: Hossein Bagheri, Michael Schmitt, Pablo d'Angelo, Xiao Xiang Zhu

Abstract: Currently, numerous remote sensing satellites provide a huge volume of diverse earth observation data. As these data show different features regarding resolution, accuracy, coverage, and spectral imaging ability, fusion techniques are required to integrate the different properties of each sensor and produce useful information. For example, synthetic aperture radar (SAR) data can be fused with opti… ▽ More Currently, numerous remote sensing satellites provide a huge volume of diverse earth observation data. As these data show different features regarding resolution, accuracy, coverage, and spectral imaging ability, fusion techniques are required to integrate the different properties of each sensor and produce useful information. For example, synthetic aperture radar (SAR) data can be fused with optical imagery to produce 3D information using stereogrammetric methods. The main focus of this study is to investigate the possibility of applying a stereogrammetry pipeline to very-high-resolution (VHR) SAR-optical image pairs. For this purpose, the applicability of semi-global matching is investigated in this unconventional multi-sensor setting. To support the image matching by reducing the search space and accelerating the identification of correct, reliable matches, the possibility of establishing an epipolarity constraint for VHR SAR-optical image pairs is investigated as well. In addition, it is shown that the absolute geolocation accuracy of VHR optical imagery with respect to VHR SAR imagery such as provided by TerraSAR-X can be improved by a multi-sensor block adjustment formulation based on rational polynomial coefficients. Finally, the feasibility of generating point clouds with a median accuracy of about 2m is demonstrated and confirms the potential of 3D reconstruction from SAR-optical image pairs over urban areas. △ Less

Submitted 26 October, 2018; originally announced October 2018.

Comments: This is the pre-acceptance version, to read the final version, please go to ISPRS Journal of Photogrammetry and Remote Sensing on ScienceDirect

Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing, 2018

arXiv:1810.11314 [pdf, other]

Fusion of Urban TanDEM-X raw DEMs using variational models

Authors: Hossein Bagheri, Michael Schmitt, Xiao Xiang Zhu

Abstract: Recently, a new global Digital Elevation Model (DEM) with pixel spacing of 0.4 arcseconds and relative height accuracy finer than 2m for flat areas (slopes < 20%) and better than 4m for rugged terrain (slopes > 20%) was created through the TanDEM-X mission. One important step of the chain of global DEM generation is to mosaic and fuse multiple raw DEM tiles to reach the target height accuracy. Cur… ▽ More Recently, a new global Digital Elevation Model (DEM) with pixel spacing of 0.4 arcseconds and relative height accuracy finer than 2m for flat areas (slopes < 20%) and better than 4m for rugged terrain (slopes > 20%) was created through the TanDEM-X mission. One important step of the chain of global DEM generation is to mosaic and fuse multiple raw DEM tiles to reach the target height accuracy. Currently, Weighted Averaging (WA) is applied as a fast and simple method for TanDEM-X raw DEM fusion in which the weights are computed from height error maps delivered from the Interferometric TanDEM-X Processor (ITP). However, evaluations show that WA is not the perfect DEM fusion method for urban areas especially in confrontation with edges such as building outlines. The main focus of this paper is to investigate more advanced variational approaches such as TV-L1 and Huber models. Furthermore, we also assess the performance of variational models for fusing raw DEMs produced from data takes with different baseline configurations and height of ambiguities. The results illustrate the high efficiency of variational models for TanDEM-X raw DEM fusion in comparison to WA. Using variational models could improve the DEM quality by up to 2m particularly in inner-city subsets. △ Less

Submitted 26 October, 2018; originally announced October 2018.

Comments: This is the pre-acceptance version, to read the final version, please go to IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing on IEEE Xplore

Journal ref: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2018

arXiv:1808.06155 [pdf, other]

doi 10.1109/TGRS.2018.2864716

Buildings Detection in VHR SAR Images Using Fully Convolution Neural Networks

Authors: Muhammad Shahzad, Michael Maurer, Friedrich Fraundorfer, Yuanyuan Wang, Xiao Xiang Zhu

Abstract: This paper addresses the highly challenging problem of automatically detecting man-made structures especially buildings in very high resolution (VHR) synthetic aperture radar (SAR) images. In this context, the paper has two major contributions: Firstly, it presents a novel and generic workflow that initially classifies the spaceborne TomoSAR point clouds $ - $ generated by processing VHR SAR image… ▽ More This paper addresses the highly challenging problem of automatically detecting man-made structures especially buildings in very high resolution (VHR) synthetic aperture radar (SAR) images. In this context, the paper has two major contributions: Firstly, it presents a novel and generic workflow that initially classifies the spaceborne TomoSAR point clouds $ - $ generated by processing VHR SAR image stacks using advanced interferometric techniques known as SAR tomography (TomoSAR) $ - $ into buildings and non-buildings with the aid of auxiliary information (i.e., either using openly available 2-D building footprints or adopting an optical image classification scheme) and later back project the extracted building points onto the SAR imaging coordinates to produce automatic large-scale benchmark labelled (buildings/non-buildings) SAR datasets. Secondly, these labelled datasets (i.e., building masks) have been utilized to construct and train the state-of-the-art deep Fully Convolution Neural Networks with an additional Conditional Random Field represented as a Recurrent Neural Network to detect building regions in a single VHR SAR image. Such a cascaded formation has been successfully employed in computer vision and remote sensing fields for optical image classification but, to our knowledge, has not been applied to SAR images. The results of the building detection are illustrated and validated over a TerraSAR-X VHR spotlight SAR image covering approximately 39 km$ ^2 $ $ - $ almost the whole city of Berlin $ - $ with mean pixel accuracies of around 93.84% △ Less

Submitted 14 August, 2018; originally announced August 2018.

Comments: Accepted publication in IEEE TGRS

arXiv:1807.06826 [pdf, other]

Spaceborne Staring Spotlight SAR Tomography - A First Demonstration with TerraSAR-X

Authors: Nan Ge, Fernando Rodriguez Gonzalez, Yuanyuan Wang, Yilei Shi, Xiao Xiang Zhu

Abstract: With the objective of exploiting hardware capabilities and preparing the ground for the next-generation X-band synthetic aperture radar (SAR) missions, TerraSAR-X and TanDEM-X are now able to operate in staring spotlight mode, which is characterized by an increased azimuth resolution of approximately 0.24 m compared to 1.1 m of the conventional sliding spotlight mode. In this paper, we demonstrate… ▽ More With the objective of exploiting hardware capabilities and preparing the ground for the next-generation X-band synthetic aperture radar (SAR) missions, TerraSAR-X and TanDEM-X are now able to operate in staring spotlight mode, which is characterized by an increased azimuth resolution of approximately 0.24 m compared to 1.1 m of the conventional sliding spotlight mode. In this paper, we demonstrate for the first time its potential for SAR tomography. To this end, we tailored our interferometric and tomographic processors for the distinctive features of the staring spotlight mode, which will be analyzed accordingly. By means of its higher spatial resolution, the staring spotlight mode will not only lead to a denser point cloud, but also to more accurate height estimates due to the higher signal-to-clutter ratio. As a result of a first comparison between sliding and staring spotlight TomoSAR, the following were observed: 1) the density of the staring spotlight point cloud is approximately 5.1--5.5 times as high; 2) the relative height accuracy of the staring spotlight point cloud is approximately 1.7 times as high. △ Less

Submitted 18 July, 2018; originally announced July 2018.

Comments: Accepted publication in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

arXiv:1805.10545 [pdf, other]

doi 10.1109/TGRS.2018.2839027

A Nonlocal InSAR Filter for High-Resolution DEM Generation from TanDEM-X Interferograms

Authors: Gerald Baier, Cristian Rossi, Marie Lachaise, Xiao Xiang Zhu, Richard Bamler

Abstract: This paper presents a nonlocal InSAR filter with the goal of generating digital elevation models of higher resolution and accuracy from bistatic TanDEM-X strip map interferograms than with the processing chain used in production. The currently employed boxcar multilooking filter naturally decreases the resolution and has inherent limitations on what level of noise reduction can be achieved. The pr… ▽ More This paper presents a nonlocal InSAR filter with the goal of generating digital elevation models of higher resolution and accuracy from bistatic TanDEM-X strip map interferograms than with the processing chain used in production. The currently employed boxcar multilooking filter naturally decreases the resolution and has inherent limitations on what level of noise reduction can be achieved. The proposed filter is specifically designed to account for the inherent diversity of natural terrain by setting several filtering parameters adaptively. In particular, it considers the local fringe frequency and scene heterogeneity, ensuring proper denoising of interferograms with considerable underlying topography as well as urban areas. A comparison using synthetic and TanDEM-X bistatic strip map datasets with existing InSAR filters shows the effectiveness of the proposed techniques, most of which could readily be integrated into existing nonlocal filters. The resulting digital elevation models outclass the ones produced with the existing global TanDEM-X DEM processing chain by effectively increasing the resolution from 12m to 6m and lowering the noise level by roughly a factor of two. △ Less

Submitted 26 May, 2018; originally announced May 2018.

Comments: Paper has been accepted to be published in IEEE Transaction on Geoscience and Remote Sensing

arXiv:1805.01759 [pdf, other]

A fast and accurate basis pursuit denoising algorithm with application to super-resolving tomographic SAR

Authors: Yilei Shi, Xiao Xiang Zhu, Wotao Yin, Richard Bamler

Abstract: $L_1… ▽ More $L_1$ regularization is used for finding sparse solutions to an underdetermined linear system. As sparse signals are widely expected in remote sensing, this type of regularization scheme and its extensions have been widely employed in many remote sensing problems, such as image fusion, target detection, image super-resolution, and others and have led to promising results. However, solving such sparse reconstruction problems is computationally expensive and has limitations in its practical use. In this paper, we proposed a novel efficient algorithm for solving the complex-valued $L_1$ regularized least squares problem. Taking the high-dimensional tomographic synthetic aperture radar (TomoSAR) as a practical example, we carried out extensive experiments, both with simulation data and real data, to demonstrate that the proposed approach can retain the accuracy of second order methods while dramatically speeding up the processing by one or two orders. Although we have chosen TomoSAR as the example, the proposed method can be generally applied to any spectral estimation problems. △ Less

Submitted 4 May, 2018; originally announced May 2018.

Comments: 11 pages, IEEE Transactions on Geoscience and Remote Sensing

arXiv:1802.09036 [pdf, other]

doi 10.1016/j.isprsjprs.2017.12.006

Towards Automatic SAR-Optical Stereogrammetry over Urban Areas using Very High Resolution Imagery

Authors: Chun** Qiu, Michael Schmitt, Xiao Xiang Zhu

Abstract: In this paper we discuss the potential and challenges regarding SAR-optical stereogrammetry for urban areas, using very-high-resolution (VHR) remote sensing imagery. Since we do this mainly from a geometrical point of view, we first analyze the height reconstruction accuracy to be expected for different stereogrammetric configurations. Then, we propose a strategy for simultaneous tie point matchin… ▽ More In this paper we discuss the potential and challenges regarding SAR-optical stereogrammetry for urban areas, using very-high-resolution (VHR) remote sensing imagery. Since we do this mainly from a geometrical point of view, we first analyze the height reconstruction accuracy to be expected for different stereogrammetric configurations. Then, we propose a strategy for simultaneous tie point matching and 3D reconstruction, which exploits an epipolar-like search window constraint. To drive the matching and ensure some robustness, we combine different established handcrafted similarity measures. For the experiments, we use real test data acquired by the Worldview-2, TerraSAR-X and MEMPHIS sensors. Our results show that SAR-optical stereogrammetry using VHR imagery is generally feasible with 3D positioning accuracies in the meter-domain, although the matching of these strongly hetereogeneous multi-sensor data remains very challenging. Keywords: Synthetic Aperture Radar (SAR), optical images, remote sensing, data fusion, stereogrammetry △ Less

Submitted 25 February, 2018; originally announced February 2018.

arXiv:1802.09026 [pdf, other]

doi 10.1016/j.isprsjprs.2018.02.006

Building Instance Classification Using Street View Images

Authors: Jian Kang, Marco Körner, Yuanyuan Wang, Hannes Taubenböck, Xiao Xiang Zhu

Abstract: Land-use classification based on spaceborne or aerial remote sensing images has been extensively studied over the past decades. Such classification is usually a patch-wise or pixel-wise labeling over the whole image. But for many applications, such as urban population density map** or urban utility planning, a classification map based on individual buildings is much more informative. However, su… ▽ More Land-use classification based on spaceborne or aerial remote sensing images has been extensively studied over the past decades. Such classification is usually a patch-wise or pixel-wise labeling over the whole image. But for many applications, such as urban population density map** or urban utility planning, a classification map based on individual buildings is much more informative. However, such semantic classification still poses some fundamental challenges, for example, how to retrieve fine boundaries of individual buildings. In this paper, we proposed a general framework for classifying the functionality of individual buildings. The proposed method is based on Convolutional Neural Networks (CNNs) which classify facade structures from street view images, such as Google StreetView, in addition to remote sensing images which usually only show roof structures. Geographic information was utilized to mask out individual buildings, and to associate the corresponding street view images. We created a benchmark dataset which was used for training and evaluating CNNs. In addition, the method was applied to generate building classification maps on both region and city scales of several cities in Canada and the US. Keywords: CNN, Building instance classification, Street view images, OpenStreetMap △ Less

Submitted 25 February, 2018; originally announced February 2018.

arXiv:1801.10240 [pdf, other]

doi 10.1109/TGRS.2018.2790262

Non-local tensor completion for multitemporal remotely sensed images inpainting

Authors: Teng-Yu Ji, Naoto Yokoya, Xiao Xiang Zhu, Ting-Zhu Huang

Abstract: Remotely sensed images may contain some missing areas because of poor weather conditions and sensor failure. Information of those areas may play an important role in the interpretation of multitemporal remotely sensed data. The paper aims at reconstructing the missing information by a non-local low-rank tensor completion method (NL-LRTC). First, nonlocal correlations in the spatial domain are take… ▽ More Remotely sensed images may contain some missing areas because of poor weather conditions and sensor failure. Information of those areas may play an important role in the interpretation of multitemporal remotely sensed data. The paper aims at reconstructing the missing information by a non-local low-rank tensor completion method (NL-LRTC). First, nonlocal correlations in the spatial domain are taken into account by searching and grou** similar image patches in a large search window. Then low-rankness of the identified 4-order tensor groups is promoted to consider their correlations in spatial, spectral, and temporal domains, while reconstructing the underlying patterns. Experimental results on simulated and real data demonstrate that the proposed method is effective both qualitatively and quantitatively. In addition, the proposed method is computationally efficient compared to other patch based methods such as the recent proposed PM-MTGSR method. △ Less

Submitted 30 January, 2018; originally announced January 2018.

arXiv:1801.08467 [pdf, other]

doi 10.1109/LGRS.2018.2799232

Identifying Corresponding Patches in SAR and Optical Images with a Pseudo-Siamese CNN

Authors: Lloyd H. Hughes, Michael Schmitt, Lichao Mou, Yuanyuan Wang, Xiao Xiang Zhu

Abstract: In this letter, we propose a pseudo-siamese convolutional neural network (CNN) architecture that enables to solve the task of identifying corresponding patches in very-high-resolution (VHR) optical and synthetic aperture radar (SAR) remote sensing imagery. Using eight convolutional layers each in two parallel network streams, a fully connected layer for the fusion of the features learned in each s… ▽ More In this letter, we propose a pseudo-siamese convolutional neural network (CNN) architecture that enables to solve the task of identifying corresponding patches in very-high-resolution (VHR) optical and synthetic aperture radar (SAR) remote sensing imagery. Using eight convolutional layers each in two parallel network streams, a fully connected layer for the fusion of the features learned in each stream, and a loss function based on binary cross-entropy, we achieve a one-hot indication if two patches correspond or not. The network is trained and tested on an automatically generated dataset that is based on a deterministic alignment of SAR and optical imagery via previously reconstructed and subsequently co-registered 3D point clouds. The satellite images, from which the patches comprising our dataset are extracted, show a complex urban scene containing many elevated objects (i.e. buildings), thus providing one of the most difficult experimental environments. The achieved results show that the network is able to predict corresponding patches with high accuracy, thus indicating great potential for further development towards a generalized multi-sensor key-point matching procedure. Index Terms-synthetic aperture radar (SAR), optical imagery, data fusion, deep learning, convolutional neural networks (CNN), image matching, deep matching △ Less

Submitted 25 January, 2018; originally announced January 2018.

arXiv:1801.07536 [pdf, other]

doi 10.1109/TGRS.2017.2769078

Automatic Detection and Positioning of Ground Control Points Using TerraSAR-X Multi-Aspect Acquisitions

Authors: Sina Montazeri, Christoph Gisinger, Michael Eineder, Xiao Xiang Zhu

Abstract: Geodetic stereo Synthetic Aperture Radar (SAR) is capable of absolute three-dimensional localization of natural Persistent Scatterer (PS)s which allows for Ground Control Point (GCP) generation using only SAR data. The prerequisite for the method to achieve high precision results is the correct detection of common scatterers in SAR images acquired from different viewing geometries. In this contrib… ▽ More Geodetic stereo Synthetic Aperture Radar (SAR) is capable of absolute three-dimensional localization of natural Persistent Scatterer (PS)s which allows for Ground Control Point (GCP) generation using only SAR data. The prerequisite for the method to achieve high precision results is the correct detection of common scatterers in SAR images acquired from different viewing geometries. In this contribution, we describe three strategies for automatic detection of identical targets in SAR images of urban areas taken from different orbit tracks. Moreover, a complete work-flow for automatic generation of large number of GCPs using SAR data is presented and its applicability is shown by exploiting TerraSAR-X (TS-X) high resolution spotlight images over the city of Oulu, Finland and a test site in Berlin, Germany. △ Less

Submitted 23 January, 2018; originally announced January 2018.

arXiv:1801.07532 [pdf]

The SARptical Dataset for Joint Analysis of SAR and Optical Image in Dense Urban Area

Authors: Yuanyuan Wang, Xiao Xiang Zhu

Abstract: The joint interpretation of very high resolution SAR and optical images in dense urban area are not trivial due to the distinct imaging geometry of the two types of images. Especially, the inevitable layover caused by the side-looking SAR imaging geometry renders this task even more challenging. Only until recently, the "SARptical" framework [1], [2] proposed a promising solution to tackle this. S… ▽ More The joint interpretation of very high resolution SAR and optical images in dense urban area are not trivial due to the distinct imaging geometry of the two types of images. Especially, the inevitable layover caused by the side-looking SAR imaging geometry renders this task even more challenging. Only until recently, the "SARptical" framework [1], [2] proposed a promising solution to tackle this. SARptical can trace individual SAR scatterers in corresponding high-resolution optical images, via rigorous 3-D reconstruction and matching. This paper introduces the SARptical dataset, which is a dataset of over 10,000 pairs of corresponding SAR, and optical image patches extracted from TerraSAR-X high-resolution spotlight images and aerial UltraCAM optical images. This dataset opens new opportunities of multisensory data analysis. One can analyze the geometry, material, and other properties of the imaged object in both SAR and optical image domain. More advanced applications such as SAR and optical image matching via deep learning [3] is now also possible. △ Less

Submitted 23 January, 2018; originally announced January 2018.

Comments: This manuscript was submitted to IGARSS 2018

arXiv:1801.07499 [pdf, other]

doi 10.1109/TGRS.2018.2790480

Object-based Multipass InSAR via Robust Low Rank Tensor Decomposition

Authors: Jian Kang, Yuanyuan Wang, Michael Schmitt, Xiao Xiang Zhu

Abstract: The most unique advantage of multipass SAR interferometry (InSAR) is the retrieval of long term geophysical parameters, e.g. linear deformation rates, over large areas. Recently, an object-based multipass InSAR framework has been proposed in [1], as an alternative to the typical single-pixel methods, e.g. Persistent Scatterer Interferometry (PSI), or pixel-cluster-based methods, e.g. SqueeSAR. Thi… ▽ More The most unique advantage of multipass SAR interferometry (InSAR) is the retrieval of long term geophysical parameters, e.g. linear deformation rates, over large areas. Recently, an object-based multipass InSAR framework has been proposed in [1], as an alternative to the typical single-pixel methods, e.g. Persistent Scatterer Interferometry (PSI), or pixel-cluster-based methods, e.g. SqueeSAR. This enables the exploitation of inherent properties of InSAR phase stacks on an object level. As a followon, this paper investigates the inherent low rank property of such phase tensors, and proposes a Robust Multipass InSAR technique via Object-based low rank tensor decomposition (RoMIO). We demonstrate that the filtered InSAR phase stacks can improve the accuracy of geophysical parameters estimated via conventional multipass InSAR techniques, e.g. PSI, by a factor of ten to thirty in typical settings. The proposed method is particularly effective against outliers, such as pixels with unmodeled phases. These merits in turn can effectively reduce the number of images required for a reliable estimation. The promising performance of the proposed method is demonstrated using high-resolution TerraSAR-X image stacks. △ Less

Submitted 23 January, 2018; originally announced January 2018.

arXiv:1710.03959 [pdf, other]

doi 10.1109/MGRS.2017.2762307

Deep learning in remote sensing: a review

Authors: Xiao Xiang Zhu, Devis Tuia, Lichao Mou, Gui-Song Xia, Liangpei Zhang, Feng Xu, Friedrich Fraundorfer

Abstract: Standing at the paradigm shift towards data-intensive science, machine learning techniques are becoming increasingly important. In particular, as a major breakthrough in the field, deep learning has proven as an extremely powerful tool in many fields. Shall we embrace deep learning as the key to all? Or, should we resist a 'black-box' solution? There are controversial opinions in the remote sensin… ▽ More Standing at the paradigm shift towards data-intensive science, machine learning techniques are becoming increasingly important. In particular, as a major breakthrough in the field, deep learning has proven as an extremely powerful tool in many fields. Shall we embrace deep learning as the key to all? Or, should we resist a 'black-box' solution? There are controversial opinions in the remote sensing community. In this article, we analyze the challenges of using deep learning for remote sensing data analysis, review the recent advances, and provide resources to make deep learning in remote sensing ridiculously simple to start with. More importantly, we advocate remote sensing scientists to bring their expertise into deep learning, and use it as an implicit general model to tackle unprecedented large-scale influential challenges, such as climate change and urbanization. △ Less

Submitted 11 October, 2017; originally announced October 2017.

Comments: Accepted for publication IEEE Geoscience and Remote Sensing Magazine

Showing 1–48 of 48 results for author: Zhu, X X