Search | arXiv e-print repository

Linearly-evolved Transformer for Pan-sharpening

Authors: Junming Hou, Zihan Cao, Naishan Zheng, Xuan Li, Xiaoyu Chen, Xinyang Liu, Xiaofeng Cong, Man Zhou, Danfeng Hong

Abstract: Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of… ▽ More Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of model parameters and FLOPs, thus preventing its application over low-resource satellites.To address this challenge between favorable performance and expensive computation, we tailor an efficient linearly-evolved transformer variant and employ it to construct a lightweight pan-sharpening framework. In detail, we deepen into the popular cascaded transformer modeling with cutting-edge methods and develop the alternative 1-order linearly-evolved transformer variant with the 1-dimensional linear convolution chain to achieve the same function. In this way, our proposed method is capable of benefiting the cascaded modeling rule while achieving favorable performance in the efficient manner. Extensive experiments over multiple satellite datasets suggest that our proposed method achieves competitive performance against other state-of-the-art with fewer computational resources. Further, the consistently favorable performance has been verified over the hyper-spectral image fusion task. Our main focus is to provide an alternative global modeling framework with an efficient structure. The code will be publicly available. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 10 pages

arXiv:2402.15335 [pdf, other]

Low-Rank Representations Meets Deep Unfolding: A Generalized and Interpretable Network for Hyperspectral Anomaly Detection

Authors: Chenyu Li, Bing Zhang, Danfeng Hong, **g Yao, Jocelyn Chanussot

Abstract: Current hyperspectral anomaly detection (HAD) benchmark datasets suffer from low resolution, simple background, and small size of the detection data. These factors also limit the performance of the well-known low-rank representation (LRR) models in terms of robustness on the separation of background and target features and the reliance on manual parameter selection. To this end, we build a new set… ▽ More Current hyperspectral anomaly detection (HAD) benchmark datasets suffer from low resolution, simple background, and small size of the detection data. These factors also limit the performance of the well-known low-rank representation (LRR) models in terms of robustness on the separation of background and target features and the reliance on manual parameter selection. To this end, we build a new set of HAD benchmark datasets for improving the robustness of the HAD algorithm in complex scenarios, AIR-HAD for short. Accordingly, we propose a generalized and interpretable HAD network by deeply unfolding a dictionary-learnable LLR model, named LRR-Net$^+$, which is capable of spectrally decoupling the background structure and object properties in a more generalized fashion and eliminating the bias introduced by vital interference targets concurrently. In addition, LRR-Net$^+$ integrates the solution process of the Alternating Direction Method of Multipliers (ADMM) optimizer with the deep network, guiding its search process and imparting a level of interpretability to parameter optimization. Additionally, the integration of physical models with DL techniques eliminates the need for manual parameter tuning. The manually tuned parameters are seamlessly transformed into trainable parameters for deep neural networks, facilitating a more efficient and automated optimization process. Extensive experiments conducted on the AIR-HAD dataset show the superiority of our LRR-Net$^+$ in terms of detection performance and generalization ability, compared to top-performing rivals. Furthermore, the compilable codes and our AIR-HAD benchmark datasets in this paper will be made available freely and openly at \url{https://sites.google.com/view/danfeng-hong}. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2401.16719 [pdf, other]

OptiState: State Estimation of Legged Robots using Gated Networks with Transformer-based Vision and Kalman Filtering

Authors: Alexander Schperberg, Yusuke Tanaka, Saviz Mowlavi, Feng Xu, Bharathan Balaji, Dennis Hong

Abstract: State estimation for legged robots is challenging due to their highly dynamic motion and limitations imposed by sensor accuracy. By integrating Kalman filtering, optimization, and learning-based modalities, we propose a hybrid solution that combines proprioception and exteroceptive information for estimating the state of the robot's trunk. Leveraging joint encoder and IMU measurements, our Kalman… ▽ More State estimation for legged robots is challenging due to their highly dynamic motion and limitations imposed by sensor accuracy. By integrating Kalman filtering, optimization, and learning-based modalities, we propose a hybrid solution that combines proprioception and exteroceptive information for estimating the state of the robot's trunk. Leveraging joint encoder and IMU measurements, our Kalman filter is enhanced through a single-rigid body model that incorporates ground reaction force control outputs from convex Model Predictive Control optimization. The estimation is further refined through Gated Recurrent Units, which also considers semantic insights and robot height from a Vision Transformer autoencoder applied on depth images. This framework not only furnishes accurate robot state estimates, including uncertainty evaluations, but can minimize the nonlinear errors that arise from sensor measurements and model simplifications through learning. The proposed methodology is evaluated in hardware using a quadruped robot on various terrains, yielding a 65% improvement on the Root Mean Squared Error compared to our VIO SLAM baseline. Code example: https://github.com/AlexS28/OptiState △ Less

Submitted 28 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: Accepted to the 2024 IEEE International Conference on Robotics and Automation (ICRA), May 13-17, in Yokohama, Japan. 7 pages, 5 figures, 1 table

arXiv:2311.06968 [pdf, other]

Physics-Informed Data Denoising for Real-Life Sensing Systems

Authors: Xiyuan Zhang, Xiaohan Fu, Diyan Teng, Chengyu Dong, Keerthivasan Vijayakumar, Jiayun Zhang, Ranak Roy Chowdhury, Junsheng Han, Dezhi Hong, Rashmi Kulkarni, **gbo Shang, Rajesh Gupta

Abstract: Sensors measuring real-life physical processes are ubiquitous in today's interconnected world. These sensors inherently bear noise that often adversely affects performance and reliability of the systems they support. Classic filtering-based approaches introduce strong assumptions on the time or frequency characteristics of sensory measurements, while learning-based denoising approaches typically r… ▽ More Sensors measuring real-life physical processes are ubiquitous in today's interconnected world. These sensors inherently bear noise that often adversely affects performance and reliability of the systems they support. Classic filtering-based approaches introduce strong assumptions on the time or frequency characteristics of sensory measurements, while learning-based denoising approaches typically rely on using ground truth clean data to train a denoising model, which is often challenging or prohibitive to obtain for many real-world applications. We observe that in many scenarios, the relationships between different sensor measurements (e.g., location and acceleration) are analytically described by laws of physics (e.g., second-order differential equation). By incorporating such physics constraints, we can guide the denoising process to improve even in the absence of ground truth data. In light of this, we design a physics-informed denoising model that leverages the inherent algebraic relationships between different measurements governed by the underlying physics. By obviating the need for ground truth clean data, our method offers a practical denoising solution for real-world applications. We conducted experiments in various domains, including inertial navigation, CO2 monitoring, and HVAC control, and achieved state-of-the-art performance compared with existing denoising methods. Our method can denoise data in real time (4ms for a sequence of 1s) for low-cost noisy sensors and produces results that closely align with those from high-precision, high-cost alternatives, leading to an efficient, cost-effective approach for more accurate sensor-based systems. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: SenSys 2023

arXiv:2310.06277 [pdf, other]

Streaming Probabilistic PCA for Missing Data with Heteroscedastic Noise

Authors: Kyle Gilman, David Hong, Jeffrey A. Fessler, Laura Balzano

Abstract: Streaming principal component analysis (PCA) is an integral tool in large-scale machine learning for rapidly estimating low-dimensional subspaces of very high dimensional and high arrival-rate data with missing entries and corrupting noise. However, modern trends increasingly combine data from a variety of sources, meaning they may exhibit heterogeneous quality across samples. Since standard strea… ▽ More Streaming principal component analysis (PCA) is an integral tool in large-scale machine learning for rapidly estimating low-dimensional subspaces of very high dimensional and high arrival-rate data with missing entries and corrupting noise. However, modern trends increasingly combine data from a variety of sources, meaning they may exhibit heterogeneous quality across samples. Since standard streaming PCA algorithms do not account for non-uniform noise, their subspace estimates can quickly degrade. On the other hand, the recently proposed Heteroscedastic Probabilistic PCA Technique (HePPCAT) addresses this heterogeneity, but it was not designed to handle missing entries and streaming data, nor does it adapt to non-stationary behavior in time series data. This paper proposes the Streaming HeteroscedASTic Algorithm for PCA (SHASTA-PCA) to bridge this divide. SHASTA-PCA employs a stochastic alternating expectation maximization approach that jointly learns the low-rank latent factors and the unknown noise variances from streaming data that may have missing entries and heteroscedastic noise, all while maintaining a low memory and computational footprint. Numerical experiments validate the superior subspace estimation of our method compared to state-of-the-art streaming PCA algorithms in the heteroscedastic setting. Finally, we illustrate SHASTA-PCA applied to highly-heterogeneous real data from astronomy. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: 19 pages, 6 figures

arXiv:2309.16499 [pdf, other]

Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for Cross-City Semantic Segmentation using High-Resolution Domain Adaptation Networks

Authors: Danfeng Hong, Bing Zhang, Hao Li, Yuxuan Li, **g Yao, Chenyu Li, Martin Werner, Jocelyn Chanussot, Alexander Zipf, Xiao Xiang Zhu

Abstract: Artificial intelligence (AI) approaches nowadays have gained remarkable success in single-modality-dominated remote sensing (RS) applications, especially with an emphasis on individual urban environments (e.g., single cities or regions). Yet these AI models tend to meet the performance bottleneck in the case studies across cities or regions, due to the lack of diverse RS information and cutting-ed… ▽ More Artificial intelligence (AI) approaches nowadays have gained remarkable success in single-modality-dominated remote sensing (RS) applications, especially with an emphasis on individual urban environments (e.g., single cities or regions). Yet these AI models tend to meet the performance bottleneck in the case studies across cities or regions, due to the lack of diverse RS information and cutting-edge solutions with high generalization ability. To this end, we build a new set of multimodal remote sensing benchmark datasets (including hyperspectral, multispectral, SAR) for the study purpose of the cross-city semantic segmentation task (called C2Seg dataset), which consists of two cross-city scenes, i.e., Berlin-Augsburg (in Germany) and Bei**g-Wuhan (in China). Beyond the single city, we propose a high-resolution domain adaptation network, HighDAN for short, to promote the AI model's generalization ability from the multi-city environments. HighDAN is capable of retaining the spatially topological structure of the studied urban scene well in a parallel high-to-low resolution fusion fashion but also closing the gap derived from enormous differences of RS image representations between different cities by means of adversarial learning. In addition, the Dice loss is considered in HighDAN to alleviate the class imbalance issue caused by factors across cities. Extensive experiments conducted on the C2Seg dataset show the superiority of our HighDAN in terms of segmentation performance and generalization ability, compared to state-of-the-art competitors. The C2Seg dataset and the semantic segmentation toolbox (involving the proposed HighDAN) will be available publicly at https://github.com/danfenghong. △ Less

Submitted 3 October, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

arXiv:2301.03462 [pdf, other]

Unleashing the Power of Shared Label Structures for Human Activity Recognition

Authors: Xiyuan Zhang, Ranak Roy Chowdhury, Jiayun Zhang, Dezhi Hong, Rajesh K. Gupta, **gbo Shang

Abstract: Current human activity recognition (HAR) techniques regard activity labels as integer class IDs without explicitly modeling the semantics of class labels. We observe that different activity names often have shared structures. For example, "open door" and "open fridge" both have "open" as the action; "kicking soccer ball" and "playing tennis ball" both have "ball" as the object. Such shared structu… ▽ More Current human activity recognition (HAR) techniques regard activity labels as integer class IDs without explicitly modeling the semantics of class labels. We observe that different activity names often have shared structures. For example, "open door" and "open fridge" both have "open" as the action; "kicking soccer ball" and "playing tennis ball" both have "ball" as the object. Such shared structures in label names can be translated to the similarity in sensory data and modeling common structures would help uncover knowledge across different activities, especially for activities with limited samples. In this paper, we propose SHARE, a HAR framework that takes into account shared structures of label names for different activities. To exploit the shared structures, SHARE comprises an encoder for extracting features from input sensory time series and a decoder for generating label names as a token sequence. We also propose three label augmentation techniques to help the model more effectively capture semantic structures across activities, including a basic token-level augmentation, and two enhanced embedding-level and sequence-level augmentations utilizing the capabilities of pre-trained models. SHARE outperforms state-of-the-art HAR models in extensive experiments on seven HAR benchmark datasets. We also evaluate in few-shot learning and label imbalance settings and observe even more significant performance gap. △ Less

Submitted 19 October, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

Comments: CIKM 2023

arXiv:2210.17151 [pdf, other]

Tech Report: One-stage Lightweight Object Detectors

Authors: Deokki Hong

Abstract: This work is for designing one-stage lightweight detectors which perform well in terms of mAP and latency. With baseline models each of which targets on GPU and CPU respectively, various operations are applied instead of the main operations in backbone networks of baseline models. In addition to experiments about backbone networks and operations, several feature pyramid network (FPN) architectures… ▽ More This work is for designing one-stage lightweight detectors which perform well in terms of mAP and latency. With baseline models each of which targets on GPU and CPU respectively, various operations are applied instead of the main operations in backbone networks of baseline models. In addition to experiments about backbone networks and operations, several feature pyramid network (FPN) architectures are investigated. Benchmarks and proposed detectors are analyzed in terms of the number of parameters, Gflops, GPU latency, CPU latency and mAP, on MS COCO dataset which is a benchmark dataset in object detection. This work propose similar or better network architectures considering the trade-off between accuracy and latency. For example, our proposed GPU-target backbone network outperforms that of YOLOX-tiny which is selected as the benchmark by 1.43x in speed and 0.5 mAP in accuracy on NVIDIA GeForce RTX 2080 Ti GPU. △ Less

Submitted 31 October, 2022; originally announced October 2022.

arXiv:2210.08181 [pdf, other]

Panchromatic and Multispectral Image Fusion via Alternating Reverse Filtering Network

Authors: Keyu Yan, Man Zhou, Jie Huang, Feng Zhao, Chengjun Xie, Chongyi Li, Danfeng Hong

Abstract: Panchromatic (PAN) and multi-spectral (MS) image fusion, named Pan-sharpening, refers to super-resolve the low-resolution (LR) multi-spectral (MS) images in the spatial domain to generate the expected high-resolution (HR) MS images, conditioning on the corresponding high-resolution PAN images. In this paper, we present a simple yet effective \textit{alternating reverse filtering network} for pan-s… ▽ More Panchromatic (PAN) and multi-spectral (MS) image fusion, named Pan-sharpening, refers to super-resolve the low-resolution (LR) multi-spectral (MS) images in the spatial domain to generate the expected high-resolution (HR) MS images, conditioning on the corresponding high-resolution PAN images. In this paper, we present a simple yet effective \textit{alternating reverse filtering network} for pan-sharpening. Inspired by the classical reverse filtering that reverses images to the status before filtering, we formulate pan-sharpening as an alternately iterative reverse filtering process, which fuses LR MS and HR MS in an interpretable manner. Different from existing model-driven methods that require well-designed priors and degradation assumptions, the reverse filtering process avoids the dependency on pre-defined exact priors. To guarantee the stability and convergence of the iterative process via contraction map** on a metric space, we develop the learnable multi-scale Gaussian kernel module, instead of using specific filters. We demonstrate the theoretical feasibility of such formulations. Extensive experiments on diverse scenes to thoroughly verify the performance of our method, significantly outperforming the state of the arts. △ Less

Submitted 14 October, 2022; originally announced October 2022.

Journal ref: NeurIPS2022

arXiv:2209.03210 [pdf, other]

Real-to-Sim: Predicting Residual Errors of Robotic Systems with Sparse Data using a Learning-based Unscented Kalman Filter

Authors: Alexander Schperberg, Yusuke Tanaka, Feng Xu, Marcel Menner, Dennis Hong

Abstract: Achieving highly accurate dynamic or simulator models that are close to the real robot can facilitate model-based controls (e.g., model predictive control or linear-quadradic regulators), model-based trajectory planning (e.g., trajectory optimization), and decrease the amount of learning time necessary for reinforcement learning methods. Thus, the objective of this work is to learn the residual er… ▽ More Achieving highly accurate dynamic or simulator models that are close to the real robot can facilitate model-based controls (e.g., model predictive control or linear-quadradic regulators), model-based trajectory planning (e.g., trajectory optimization), and decrease the amount of learning time necessary for reinforcement learning methods. Thus, the objective of this work is to learn the residual errors between a dynamic and/or simulator model and the real robot. This is achieved using a neural network, where the parameters of a neural network are updated through an Unscented Kalman Filter (UKF) formulation. Using this method, we model these residual errors with only small amounts of data -- a necessity as we improve the simulator/dynamic model by learning directly from real-world operation. We demonstrate our method on robotic hardware (e.g., manipulator arm, and a wheeled robot), and show that with the learned residual errors, we can further close the reality gap between dynamic models, simulations, and actual hardware. △ Less

Submitted 7 May, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

Comments: Accepted to Ubiquitous Robotics 2023, Honolulu, Hawaii

arXiv:2208.00323 [pdf]

A Multi-View Learning Approach to Enhance Automatic 12-Lead ECG Diagnosis Performance

Authors: Jae-Won Choi, Dae-Yong Hong, Chan Jung, Eugene Hwang, Sung-Hyuk Park, Seung-Young Roh

Abstract: The performances of commonly used electrocardiogram (ECG) diagnosis models have recently improved with the introduction of deep learning (DL). However, the impact of various combinations of multiple DL components and/or the role of data augmentation techniques on the diagnosis have not been sufficiently investigated. This study proposes an ensemble-based multi-view learning approach with an ECG au… ▽ More The performances of commonly used electrocardiogram (ECG) diagnosis models have recently improved with the introduction of deep learning (DL). However, the impact of various combinations of multiple DL components and/or the role of data augmentation techniques on the diagnosis have not been sufficiently investigated. This study proposes an ensemble-based multi-view learning approach with an ECG augmentation technique to achieve a higher performance than traditional automatic 12-lead ECG diagnosis methods. The data analysis results show that the proposed model reports an F1 score of 0.840, which outperforms existing state-ofthe-art methods in the literature. △ Less

Submitted 30 July, 2022; originally announced August 2022.

Comments: 9 pages, 3 figures, and 5 tables

arXiv:2207.01418 [pdf, other]

doi 10.1109/IROS47612.2022.9981579

Simultaneous Contact-Rich Gras** and Locomotion via Distributed Optimization Enabling Free-Climbing for Multi-Limbed Robots

Authors: Yuki Shirai, Xuan Lin, Alexander Schperberg, Yusuke Tanaka, Hayato Kato, Varit Vichathorn, Dennis Hong

Abstract: While motion planning of locomotion for legged robots has shown great success, motion planning for legged robots with dexterous multi-finger gras** is not mature yet. We present an efficient motion planning framework for simultaneously solving locomotion (e.g., centroidal dynamics), gras** (e.g., patch contact), and contact (e.g., gait) problems. To accelerate the planning process, we propose… ▽ More While motion planning of locomotion for legged robots has shown great success, motion planning for legged robots with dexterous multi-finger gras** is not mature yet. We present an efficient motion planning framework for simultaneously solving locomotion (e.g., centroidal dynamics), gras** (e.g., patch contact), and contact (e.g., gait) problems. To accelerate the planning process, we propose distributed optimization frameworks based on Alternating Direction Methods of Multipliers (ADMM) to solve the original large-scale Mixed-Integer NonLinear Programming (MINLP). The resulting frameworks use Mixed-Integer Quadratic Programming (MIQP) to solve contact and NonLinear Programming (NLP) to solve nonlinear dynamics, which are more computationally tractable and less sensitive to parameters. Also, we explicitly enforce patch contact constraints from limit surfaces with micro-spine grippers. We demonstrate our proposed framework in the hardware experiments, showing that the multi-limbed robot is able to realize various motions including free-climbing at a slope angle 45° with a much shorter planning time. △ Less

Submitted 5 July, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: Accepted for the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022). Hardware implementation videos: https://youtu.be/QLH1shghqQ0

arXiv:2207.01033 [pdf, other]

Adaptive Force Controller for Contact-Rich Robotic Systems using an Unscented Kalman Filter

Authors: Alexander Schperberg, Yuki Shirai, Xuan Lin, Yusuke Tanaka, Dennis Hong

Abstract: In multi-point contact systems, precise force control is crucial for achieving stable and safe interactions between robots and their environment. Thus, we demonstrate an admittance controller with auto-tuning that can be applied for these systems. The controller's objective is to track the target wrench profiles of each contact point while considering the additional torque due to rotational fricti… ▽ More In multi-point contact systems, precise force control is crucial for achieving stable and safe interactions between robots and their environment. Thus, we demonstrate an admittance controller with auto-tuning that can be applied for these systems. The controller's objective is to track the target wrench profiles of each contact point while considering the additional torque due to rotational friction. Our admittance controller is adaptive during online operation by using an auto-tuning method that tunes the gains of the controller while following user-specified training objectives. These objectives include facilitating controller stability, such as tracking the wrench profiles as closely as possible, ensuring control outputs are within force limits that minimize slippage, and avoiding configurations that induce kinematic singularity. We demonstrate the robustness of our controller on hardware for both manipulation and locomotion tasks using a multi-limbed climbing robot. △ Less

Submitted 1 October, 2023; v1 submitted 3 July, 2022; originally announced July 2022.

Comments: Accepted to IEEE RAS International Conference on Humanoid Robots 2023, December 12-14 in Austin, Texas, USA

arXiv:2205.06407 [pdf, other]

Tensor Decompositions for Hyperspectral Data Processing in Remote Sensing: A Comprehensive Review

Authors: Minghua Wang, Danfeng Hong, Zhu Han, Jiaxin Li, **g Yao, Lianru Gao, Bing Zhang, Jocelyn Chanussot

Abstract: Owing to the rapid development of sensor technology, hyperspectral (HS) remote sensing (RS) imaging has provided a significant amount of spatial and spectral information for the observation and analysis of the Earth's surface at a distance of data acquisition devices, such as aircraft, spacecraft, and satellite. The recent advancement and even revolution of the HS RS technique offer opportunities… ▽ More Owing to the rapid development of sensor technology, hyperspectral (HS) remote sensing (RS) imaging has provided a significant amount of spatial and spectral information for the observation and analysis of the Earth's surface at a distance of data acquisition devices, such as aircraft, spacecraft, and satellite. The recent advancement and even revolution of the HS RS technique offer opportunities to realize the full potential of various applications, while confronting new challenges for efficiently processing and analyzing the enormous HS acquisition data. Due to the maintenance of the 3-D HS inherent structure, tensor decomposition has aroused widespread concern and research in HS data processing tasks over the past decades. In this article, we aim at presenting a comprehensive overview of tensor decomposition, specifically contextualizing the five broad topics in HS data processing, and they are HS restoration, compressed sensing, anomaly detection, super-resolution, and spectral unmixing. For each topic, we elaborate on the remarkable achievements of tensor decomposition models for HS RS with a pivotal description of the existing methodologies and a representative exhibition on the experimental results. As a result, the remaining challenges of the follow-up research directions are outlined and discussed from the perspective of the real HS RS practices and tensor decomposition merged with advanced priors and even with deep neural networks. This article summarizes different tensor decomposition-based HS data processing methods and categorizes them into different classes from simple adoptions to complex combinations with other priors for the algorithm beginners. We also expect this survey can provide new investigations and development trends for the experienced researchers who understand tensor decomposition and HS RS to some extent. △ Less

Submitted 12 May, 2022; originally announced May 2022.

arXiv:2205.03742 [pdf, other]

Decoupled-and-Coupled Networks: Self-Supervised Hyperspectral Image Super-Resolution with Subpixel Fusion

Authors: Danfeng Hong, **g Yao, Deyu Meng, Naoto Yokoya, Jocelyn Chanussot

Abstract: Enormous efforts have been recently made to super-resolve hyperspectral (HS) images with the aid of high spatial resolution multispectral (MS) images. Most prior works usually perform the fusion task by means of multifarious pixel-level priors. Yet the intrinsic effects of a large distribution gap between HS-MS data due to differences in the spatial and spectral resolution are less investigated. T… ▽ More Enormous efforts have been recently made to super-resolve hyperspectral (HS) images with the aid of high spatial resolution multispectral (MS) images. Most prior works usually perform the fusion task by means of multifarious pixel-level priors. Yet the intrinsic effects of a large distribution gap between HS-MS data due to differences in the spatial and spectral resolution are less investigated. The gap might be caused by unknown sensor-specific properties or highly-mixed spectral information within one pixel (due to low spatial resolution). To this end, we propose a subpixel-level HS super-resolution framework by devising a novel decoupled-and-coupled network, called DC-Net, to progressively fuse HS-MS information from the pixel- to subpixel-level, from the image- to feature-level. As the name suggests, DC-Net first decouples the input into common (or cross-sensor) and sensor-specific components to eliminate the gap between HS-MS images before further fusion, and then fully blends them by a model-guided coupled spectral unmixing (CSU) net. More significantly, we append a self-supervised learning module behind the CSU net by guaranteeing the material consistency to enhance the detailed appearances of the restored HS product. Extensive experimental results show the superiority of our method both visually and quantitatively and achieve a significant improvement in comparison with the state-of-the-arts. Furthermore, the codes and datasets will be available at https://sites.google.com/view/danfeng-hong for the sake of reproducibility. △ Less

Submitted 7 May, 2022; originally announced May 2022.

arXiv:2205.01380 [pdf, other]

Deep Learning in Multimodal Remote Sensing Data Fusion: A Comprehensive Review

Authors: Jiaxin Li, Danfeng Hong, Lianru Gao, **g Yao, Ke Zheng, Bing Zhang, Jocelyn Chanussot

Abstract: With the extremely rapid advances in remote sensing (RS) technology, a great quantity of Earth observation (EO) data featuring considerable and complicated heterogeneity is readily available nowadays, which renders researchers an opportunity to tackle current geoscience applications in a fresh way. With the joint utilization of EO data, much research on multimodal RS data fusion has made tremendou… ▽ More With the extremely rapid advances in remote sensing (RS) technology, a great quantity of Earth observation (EO) data featuring considerable and complicated heterogeneity is readily available nowadays, which renders researchers an opportunity to tackle current geoscience applications in a fresh way. With the joint utilization of EO data, much research on multimodal RS data fusion has made tremendous progress in recent years, yet these developed traditional algorithms inevitably meet the performance bottleneck due to the lack of the ability to comprehensively analyse and interpret these strongly heterogeneous data. Hence, this non-negligible limitation further arouses an intense demand for an alternative tool with powerful processing competence. Deep learning (DL), as a cutting-edge technology, has witnessed remarkable breakthroughs in numerous computer vision tasks owing to its impressive ability in data representation and reconstruction. Naturally, it has been successfully applied to the field of multimodal RS data fusion, yielding great improvement compared with traditional methods. This survey aims to present a systematic overview in DL-based multimodal RS data fusion. More specifically, some essential knowledge about this topic is first given. Subsequently, a literature survey is conducted to analyse the trends of this field. Some prevalent sub-fields in the multimodal RS data fusion are then reviewed in terms of the to-be-fused data modalities, i.e., spatiospectral, spatiotemporal, light detection and ranging-optical, synthetic aperture radar-optical, and RS-Geospatial Big Data fusion. Furthermore, We collect and summarize some valuable resources for the sake of the development in multimodal RS data fusion. Finally, the remaining challenges and potential future directions are highlighted. △ Less

Submitted 3 May, 2022; originally announced May 2022.

arXiv:2203.16952 [pdf, other]

doi 10.1109/TGRS.2023.3286826

Multimodal Fusion Transformer for Remote Sensing Image Classification

Authors: Swalpa Kumar Roy, Ankur Deria, Danfeng Hong, Behnood Rasti, Antonio Plaza, Jocelyn Chanussot

Abstract: Vision transformers (ViTs) have been trending in image classification tasks due to their promising performance when compared to convolutional neural networks (CNNs). As a result, many researchers have tried to incorporate ViTs in hyperspectral image (HSI) classification tasks. To achieve satisfactory performance, close to that of CNNs, transformers need fewer parameters. ViTs and other similar tra… ▽ More Vision transformers (ViTs) have been trending in image classification tasks due to their promising performance when compared to convolutional neural networks (CNNs). As a result, many researchers have tried to incorporate ViTs in hyperspectral image (HSI) classification tasks. To achieve satisfactory performance, close to that of CNNs, transformers need fewer parameters. ViTs and other similar transformers use an external classification (CLS) token which is randomly initialized and often fails to generalize well, whereas other sources of multimodal datasets, such as light detection and ranging (LiDAR) offer the potential to improve these models by means of a CLS. In this paper, we introduce a new multimodal fusion transformer (MFT) network which comprises a multihead cross patch attention (mCrossPA) for HSI land-cover classification. Our mCrossPA utilizes other sources of complementary information in addition to the HSI in the transformer encoder to achieve better generalization. The concept of tokenization is used to generate CLS and HSI patch tokens, hel** to learn a {distinctive representation} in a reduced and hierarchical feature space. Extensive experiments are carried out on {widely used benchmark} datasets {i.e.,} the University of Houston, Trento, University of Southern Mississippi Gulfpark (MUUFL), and Augsburg. We compare the results of the proposed MFT model with other state-of-the-art transformers, classical CNNs, and conventional classifiers models. The superior performance achieved by the proposed model is due to the use of multihead cross patch attention. The source code will be made available publicly at \url{https://github.com/AnkurDeria/MFT}.} △ Less

Submitted 20 June, 2023; v1 submitted 31 March, 2022; originally announced March 2022.

Comments: Published in IEEE Transactions on Geoscience and Remote Sensing

arXiv:2203.06889 [pdf, other]

Lead-agnostic Self-supervised Learning for Local and Global Representations of Electrocardiogram

Authors: Jungwoo Oh, Hyunseung Chung, Joon-myoung Kwon, Dong-gyun Hong, Edward Choi

Abstract: In recent years, self-supervised learning methods have shown significant improvement for pre-training with unlabeled data and have proven helpful for electrocardiogram signals. However, most previous pre-training methods for electrocardiogram focused on capturing only global contextual representations. This inhibits the models from learning fruitful representation of electrocardiogram, which resul… ▽ More In recent years, self-supervised learning methods have shown significant improvement for pre-training with unlabeled data and have proven helpful for electrocardiogram signals. However, most previous pre-training methods for electrocardiogram focused on capturing only global contextual representations. This inhibits the models from learning fruitful representation of electrocardiogram, which results in poor performance on downstream tasks. Additionally, they cannot fine-tune the model with an arbitrary set of electrocardiogram leads unless the models were pre-trained on the same set of leads. In this work, we propose an ECG pre-training method that learns both local and global contextual representations for better generalizability and performance on downstream tasks. In addition, we propose random lead masking as an ECG-specific augmentation method to make our proposed model robust to an arbitrary set of leads. Experimental results on two downstream tasks, cardiac arrhythmia classification and patient identification, show that our proposed approach outperforms other state-of-the-art methods. △ Less

Submitted 18 March, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: Accepted at CHIL 2022 (16 pages, 3 figures, 4 tables)

arXiv:2108.01262 [pdf, other]

doi 10.1109/LRA.2021.3103054

SABER: Data-Driven Motion Planner for Autonomously Navigating Heterogeneous Robots

Authors: Alexander Schperberg, Stephanie Tsuei, Stefano Soatto, Dennis Hong

Abstract: We present an end-to-end online motion planning framework that uses a data-driven approach to navigate a heterogeneous robot team towards a global goal while avoiding obstacles in uncertain environments. First, we use stochastic model predictive control (SMPC) to calculate control inputs that satisfy robot dynamics, and consider uncertainty during obstacle avoidance with chance constraints. Second… ▽ More We present an end-to-end online motion planning framework that uses a data-driven approach to navigate a heterogeneous robot team towards a global goal while avoiding obstacles in uncertain environments. First, we use stochastic model predictive control (SMPC) to calculate control inputs that satisfy robot dynamics, and consider uncertainty during obstacle avoidance with chance constraints. Second, recurrent neural networks are used to provide a quick estimate of future state uncertainty considered in the SMPC finite-time horizon solution, which are trained on uncertainty outputs of various simultaneous localization and map** algorithms. When two or more robots are in communication range, these uncertainties are then updated using a distributed Kalman filtering approach. Lastly, a Deep Q-learning agent is employed to serve as a high-level path planner, providing the SMPC with target positions that move the robots towards a desired global goal. Our complete methods are demonstrated on a ground and aerial robot simultaneously (code available at: https://github.com/AlexS28/SABER). △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: Accepted to IEEE Robotics and Automation Letters (RA-L) 2021. Pre-print version. The video link of the paper is: https://www.youtube.com/watch?v=EKCCQtN5Z6A

arXiv:2105.10194 [pdf, other]

doi 10.1109/TNNLS.2021

Endmember-Guided Unmixing Network (EGU-Net): A General Deep Learning Framework for Self-Supervised Hyperspectral Unmixing

Authors: Danfeng Hong, Lianru Gao, **g Yao, Naoto Yokoya, Jocelyn Chanussot, Uta Heiden, Bing Zhang

Abstract: Over the past decades, enormous efforts have been made to improve the performance of linear or nonlinear mixing models for hyperspectral unmixing, yet their ability to simultaneously generalize various spectral variabilities and extract physically meaningful endmembers still remains limited due to the poor ability in data fitting and reconstruction and the sensitivity to various spectral variabili… ▽ More Over the past decades, enormous efforts have been made to improve the performance of linear or nonlinear mixing models for hyperspectral unmixing, yet their ability to simultaneously generalize various spectral variabilities and extract physically meaningful endmembers still remains limited due to the poor ability in data fitting and reconstruction and the sensitivity to various spectral variabilities. Inspired by the powerful learning ability of deep learning, we attempt to develop a general deep learning approach for hyperspectral unmixing, by fully considering the properties of endmembers extracted from the hyperspectral imagery, called endmember-guided unmixing network (EGU-Net). Beyond the alone autoencoder-like architecture, EGU-Net is a two-stream Siamese deep network, which learns an additional network from the pure or nearly-pure endmembers to correct the weights of another unmixing network by sharing network parameters and adding spectrally meaningful constraints (e.g., non-negativity and sum-to-one) towards a more accurate and interpretable unmixing solution. Furthermore, the resulting general framework is not only limited to pixel-wise spectral unmixing but also applicable to spatial information modeling with convolutional operators for spatial-spectral unmixing. Experimental results conducted on three different datasets with the ground-truth of abundance maps corresponding to each material demonstrate the effectiveness and superiority of the EGU-Net over state-of-the-art unmixing algorithms. The codes will be available from the website: https://github.com/danfenghong/IEEE_TNNLS_EGU-Net. △ Less

Submitted 21 May, 2021; originally announced May 2021.

Journal ref: IEEE Transactions on Neural Networks and Learning Systems, 2021

arXiv:2103.01449 [pdf, other]

Interpretable Hyperspectral AI: When Non-Convex Modeling meets Hyperspectral Remote Sensing

Authors: Danfeng Hong, Wei He, Naoto Yokoya, **g Yao, Lianru Gao, Liangpei Zhang, Jocelyn Chanussot, Xiao Xiang Zhu

Abstract: Hyperspectral imaging, also known as image spectrometry, is a landmark technique in geoscience and remote sensing (RS). In the past decade, enormous efforts have been made to process and analyze these hyperspectral (HS) products mainly by means of seasoned experts. However, with the ever-growing volume of data, the bulk of costs in manpower and material resources poses new challenges on reducing t… ▽ More Hyperspectral imaging, also known as image spectrometry, is a landmark technique in geoscience and remote sensing (RS). In the past decade, enormous efforts have been made to process and analyze these hyperspectral (HS) products mainly by means of seasoned experts. However, with the ever-growing volume of data, the bulk of costs in manpower and material resources poses new challenges on reducing the burden of manual labor and improving efficiency. For this reason, it is, therefore, urgent to develop more intelligent and automatic approaches for various HS RS applications. Machine learning (ML) tools with convex optimization have successfully undertaken the tasks of numerous artificial intelligence (AI)-related applications. However, their ability in handling complex practical problems remains limited, particularly for HS data, due to the effects of various spectral variabilities in the process of HS imaging and the complexity and redundancy of higher dimensional HS signals. Compared to the convex models, non-convex modeling, which is capable of characterizing more complex real scenes and providing the model interpretability technically and theoretically, has been proven to be a feasible solution to reduce the gap between challenging HS vision tasks and currently advanced intelligent data processing models. △ Less

Submitted 1 March, 2021; originally announced March 2021.

arXiv:2103.01333 [pdf, other]

doi 10.1109/ICRA48506.2021.9561502

LTO: Lazy Trajectory Optimization with Graph-Search Planning for High DOF Robots in Cluttered Environments

Authors: Yuki Shirai, Xuan Lin, Ankur Mehta, Dennis Hong

Abstract: Although Trajectory Optimization (TO) is one of the most powerful motion planning tools, it suffers from expensive computational complexity as a time horizon increases in cluttered environments. It can also fail to converge to a globally optimal solution. In this paper, we present Lazy Trajectory Optimization (LTO) that unifies local short-horizon TO and global Graph-Search Planning (GSP) to gener… ▽ More Although Trajectory Optimization (TO) is one of the most powerful motion planning tools, it suffers from expensive computational complexity as a time horizon increases in cluttered environments. It can also fail to converge to a globally optimal solution. In this paper, we present Lazy Trajectory Optimization (LTO) that unifies local short-horizon TO and global Graph-Search Planning (GSP) to generate a long-horizon global optimal trajectory. LTO solves TO with the same constraints as the original long-horizon TO with improved time complexity. We also propose a TO-aware cost function that can balance both solution cost and planning time. Since LTO solves many nearly identical TO in a roadmap, it can provide an informed warm-start for TO to accelerate the planning process. We also present proofs of the computational complexity and optimality of LTO. Finally, we demonstrate LTO's performance on motion planning problems for a 2 DOF free-flying robot and a 21 DOF legged robot, showing that LTO outperforms existing algorithms in terms of its runtime and reliability. △ Less

Submitted 23 March, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

Comments: Accepted for 2021 IEEE International Conference on Robotics and Automation (2021 ICRA). You can find a summary video here: https://youtu.be/o5zDEKc2HPU

arXiv:2101.06116 [pdf, other]

doi 10.1109/JSTARS.2021.3133021

Hyperspectral Image Classification-Traditional to Deep Models: A Survey for Future Prospects

Authors: Muhammad Ahmad, Sidrah Shabbir, Swalpa Kumar Roy, Danfeng Hong, Xin Wu, **g Yao, Adil Mehmood Khan, Manuel Mazzara, Salvatore Distefano, Jocelyn Chanussot

Abstract: Hyperspectral Imaging (HSI) has been extensively utilized in many real-life applications because it benefits from the detailed spectral information contained in each pixel. Notably, the complex characteristics i.e., the nonlinear relation among the captured spectral information and the corresponding object of HSI data make accurate classification challenging for traditional methods. In the last fe… ▽ More Hyperspectral Imaging (HSI) has been extensively utilized in many real-life applications because it benefits from the detailed spectral information contained in each pixel. Notably, the complex characteristics i.e., the nonlinear relation among the captured spectral information and the corresponding object of HSI data make accurate classification challenging for traditional methods. In the last few years, Deep Learning (DL) has been substantiated as a powerful feature extractor that effectively addresses the nonlinear problems that appeared in a number of computer vision tasks. This prompts the deployment of DL for HSI classification (HSIC) which revealed good performance. This survey enlists a systematic overview of DL for HSIC and compared state-of-the-art strategies on the said topic. Primarily, we will encapsulate the main challenges of traditional machine learning for HSIC and then we will acquaint the superiority of DL to address these problems. This survey breakdown the state-of-the-art DL frameworks into spectral features, spatial features, and together spatial-spectral features to systematically analyze the achievements (future research directions as well) of these frameworks for HSIC. Moreover, we will consider the fact that DL requires a large number of labeled training examples whereas acquiring such a number for HSIC is challenging in terms of time and cost. Therefore, this survey discusses some strategies to improve the generalization performance of DL strategies which can provide some future guidelines. △ Less

Submitted 27 April, 2022; v1 submitted 15 January, 2021; originally announced January 2021.

Comments: https://ieeexplore.ieee.org/abstract/document/9645266

arXiv:2101.03468 [pdf, other]

doi 10.1109/TSP.2021.3104979

HePPCAT: Probabilistic PCA for Data with Heteroscedastic Noise

Authors: David Hong, Kyle Gilman, Laura Balzano, Jeffrey A. Fessler

Abstract: Principal component analysis (PCA) is a classical and ubiquitous method for reducing data dimensionality, but it is suboptimal for heterogeneous data that are increasingly common in modern applications. PCA treats all samples uniformly so degrades when the noise is heteroscedastic across samples, as occurs, e.g., when samples come from sources of heterogeneous quality. This paper develops a probab… ▽ More Principal component analysis (PCA) is a classical and ubiquitous method for reducing data dimensionality, but it is suboptimal for heterogeneous data that are increasingly common in modern applications. PCA treats all samples uniformly so degrades when the noise is heteroscedastic across samples, as occurs, e.g., when samples come from sources of heterogeneous quality. This paper develops a probabilistic PCA variant that estimates and accounts for this heterogeneity by incorporating it in the statistical model. Unlike in the homoscedastic setting, the resulting nonconvex optimization problem is not seemingly solved by singular value decomposition. This paper develops a heteroscedastic probabilistic PCA technique (HePPCAT) that uses efficient alternating maximization algorithms to jointly estimate both the underlying factors and the unknown noise variances. Simulation experiments illustrate the comparative speed of the algorithms, the benefit of accounting for heteroscedasticity, and the seemingly favorable optimization landscape of this problem. Real data experiments on environmental air quality data show that HePPCAT can give a better PCA estimate than techniques that do not account for heteroscedasticity. △ Less

Submitted 1 December, 2021; v1 submitted 9 January, 2021; originally announced January 2021.

Comments: This article has been accepted for publication in the IEEE Transactions on Signal Processing. (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. 26 pages, 14 figures

Journal ref: IEEE Transactions on Signal Processing, Vol. 69, pp. 4819-4834, 2021

arXiv:2008.05457 [pdf, other]

doi 10.1109/TGRS.2020.3016820

More Diverse Means Better: Multimodal Deep Learning Meets Remote Sensing Imagery Classification

Authors: Danfeng Hong, Lianru Gao, Naoto Yokoya, **g Yao, Jocelyn Chanussot, Qian Du, Bing Zhang

Abstract: Classification and identification of the materials lying over or beneath the Earth's surface have long been a fundamental but challenging research topic in geoscience and remote sensing (RS) and have garnered a growing concern owing to the recent advancements of deep learning techniques. Although deep networks have been successfully applied in single-modality-dominated classification tasks, yet th… ▽ More Classification and identification of the materials lying over or beneath the Earth's surface have long been a fundamental but challenging research topic in geoscience and remote sensing (RS) and have garnered a growing concern owing to the recent advancements of deep learning techniques. Although deep networks have been successfully applied in single-modality-dominated classification tasks, yet their performance inevitably meets the bottleneck in complex scenes that need to be finely classified, due to the limitation of information diversity. In this work, we provide a baseline solution to the aforementioned difficulty by develo** a general multimodal deep learning (MDL) framework. In particular, we also investigate a special case of multi-modality learning (MML) -- cross-modality learning (CML) that exists widely in RS image classification applications. By focusing on "what", "where", and "how" to fuse, we show different fusion strategies as well as how to train deep networks and build the network architecture. Specifically, five fusion architectures are introduced and developed, further being unified in our MDL framework. More significantly, our framework is not only limited to pixel-wise classification tasks but also applicable to spatial information modeling with convolutional neural networks (CNNs). To validate the effectiveness and superiority of the MDL framework, extensive experiments related to the settings of MML and CML are conducted on two different multimodal RS datasets. Furthermore, the codes and datasets will be available at https://github.com/danfenghong/IEEE_TGRS_MDL-RS, contributing to the RS community. △ Less

Submitted 12 August, 2020; originally announced August 2020.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 2020

arXiv:2007.14035 [pdf, other]

doi 10.1109/IROS45743.2020.9341070

Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for Online Collision Avoidance

Authors: Alexander Schperberg, Kenny Chen, Stephanie Tsuei, Michael Jewett, Joshua Hooks, Stefano Soatto, Ankur Mehta, Dennis Hong

Abstract: In this paper, we propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties for safer navigation through cluttered environments. Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates through each step of our MPC's finite time ho… ▽ More In this paper, we propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties for safer navigation through cluttered environments. Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates through each step of our MPC's finite time horizon. The RNN model is trained on a dataset that comprises of robot and landmark poses generated from camera images and inertial measurement unit (IMU) readings via a state-of-the-art visual-inertial odometry framework. To detect and extract object locations for avoidance, we use a custom-trained convolutional neural network model in conjunction with a feature extractor to retrieve 3D centroid and radii boundaries of nearby obstacles. The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms, demonstrating autonomous behaviors that can plan fast and collision-free paths towards a goal point. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Comments: Accepted to the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, USA. First two authors contributed equally. For supplementary video, see https://www.youtube.com/watch?v=td4K55Tj-U8

arXiv:2007.14007 [pdf, other]

doi 10.1109/TGRS.2020.3006534

Coupled Convolutional Neural Network with Adaptive Response Function Learning for Unsupervised Hyperspectral Super-Resolution

Authors: Ke Zheng, Lianru Gao, Wenzhi Liao, Danfeng Hong, Bing Zhang, Ximin Cui, Jocelyn Chanussot

Abstract: Due to the limitations of hyperspectral imaging systems, hyperspectral imagery (HSI) often suffers from poor spatial resolution, thus hampering many applications of the imagery. Hyperspectral super-resolution refers to fusing HSI and MSI to generate an image with both high spatial and high spectral resolutions. Recently, several new methods have been proposed to solve this fusion problem, and most… ▽ More Due to the limitations of hyperspectral imaging systems, hyperspectral imagery (HSI) often suffers from poor spatial resolution, thus hampering many applications of the imagery. Hyperspectral super-resolution refers to fusing HSI and MSI to generate an image with both high spatial and high spectral resolutions. Recently, several new methods have been proposed to solve this fusion problem, and most of these methods assume that the prior information of the Point Spread Function (PSF) and Spectral Response Function (SRF) are known. However, in practice, this information is often limited or unavailable. In this work, an unsupervised deep learning-based fusion method - HyCoNet - that can solve the problems in HSI-MSI fusion without the prior PSF and SRF information is proposed. HyCoNet consists of three coupled autoencoder nets in which the HSI and MSI are unmixed into endmembers and abundances based on the linear unmixing model. Two special convolutional layers are designed to act as a bridge that coordinates with the three autoencoder nets, and the PSF and SRF parameters are learned adaptively in the two convolution layers during the training process. Furthermore, driven by the joint loss function, the proposed method is straightforward and easily implemented in an end-to-end training manner. The experiments performed in the study demonstrate that the proposed method performs well and produces robust results for different datasets and arbitrary PSFs and SRFs. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing,2020

arXiv:2007.14006 [pdf, other]

doi 10.1109/TGRS.2020.3000684

Spectral Superresolution of Multispectral Imagery with Joint Sparse and Low-Rank Learning

Authors: Lianru Gao, Danfeng Hong, **g Yao, Bing Zhang, Paolo Gamba, Jocelyn Chanussot

Abstract: Extensive attention has been widely paid to enhance the spatial resolution of hyperspectral (HS) images with the aid of multispectral (MS) images in remote sensing. However, the ability in the fusion of HS and MS images remains to be improved, particularly in large-scale scenes, due to the limited acquisition of HS images. Alternatively, we super-resolve MS images in the spectral domain by the mea… ▽ More Extensive attention has been widely paid to enhance the spatial resolution of hyperspectral (HS) images with the aid of multispectral (MS) images in remote sensing. However, the ability in the fusion of HS and MS images remains to be improved, particularly in large-scale scenes, due to the limited acquisition of HS images. Alternatively, we super-resolve MS images in the spectral domain by the means of partially overlapped HS images, yielding a novel and promising topic: spectral superresolution (SSR) of MS imagery. This is challenging and less investigated task due to its high ill-posedness in inverse imaging. To this end, we develop a simple but effective method, called joint sparse and low-rank learning (J-SLoL), to spectrally enhance MS images by jointly learning low-rank HS-MS dictionary pairs from overlapped regions. J-SLoL infers and recovers the unknown hyperspectral signals over a larger coverage by sparse coding on the learned dictionary pair. Furthermore, we validate the SSR performance on three HS-MS datasets (two for classification and one for unmixing) in terms of reconstruction, classification, and unmixing by comparing with several existing state-of-the-art baselines, showing the effectiveness and superiority of the proposed J-SLoL algorithm. Furthermore, the codes and datasets will be available at: https://github.com/danfenghong/IEEE\_TGRS\_J-SLoL, contributing to the RS community. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 2020

arXiv:2007.11766 [pdf, other]

Guided Deep Decoder: Unsupervised Image Pair Fusion

Authors: Tatsumi Uezato, Danfeng Hong, Naoto Yokoya, Wei He

Abstract: The fusion of input and guidance images that have a tradeoff in their information (e.g., hyperspectral and RGB image fusion or pansharpening) can be interpreted as one general problem. However, previous studies applied a task-specific handcrafted prior and did not address the problems with a unified approach. To address this limitation, in this study, we propose a guided deep decoder network as a… ▽ More The fusion of input and guidance images that have a tradeoff in their information (e.g., hyperspectral and RGB image fusion or pansharpening) can be interpreted as one general problem. However, previous studies applied a task-specific handcrafted prior and did not address the problems with a unified approach. To address this limitation, in this study, we propose a guided deep decoder network as a general prior. The proposed network is composed of an encoder-decoder network that exploits multi-scale features of a guidance image and a deep decoder network that generates an output image. The two networks are connected by feature refinement units to embed the multi-scale features of the guidance image into the deep decoder network. The proposed network allows the network parameters to be optimized in an unsupervised way without training data. Our results show that the proposed network can achieve state-of-the-art performance in various image fusion problems. △ Less

Submitted 22 July, 2020; originally announced July 2020.

Comments: ECCV 2020

arXiv:2007.05230 [pdf, other]

Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral Super-Resolution

Authors: **g Yao, Danfeng Hong, Jocelyn Chanussot, Deyu Meng, Xiaoxiang Zhu, Zongben Xu

Abstract: The recent advancement of deep learning techniques has made great progress on hyperspectral image super-resolution (HSI-SR). Yet the development of unsupervised deep networks remains challenging for this task. To this end, we propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet for short, to enhance the spatial resolution of HSI by means of higher-spatial-resolution m… ▽ More The recent advancement of deep learning techniques has made great progress on hyperspectral image super-resolution (HSI-SR). Yet the development of unsupervised deep networks remains challenging for this task. To this end, we propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet for short, to enhance the spatial resolution of HSI by means of higher-spatial-resolution multispectral image (MSI). Inspired by coupled spectral unmixing, a two-stream convolutional autoencoder framework is taken as backbone to jointly decompose MS and HS data into a spectrally meaningful basis and corresponding coefficients. CUCaNet is capable of adaptively learning spectral and spatial response functions from HS-MS correspondences by enforcing reasonable consistency assumptions on the networks. Moreover, a cross-attention module is devised to yield more effective spatial-spectral information transfer in networks. Extensive experiments are conducted on three widely-used HS-MS datasets in comparison with state-of-the-art HSI-SR models, demonstrating the superiority of the CUCaNet in the HSI-SR application. Furthermore, the codes and datasets will be available at: https://github.com/danfenghong/ECCV2020_CUCaNet. △ Less

Submitted 1 August, 2020; v1 submitted 10 July, 2020; originally announced July 2020.

arXiv:2006.16013 [pdf, other]

Single-Look Multi-Master SAR Tomography: An Introduction

Authors: Nan Ge, Richard Bamler, Danfeng Hong, Xiao Xiang Zhu

Abstract: This paper addresses the general problem of single-look multi-master SAR tomography. For this purpose, we establish the single-look multi-master data model, analyze its implications for single and double scatterers, and propose a generic inversion framework. The core of this framework is nonconvex sparse recovery, for which we develop two algorithms: one extends the conventional nonlinear least sq… ▽ More This paper addresses the general problem of single-look multi-master SAR tomography. For this purpose, we establish the single-look multi-master data model, analyze its implications for single and double scatterers, and propose a generic inversion framework. The core of this framework is nonconvex sparse recovery, for which we develop two algorithms: one extends the conventional nonlinear least squares (NLS) to the single-look multi-master data model, and the other is based on bi-convex relaxation and alternating minimization (BiCRAM). We provide two theorems for the objective function of the NLS subproblem, which lead to its analytic solution up to a constant phase angle in the one-dimensional case. We also report our findings from the experiments on different acceleration techniques for BiCRAM. The proposed algorithms are applied to a real TerraSAR-X data set, and validated with height ground truth made available via a SAR imaging geodesy and simulation framework. This shows empirically that the \emph{single-master} approach, if applied to a single-look \emph{multi-master} stack, can be insufficient for layover separation, and the \emph{multi-master} approach can indeed perform slightly better (despite being computationally more expensive) even in the case of single scatterers. Besides, this paper also sheds light on the special case of single-look bistatic SAR tomography, which is relevant for current and future SAR missions such as TanDEM-X and Tandem-L. △ Less

Submitted 29 June, 2020; originally announced June 2020.

arXiv:2006.02656 [pdf, other]

doi 10.1109/LRA.2020.3001503

Risk-Aware Motion Planning for a Limbed Robot with Stochastic Grip** Forces Using Nonlinear Programming

Authors: Yuki Shirai, Xuan Lin, Yusuke Tanaka, Ankur Mehta, Dennis Hong

Abstract: We present a motion planning algorithm with probabilistic guarantees for limbed robots with stochastic grip** forces. Planners based on deterministic models with a worst-case uncertainty can be conservative and inflexible to consider the stochastic behavior of the contact, especially when a gripper is installed. Our proposed planner enables the robot to simultaneously plan its pose and contact f… ▽ More We present a motion planning algorithm with probabilistic guarantees for limbed robots with stochastic grip** forces. Planners based on deterministic models with a worst-case uncertainty can be conservative and inflexible to consider the stochastic behavior of the contact, especially when a gripper is installed. Our proposed planner enables the robot to simultaneously plan its pose and contact force trajectories while considering the risk associated with the grip** forces. Our planner is formulated as a nonlinear programming problem with chance constraints, which allows the robot to generate a variety of motions based on different risk bounds. To model the grip** forces as random variables, we employ Gaussian Process regression. We validate our proposed motion planning algorithm on an 11.5 kg six-limbed robot for two-wall climbing. Our results show that our proposed planner generates various trajectories (e.g., avoiding low friction terrain under the low risk bound, choosing an unstable but faster gait under the high risk bound) by changing the probability of risk based on various specifications. △ Less

Submitted 5 June, 2020; v1 submitted 4 June, 2020; originally announced June 2020.

Comments: IEEE Robotics and Automation Letters (RA-L). Pre-print Version. The video of this paper is: https://youtu.be/ZDqvf1J4nS4

arXiv:2004.13557 [pdf, other]

Baseline Estimation of Commercial Building HVAC Fan Power Using Tensor Completion

Authors: Shunbo Lei, David Hong, Johanna L. Mathieu, Ian A. Hiskens

Abstract: Commercial building heating, ventilation, and air conditioning (HVAC) systems have been studied for providing ancillary services to power grids via demand response (DR). One critical issue is to estimate the counterfactual baseline power consumption that would have prevailed without DR. Baseline methods have been developed based on whole building electric load profiles. New methods are necessary t… ▽ More Commercial building heating, ventilation, and air conditioning (HVAC) systems have been studied for providing ancillary services to power grids via demand response (DR). One critical issue is to estimate the counterfactual baseline power consumption that would have prevailed without DR. Baseline methods have been developed based on whole building electric load profiles. New methods are necessary to estimate the baseline power consumption of HVAC sub-components (e.g., supply and return fans), which have different characteristics compared to that of the whole building. Tensor completion can estimate the unobserved entries of multi-dimensional tensors describing complex data sets. It exploits high-dimensional data to capture granular insights into the problem. This paper proposes to use it for baselining HVAC fan power, by utilizing its capability of capturing dominant fan power patterns. The tensor completion method is evaluated using HVAC fan power data from several buildings at the University of Michigan, and compared with several existing methods. The tensor completion method generally outperforms the benchmarks. △ Less

Submitted 24 April, 2020; originally announced April 2020.

arXiv:2003.03440 [pdf, other]

doi 10.1109/TNNLS.2020.2979546

Learning Convolutional Sparse Coding on Complex Domain for Interferometric Phase Restoration

Authors: Jian Kang, Danfeng Hong, Jialin Liu, Gerald Baier, Naoto Yokoya, Begüm Demir

Abstract: Interferometric phase restoration has been investigated for decades and most of the state-of-the-art methods have achieved promising performances for InSAR phase restoration. These methods generally follow the nonlocal filtering processing chain aiming at circumventing the staircase effect and preserving the details of phase variations. In this paper, we propose an alternative approach for InSAR p… ▽ More Interferometric phase restoration has been investigated for decades and most of the state-of-the-art methods have achieved promising performances for InSAR phase restoration. These methods generally follow the nonlocal filtering processing chain aiming at circumventing the staircase effect and preserving the details of phase variations. In this paper, we propose an alternative approach for InSAR phase restoration, i.e. Complex Convolutional Sparse Coding (ComCSC) and its gradient regularized version. To our best knowledge, this is the first time that we solve the InSAR phase restoration problem in a deconvolutional fashion. The proposed methods can not only suppress interferometric phase noise, but also avoid the staircase effect and preserve the details. Furthermore, they provide an insight of the elementary phase components for the interferometric phases. The experimental results on synthetic and realistic high- and medium-resolution datasets from TerraSAR-X StripMap and Sentinel-1 interferometric wide swath mode, respectively, show that our method outperforms those previous state-of-the-art methods based on nonlocal InSAR filters, particularly the state-of-the-art method: InSAR-BM3D. The source code of this paper will be made publicly available for reproducible research inside the community. △ Less

Submitted 6 March, 2020; originally announced March 2020.

arXiv:2003.02822 [pdf, other]

doi 10.1109/MGRS.2020.2979764

Feature Extraction for Hyperspectral Imagery: The Evolution from Shallow to Deep (Overview and Toolbox)

Authors: Behnood Rasti, Danfeng Hong, Renlong Hang, Pedram Ghamisi, Xudong Kang, Jocelyn Chanussot, Jon Atli Benediktsson

Abstract: Hyperspectral images provide detailed spectral information through hundreds of (narrow) spectral channels (also known as dimensionality or bands) with continuous spectral information that can accurately classify diverse materials of interest. The increased dimensionality of such data makes it possible to significantly improve data information content but provides a challenge to the conventional te… ▽ More Hyperspectral images provide detailed spectral information through hundreds of (narrow) spectral channels (also known as dimensionality or bands) with continuous spectral information that can accurately classify diverse materials of interest. The increased dimensionality of such data makes it possible to significantly improve data information content but provides a challenge to the conventional techniques (the so-called curse of dimensionality) for accurate analysis of hyperspectral images. Feature extraction, as a vibrant field of research in the hyperspectral community, evolved through decades of research to address this issue and extract informative features suitable for data representation and classification. The advances in feature extraction have been inspired by two fields of research, including the popularization of image and signal processing as well as machine (deep) learning, leading to two types of feature extraction approaches named shallow and deep techniques. This article outlines the advances in feature extraction approaches for hyperspectral imagery by providing a technical overview of the state-of-the-art techniques, providing useful entry points for researchers at different levels, including students, researchers, and senior researchers, willing to explore novel investigations on this challenging topic. In more detail, this paper provides a bird's eye view over shallow (both supervised and unsupervised) and deep feature extraction approaches specifically dedicated to the topic of hyperspectral feature extraction and its application on hyperspectral image classification. Additionally, this paper compares 15 advanced techniques with an emphasis on their methodological foundations in terms of classification accuracies. Furthermore, the codes and libraries are shared at https://github.com/BehnoodRasti/HyFTech-Hyperspectral-Shallow-Deep-Feature-Extraction-Toolbox. △ Less

Submitted 29 July, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

Journal ref: IEEE Geoscience and Remote Sensing Magazine, 2020

arXiv:2002.09820 [pdf, other]

Deep Reinforcement Learning with Linear Quadratic Regulator Regions

Authors: Gabriel I. Fernandez, Colin Togashi, Dennis W. Hong, Lin F. Yang

Abstract: Practitioners often rely on compute-intensive domain randomization to ensure reinforcement learning policies trained in simulation can robustly transfer to the real world. Due to unmodeled nonlinearities in the real system, however, even such simulated policies can still fail to perform stably enough to acquire experience in real environments. In this paper we propose a novel method that guarantee… ▽ More Practitioners often rely on compute-intensive domain randomization to ensure reinforcement learning policies trained in simulation can robustly transfer to the real world. Due to unmodeled nonlinearities in the real system, however, even such simulated policies can still fail to perform stably enough to acquire experience in real environments. In this paper we propose a novel method that guarantees a stable region of attraction for the output of a policy trained in simulation, even for highly nonlinear systems. Our core technique is to use "bias-shifted" neural networks for constructing the controller and training the network in the simulator. The modified neural networks not only capture the nonlinearities of the system but also provably preserve linearity in a certain region of the state space and thus can be tuned to resemble a linear quadratic regulator that is known to be stable for the real system. We have tested our new method by transferring simulated policies for a swing-up inverted pendulum to real systems and demonstrated its efficacy. △ Less

Submitted 25 February, 2020; v1 submitted 22 February, 2020; originally announced February 2020.

arXiv:2002.01144 [pdf, other]

doi 10.1109/TGRS.2020.2969024

Classification of Hyperspectral and LiDAR Data Using Coupled CNNs

Authors: Renlong Hang, Zhu Li, Pedram Ghamisi, Danfeng Hong, Guiyu Xia, Qingshan Liu

Abstract: In this paper, we propose an efficient and effective framework to fuse hyperspectral and Light Detection And Ranging (LiDAR) data using two coupled convolutional neural networks (CNNs). One CNN is designed to learn spectral-spatial features from hyperspectral data, and the other one is used to capture the elevation information from LiDAR data. Both of them consist of three convolutional layers, an… ▽ More In this paper, we propose an efficient and effective framework to fuse hyperspectral and Light Detection And Ranging (LiDAR) data using two coupled convolutional neural networks (CNNs). One CNN is designed to learn spectral-spatial features from hyperspectral data, and the other one is used to capture the elevation information from LiDAR data. Both of them consist of three convolutional layers, and the last two convolutional layers are coupled together via a parameter sharing strategy. In the fusion phase, feature-level and decision-level fusion methods are simultaneously used to integrate these heterogeneous features sufficiently. For the feature-level fusion, three different fusion strategies are evaluated, including the concatenation strategy, the maximization strategy, and the summation strategy. For the decision-level fusion, a weighted summation strategy is adopted, where the weights are determined by the classification accuracy of each output. The proposed model is evaluated on an urban data set acquired over Houston, USA, and a rural one captured over Trento, Italy. On the Houston data, our model can achieve a new record overall accuracy of 96.03%. On the Trento data, it achieves an overall accuracy of 99.12%. These results sufficiently certify the effectiveness of our proposed model. △ Less

Submitted 4 February, 2020; originally announced February 2020.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 2020

Showing 1–37 of 37 results for author: Hong, D