Search | arXiv e-print repository

Superpixel-based and Spatially-regularized Diffusion Learning for Unsupervised Hyperspectral Image Clustering

Authors: Kangning Cui, Ruoning Li, Sam L. Polk, Yinyi Lin, Hongsheng Zhang, James M. Murphy, Robert J. Plemmons, Raymond H. Chan

Abstract: Hyperspectral images (HSIs) provide exceptional spatial and spectral resolution of a scene, crucial for various remote sensing applications. However, the high dimensionality, presence of noise and outliers, and the need for precise labels of HSIs present significant challenges to HSIs analysis, motivating the development of performant HSI clustering algorithms. This paper introduces a novel unsupe… ▽ More Hyperspectral images (HSIs) provide exceptional spatial and spectral resolution of a scene, crucial for various remote sensing applications. However, the high dimensionality, presence of noise and outliers, and the need for precise labels of HSIs present significant challenges to HSIs analysis, motivating the development of performant HSI clustering algorithms. This paper introduces a novel unsupervised HSI clustering algorithm, Superpixel-based and Spatially-regularized Diffusion Learning (S2DL), which addresses these challenges by incorporating rich spatial information encoded in HSIs into diffusion geometry-based clustering. S2DL employs the Entropy Rate Superpixel (ERS) segmentation technique to partition an image into superpixels, then constructs a spatially-regularized diffusion graph using the most representative high-density pixels. This approach reduces computational burden while preserving accuracy. Cluster modes, serving as exemplars for underlying cluster structure, are identified as the highest-density pixels farthest in diffusion distance from other highest-density pixels. These modes guide the labeling of the remaining representative pixels from ERS superpixels. Finally, majority voting is applied to the labels assigned within each superpixel to propagate labels to the rest of the image. This spatial-spectral approach simultaneously simplifies graph construction, reduces computational cost, and improves clustering performance. S2DL's performance is illustrated with extensive experiments on three publicly available, real-world HSIs: Indian Pines, Salinas, and Salinas A. Additionally, we apply S2DL to landscape-scale, unsupervised mangrove species map** in the Mai Po Nature Reserve, Hong Kong, using a Gaofen-5 HSI. The success of S2DL in these diverse numerical experiments indicates its efficacy on a wide range of important unsupervised remote sensing analysis tasks. △ Less

Submitted 24 December, 2023; originally announced December 2023.

Comments: 27 pages, 9 figures, and 2 tables

arXiv:2206.09365 [pdf, other]

Semi-supervised Change Detection of Small Water Bodies Using RGB and Multispectral Images in Peruvian Rainforests

Authors: Kangning Cui, Seda Camalan, Ruoning Li, Victor P. Pauca, Sarra Alqahtani, Robert J. Plemmons, Miles Silman, Evan N. Dethier, David Lutz, Raymond H. Chan

Abstract: Artisanal and Small-scale Gold Mining (ASGM) is an important source of income for many households, but it can have large social and environmental effects, especially in rainforests of develo** countries. The Sentinel-2 satellites collect multispectral images that can be used for the purpose of detecting changes in water extent and quality which indicates the locations of mining sites. This work… ▽ More Artisanal and Small-scale Gold Mining (ASGM) is an important source of income for many households, but it can have large social and environmental effects, especially in rainforests of develo** countries. The Sentinel-2 satellites collect multispectral images that can be used for the purpose of detecting changes in water extent and quality which indicates the locations of mining sites. This work focuses on the recognition of ASGM activities in Peruvian Amazon rainforests. We tested several semi-supervised classifiers based on Support Vector Machines (SVMs) to detect the changes of water bodies from 2019 to 2021 in the Madre de Dios region, which is one of the global hotspots of ASGM activities. Experiments show that SVM-based models can achieve reasonable performance for both RGB (using Cohen's $κ$ 0.49) and 6-channel images (using Cohen's $κ$ 0.71) with very limited annotations. The efficacy of incorporating Lab color space for change detection is analyzed as well. △ Less

Submitted 19 June, 2022; originally announced June 2022.

Comments: 8 pages, 5 figures. Accepted to Proceedings of IEEE WHISPERS 2022

arXiv:2204.13497 [pdf, ps, other]

Unsupervised Spatial-spectral Hyperspectral Image Reconstruction and Clustering with Diffusion Geometry

Authors: Kangning Cui, Ruoning Li, Sam L. Polk, James M. Murphy, Robert J. Plemmons, Raymond H. Chan

Abstract: Hyperspectral images, which store a hundred or more spectral bands of reflectance, have become an important data source in natural and social sciences. Hyperspectral images are often generated in large quantities at a relatively coarse spatial resolution. As such, unsupervised machine learning algorithms incorporating known structure in hyperspectral imagery are needed to analyze these images auto… ▽ More Hyperspectral images, which store a hundred or more spectral bands of reflectance, have become an important data source in natural and social sciences. Hyperspectral images are often generated in large quantities at a relatively coarse spatial resolution. As such, unsupervised machine learning algorithms incorporating known structure in hyperspectral imagery are needed to analyze these images automatically. This work introduces the Spatial-Spectral Image Reconstruction and Clustering with Diffusion Geometry (DSIRC) algorithm for partitioning highly mixed hyperspectral images. DSIRC reduces measurement noise through a shape-adaptive reconstruction procedure. In particular, for each pixel, DSIRC locates spectrally correlated pixels within a data-adaptive spatial neighborhood and reconstructs that pixel's spectral signature using those of its neighbors. DSIRC then locates high-density, high-purity pixels far in diffusion distance (a data-dependent distance metric) from other high-density, high-purity pixels and treats these as cluster exemplars, giving each a unique label. Non-modal pixels are assigned the label of their diffusion distance-nearest neighbor of higher density and purity that is already labeled. Strong numerical results indicate that incorporating spatial information through image reconstruction substantially improves the performance of pixel-wise clustering. △ Less

Submitted 28 April, 2022; originally announced April 2022.

Comments: 7 pages, 1 figure

arXiv:2204.09041 [pdf, other]

Unsupervised detection of ash dieback disease (Hymenoscyphus fraxineus) using diffusion-based hyperspectral image clustering

Authors: Sam L. Polk, Aland H. Y. Chan, Kangning Cui, Robert J. Plemmons, David A. Coomes, James M. Murphy

Abstract: Ash dieback (Hymenoscyphus fraxineus) is an introduced fungal disease that is causing the widespread death of ash trees across Europe. Remote sensing hyperspectral images encode rich structure that has been exploited for the detection of dieback disease in ash trees using supervised machine learning techniques. However, to understand the state of forest health at landscape-scale, accurate unsuperv… ▽ More Ash dieback (Hymenoscyphus fraxineus) is an introduced fungal disease that is causing the widespread death of ash trees across Europe. Remote sensing hyperspectral images encode rich structure that has been exploited for the detection of dieback disease in ash trees using supervised machine learning techniques. However, to understand the state of forest health at landscape-scale, accurate unsupervised approaches are needed. This article investigates the use of the unsupervised Diffusion and VCA-Assisted Image Segmentation (D-VIS) clustering algorithm for the detection of ash dieback disease in a forest site near Cambridge, United Kingdom. The unsupervised clustering presented in this work has high overlap with the supervised classification of previous work on this scene (overall accuracy = 71%). Thus, unsupervised learning may be used for the remote detection of ash dieback disease without the need for expert labeling. △ Less

Submitted 19 April, 2022; originally announced April 2022.

Comments: (6 pages, 2 figures). Accepted to Proceedings of IEEE IGARSS 2022

arXiv:2204.06298 [pdf, other]

Active Diffusion and VCA-Assisted Image Segmentation of Hyperspectral Images

Authors: Sam L. Polk, Kangning Cui, Robert J. Plemmons, James M. Murphy

Abstract: Hyperspectral images encode rich structure that can be exploited for material discrimination by machine learning algorithms. This article introduces the Active Diffusion and VCA-Assisted Image Segmentation (ADVIS) for active material discrimination. ADVIS selects high-purity, high-density pixels that are far in diffusion distance (a data-dependent metric) from other high-purity, high-density pixel… ▽ More Hyperspectral images encode rich structure that can be exploited for material discrimination by machine learning algorithms. This article introduces the Active Diffusion and VCA-Assisted Image Segmentation (ADVIS) for active material discrimination. ADVIS selects high-purity, high-density pixels that are far in diffusion distance (a data-dependent metric) from other high-purity, high-density pixels in the hyperspectral image. The ground truth labels of these pixels are queried and propagated to the rest of the image. The ADVIS active learning algorithm is shown to strongly outperform its fully unsupervised clustering algorithm counterpart, suggesting that the incorporation of a very small number of carefully-selected ground truth labels can result in substantially superior material discrimination in hyperspectral images. △ Less

Submitted 13 April, 2022; originally announced April 2022.

Comments: (6 pages, 2 figures). Accepted to Proceedings of IEEE IGARSS 2022

arXiv:2203.15619 [pdf, other]

Classification of Hyperspectral Images Using SVM with Shape-adaptive Reconstruction and Smoothed Total Variation

Authors: Ruoning Li, Kangning Cui, Raymond H. Chan, Robert J. Plemmons

Abstract: In this work, a novel algorithm called SVM with Shape-adaptive Reconstruction and Smoothed Total Variation (SaR-SVM-STV) is introduced to classify hyperspectral images, which makes full use of spatial and spectral information. The Shape-adaptive Reconstruction (SaR) is introduced to preprocess each pixel based on the Pearson Correlation between pixels in its shape-adaptive (SA) region. Support Vec… ▽ More In this work, a novel algorithm called SVM with Shape-adaptive Reconstruction and Smoothed Total Variation (SaR-SVM-STV) is introduced to classify hyperspectral images, which makes full use of spatial and spectral information. The Shape-adaptive Reconstruction (SaR) is introduced to preprocess each pixel based on the Pearson Correlation between pixels in its shape-adaptive (SA) region. Support Vector Machines (SVMs) are trained to estimate the pixel-wise probability maps of each class. Then the Smoothed Total Variation (STV) model is applied to denoise and generate the final classification map. Experiments show that SaR-SVM-STV outperforms the SVM-STV method with a few training labels, demonstrating the significance of reconstructing hyperspectral images before classification. △ Less

Submitted 14 April, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: 6 pages, 3 figures. Accepted to Proceedings of IEEE IGARSS 2022

arXiv:2203.09992 [pdf, other]

doi 10.3390/rs15041053

Unsupervised Diffusion and Volume Maximization-Based Clustering of Hyperspectral Images

Authors: Sam L. Polk, Kangning Cui, Aland H. Y. Chan, David A. Coomes, Robert J. Plemmons, James M. Murphy

Abstract: Hyperspectral images taken from aircraft or satellites contain information from hundreds of spectral bands, within which lie latent lower-dimensional structures that can be exploited for classifying vegetation and other materials. A disadvantage of working with hyperspectral images is that, due to an inherent trade-off between spectral and spatial resolution, they have a relatively coarse spatial… ▽ More Hyperspectral images taken from aircraft or satellites contain information from hundreds of spectral bands, within which lie latent lower-dimensional structures that can be exploited for classifying vegetation and other materials. A disadvantage of working with hyperspectral images is that, due to an inherent trade-off between spectral and spatial resolution, they have a relatively coarse spatial scale, meaning that single pixels may correspond to spatial regions containing multiple materials. This article introduces the Diffusion and Volume maximization-based Image Clustering (D-VIC) algorithm for unsupervised material clustering to address this problem. By directly incorporating pixel purity into its labeling procedure, D-VIC gives greater weight to pixels that correspond to a spatial region containing just a single material. D-VIC is shown to outperform comparable state-of-the-art methods in extensive experiments on a range of hyperspectral images, including land-use maps and highly mixed forest health surveys (in the context of ash dieback disease), implying that it is well-equipped for unsupervised material clustering of spectrally-mixed hyperspectral datasets. △ Less

Submitted 19 February, 2023; v1 submitted 18 March, 2022; originally announced March 2022.

Comments: 28 pages, 11 figures

Journal ref: Remote Sens. 2023, 15(4), 1053

arXiv:1907.09708 [pdf, other]

Node Attribute Generation on Graphs

Authors: Xu Chen, Siheng Chen, Huangjie Zheng, Jiangchao Yao, Kenan Cui, Ya Zhang, Ivor W. Tsang

Abstract: Graph structured data provide two-fold information: graph structures and node attributes. Numerous graph-based algorithms rely on both information to achieve success in supervised tasks, such as node classification and link prediction. However, node attributes could be missing or incomplete, which significantly deteriorates the performance. The task of node attribute generation aims to generate at… ▽ More Graph structured data provide two-fold information: graph structures and node attributes. Numerous graph-based algorithms rely on both information to achieve success in supervised tasks, such as node classification and link prediction. However, node attributes could be missing or incomplete, which significantly deteriorates the performance. The task of node attribute generation aims to generate attributes for those nodes whose attributes are completely unobserved. This task benefits many real-world problems like profiling, node classification and graph data augmentation. To tackle this task, we propose a deep adversarial learning based method to generate node attributes; called node attribute neural generator (NANG). NANG learns a unifying latent representation which is shared by both node attributes and graph structures and can be translated to different modalities. We thus use this latent representation as a bridge to convert information from one modality to another. We further introduce practical applications to quantify the performance of node attribute generation. Extensive experiments are conducted on four real-world datasets and the empirical results show that node attributes generated by the proposed method are high-qualitative and beneficial to other applications. The datasets and codes are available online. △ Less

Submitted 23 July, 2019; originally announced July 2019.

arXiv:1809.08400 [pdf, other]

Variational Collaborative Learning for User Probabilistic Representation

Authors: Kenan Cui, Xu Chen, Jiangchao Yao, Ya Zhang

Abstract: Collaborative filtering (CF) has been successfully employed by many modern recommender systems. Conventional CF-based methods use the user-item interaction data as the sole information source to recommend items to users. However, CF-based methods are known for suffering from cold start problems and data sparsity problems. Hybrid models that utilize auxiliary information on top of interaction data… ▽ More Collaborative filtering (CF) has been successfully employed by many modern recommender systems. Conventional CF-based methods use the user-item interaction data as the sole information source to recommend items to users. However, CF-based methods are known for suffering from cold start problems and data sparsity problems. Hybrid models that utilize auxiliary information on top of interaction data have increasingly gained attention. A few "collaborative learning"-based models, which tightly bridges two heterogeneous learners through mutual regularization, are recently proposed for the hybrid recommendation. However, the "collaboration" in the existing methods are actually asynchronous due to the alternative optimization of the two learners. Leveraging the recent advances in variational autoencoder~(VAE), we here propose a model consisting of two streams of mutual linked VAEs, named variational collaborative model (VCM). Unlike the mutual regularization used in previous works where two learners are optimized asynchronously, VCM enables a synchronous collaborative learning mechanism. Besides, the two stream VAEs setup allows VCM to fully leverages the Bayesian probabilistic representations in collaborative learning. Extensive experiments on three real-life datasets have shown that VCM outperforms several state-of-art methods. △ Less

Submitted 22 September, 2018; originally announced September 2018.

Comments: 8 pages, 4 figures

arXiv:1702.07304 [pdf, other]

Conflict diagnostics for evidence synthesis in a multiple testing framework

Authors: Anne M. Presanis, David Ohlssen, Kai Cui, Magdalena Rosinska, Daniela De Angelis

Abstract: Evidence synthesis models that combine multiple datasets of varying design, to estimate quantities that cannot be directly observed, require the formulation of complex probabilistic models that can be expressed as graphical models. An assessment of whether the different datasets synthesised contribute information that is consistent with each other, and in a Bayesian context, with the prior distrib… ▽ More Evidence synthesis models that combine multiple datasets of varying design, to estimate quantities that cannot be directly observed, require the formulation of complex probabilistic models that can be expressed as graphical models. An assessment of whether the different datasets synthesised contribute information that is consistent with each other, and in a Bayesian context, with the prior distribution, is a crucial component of the model criticism process. However, a systematic assessment of conflict suffers from the multiple testing problem, through testing for conflict at multiple locations in a model. We demonstrate the systematic use of conflict diagnostics, while accounting for the multiple hypothesis tests of no conflict at each location in the graphical model. The method is illustrated by a network meta-analysis to estimate treatment effects in smoking cessation programs and an evidence synthesis to estimate HIV prevalence in Poland. △ Less

Submitted 13 September, 2017; v1 submitted 23 February, 2017; originally announced February 2017.

Showing 1–10 of 10 results for author: Cui, K