-
Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation
Authors:
Linshan Wu,
Zhun Zhong,
Jiayi Ma,
Yunchao Wei,
Hao Chen,
Leyuan Fang,
Shutao Li
Abstract:
Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models by weak labels, which is receiving significant attention due to its low annotation cost. Existing approaches focus on generating pseudo labels for supervision while largely ignoring to leverage the inherent semantic correlation among different pseudo labels. We observe that pseudo-labeled pixels that are close to each…
▽ More
Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models by weak labels, which is receiving significant attention due to its low annotation cost. Existing approaches focus on generating pseudo labels for supervision while largely ignoring to leverage the inherent semantic correlation among different pseudo labels. We observe that pseudo-labeled pixels that are close to each other in the feature space are more likely to share the same class, and those closer to the distribution centers tend to have higher confidence. Motivated by this, we propose to model the underlying label distributions and employ cross-label constraints to generate more accurate pseudo labels. In this paper, we develop a unified WSSS framework named Adaptive Gaussian Mixtures Model, which leverages a GMM to model the label distributions. Specifically, we calculate the feature distribution centers of pseudo-labeled pixels and build the GMM by measuring the distance between the centers and each pseudo-labeled pixel. Then, we introduce an Online Expectation-Maximization (OEM) algorithm and a novel maximization loss to optimize the GMM adaptively, aiming to learn more discriminative decision boundaries between different class-wise Gaussian mixtures. Based on the label distributions, we leverage the GMM to generate high-quality pseudo labels for more reliable supervision. Our framework is capable of solving different forms of weak labels: image-level labels, points, scribbles, blocks, and bounding-boxes. Extensive experiments on PASCAL, COCO, Cityscapes, and ADE20K datasets demonstrate that our framework can effectively provide more reliable supervision and outperform the state-of-the-art methods under all settings. Code will be available at https://github.com/Luffy03/AGMM-SASS.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Artwork Protection Against Neural Style Transfer Using Locally Adaptive Adversarial Color Attack
Authors:
Zhongliang Guo,
Junhao Dong,
Yifei Qian,
Kaixuan Wang,
Weiye Li,
Ziheng Guo,
Yuheng Wang,
Yanli Li,
Ognjen Arandjelović,
Lei Fang
Abstract:
Neural style transfer (NST) generates new images by combining the style of one image with the content of another. However, unauthorized NST can exploit artwork, raising concerns about artists' rights and motivating the development of proactive protection methods. We propose Locally Adaptive Adversarial Color Attack (LAACA), empowering artists to protect their artwork from unauthorized style transf…
▽ More
Neural style transfer (NST) generates new images by combining the style of one image with the content of another. However, unauthorized NST can exploit artwork, raising concerns about artists' rights and motivating the development of proactive protection methods. We propose Locally Adaptive Adversarial Color Attack (LAACA), empowering artists to protect their artwork from unauthorized style transfer by processing before public release. By delving into the intricacies of human visual perception and the role of different frequency components, our method strategically introduces frequency-adaptive perturbations in the image. These perturbations significantly degrade the generation quality of NST while maintaining an acceptable level of visual change in the original image, ensuring that potential infringers are discouraged from using the protected artworks, because of its bad NST generation quality. Additionally, existing metrics often overlook the importance of color fidelity in evaluating color-mattered tasks, such as the quality of NST-generated images, which is crucial in the context of artistic works. To comprehensively assess the color-mattered tasks, we propose the Adversarial Color Distance Metric (ACDM), designed to quantify the color difference of images pre- and post-manipulations. Experimental results confirm that attacking NST using LAACA results in visually inferior style transfer, and the ACDM can efficiently measure color-mattered tasks. By providing artists with a tool to safeguard their intellectual property, our work relieves the socio-technical challenges posed by the misuse of NST in the art community.
△ Less
Submitted 19 April, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Deep Covariance Alignment for Domain Adaptive Remote Sensing Image Segmentation
Authors:
Linshan Wu,
Ming Lu,
Leyuan Fang
Abstract:
Unsupervised domain adaptive (UDA) image segmentation has recently gained increasing attention, aiming to improve the generalization capability for transferring knowledge from the source domain to the target domain. However, in high spatial resolution remote sensing image (RSI), the same category from different domains (\emph{e.g.}, urban and rural) can appear to be totally different with extremel…
▽ More
Unsupervised domain adaptive (UDA) image segmentation has recently gained increasing attention, aiming to improve the generalization capability for transferring knowledge from the source domain to the target domain. However, in high spatial resolution remote sensing image (RSI), the same category from different domains (\emph{e.g.}, urban and rural) can appear to be totally different with extremely inconsistent distributions, which heavily limits the UDA accuracy. To address this problem, in this paper, we propose a novel Deep Covariance Alignment (DCA) model for UDA RSI segmentation. The DCA can explicitly align category features to learn shared domain-invariant discriminative feature representations, which enhances the ability of model generalization. Specifically, a Category Feature Pooling (CFP) module is first employed to extract category features by combining the coarse outputs and the deep features. Then, we leverage a novel Covariance Regularization (CR) to enforce the intra-category features to be closer and the inter-category features to be further separate. Compared with the existing category alignment methods, our CR aims to regularize the correlation between different dimensions of the features and thus performs more robustly when dealing with the divergent category features of imbalanced and inconsistent distributions. Finally, we propose a stagewise procedure to train the DCA in order to alleviate the error accumulation. Experiments on both Rural-to-Urban and Urban-to-Rural scenarios of the LoveDA dataset demonstrate the superiority of our proposed DCA over other state-of-the-art UDA segmentation methods. Code is available at https://github.com/Luffy03/DCA.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
ADASR: An Adversarial Auto-Augmentation Framework for Hyperspectral and Multispectral Data Fusion
Authors:
**ghui Qin,
Lihuang Fang,
Ruitao Lu,
Liang Lin,
Yukai Shi
Abstract:
Deep learning-based hyperspectral image (HSI) super-resolution, which aims to generate high spatial resolution HSI (HR-HSI) by fusing hyperspectral image (HSI) and multispectral image (MSI) with deep neural networks (DNNs), has attracted lots of attention. However, neural networks require large amounts of training data, hindering their application in real-world scenarios. In this letter, we propos…
▽ More
Deep learning-based hyperspectral image (HSI) super-resolution, which aims to generate high spatial resolution HSI (HR-HSI) by fusing hyperspectral image (HSI) and multispectral image (MSI) with deep neural networks (DNNs), has attracted lots of attention. However, neural networks require large amounts of training data, hindering their application in real-world scenarios. In this letter, we propose a novel adversarial automatic data augmentation framework ADASR that automatically optimizes and augments HSI-MSI sample pairs to enrich data diversity for HSI-MSI fusion. Our framework is sample-aware and optimizes an augmentor network and two downsampling networks jointly by adversarial learning so that we can learn more robust downsampling networks for training the upsampling network. Extensive experiments on two public classical hyperspectral datasets demonstrate the effectiveness of our ADASR compared to the state-of-the-art methods.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
randomHAR: Improving Ensemble Deep Learners for Human Activity Recognition with Sensor Selection and Reinforcement Learning
Authors:
Yiran Huang,
Yexu Zhou,
Till Riedel,
Likun Fang,
Michael Beigl
Abstract:
Deep learning has proven to be an effective approach in the field of Human activity recognition (HAR), outperforming other architectures that require manual feature engineering. Despite recent advancements, challenges inherent to HAR data, such as noisy data, intra-class variability and inter-class similarity, remain. To address these challenges, we propose an ensemble method, called randomHAR. Th…
▽ More
Deep learning has proven to be an effective approach in the field of Human activity recognition (HAR), outperforming other architectures that require manual feature engineering. Despite recent advancements, challenges inherent to HAR data, such as noisy data, intra-class variability and inter-class similarity, remain. To address these challenges, we propose an ensemble method, called randomHAR. The general idea behind randomHAR is training a series of deep learning models with the same architecture on randomly selected sensor data from the given dataset. Besides, an agent is trained with the reinforcement learning algorithm to identify the optimal subset of the trained models that are utilized for runtime prediction. In contrast to existing work, this approach optimizes the ensemble process rather than the architecture of the constituent models. To assess the performance of the approach, we compare it against two HAR algorithms, including the current state of the art, on six HAR benchmark datasets. The result of the experiment demonstrates that the proposed approach outperforms the state-of-the-art method, ensembleLSTM.
△ Less
Submitted 15 July, 2023;
originally announced July 2023.
-
Using simulation to design an MPC policy for field navigation using GPS sensing
Authors:
Harry Zhang,
Stefan Caldararu,
Ishaan Mahajan,
Shouvik Chatterjee,
Thomas Hansen,
Abhiraj Dashora,
Sriram Ashokkumar,
Luning Fang,
Xiangru Xu,
Shen He,
Dan Negrut
Abstract:
Modeling a robust control system with a precise GPS-based state estimation capability in simulation can be useful in field navigation applications as it allows for testing and validation in a controlled environment. This testing process would enable navigation systems to be developed and optimized in simulation with direct transferability to real-world scenarios. The multi-physics simulation engin…
▽ More
Modeling a robust control system with a precise GPS-based state estimation capability in simulation can be useful in field navigation applications as it allows for testing and validation in a controlled environment. This testing process would enable navigation systems to be developed and optimized in simulation with direct transferability to real-world scenarios. The multi-physics simulation engine Chrono allows for the creation of scenarios that may be difficult or dangerous to replicate in the field, such as extreme weather or terrain conditions. Autonomy Research Testbed (ART), a specialized robotics algorithm testbed, is operated in conjunction with Chrono to develop an MPC control policy as well as an EKF state estimator. This platform enables users to easily integrate custom algorithms in the autonomy stack. This model is initially developed and used in simulation and then tested on a twin vehicle model in reality, to demonstrate the transferability between simulation and reality (also known as Sim2Real).
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Geo-NI: Geometry-aware Neural Interpolation for Light Field Rendering
Authors:
Gaochang Wu,
Yuemei Zhou,
Yebin Liu,
Lu Fang,
Tianyou Chai
Abstract:
In this paper, we present a Geometry-aware Neural Interpolation (Geo-NI) framework for light field rendering. Previous learning-based approaches either rely on the capability of neural networks to perform direct interpolation, which we dubbed Neural Interpolation (NI), or explore scene geometry for novel view synthesis, also known as Depth Image-Based Rendering (DIBR). Instead, we incorporate the…
▽ More
In this paper, we present a Geometry-aware Neural Interpolation (Geo-NI) framework for light field rendering. Previous learning-based approaches either rely on the capability of neural networks to perform direct interpolation, which we dubbed Neural Interpolation (NI), or explore scene geometry for novel view synthesis, also known as Depth Image-Based Rendering (DIBR). Instead, we incorporate the ideas behind these two kinds of approaches by launching the NI with a novel DIBR pipeline. Specifically, the proposed Geo-NI first performs NI using input light field sheared by a set of depth hypotheses. Then the DIBR is implemented by assigning the sheared light fields with a novel reconstruction cost volume according to the reconstruction quality under different depth hypotheses. The reconstruction cost is interpreted as a blending weight to render the final output light field by blending the reconstructed light fields along the dimension of depth hypothesis. By combining the superiorities of NI and DIBR, the proposed Geo-NI is able to render views with large disparity with the help of scene geometry while also reconstruct non-Lambertian effect when depth is prone to be ambiguous. Extensive experiments on various datasets demonstrate the superior performance of the proposed geometry-aware light field rendering framework.
△ Less
Submitted 20 June, 2022;
originally announced June 2022.
-
LDP-Net: An Unsupervised Pansharpening Network Based on Learnable Degradation Processes
Authors:
Jiahui Ni,
Zhimin Shao,
Zhongzhou Zhang,
Mingzheng Hou,
Jiliu Zhou,
Leyuan Fang,
Yi Zhang
Abstract:
Pansharpening in remote sensing image aims at acquiring a high-resolution multispectral (HRMS) image directly by fusing a low-resolution multispectral (LRMS) image with a panchromatic (PAN) image. The main concern is how to effectively combine the rich spectral information of LRMS image with the abundant spatial information of PAN image. Recently, many methods based on deep learning have been prop…
▽ More
Pansharpening in remote sensing image aims at acquiring a high-resolution multispectral (HRMS) image directly by fusing a low-resolution multispectral (LRMS) image with a panchromatic (PAN) image. The main concern is how to effectively combine the rich spectral information of LRMS image with the abundant spatial information of PAN image. Recently, many methods based on deep learning have been proposed for the pansharpening task. However, these methods usually has two main drawbacks: 1) requiring HRMS for supervised learning; and 2) simply ignoring the latent relation between the MS and PAN image and fusing them directly. To solve these problems, we propose a novel unsupervised network based on learnable degradation processes, dubbed as LDP-Net. A reblurring block and a graying block are designed to learn the corresponding degradation processes, respectively. In addition, a novel hybrid loss function is proposed to constrain both spatial and spectral consistency between the pansharpened image and the PAN and LRMS images at different resolutions. Experiments on Worldview2 and Worldview3 images demonstrate that our proposed LDP-Net can fuse PAN and LRMS images effectively without the help of HRMS samples, achieving promising performance in terms of both qualitative visual effects and quantitative metrics.
△ Less
Submitted 24 November, 2021;
originally announced November 2021.
-
Revisiting Light Field Rendering with Deep Anti-Aliasing Neural Network
Authors:
Gaochang Wu,
Yebin Liu,
Lu Fang,
Tianyou Chai
Abstract:
The light field (LF) reconstruction is mainly confronted with two challenges, large disparity and the non-Lambertian effect. Typical approaches either address the large disparity challenge using depth estimation followed by view synthesis or eschew explicit depth information to enable non-Lambertian rendering, but rarely solve both challenges in a unified framework. In this paper, we revisit the c…
▽ More
The light field (LF) reconstruction is mainly confronted with two challenges, large disparity and the non-Lambertian effect. Typical approaches either address the large disparity challenge using depth estimation followed by view synthesis or eschew explicit depth information to enable non-Lambertian rendering, but rarely solve both challenges in a unified framework. In this paper, we revisit the classic LF rendering framework to address both challenges by incorporating it with advanced deep learning techniques. First, we analytically show that the essential issue behind the large disparity and non-Lambertian challenges is the aliasing problem. Classic LF rendering approaches typically mitigate the aliasing with a reconstruction filter in the Fourier domain, which is, however, intractable to implement within a deep learning pipeline. Instead, we introduce an alternative framework to perform anti-aliasing reconstruction in the image domain and analytically show comparable efficacy on the aliasing issue. To explore the full potential, we then embed the anti-aliasing framework into a deep neural network through the design of an integrated architecture and trainable parameters. The network is trained through end-to-end optimization using a peculiar training set, including regular LFs and unstructured LFs. The proposed deep learning pipeline shows a substantial superiority in solving both the large disparity and the non-Lambertian challenges compared with other state-of-the-art approaches. In addition to the view interpolation for an LF, we also show that the proposed pipeline also benefits light field view extrapolation.
△ Less
Submitted 27 April, 2021; v1 submitted 14 April, 2021;
originally announced April 2021.
-
Light Field Reconstruction Using Convolutional Network on EPI and Extended Applications
Authors:
Gaochang Wu,
Yebin Liu,
Lu Fang,
Qionghai Dai,
Tianyou Chai
Abstract:
In this paper, a novel convolutional neural network (CNN)-based framework is developed for light field reconstruction from a sparse set of views. We indicate that the reconstruction can be efficiently modeled as angular restoration on an epipolar plane image (EPI). The main problem in direct reconstruction on the EPI involves an information asymmetry between the spatial and angular dimensions, whe…
▽ More
In this paper, a novel convolutional neural network (CNN)-based framework is developed for light field reconstruction from a sparse set of views. We indicate that the reconstruction can be efficiently modeled as angular restoration on an epipolar plane image (EPI). The main problem in direct reconstruction on the EPI involves an information asymmetry between the spatial and angular dimensions, where the detailed portion in the angular dimensions is damaged by undersampling. Directly upsampling or super-resolving the light field in the angular dimensions causes ghosting effects. To suppress these ghosting effects, we contribute a novel "blur-restoration-deblur" framework. First, the "blur" step is applied to extract the low-frequency components of the light field in the spatial dimensions by convolving each EPI slice with a selected blur kernel. Then, the "restoration" step is implemented by a CNN, which is trained to restore the angular details of the EPI. Finally, we use a non-blind "deblur" operation to recover the spatial high frequencies suppressed by the EPI blur. We evaluate our approach on several datasets, including synthetic scenes, real-world scenes and challenging microscope light field data. We demonstrate the high performance and robustness of the proposed framework compared with state-of-the-art algorithms. We further show extended applications, including depth enhancement and interpolation for unstructured input. More importantly, a novel rendering approach is presented by combining the proposed framework and depth information to handle large disparities.
△ Less
Submitted 24 March, 2021;
originally announced March 2021.
-
Subjective and Objective Quality Assessment of Mobile Gaming Video
Authors:
Shaoguo Wen,
Suiyi Ling,
Junle Wang,
Ximing Chen,
Lizhi Fang,
Yanqing **g,
Patrick Le Callet
Abstract:
Nowadays, with the vigorous expansion and development of gaming video streaming techniques and services, the expectation of users, especially the mobile phone users, for higher quality of experience is also growing swiftly. As most of the existing research focuses on traditional video streaming, there is a clear lack of both subjective study and objective quality models that are tailored for quali…
▽ More
Nowadays, with the vigorous expansion and development of gaming video streaming techniques and services, the expectation of users, especially the mobile phone users, for higher quality of experience is also growing swiftly. As most of the existing research focuses on traditional video streaming, there is a clear lack of both subjective study and objective quality models that are tailored for quality assessment of mobile gaming content. To this end, in this study, we first present a brand new Tencent Gaming Video dataset containing 1293 mobile gaming sequences encoded with three different codecs. Second, we propose an objective quality framework, namely Efficient hard-RAnk Quality Estimator (ERAQUE), that is equipped with (1) a novel hard pairwise ranking loss, which forces the model to put more emphasis on differentiating similar pairs; (2) an adapted model distillation strategy, which could be utilized to compress the proposed model efficiently without causing significant performance drop. Extensive experiments demonstrate the efficiency and robustness of our model.
△ Less
Submitted 27 January, 2021;
originally announced March 2021.
-
LEAD: LiDAR Extender for Autonomous Driving
Authors:
Jianing Zhang,
Wei Li,
Honggang Gou,
Lu Fang,
Ruigang Yang
Abstract:
3D perception using sensors under vehicle industrial standard is the rigid demand in autonomous driving. MEMS LiDAR emerges with irresistible trend due to its lower cost, more robust, and meeting the mass-production standards. However, it suffers small field of view (FoV), slowing down the step of its population. In this paper, we propose LEAD, i.e., LiDAR Extender for Autonomous Driving, to exten…
▽ More
3D perception using sensors under vehicle industrial standard is the rigid demand in autonomous driving. MEMS LiDAR emerges with irresistible trend due to its lower cost, more robust, and meeting the mass-production standards. However, it suffers small field of view (FoV), slowing down the step of its population. In this paper, we propose LEAD, i.e., LiDAR Extender for Autonomous Driving, to extend the MEMS LiDAR by coupled image w.r.t both FoV and range. We propose a multi-stage propagation strategy based on depth distributions and uncertainty map, which shows effective propagation ability. Moreover, our depth outpainting/propagation network follows a teacher-student training fashion, which transfers depth estimation ability to depth completion network without any scale error passed. To validate the LiDAR extension quality, we utilize a high-precise laser scanner to generate a ground-truth dataset. Quantitative and qualitative evaluations show that our scheme outperforms SOTAs with a large margin. We believe the proposed LEAD along with the dataset would benefit the community w.r.t depth researches.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
A GCICA Grant-Free Random Access Scheme for M2M Communications in Crowded Massive MIMO Systems
Authors:
Huimei Han,
Lushun Fang,
Weidang Lu,
Wenchao Zhai,
Ying Li,
Jun Zhao
Abstract:
A high success rate of grant-free random access scheme is proposed to support massive access for machine-to-machine communications in massive multipleinput multiple-output systems. This scheme allows active user equipments (UEs) to transmit their modulated uplink messages along with super pilots consisting of multiple sub-pilots to a base station (BS). Then, the BS performs channel state informati…
▽ More
A high success rate of grant-free random access scheme is proposed to support massive access for machine-to-machine communications in massive multipleinput multiple-output systems. This scheme allows active user equipments (UEs) to transmit their modulated uplink messages along with super pilots consisting of multiple sub-pilots to a base station (BS). Then, the BS performs channel state information (CSI) estimation and uplink message decoding by utilizing a proposed graph combined clustering independent component analysis (GCICA) decoding algorithm, and then employs the estimated CSIs to detect active UEs by utilizing the characteristic of asymptotic favorable propagation of massive MIMO channel. We call this proposed scheme as GCICA based random access (GCICA-RA) scheme. We analyze the successful access probability, missed detection probability, and uplink throughput of the GCICA-RA scheme. Numerical results show that, the GCICA-RA scheme significantly improves the successful access probability and uplink throughput, decreases missed detection probability, and provides low CSI estimation error at the same time.
△ Less
Submitted 25 December, 2020;
originally announced December 2020.
-
Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit
Authors:
Tiankuang Zhou,
Xing Lin,
Jiamin Wu,
Yitong Chen,
Hao Xie,
Yipeng Li,
**tao Fan,
Huaqiang Wu,
Lu Fang,
Qionghai Dai
Abstract:
Application-specific optical processors have been considered disruptive technologies for modern computing that can fundamentally accelerate the development of artificial intelligence (AI) by offering substantially improved computing performance. Recent advancements in optical neural network architectures for neural information processing have been applied to perform various machine learning tasks.…
▽ More
Application-specific optical processors have been considered disruptive technologies for modern computing that can fundamentally accelerate the development of artificial intelligence (AI) by offering substantially improved computing performance. Recent advancements in optical neural network architectures for neural information processing have been applied to perform various machine learning tasks. However, the existing architectures have limited complexity and performance; and each of them requires its own dedicated design that cannot be reconfigured to switch between different neural network models for different applications after deployment. Here, we propose an optoelectronic reconfigurable computing paradigm by constructing a diffractive processing unit (DPU) that can efficiently support different neural networks and achieve a high model complexity with millions of neurons. It allocates almost all of its computational operations optically and achieves extremely high speed of data modulation and large-scale network parameter updating by dynamically programming optical modulators and photodetectors. We demonstrated the reconfiguration of the DPU to implement various diffractive feedforward and recurrent neural networks and developed a novel adaptive training approach to circumvent the system imperfections. We applied the trained networks for high-speed classifying of handwritten digit images and human action videos over benchmark datasets, and the experimental results revealed a comparable classification accuracy to the electronic computing approaches. Furthermore, our prototype system built with off-the-shelf optoelectronic components surpasses the performance of state-of-the-art graphics processing units (GPUs) by several times on computing speed and more than an order of magnitude on system energy efficiency.
△ Less
Submitted 26 August, 2020;
originally announced August 2020.
-
Noise-Powered Disentangled Representation for Unsupervised Speckle Reduction of Optical Coherence Tomography Images
Authors:
Yongqiang Huang,
Wenjun Xia,
Zexin Lu,
Yan Liu,
Hu Chen,
Jiliu Zhou,
Leyuan Fang,
Yi Zhang
Abstract:
Due to its noninvasive character, optical coherence tomography (OCT) has become a popular diagnostic method in clinical settings. However, the low-coherence interferometric imaging procedure is inevitably contaminated by heavy speckle noise, which impairs both visual quality and diagnosis of various ocular diseases. Although deep learning has been applied for image denoising and achieved promising…
▽ More
Due to its noninvasive character, optical coherence tomography (OCT) has become a popular diagnostic method in clinical settings. However, the low-coherence interferometric imaging procedure is inevitably contaminated by heavy speckle noise, which impairs both visual quality and diagnosis of various ocular diseases. Although deep learning has been applied for image denoising and achieved promising results, the lack of well-registered clean and noisy image pairs makes it impractical for supervised learning-based approaches to achieve satisfactory OCT image denoising results. In this paper, we propose an unsupervised OCT image speckle reduction algorithm that does not rely on well-registered image pairs. Specifically, by employing the ideas of disentangled representation and generative adversarial network, the proposed method first disentangles the noisy image into content and noise spaces by corresponding encoders. Then, the generator is used to predict the denoised OCT image with the extracted content features. In addition, the noise patches cropped from the noisy image are utilized to facilitate more accurate disentanglement. Extensive experiments have been conducted, and the results suggest that our proposed method is superior to the classic methods and demonstrates competitive performance to several recently proposed learning-based approaches in both quantitative and qualitative aspects.
△ Less
Submitted 7 July, 2020;
originally announced July 2020.
-
Spatial-Angular Attention Network for Light Field Reconstruction
Authors:
Gaochang Wu,
Yingqian Wang,
Yebin Liu,
Lu Fang,
Tianyou Chai
Abstract:
Typical learning-based light field reconstruction methods demand in constructing a large receptive field by deepening the network to capture correspondences between input views. In this paper, we propose a spatial-angular attention network to perceive correspondences in the light field non-locally, and reconstruction high angular resolution light field in an end-to-end manner. Motivated by the non…
▽ More
Typical learning-based light field reconstruction methods demand in constructing a large receptive field by deepening the network to capture correspondences between input views. In this paper, we propose a spatial-angular attention network to perceive correspondences in the light field non-locally, and reconstruction high angular resolution light field in an end-to-end manner. Motivated by the non-local attention mechanism, a spatial-angular attention module specifically for the high-dimensional light field data is introduced to compute the responses from all the positions in the epipolar plane for each pixel in the light field, and generate an attention map that captures correspondences along the angular dimension. We then propose a multi-scale reconstruction structure to efficiently implement the non-local attention in the low spatial scale, while also preserving the high frequency components in the high spatial scales. Extensive experiments demonstrate the superior performance of the proposed spatial-angular attention network for reconstructing sparsely-sampled light fields with non-Lambertian effects.
△ Less
Submitted 13 October, 2021; v1 submitted 5 July, 2020;
originally announced July 2020.
-
SurfaceNet+: An End-to-end 3D Neural Network for Very Sparse Multi-view Stereopsis
Authors:
Mengqi Ji,
**zhi Zhang,
Qionghai Dai,
Lu Fang
Abstract:
Multi-view stereopsis (MVS) tries to recover the 3D model from 2D images. As the observations become sparser, the significant 3D information loss makes the MVS problem more challenging. Instead of only focusing on densely sampled conditions, we investigate sparse-MVS with large baseline angles since the sparser sensation is more practical and more cost-efficient. By investigating various observati…
▽ More
Multi-view stereopsis (MVS) tries to recover the 3D model from 2D images. As the observations become sparser, the significant 3D information loss makes the MVS problem more challenging. Instead of only focusing on densely sampled conditions, we investigate sparse-MVS with large baseline angles since the sparser sensation is more practical and more cost-efficient. By investigating various observation sparsities, we show that the classical depth-fusion pipeline becomes powerless for the case with a larger baseline angle that worsens the photo-consistency check. As another line of the solution, we present SurfaceNet+, a volumetric method to handle the 'incompleteness' and the 'inaccuracy' problems induced by a very sparse MVS setup. Specifically, the former problem is handled by a novel volume-wise view selection approach. It owns superiority in selecting valid views while discarding invalid occluded views by considering the geometric prior. Furthermore, the latter problem is handled via a multi-scale strategy that consequently refines the recovered geometry around the region with the repeating pattern. The experiments demonstrate the tremendous performance gap between SurfaceNet+ and state-of-the-art methods in terms of precision and recall. Under the extreme sparse-MVS settings in two datasets, where existing methods can only return very few points, SurfaceNet+ still works as well as in the dense MVS setting. The benchmark and the implementation are publicly available at https://github.com/mjiUST/SurfaceNet-plus.
△ Less
Submitted 26 May, 2020;
originally announced May 2020.
-
Multiscale Sparsifying Transform Learning for Image Denoising
Authors:
Ashkan Abbasi,
Amirhassan Monadjemi,
Leyuan Fang,
Hossein Rabbani,
Neda Noormohammadi,
Yi Zhang
Abstract:
The data-driven sparse methods such as synthesis dictionary learning (e.g., K-SVD) and sparsifying transform learning have been proven effective in image denoising. However, they are intrinsically single-scale which can lead to suboptimal results. We propose two methods developed based on wavelet subbands mixing to efficiently combine the merits of both single and multiscale methods. We show that…
▽ More
The data-driven sparse methods such as synthesis dictionary learning (e.g., K-SVD) and sparsifying transform learning have been proven effective in image denoising. However, they are intrinsically single-scale which can lead to suboptimal results. We propose two methods developed based on wavelet subbands mixing to efficiently combine the merits of both single and multiscale methods. We show that an efficient multiscale method can be devised without the need for denoising detail subbands which substantially reduces the runtime. The proposed methods are initially derived within the framework of sparsifying transform learning denoising, and then, they are generalized to propose our multiscale extensions for the well-known K-SVD and SAIST image denoising methods. We analyze and assess the studied methods thoroughly and compare them with the well-known and state-of-the-art methods. The experiments show that our methods are able to offer good trade-offs between performance and complexity.
△ Less
Submitted 25 July, 2021; v1 submitted 25 March, 2020;
originally announced March 2020.
-
Smart Cameras
Authors:
David J. Brady,
Minghao Hu,
Chengyu Wang,
Xuefei Yan,
Lu Fang,
Yiwnheng Zhu,
Yang Tan,
Ming Cheng,
Zhan Ma
Abstract:
We review camera architecture in the age of artificial intelligence. Modern cameras use physical components and software to capture, compress and display image data. Over the past 5 years, deep learning solutions have become superior to traditional algorithms for each of these functions. Deep learning enables 10-100x reduction in electrical sensor power per pixel, 10x improvement in depth of field…
▽ More
We review camera architecture in the age of artificial intelligence. Modern cameras use physical components and software to capture, compress and display image data. Over the past 5 years, deep learning solutions have become superior to traditional algorithms for each of these functions. Deep learning enables 10-100x reduction in electrical sensor power per pixel, 10x improvement in depth of field and dynamic range and 10-100x improvement in image pixel count. Deep learning enables multiframe and multiaperture solutions that fundamentally shift the goals of physical camera design. Here we review the state of the art of deep learning in camera operations and consider the impact of AI on the physical design of cameras.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
A Machine Learning-enhanced Robust P-Phase Picker for Real-time Seismic Monitoring
Authors:
Dazhong Shen,
Qi Zhang,
Tong Xu,
Hengshu Zhu,
Wenjia Zhao,
Zikai Yin,
Peilun Zhou,
Lihua Fang,
Enhong Chen,
Hui Xiong
Abstract:
Identifying the arrival times of seismic P-phases plays a significant role in real-time seismic monitoring, which provides critical guidance for emergency response activities. While considerable research has been conducted on this topic, efficiently capturing the arrival times of seismic P-phases hidden within intensively distributed and noisy seismic waves, such as those generated by the aftersho…
▽ More
Identifying the arrival times of seismic P-phases plays a significant role in real-time seismic monitoring, which provides critical guidance for emergency response activities. While considerable research has been conducted on this topic, efficiently capturing the arrival times of seismic P-phases hidden within intensively distributed and noisy seismic waves, such as those generated by the aftershocks of destructive earthquakes, remains a real challenge since most common existing methods in seismology rely on laborious expert supervision. To this end, in this paper, we present a machine learning-enhanced framework based on ensemble learning strategy, EL-Picker, for the automatic identification of seismic P-phase arrivals on continuous and massive waveforms. More specifically, EL-Picker consists of three modules, namely, Trigger, Classifier, and Refiner, and an ensemble learning strategy is exploited to integrate several machine learning classifiers. An evaluation of the aftershocks following the MS 8.0 Wenchuan earthquake demonstrates that EL-Picker can not only achieve the best identification performance but also identify 120% more seismic P-phase arrivals as complementary data. Meanwhile, experimental results also reveal both the applicability of different machine learning models for waveforms collected from different seismic stations and the regularities of seismic P-phase arrivals that might be neglected during manual inspection. These findings clearly validate the effectiveness, efficiency, flexibility and stability of EL-Picker.
△ Less
Submitted 20 August, 2020; v1 submitted 20 November, 2019;
originally announced November 2019.
-
Universal digital filtering for denoising volumetric retinal OCT and OCT angiography in 3D shearlet domain
Authors:
Jianlong Yang,
Yan Hu,
Liyang Fang,
Jun Cheng,
Jiang Liu
Abstract:
Retinal optical coherence tomography (OCT) and OCT angiography (OCTA) suffer from the degeneration of image quality due to speckle noise and bulk-motion noise, respectively. Because the cross-sectional retina has distinct features in OCT and OCTA B-scans, existing digital filters that can denoise OCT efficiently are unable to handle the bulk-motion noise in OCTA. In this Letter, we propose a unive…
▽ More
Retinal optical coherence tomography (OCT) and OCT angiography (OCTA) suffer from the degeneration of image quality due to speckle noise and bulk-motion noise, respectively. Because the cross-sectional retina has distinct features in OCT and OCTA B-scans, existing digital filters that can denoise OCT efficiently are unable to handle the bulk-motion noise in OCTA. In this Letter, we propose a universal digital filtering approach that is capable of minimizing both types of noise. Considering the retinal capillaries in OCTA are hard to differentiate in B-scans while having distinct curvilinear structures in 3D volumes, we decompose the volumetric OCT and OCTA data with 3D shearlets thus efficiently separate the retinal tissue and vessels from the noise in this transform domain. Compared with wavelets and curvelets, the shearlets provide better representation of the layer edges in OCT and the vasculature in OCTA. Qualitative and quantitative results show the proposed method outperforms the state-of-the-art OCT and OCTA denoising methods. Besides, the superiority of 3D denoising is demonstrated by comparing the 3D shearlet filtering with its 2D counterpart.
△ Less
Submitted 8 January, 2020; v1 submitted 18 November, 2019;
originally announced November 2019.
-
Deep Learning for Hyperspectral Image Classification: An Overview
Authors:
Shutao Li,
Weiwei Song,
Leyuan Fang,
Yushi Chen,
Pedram Ghamisi,
Jón Atli Benediktsson
Abstract:
Hyperspectral image (HSI) classification has become a hot topic in the field of remote sensing. In general, the complex characteristics of hyperspectral data make the accurate classification of such data challenging for traditional machine learning methods. In addition, hyperspectral imaging often deals with an inherently nonlinear relation between the captured spectral information and the corresp…
▽ More
Hyperspectral image (HSI) classification has become a hot topic in the field of remote sensing. In general, the complex characteristics of hyperspectral data make the accurate classification of such data challenging for traditional machine learning methods. In addition, hyperspectral imaging often deals with an inherently nonlinear relation between the captured spectral information and the corresponding materials. In recent years, deep learning has been recognized as a powerful feature-extraction tool to effectively address nonlinear problems and widely used in a number of image processing tasks. Motivated by those successful applications, deep learning has also been introduced to classify HSIs and demonstrated good performance. This survey paper presents a systematic review of deep learning-based HSI classification literatures and compares several strategies for this topic. Specifically, we first summarize the main challenges of HSI classification which cannot be effectively overcome by traditional machine learning methods, and also introduce the advantages of deep learning to handle these problems. Then, we build a framework which divides the corresponding works into spectral-feature networks, spatial-feature networks, and spectral-spatial-feature networks to systematically review the recent achievements in deep learning-based HSI classification. In addition, considering the fact that available training samples in the remote sensing field are usually very limited and training deep networks require a large number of samples, we include some strategies to improve classification performance, which can provide some guidelines for future studies on this topic. Finally, several representative deep learning-based classification methods are conducted on real HSIs in our experiments.
△ Less
Submitted 26 October, 2019;
originally announced October 2019.
-
Knowledge infused cascade convolutional neural network for segmenting retinal vessels in volumetric optical coherence tomography
Authors:
Liyang Fang,
Jianlong Yang,
Lei Mou,
Huihong Zhang,
Zhenjie Chai,
Zhi Chen,
Jiang Liu
Abstract:
We present a cascade deep neural network to segment retinal vessels in volumetric optical coherence tomography (OCT). Two types of knowledge are infused into the network for confining the searching regions. (1) Histology. The retinal vessels locate between the inner limiting membrane and the inner nuclear layer of human retina. (2) Imaging. The red blood cells inside the vessels scatter the OCT pr…
▽ More
We present a cascade deep neural network to segment retinal vessels in volumetric optical coherence tomography (OCT). Two types of knowledge are infused into the network for confining the searching regions. (1) Histology. The retinal vessels locate between the inner limiting membrane and the inner nuclear layer of human retina. (2) Imaging. The red blood cells inside the vessels scatter the OCT probe light forward and form projection shadows on the retinal pigment epithelium (RPE) layer, which is avascular thus perfect for localizing the retinal vessel in transverse plane. Qualitative and quantitative comparison results show that the proposed method outperforms the state-of-the-art deep learning and graph-based methods. This work demonstrates, instead of modifying the architectures of the deep networks, incorporating proper prior knowledge in the design of the image processing framework could be an efficient approach for handling such specific tasks.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
High signal-to-noise ratio reconstruction of low bit-depth optical coherence tomography using deep learning
Authors:
Qiangjiang Hao,
Kang Zhou,
Jianlong Yang,
Liyang Fang,
Zhengjie Chai,
Yuhui Ma,
Yan Hu,
Shenghua Gao,
Jiang Liu
Abstract:
Reducing the bit-depth is an effective approach to lower the cost of optical coherence tomography (OCT) systems and increase the transmission efficiency in data acquisition and telemedicine. However, a low bit-depth will lead to the degeneration of the detection sensitivity thus reduce the signal-to-noise ratio (SNR) of OCT images. In this paper, we propose to use deep learning for the reconstruct…
▽ More
Reducing the bit-depth is an effective approach to lower the cost of optical coherence tomography (OCT) systems and increase the transmission efficiency in data acquisition and telemedicine. However, a low bit-depth will lead to the degeneration of the detection sensitivity thus reduce the signal-to-noise ratio (SNR) of OCT images. In this paper, we propose to use deep learning for the reconstruction of the high SNR OCT images from the low bit-depth acquisition. Its feasibility was preliminarily evaluated by applying the proposed method to the quantized $3\sim8$-bit data from native 12-bit interference fringes. We employed a pixel-to-pixel generative adversarial network architecture in the low to high bit-depth OCT image transition. Retinal OCT data of a healthy subject from a homemade spectral-domain OCT system was used in the study. Extensively qualitative and quantitative results show this deep-learning-based approach could significantly improve the SNR of the low bit-depth OCT images especially at the choroidal region. Superior similarity and SNR between the reconstructed images and the original 12-bit OCT images could be derived when the bit-depth $\geq 5$. This work demonstrates the proper integration of OCT and deep learning could benefit the development of healthcare in low-resource settings.
△ Less
Submitted 11 February, 2020; v1 submitted 12 October, 2019;
originally announced October 2019.
-
Digital resolution enhancement in low transverse sampling optical coherence tomography angiography using deep learning
Authors:
Ting Zhou,
Kang Zhou,
Jianlong Yang,
Liyang Fang,
Yan Hu,
Yitian Zhao,
Jun Cheng,
** Chen,
Shenghua Gao,
Jiang Liu
Abstract:
Optical coherence tomography angiography (OCTA) requires high transverse sampling density for visualizing retinal and choroidal capillaries. Low transverse sampling causes resolution degradation, such as the angiograms in wide-field OCTA. In this paper, we propose to address this problem using deep learning. We conducted extensive experiments on converting the centrally cropped 3 x 3 mm2 field of…
▽ More
Optical coherence tomography angiography (OCTA) requires high transverse sampling density for visualizing retinal and choroidal capillaries. Low transverse sampling causes resolution degradation, such as the angiograms in wide-field OCTA. In this paper, we propose to address this problem using deep learning. We conducted extensive experiments on converting the centrally cropped 3 x 3 mm2 field of view (FOV) of the 8 x 8 mm2 foveal OCTA images (a sampling density of 22.9 $μ$m) to the native 3 x 3 mm2 en face OCTA images (a sampling density of 12.2 $μ$m). We employed a cycle-consistent adversarial network architecture in this conversion. The quantitative analysis using the perceptual similarity measures shows the generated OCTA images are closer to the native 3 x 3 mm2 scans. Besides, the results show the proposed method could also enhance signal-to-noise ratio. We further applied our method to enhance diseased cases and calculate vascular biomarkers, which demonstrates its generalization performance and clinical perspective.
△ Less
Submitted 8 January, 2020; v1 submitted 3 October, 2019;
originally announced October 2019.
-
The Channel Attention based Context Encoder Network for Inner Limiting Membrane Detection
Authors:
Hao Qiu,
Zaiwang Gu,
Lei Mou,
Xiaoqian Mao,
Liyang Fang,
Yitian Zhao,
Jiang Liu,
Jun Cheng
Abstract:
The optic disc segmentation is an important step for retinal image-based disease diagnosis such as glaucoma. The inner limiting membrane (ILM) is the first boundary in the OCT, which can help to extract the retinal pigment epithelium (RPE) through gradient edge information to locate the boundary of the optic disc. Thus, the ILM layer segmentation is of great importance for optic disc localization.…
▽ More
The optic disc segmentation is an important step for retinal image-based disease diagnosis such as glaucoma. The inner limiting membrane (ILM) is the first boundary in the OCT, which can help to extract the retinal pigment epithelium (RPE) through gradient edge information to locate the boundary of the optic disc. Thus, the ILM layer segmentation is of great importance for optic disc localization. In this paper, we build a new optic disc centered dataset from 20 volunteers and manually annotated the ILM boundary in each OCT scan as ground-truth. We also propose a channel attention based context encoder network modified from the CE-Net to segment the optic disc. It mainly contains three phases: the encoder module, the channel attention based context encoder module, and the decoder module. Finally, we demonstrate that our proposed method achieves state-of-the-art disc segmentation performance on our dataset mentioned above.
△ Less
Submitted 9 August, 2019;
originally announced August 2019.
-
Efficient Structurally-Strengthened Generative Adversarial Network for MRI Reconstruction
Authors:
Wenzhong Zhou,
Huiqian Du,
Wenbo Mei,
Li** Fang
Abstract:
Compressed sensing based magnetic resonance imaging (CS-MRI) provides an efficient way to reduce scanning time of MRI. Recently deep learning has been introduced into CS-MRI to further improve the image quality and shorten reconstruction time. In this paper, we propose an efficient structurally strengthened Generative Adversarial Network, termed ESSGAN, for reconstructing MR images from highly und…
▽ More
Compressed sensing based magnetic resonance imaging (CS-MRI) provides an efficient way to reduce scanning time of MRI. Recently deep learning has been introduced into CS-MRI to further improve the image quality and shorten reconstruction time. In this paper, we propose an efficient structurally strengthened Generative Adversarial Network, termed ESSGAN, for reconstructing MR images from highly under-sampled k-space data. ESSGAN consists of a structurally strengthened generator (SG) and a discriminator. In SG, we introduce strengthened connections (SCs) to improve the utilization of the feature maps between the proposed strengthened convolutional autoencoders (SCAEs), where each SCAE is a variant of a typical convolutional autoencoder. In addition, we creatively introduce a residual in residual block (RIRB) to SG. RIRB increases the depth of SG, thus enhances feature expression ability of SG. Moreover, it can give the encoder blocks and the decoder blocks richer texture features. To further reduce artifacts and preserve more image details, we introduce an enhanced structural loss to SG. ESSGAN can provide higher image quality with less model parameters than the state-of-the-art deep learning-based methods at different undersampling rates of different subsampling masks, and reconstruct a 256*256 MR image in tens of milliseconds.
△ Less
Submitted 11 August, 2019;
originally announced August 2019.
-
Deep Clustering With Intra-class Distance Constraint for Hyperspectral Images
Authors:
**guang Sun,
Wanli Wang,
Xian Wei,
Li Fang,
Xiaoliang Tang,
Yusheng Xu,
Hui Yu,
Wei Yao
Abstract:
The high dimensionality of hyperspectral images often results in the degradation of clustering performance. Due to the powerful ability of deep feature extraction and non-linear feature representation, the clustering algorithm based on deep learning has become a hot research topic in the field of hyperspectral remote sensing. However, most deep clustering algorithms for hyperspectral images utiliz…
▽ More
The high dimensionality of hyperspectral images often results in the degradation of clustering performance. Due to the powerful ability of deep feature extraction and non-linear feature representation, the clustering algorithm based on deep learning has become a hot research topic in the field of hyperspectral remote sensing. However, most deep clustering algorithms for hyperspectral images utilize deep neural networks as feature extractor without considering prior knowledge constraints that are suitable for clustering. To solve this problem, we propose an intra-class distance constrained deep clustering algorithm for high-dimensional hyperspectral images. The proposed algorithm constrains the feature map** procedure of the auto-encoder network by intra-class distance so that raw images are transformed from the original high-dimensional space to the low-dimensional feature space that is more conducive to clustering. Furthermore, the related learning process is treated as a joint optimization problem of deep feature extraction and clustering. Experimental results demonstrate the intense competitiveness of the proposed algorithm in comparison with state-of-the-art clustering methods of hyperspectral images.
△ Less
Submitted 1 April, 2019;
originally announced April 2019.
-
Deep learning for seismic phase detection and picking in the aftershock zone of 2008 Mw7.9 Wenchuan earthquake
Authors:
Lijun Zhu,
Zhigang Peng,
James McClellan,
Chenyu Li,
Dongdong Yao,
Zefeng Li,
Lihua Fang
Abstract:
The increasing volume of seismic data from long-term continuous monitoring motivates the development of algorithms based on convolutional neural network (CNN) for faster and more reliable phase detection and picking. However, many less studied regions lack a significant amount of labeled events needed for traditional CNN approaches. In this paper, we present a CNN-based Phase- Identification Class…
▽ More
The increasing volume of seismic data from long-term continuous monitoring motivates the development of algorithms based on convolutional neural network (CNN) for faster and more reliable phase detection and picking. However, many less studied regions lack a significant amount of labeled events needed for traditional CNN approaches. In this paper, we present a CNN-based Phase- Identification Classifier (CPIC) designed for phase detection and picking on small to medium sized training datasets. When trained on 30,146 labeled phases and applied to one-month of continuous recordings during the aftershock sequences of the 2008 MW 7.9 Wenchuan Earthquake in Sichuan, China, CPIC detects 97.5% of the manually picked phases in the standard catalog and predicts their arrival times with a five-times improvement over the ObsPy AR picker. In addition, unlike other CNN-based approaches that require millions of training samples, when the off-line training set size of CPIC is reduced to only a few thousand training samples the accuracy stays above 95%. The online implementation of CPIC takes less than 12 hours to pick arrivals in 31-day recordings on 14 stations. In addition to the catalog phases manually picked by analysts, CPIC finds more phases for existing events and new events missed in the catalog. Among those additional detections, some are confirmed by a matched filter method while others require further investigation. Finally, when tested on a small dataset from a different region (Oklahoma, US), CPIC achieves 97% accuracy after fine tuning only the fully connected layer of the model. This result suggests that the CPIC developed in this study can be used to identify and pick P/S arrivals in other regions with no or minimum labeled phases.
△ Less
Submitted 29 January, 2019; v1 submitted 18 January, 2019;
originally announced January 2019.