Search | arXiv e-print repository

NeurTV: Total Variation on the Neural Domain

Authors: Yisi Luo, Xile Zhao, Kai Ye, Deyu Meng

Abstract: Recently, we have witnessed the success of total variation (TV) for many imaging applications. However, traditional TV is defined on the original pixel domain, which limits its potential. In this work, we suggest a new TV regularization defined on the neural domain. Concretely, the discrete data is continuously and implicitly represented by a deep neural network (DNN), and we use the derivatives o… ▽ More Recently, we have witnessed the success of total variation (TV) for many imaging applications. However, traditional TV is defined on the original pixel domain, which limits its potential. In this work, we suggest a new TV regularization defined on the neural domain. Concretely, the discrete data is continuously and implicitly represented by a deep neural network (DNN), and we use the derivatives of DNN outputs w.r.t. input coordinates to capture local correlations of data. As compared with classical TV on the original domain, the proposed TV on the neural domain (termed NeurTV) enjoys two advantages. First, NeurTV is not limited to meshgrid but is suitable for both meshgrid and non-meshgrid data. Second, NeurTV can more exactly capture local correlations across data for any direction and any order of derivatives attributed to the implicit and continuous nature of neural domain. We theoretically reinterpret NeurTV under the variational approximation framework, which allows us to build the connection between classical TV and NeurTV and inspires us to develop variants (e.g., NeurTV with arbitrary resolution and space-variant NeurTV). Extensive numerical experiments with meshgrid data (e.g., color and hyperspectral images) and non-meshgrid data (e.g., point clouds and spatial transcriptomics) showcase the effectiveness of the proposed methods. △ Less

Submitted 27 May, 2024; originally announced May 2024.

MSC Class: 94A08; 68U10; 68T45

arXiv:2403.09993 [pdf, other]

TRG-Net: An Interpretable and Controllable Rain Generator

Authors: Zhiqiang Pang, Hong Wang, Qi Xie, Deyu Meng, Zongben Xu

Abstract: Exploring and modeling rain generation mechanism is critical for augmenting paired data to ease training of rainy image processing models. Against this task, this study proposes a novel deep learning based rain generator, which fully takes the physical generation mechanism underlying rains into consideration and well encodes the learning of the fundamental rain factors (i.e., shape, orientation, l… ▽ More Exploring and modeling rain generation mechanism is critical for augmenting paired data to ease training of rainy image processing models. Against this task, this study proposes a novel deep learning based rain generator, which fully takes the physical generation mechanism underlying rains into consideration and well encodes the learning of the fundamental rain factors (i.e., shape, orientation, length, width and sparsity) explicitly into the deep network. Its significance lies in that the generator not only elaborately design essential elements of the rain to simulate expected rains, like conventional artificial strategies, but also finely adapt to complicated and diverse practical rainy images, like deep learning methods. By rationally adopting filter parameterization technique, we first time achieve a deep network that is finely controllable with respect to rain factors and able to learn the distribution of these factors purely from data. Our unpaired generation experiments demonstrate that the rain generated by the proposed rain generator is not only of higher quality, but also more effective for deraining and downstream tasks compared to current state-of-the-art rain generation methods. Besides, the paired data augmentation experiments, including both in-distribution and out-of-distribution (OOD), further validate the diversity of samples generated by our model for in-distribution deraining and OOD generalization tasks. △ Less

Submitted 29 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

arXiv:2402.15865 [pdf, other]

HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models

Authors: Li Pang, Xiangyu Rui, Long Cui, Hongzhong Wang, Deyu Meng, Xiangyong Cao

Abstract: Hyperspectral image (HSI) restoration aims at recovering clean images from degraded observations and plays a vital role in downstream tasks. Existing model-based methods have limitations in accurately modeling the complex image characteristics with handcraft priors, and deep learning-based methods suffer from poor generalization ability. To alleviate these issues, this paper proposes an unsupervis… ▽ More Hyperspectral image (HSI) restoration aims at recovering clean images from degraded observations and plays a vital role in downstream tasks. Existing model-based methods have limitations in accurately modeling the complex image characteristics with handcraft priors, and deep learning-based methods suffer from poor generalization ability. To alleviate these issues, this paper proposes an unsupervised HSI restoration framework with pre-trained diffusion model (HIR-Diff), which restores the clean HSIs from the product of two low-rank components, i.e., the reduced image and the coefficient matrix. Specifically, the reduced image, which has a low spectral dimension, lies in the image field and can be inferred from our improved diffusion model where a new guidance function with total variation (TV) prior is designed to ensure that the reduced image can be well sampled. The coefficient matrix can be effectively pre-estimated based on singular value decomposition (SVD) and rank-revealing QR (RRQR) factorization. Furthermore, a novel exponential noise schedule is proposed to accelerate the restoration process (about 5$\times$ acceleration for denoising) with little performance decrease. Extensive experimental results validate the superiority of our method in both performance and speed on a variety of HSI restoration tasks, including HSI denoising, noisy HSI super-resolution, and noisy HSI inpainting. The code is available at https://github.com/LiPang/HIRDiff. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2401.00708 [pdf, other]

Revisiting Nonlocal Self-Similarity from Continuous Representation

Authors: Yisi Luo, Xile Zhao, Deyu Meng

Abstract: Nonlocal self-similarity (NSS) is an important prior that has been successfully applied in multi-dimensional data processing tasks, e.g., image and video recovery. However, existing NSS-based methods are solely suitable for meshgrid data such as images and videos, but are not suitable for emerging off-meshgrid data, e.g., point cloud and climate data. In this work, we revisit the NSS from the cont… ▽ More Nonlocal self-similarity (NSS) is an important prior that has been successfully applied in multi-dimensional data processing tasks, e.g., image and video recovery. However, existing NSS-based methods are solely suitable for meshgrid data such as images and videos, but are not suitable for emerging off-meshgrid data, e.g., point cloud and climate data. In this work, we revisit the NSS from the continuous representation perspective and propose a novel Continuous Representation-based NonLocal method (termed as CRNL), which has two innovative features as compared with classical nonlocal methods. First, based on the continuous representation, our CRNL unifies the measure of self-similarity for on-meshgrid and off-meshgrid data and thus is naturally suitable for both of them. Second, the nonlocal continuous groups can be more compactly and efficiently represented by the coupled low-rank function factorization, which simultaneously exploits the similarity within each group and across different groups, while classical nonlocal methods neglect the similarity across groups. This elaborately designed coupled mechanism allows our method to enjoy favorable performance over conventional NSS methods in terms of both effectiveness and efficiency. Extensive multi-dimensional data processing experiments on-meshgrid (e.g., image inpainting and image denoising) and off-meshgrid (e.g., climate data prediction and point cloud recovery) validate the versatility, effectiveness, and efficiency of our CRNL as compared with state-of-the-art methods. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2312.15701 [pdf, other]

Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration

Authors: Jiahong Fu, Qi Xie, Deyu Meng, Zongben Xu

Abstract: The deep unfolding approach has attracted significant attention in computer vision tasks, which well connects conventional image processing modeling manners with more recent deep learning techniques. Specifically, by establishing a direct correspondence between algorithm operators at each implementation step and network modules within each layer, one can rationally construct an almost ``white box'… ▽ More The deep unfolding approach has attracted significant attention in computer vision tasks, which well connects conventional image processing modeling manners with more recent deep learning techniques. Specifically, by establishing a direct correspondence between algorithm operators at each implementation step and network modules within each layer, one can rationally construct an almost ``white box'' network architecture with high interpretability. In this architecture, only the predefined component of the proximal operator, known as a proximal network, needs manual configuration, enabling the network to automatically extract intrinsic image priors in a data-driven manner. In current deep unfolding methods, such a proximal network is generally designed as a CNN architecture, whose necessity has been proven by a recent theory. That is, CNN structure substantially delivers the translational invariant image prior, which is the most universally possessed structural prior across various types of images. However, standard CNN-based proximal networks have essential limitations in capturing the rotation symmetry prior, another universal structural prior underlying general images. This leaves a large room for further performance improvement in deep unfolding approaches. To address this issue, this study makes efforts to suggest a high-accuracy rotation equivariant proximal network that effectively embeds rotation symmetry priors into the deep unfolding framework. Especially, we deduce, for the first time, the theoretical equivariant error for such a designed proximal network with arbitrary layers under arbitrary rotation degrees. This analysis should be the most refined theoretical conclusion for such error evaluation to date and is also indispensable for supporting the rationale behind such networks with intrinsic interpretability requirements. △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2310.05290 [pdf, other]

MSight: An Edge-Cloud Infrastructure-based Perception System for Connected Automated Vehicles

Authors: Rusheng Zhang, Depu Meng, Shengyin Shen, Zhengxia Zou, Houqiang Li, Henry X. Liu

Abstract: As vehicular communication and networking technologies continue to advance, infrastructure-based roadside perception emerges as a pivotal tool for connected automated vehicle (CAV) applications. Due to their elevated positioning, roadside sensors, including cameras and lidars, often enjoy unobstructed views with diminished object occlusion. This provides them a distinct advantage over onboard perc… ▽ More As vehicular communication and networking technologies continue to advance, infrastructure-based roadside perception emerges as a pivotal tool for connected automated vehicle (CAV) applications. Due to their elevated positioning, roadside sensors, including cameras and lidars, often enjoy unobstructed views with diminished object occlusion. This provides them a distinct advantage over onboard perception, enabling more robust and accurate detection of road objects. This paper presents MSight, a cutting-edge roadside perception system specifically designed for CAVs. MSight offers real-time vehicle detection, localization, tracking, and short-term trajectory prediction. Evaluations underscore the system's capability to uphold lane-level accuracy with minimal latency, revealing a range of potential applications to enhance CAV safety and efficiency. Presently, MSight operates 24/7 at a two-lane roundabout in the City of Ann Arbor, Michigan. △ Less

Submitted 8 October, 2023; originally announced October 2023.

Comments: Submitted to IEEE T-ITS

arXiv:2309.15638 [pdf, other]

FRS-Nets: Fourier Parameterized Rotation and Scale Equivariant Networks for Retinal Vessel Segmentation

Authors: Zihong Sun, Qi Xie, Deyu Meng

Abstract: With translation equivariance, convolution neural networks (CNNs) have achieved great success in retinal vessel segmentation. However, some other symmetries of the vascular morphology are not characterized by CNNs, such as rotation and scale symmetries. To embed more equivariance into CNNs and achieve the accuracy requirement for retinal vessel segmentation, we construct a novel convolution operat… ▽ More With translation equivariance, convolution neural networks (CNNs) have achieved great success in retinal vessel segmentation. However, some other symmetries of the vascular morphology are not characterized by CNNs, such as rotation and scale symmetries. To embed more equivariance into CNNs and achieve the accuracy requirement for retinal vessel segmentation, we construct a novel convolution operator (FRS-Conv), which is Fourier parameterized and equivariant to rotation and scaling. Specifically, we first adopt a new parameterization scheme, which enables convolutional filters to arbitrarily perform transformations with high accuracy. Secondly, we derive the formulations for the rotation and scale equivariant convolution map**. Finally, we construct FRS-Conv following the proposed formulations and replace the traditional convolution filters in U-Net and Iter-Net with FRS-Conv (FRS-Nets). We faithfully reproduce all compared methods and conduct comprehensive experiments on three public datasets under both in-dataset and cross-dataset settings. With merely 13.9% parameters of corresponding baselines, FRS-Nets have achieved state-of-the-art performance and significantly outperform all compared methods. It demonstrates the remarkable accuracy, generalization, and clinical application potential of FRS-Nets. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2308.16612 [pdf, other]

Neural Gradient Regularizer

Authors: Shuang Xu, Yifan Wang, Zixiang Zhao, Jiangjun Peng, Xiangyong Cao, Deyu Meng, Yulun Zhang, Radu Timofte, Luc Van Gool

Abstract: Owing to its significant success, the prior imposed on gradient maps has consistently been a subject of great interest in the field of image processing. Total variation (TV), one of the most representative regularizers, is known for its ability to capture the intrinsic sparsity prior underlying gradient maps. Nonetheless, TV and its variants often underestimate the gradient maps, leading to the we… ▽ More Owing to its significant success, the prior imposed on gradient maps has consistently been a subject of great interest in the field of image processing. Total variation (TV), one of the most representative regularizers, is known for its ability to capture the intrinsic sparsity prior underlying gradient maps. Nonetheless, TV and its variants often underestimate the gradient maps, leading to the weakening of edges and details whose gradients should not be zero in the original image (i.e., image structures is not describable by sparse priors of gradient maps). Recently, total deep variation (TDV) has been introduced, assuming the sparsity of feature maps, which provides a flexible regularization learned from large-scale datasets for a specific task. However, TDV requires to retrain the network with image/task variations, limiting its versatility. To alleviate this issue, in this paper, we propose a neural gradient regularizer (NGR) that expresses the gradient map as the output of a neural network. Unlike existing methods, NGR does not rely on any subjective sparsity or other prior assumptions on image gradient maps, thereby avoiding the underestimation of gradient maps. NGR is applicable to various image types and different image processing tasks, functioning in a zero-shot learning fashion, making it a versatile and plug-and-play regularizer. Extensive experimental results demonstrate the superior performance of NGR over state-of-the-art counterparts for a range of different tasks, further validating its effectiveness and versatility. △ Less

Submitted 13 September, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

arXiv:2306.17799 [pdf, other]

A Low-rank Matching Attention based Cross-modal Feature Fusion Method for Conversational Emotion Recognition

Authors: Yuntao Shou, Xiangyong Cao, Deyu Meng, Bo Dong, Qinghua Zheng

Abstract: Conversational emotion recognition (CER) is an important research topic in human-computer interactions. Although deep learning (DL) based CER approaches have achieved excellent performance, existing cross-modal feature fusion methods used in these DL-based approaches either ignore the intra-modal and inter-modal emotional interaction or have high computational complexity. To address these issues,… ▽ More Conversational emotion recognition (CER) is an important research topic in human-computer interactions. Although deep learning (DL) based CER approaches have achieved excellent performance, existing cross-modal feature fusion methods used in these DL-based approaches either ignore the intra-modal and inter-modal emotional interaction or have high computational complexity. To address these issues, this paper develops a novel cross-modal feature fusion method for the CER task, i.e., the low-rank matching attention method (LMAM). By setting a matching weight and calculating attention scores between modal features row by row, LMAM contains fewer parameters than the self-attention method. We further utilize the low-rank decomposition method on the weight to make the parameter number of LMAM less than one-third of the self-attention. Therefore, LMAM can potentially alleviate the over-fitting issue caused by a large number of parameters. Additionally, by computing and fusing the similarity of intra-modal and inter-modal features, LMAM can also fully exploit the intra-modal contextual information within each modality and the complementary semantic information across modalities (i.e., text, video and audio) simultaneously. Experimental results on some benchmark datasets show that LMAM can be embedded into any existing state-of-the-art DL-based CER methods and help boost their performance in a plug-and-play manner. Also, experimental results verify the superiority of LMAM compared with other popular cross-modal fusion methods. Moreover, LMAM is a general cross-modal fusion method and can thus be applied to other multi-modal recognition tasks, e.g., session recommendation and humour detection. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: 10 pages, 4 figures

arXiv:2306.17797 [pdf, other]

HIDFlowNet: A Flow-Based Deep Network for Hyperspectral Image Denoising

Authors: Li Pang, Weizhen Gu, Xiangyong Cao, Xiangyu Rui, Jiangjun Peng, Shuang Xu, Gang Yang, Deyu Meng

Abstract: Hyperspectral image (HSI) denoising is essentially ill-posed since a noisy HSI can be degraded from multiple clean HSIs. However, current deep learning-based approaches ignore this fact and restore the clean image with deterministic map** (i.e., the network receives a noisy HSI and outputs a clean HSI). To alleviate this issue, this paper proposes a flow-based HSI denoising network (HIDFlowNet)… ▽ More Hyperspectral image (HSI) denoising is essentially ill-posed since a noisy HSI can be degraded from multiple clean HSIs. However, current deep learning-based approaches ignore this fact and restore the clean image with deterministic map** (i.e., the network receives a noisy HSI and outputs a clean HSI). To alleviate this issue, this paper proposes a flow-based HSI denoising network (HIDFlowNet) to directly learn the conditional distribution of the clean HSI given the noisy HSI and thus diverse clean HSIs can be sampled from the conditional distribution. Overall, our HIDFlowNet is induced from the flow methodology and contains an invertible decoder and a conditional encoder, which can fully decouple the learning of low-frequency and high-frequency information of HSI. Specifically, the invertible decoder is built by staking a succession of invertible conditional blocks (ICBs) to capture the local high-frequency details since the invertible network is information-lossless. The conditional encoder utilizes down-sampling operations to obtain low-resolution images and uses transformers to capture correlations over a long distance so that global low-frequency information can be effectively extracted. Extensive experimental results on simulated and real HSI datasets verify the superiority of our proposed HIDFlowNet compared with other state-of-the-art methods both quantitatively and visually. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: 10 pages, 8 figures

arXiv:2306.17302 [pdf, other]

Robust Roadside Perception: an Automated Data Synthesis Pipeline Minimizing Human Annotation

Authors: Rusheng Zhang, Depu Meng, Lance Bassett, Shengyin Shen, Zhengxia Zou, Henry X. Liu

Abstract: Recently, advancements in vehicle-to-infrastructure communication technologies have elevated the significance of infrastructure-based roadside perception systems for cooperative driving. This paper delves into one of its most pivotal challenges: data insufficiency. The lacking of high-quality labeled roadside sensor data with high diversity leads to low robustness, and low transfer-ability of curr… ▽ More Recently, advancements in vehicle-to-infrastructure communication technologies have elevated the significance of infrastructure-based roadside perception systems for cooperative driving. This paper delves into one of its most pivotal challenges: data insufficiency. The lacking of high-quality labeled roadside sensor data with high diversity leads to low robustness, and low transfer-ability of current roadside perception systems. In this paper, a novel solution is proposed to address this problem that creates synthesized training data using Augmented Reality. A Generative Adversarial Network is then applied to enhance the reality further, that produces a photo-realistic synthesized dataset that is capable of training or fine-tuning a roadside perception detector which is robust to different weather and lighting conditions. Our approach was rigorously tested at two key intersections in Michigan, USA: the Mcity intersection and the State St./Ellsworth Rd roundabout. The Mcity intersection is located within the Mcity test field, a controlled testing environment. In contrast, the State St./Ellsworth Rd intersection is a bustling roundabout notorious for its high traffic flow and a significant number of accidents annually. Experimental results demonstrate that detectors trained solely on synthesized data exhibit commendable performance across all conditions. Furthermore, when integrated with labeled data, the synthesized data can notably bolster the performance of pre-existing detectors, especially in adverse conditions. △ Less

Submitted 8 February, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: Accepted by IEEE Transactions on Intelligent Vehicles

arXiv:2305.10925 [pdf, other]

Unsupervised Hyperspectral Pansharpening via Low-rank Diffusion Model

Authors: Xiangyu Rui, Xiangyong Cao, Li Pang, Zeyu Zhu, Zongsheng Yue, Deyu Meng

Abstract: Hyperspectral pansharpening is a process of merging a high-resolution panchromatic (PAN) image and a low-resolution hyperspectral (LRHS) image to create a single high-resolution hyperspectral (HRHS) image. Existing Bayesian-based HS pansharpening methods require designing handcraft image prior to characterize the image features, and deep learning-based HS pansharpening methods usually require a la… ▽ More Hyperspectral pansharpening is a process of merging a high-resolution panchromatic (PAN) image and a low-resolution hyperspectral (LRHS) image to create a single high-resolution hyperspectral (HRHS) image. Existing Bayesian-based HS pansharpening methods require designing handcraft image prior to characterize the image features, and deep learning-based HS pansharpening methods usually require a large number of paired training data and suffer from poor generalization ability. To address these issues, in this work, we propose a low-rank diffusion model for hyperspectral pansharpening by simultaneously leveraging the power of the pre-trained deep diffusion model and better generalization ability of Bayesian methods. Specifically, we assume that the HRHS image can be recovered from the product of two low-rank tensors, i.e., the base tensor and the coefficient matrix. The base tensor lies on the image field and has a low spectral dimension. Thus, we can conveniently utilize a pre-trained remote sensing diffusion model to capture its image structures. Additionally, we derive a simple yet quite effective way to pre-estimate the coefficient matrix from the observed LRHS image, which preserves the spectral information of the HRHS. Experimental results demonstrate that the proposed method performs better than some popular traditional approaches and gains better generalization ability than some DL-based methods. The code is released in https://github.com/xyrui/PLRDiff. △ Less

Submitted 19 November, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.07774 [pdf, other]

PanFlowNet: A Flow-Based Deep Network for Pan-sharpening

Authors: Gang Yang, Xiangyong Cao, Wenzhe Xiao, Man Zhou, Ai** Liu, Xun chen, Deyu Meng

Abstract: Pan-sharpening aims to generate a high-resolution multispectral (HRMS) image by integrating the spectral information of a low-resolution multispectral (LRMS) image with the texture details of a high-resolution panchromatic (PAN) image. It essentially inherits the ill-posed nature of the super-resolution (SR) task that diverse HRMS images can degrade into an LRMS image. However, existing deep learn… ▽ More Pan-sharpening aims to generate a high-resolution multispectral (HRMS) image by integrating the spectral information of a low-resolution multispectral (LRMS) image with the texture details of a high-resolution panchromatic (PAN) image. It essentially inherits the ill-posed nature of the super-resolution (SR) task that diverse HRMS images can degrade into an LRMS image. However, existing deep learning-based methods recover only one HRMS image from the LRMS image and PAN image using a deterministic map**, thus ignoring the diversity of the HRMS image. In this paper, to alleviate this ill-posed issue, we propose a flow-based pan-sharpening network (PanFlowNet) to directly learn the conditional distribution of HRMS image given LRMS image and PAN image instead of learning a deterministic map**. Specifically, we first transform this unknown conditional distribution into a given Gaussian distribution by an invertible network, and the conditional distribution can thus be explicitly defined. Then, we design an invertible Conditional Affine Coupling Block (CACB) and further build the architecture of PanFlowNet by stacking a series of CACBs. Finally, the PanFlowNet is trained by maximizing the log-likelihood of the conditional distribution given a training set and can then be used to predict diverse HRMS images. The experimental results verify that the proposed PanFlowNet can generate various HRMS images given an LRMS image and a PAN image. Additionally, the experimental results on different kinds of satellite datasets also demonstrate the superiority of our PanFlowNet compared with other state-of-the-art methods both visually and quantitatively. △ Less

Submitted 16 May, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

arXiv:2303.00563 [pdf, other]

ROCO: A Roundabout Traffic Conflict Dataset

Authors: Depu Meng, Owen Sayer, Rusheng Zhang, Shengyin Shen, Houqiang Li, Henry X. Liu

Abstract: Traffic conflicts have been studied by the transportation research community as a surrogate safety measure for decades. However, due to the rarity of traffic conflicts, collecting large-scale real-world traffic conflict data becomes extremely challenging. In this paper, we introduce and analyze ROCO - a real-world roundabout traffic conflict dataset. The data is collected at a two-lane roundabout… ▽ More Traffic conflicts have been studied by the transportation research community as a surrogate safety measure for decades. However, due to the rarity of traffic conflicts, collecting large-scale real-world traffic conflict data becomes extremely challenging. In this paper, we introduce and analyze ROCO - a real-world roundabout traffic conflict dataset. The data is collected at a two-lane roundabout at the intersection of State St. and W. Ellsworth Rd. in Ann Arbor, Michigan. We use raw video dataflow captured from four fisheye cameras installed at the roundabout as our input data source. We adopt a learning-based conflict identification algorithm from video to find potential traffic conflicts, and then manually label them for dataset collection and annotation. In total 557 traffic conflicts and 17 traffic crashes are collected from August 2021 to October 2021. We provide trajectory data of the traffic conflict scenes extracted using our roadside perception system. Taxonomy based on traffic conflict severity, reason for the traffic conflict, and its effect on the traffic flow is provided. With the traffic conflict data collected, we discover that failure to yield to circulating vehicles when entering the roundabout is the largest contributing reason for traffic conflicts. ROCO dataset will be made public in the short future. △ Less

Submitted 1 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: Accepted by TRBAM 2023 presentation

arXiv:2301.06132 [pdf, other]

Deep Diversity-Enhanced Feature Representation of Hyperspectral Images

Authors: **hui Hou, Zhiyu Zhu, Junhui Hou, Hui Liu, Huanqiang Zeng, Deyu Meng

Abstract: In this paper, we study the problem of efficiently and effectively embedding the high-dimensional spatio-spectral information of hyperspectral (HS) images, guided by feature diversity. Specifically, based on the theoretical formulation that feature diversity is correlated with the rank of the unfolded kernel matrix, we rectify 3D convolution by modifying its topology to enhance the rank upper-boun… ▽ More In this paper, we study the problem of efficiently and effectively embedding the high-dimensional spatio-spectral information of hyperspectral (HS) images, guided by feature diversity. Specifically, based on the theoretical formulation that feature diversity is correlated with the rank of the unfolded kernel matrix, we rectify 3D convolution by modifying its topology to enhance the rank upper-bound. This modification yields a rank-enhanced spatial-spectral symmetrical convolution set (ReS$^3$-ConvSet), which not only learns diverse and powerful feature representations but also saves network parameters. Additionally, we also propose a novel diversity-aware regularization (DA-Reg) term that directly acts on the feature maps to maximize independence among elements. To demonstrate the superiority of the proposed ReS$^3$-ConvSet and DA-Reg, we apply them to various HS image processing and analysis tasks, including denoising, spatial super-resolution, and classification. Extensive experiments show that the proposed approaches outperform state-of-the-art methods both quantitatively and qualitatively to a significant extent. The code is publicly available at https://github.com/**nh/ReSSS-ConvSet. △ Less

Submitted 9 May, 2024; v1 submitted 15 January, 2023; originally announced January 2023.

Comments: 17 pages, 12 figures. Accepted in TPAMI 2024. arXiv admin note: substantial text overlap with arXiv:2207.04266

arXiv:2301.06081 [pdf, other]

A Hyper-weight Network for Hyperspectral Image Denoising

Authors: Xiangyu Rui, Xiangyong Cao, Jun Shu, Qian Zhao, Deyu Meng

Abstract: In the hyperspectral image (HSI) denoising task, the real noise embedded in the HSI is always complex and diverse so that many model-based HSI denoising methods only perform well on some specific noisy HSIs. To enhance the noise adaptation capability of current methods, we first resort to the weighted HSI denoising model since its weight is capable of characterizing the noise in different position… ▽ More In the hyperspectral image (HSI) denoising task, the real noise embedded in the HSI is always complex and diverse so that many model-based HSI denoising methods only perform well on some specific noisy HSIs. To enhance the noise adaptation capability of current methods, we first resort to the weighted HSI denoising model since its weight is capable of characterizing the noise in different positions of the image. However, the weight in these weighted models is always determined by an empirical updating formula, which does not fully utilize the noise information contained in noisy images and thus limits their performance improvement. In this work, we propose an automatic weighting scheme to alleviate this issue. Specifically, the weight in the weighted model is predicted by a hyper-weight network (i.e., HWnet), which can be learned in a bi-level optimization framework based on the data-driven methodology. The learned HWnet can be explicitly plugged into other weighted denoising models, and help adjust weights for different noisy HSIs and different weighted models. Extensive experiments verify that the proposed HWnet can help improve the generalization ability of a weighted model to adapt to more complex noise, and can also strengthen the weighted model by transferring the knowledge from another weighted model. Additionally, to explain the experimental results, we also theoretically prove the training error and generalization error upper bound of the proposed HWnet, which should be the first generalization error analysis in the low-level vision field as far as we know. △ Less

Submitted 8 December, 2022; originally announced January 2023.

Comments: 16 pages

arXiv:2301.00815 [pdf, other]

NeuroExplainer: Fine-Grained Attention Decoding to Uncover Cortical Development Patterns of Preterm Infants

Authors: Chenyu Xue, Fan Wang, Yuanzhuo Zhu, Hui Li, Deyu Meng, Dinggang Shen, Chunfeng Lian

Abstract: Deploying reliable deep learning techniques in interdisciplinary applications needs learned models to output accurate and (even more importantly) explainable predictions. Existing approaches typically explicate network outputs in a post-hoc fashion, under an implicit assumption that faithful explanations come from accurate predictions/classifications. We have an opposite claim that explanations bo… ▽ More Deploying reliable deep learning techniques in interdisciplinary applications needs learned models to output accurate and (even more importantly) explainable predictions. Existing approaches typically explicate network outputs in a post-hoc fashion, under an implicit assumption that faithful explanations come from accurate predictions/classifications. We have an opposite claim that explanations boost (or even determine) classification. That is, end-to-end learning of explanation factors to augment discriminative representation extraction could be a more intuitive strategy to inversely assure fine-grained explainability, e.g., in those neuroimaging and neuroscience studies with high-dimensional data containing noisy, redundant, and task-irrelevant information. In this paper, we propose such an explainable geometric deep network dubbed as NeuroExplainer, with applications to uncover altered infant cortical development patterns associated with preterm birth. Given fundamental cortical attributes as network input, our NeuroExplainer adopts a hierarchical attention-decoding framework to learn fine-grained attentions and respective discriminative representations to accurately recognize preterm infants from term-born infants at term-equivalent age. NeuroExplainer learns the hierarchical attention-decoding modules under subject-level weak supervision coupled with targeted regularizers deduced from domain knowledge regarding brain development. These prior-guided constraints implicitly maximizes the explainability metrics (i.e., fidelity, sparsity, and stability) in network training, driving the learned network to output detailed explanations and accurate classifications. Experimental results on the public dHCP benchmark suggest that NeuroExplainer led to quantitatively reliable explanation results that are qualitatively consistent with representative neuroimaging studies. △ Less

Submitted 25 May, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

Comments: 12 page 4 fig and 2 table

arXiv:2212.13523 [pdf, other]

doi 10.1109/TGRS.2023.3268554

S2S-WTV: Seismic Data Noise Attenuation Using Weighted Total Variation Regularized Self-Supervised Learning

Authors: Zitai Xu, Yisi Luo, Bangyu Wu, Deyu Meng

Abstract: Seismic data often undergoes severe noise due to environmental factors, which seriously affects subsequent applications. Traditional hand-crafted denoisers such as filters and regularizations utilize interpretable domain knowledge to design generalizable denoising techniques, while their representation capacities may be inferior to deep learning denoisers, which can learn complex and representativ… ▽ More Seismic data often undergoes severe noise due to environmental factors, which seriously affects subsequent applications. Traditional hand-crafted denoisers such as filters and regularizations utilize interpretable domain knowledge to design generalizable denoising techniques, while their representation capacities may be inferior to deep learning denoisers, which can learn complex and representative denoising map**s from abundant training pairs. However, due to the scarcity of high-quality training pairs, deep learning denoisers may sustain some generalization issues over various scenarios. In this work, we propose a self-supervised method that combines the capacities of deep denoiser and the generalization abilities of hand-crafted regularization for seismic data random noise attenuation. Specifically, we leverage the Self2Self (S2S) learning framework with a trace-wise masking strategy for seismic data denoising by solely using the observed noisy data. Parallelly, we suggest the weighted total variation (WTV) to further capture the horizontal local smooth structure of seismic data. Our method, dubbed as S2S-WTV, enjoys both high representation abilities brought from the self-supervised deep network and good generalization abilities of the hand-crafted WTV regularizer and the self-supervised nature. Therefore, our method can more effectively and stably remove the random noise and preserve the details and edges of the clean signal. To tackle the S2S-WTV optimization model, we introduce an alternating direction multiplier method (ADMM)-based algorithm. Extensive experiments on synthetic and field noisy seismic data demonstrate the effectiveness of our method as compared with state-of-the-art traditional and deep learning-based seismic data denoising methods. △ Less

Submitted 27 December, 2022; originally announced December 2022.

Journal ref: TGRS 2023

arXiv:2212.13166 [pdf, other]

doi 10.1007/978-3-031-16446-0_63

Orientation-Shared Convolution Representation for CT Metal Artifact Learning

Authors: Hong Wang, Qi Xie, Yuexiang Li, Yawen Huang, Deyu Meng, Yefeng Zheng

Abstract: During X-ray computed tomography (CT) scanning, metallic implants carrying with patients often lead to adverse artifacts in the captured CT images and then impair the clinical treatment. Against this metal artifact reduction (MAR) task, the existing deep-learning-based methods have gained promising reconstruction performance. Nevertheless, there is still some room for further improvement of MAR pe… ▽ More During X-ray computed tomography (CT) scanning, metallic implants carrying with patients often lead to adverse artifacts in the captured CT images and then impair the clinical treatment. Against this metal artifact reduction (MAR) task, the existing deep-learning-based methods have gained promising reconstruction performance. Nevertheless, there is still some room for further improvement of MAR performance and generalization ability, since some important prior knowledge underlying this specific task has not been fully exploited. Hereby, in this paper, we carefully analyze the characteristics of metal artifacts and propose an orientation-shared convolution representation strategy to adapt the physical prior structures of artifacts, i.e., rotationally symmetrical streaking patterns. The proposed method rationally adopts Fourier-series-expansion-based filter parametrization in artifact modeling, which can better separate artifacts from anatomical tissues and boost the model generalizability. Comprehensive experiments executed on synthesized and clinical datasets show the superiority of our method in detail preservation beyond the current representative MAR methods. Code will be available at \url{https://github.com/hongwang01/OSCNet} △ Less

Submitted 26 December, 2022; originally announced December 2022.

Journal ref: MICCAI 2022

arXiv:2212.00262 [pdf, other]

Low-Rank Tensor Function Representation for Multi-Dimensional Data Recovery

Authors: Yisi Luo, Xile Zhao, Zhemin Li, Michael K. Ng, Deyu Meng

Abstract: Since higher-order tensors are naturally suitable for representing multi-dimensional data in real-world, e.g., color images and videos, low-rank tensor representation has become one of the emerging areas in machine learning and computer vision. However, classical low-rank tensor representations can only represent data on finite meshgrid due to their intrinsical discrete nature, which hinders their… ▽ More Since higher-order tensors are naturally suitable for representing multi-dimensional data in real-world, e.g., color images and videos, low-rank tensor representation has become one of the emerging areas in machine learning and computer vision. However, classical low-rank tensor representations can only represent data on finite meshgrid due to their intrinsical discrete nature, which hinders their potential applicability in many scenarios beyond meshgrid. To break this barrier, we propose a low-rank tensor function representation (LRTFR), which can continuously represent data beyond meshgrid with infinite resolution. Specifically, the suggested tensor function, which maps an arbitrary coordinate to the corresponding value, can continuously represent data in an infinite real space. Parallel to discrete tensors, we develop two fundamental concepts for tensor functions, i.e., the tensor function rank and low-rank tensor function factorization. We theoretically justify that both low-rank and smooth regularizations are harmoniously unified in the LRTFR, which leads to high effectiveness and efficiency for data continuous representation. Extensive multi-dimensional data recovery applications arising from image processing (image inpainting and denoising), machine learning (hyperparameter optimization), and computer graphics (point cloud upsampling) substantiate the superiority and versatility of our method as compared with state-of-the-art methods. Especially, the experiments beyond the original meshgrid resolution (hyperparameter optimization) or even beyond meshgrid (point cloud upsampling) validate the favorable performances of our method for continuous representation. △ Less

Submitted 30 November, 2022; originally announced December 2022.

arXiv:2211.01825 [pdf, other]

doi 10.1109/TGRS.2022.3229012

Fast Noise Removal in Hyperspectral Images via Representative Coefficient Total Variation

Authors: Jiangjun Peng, Hailin Wang, Xiangyong Cao, Xinlin Liu, Xiangyu Rui, Deyu Meng

Abstract: Mining structural priors in data is a widely recognized technique for hyperspectral image (HSI) denoising tasks, whose typical ways include model-based methods and data-based methods. The model-based methods have good generalization ability, while the runtime cannot meet the fast processing requirements of the practical situations due to the large size of an HSI data… ▽ More Mining structural priors in data is a widely recognized technique for hyperspectral image (HSI) denoising tasks, whose typical ways include model-based methods and data-based methods. The model-based methods have good generalization ability, while the runtime cannot meet the fast processing requirements of the practical situations due to the large size of an HSI data $ \mathbf{X} \in \mathbb{R}^{MN\times B}$. For the data-based methods, they perform very fast on new test data once they have been trained. However, their generalization ability is always insufficient. In this paper, we propose a fast model-based HSI denoising approach. Specifically, we propose a novel regularizer named Representative Coefficient Total Variation (RCTV) to simultaneously characterize the low rank and local smooth properties. The RCTV regularizer is proposed based on the observation that the representative coefficient matrix $\mathbf{U}\in\mathbb{R}^{MN\times R} (R\ll B)$ obtained by orthogonally transforming the original HSI $\mathbf{X}$ can inherit the strong local-smooth prior of $\mathbf{X}$. Since $R/B$ is very small, the HSI denoising model based on the RCTV regularizer has lower time complexity. Additionally, we find that the representative coefficient matrix $\mathbf{U}$ is robust to noise, and thus the RCTV regularizer can somewhat promote the robustness of the HSI denoising model. Extensive experiments on mixed noise removal demonstrate the superiority of the proposed method both in denoising performance and denoising speed compared with other state-of-the-art methods. Remarkably, the denoising speed of our proposed method outperforms all the model-based techniques and is comparable with the deep learning-based approaches. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: 16 pages, 18 figures, 5 tables, 1 theorem

arXiv:2209.15340 [pdf, other]

A Learnable Optimization and Regularization Approach to Massive MIMO CSI Feedback

Authors: Zhengyang Hu, Guanzhang Liu, Qi Xie, Jiang Xue, Deyu Meng, Deniz Gunduz

Abstract: Channel state information (CSI) plays a critical role in achieving the potential benefits of massive multiple input multiple output (MIMO) systems. In frequency division duplex (FDD) massive MIMO systems, the base station (BS) relies on sustained and accurate CSI feedback from the users. However, due to the large number of antennas and users being served in massive MIMO systems, feedback overhead… ▽ More Channel state information (CSI) plays a critical role in achieving the potential benefits of massive multiple input multiple output (MIMO) systems. In frequency division duplex (FDD) massive MIMO systems, the base station (BS) relies on sustained and accurate CSI feedback from the users. However, due to the large number of antennas and users being served in massive MIMO systems, feedback overhead can become a bottleneck. In this paper, we propose a model-driven deep learning method for CSI feedback, called learnable optimization and regularization algorithm (LORA). Instead of using l1-norm as the regularization term, a learnable regularization module is introduced in LORA to automatically adapt to the characteristics of CSI. We unfold the conventional iterative shrinkage-thresholding algorithm (ISTA) to a neural network and learn both the optimization process and regularization term by end-toend training. We show that LORA improves the CSI feedback accuracy and speed. Besides, a novel learnable quantization method and the corresponding training scheme are proposed, and it is shown that LORA can operate successfully at different bit rates, providing flexibility in terms of the CSI feedback overhead. Various realistic scenarios are considered to demonstrate the effectiveness and robustness of LORA through numerical simulations. △ Less

Submitted 30 September, 2022; originally announced September 2022.

arXiv:2209.10305 [pdf, other]

KXNet: A Model-Driven Deep Neural Network for Blind Super-Resolution

Authors: Jiahong Fu, Hong Wang, Qi Xie, Qian Zhao, Deyu Meng, Zongben Xu

Abstract: Although current deep learning-based methods have gained promising performance in the blind single image super-resolution (SISR) task, most of them mainly focus on heuristically constructing diverse network architectures and put less emphasis on the explicit embedding of the physical generation mechanism between blur kernels and high-resolution (HR) images. To alleviate this issue, we propose a mo… ▽ More Although current deep learning-based methods have gained promising performance in the blind single image super-resolution (SISR) task, most of them mainly focus on heuristically constructing diverse network architectures and put less emphasis on the explicit embedding of the physical generation mechanism between blur kernels and high-resolution (HR) images. To alleviate this issue, we propose a model-driven deep neural network, called KXNet, for blind SISR. Specifically, to solve the classical SISR model, we propose a simple-yet-effective iterative algorithm. Then by unfolding the involved iterative steps into the corresponding network module, we naturally construct the KXNet. The main specificity of the proposed KXNet is that the entire learning process is fully and explicitly integrated with the inherent physical mechanism underlying this SISR task. Thus, the learned blur kernel has clear physical patterns and the mutually iterative process between blur kernel and HR image can soundly guide the KXNet to be evolved in the right direction. Extensive experiments on synthetic and real data finely demonstrate the superior accuracy and generality of our method beyond the current representative state-of-the-art blind SISR methods. Code is available at: https://github.com/jiahong-fu/KXNet. △ Less

Submitted 22 September, 2022; v1 submitted 21 September, 2022; originally announced September 2022.

Comments: Accepted by ECCV2022

arXiv:2208.04481 [pdf, other]

doi 10.1109/LGRS.2022.3198088

Synthetic Aperture Radar Image Change Detection via Layer Attention-Based Noise-Tolerant Network

Authors: Desen Meng, Feng Gao, Junyu Dong, Qian Du, Heng-Chao Li

Abstract: Recently, change detection methods for synthetic aperture radar (SAR) images based on convolutional neural networks (CNN) have gained increasing research attention. However, existing CNN-based methods neglect the interactions among multilayer convolutions, and errors involved in the preclassification restrict the network optimization. To this end, we proposed a layer attention-based noise-tolerant… ▽ More Recently, change detection methods for synthetic aperture radar (SAR) images based on convolutional neural networks (CNN) have gained increasing research attention. However, existing CNN-based methods neglect the interactions among multilayer convolutions, and errors involved in the preclassification restrict the network optimization. To this end, we proposed a layer attention-based noise-tolerant network, termed LANTNet. In particular, we design a layer attention module that adaptively weights the feature of different convolution layers. In addition, we design a noise-tolerant loss function that effectively suppresses the impact of noisy labels. Therefore, the model is insensitive to noisy labels in the preclassification results. The experimental results on three SAR datasets show that the proposed LANTNet performs better compared to several state-of-the-art methods. The source codes are available at https://github.com/summitgao/LANTNet △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: Accepted by IEEE Geoscience and Remote Sensing Letters (GRSL) 2022, code is available at https://github.com/summitgao/LANTNet

arXiv:2205.07471 [pdf, other]

Adaptive Convolutional Dictionary Network for CT Metal Artifact Reduction

Authors: Hong Wang, Yuexiang Li, Deyu Meng, Yefeng Zheng

Abstract: Inspired by the great success of deep neural networks, learning-based methods have gained promising performances for metal artifact reduction (MAR) in computed tomography (CT) images. However, most of the existing approaches put less emphasis on modelling and embedding the intrinsic prior knowledge underlying this specific MAR task into their network designs. Against this issue, we propose an adap… ▽ More Inspired by the great success of deep neural networks, learning-based methods have gained promising performances for metal artifact reduction (MAR) in computed tomography (CT) images. However, most of the existing approaches put less emphasis on modelling and embedding the intrinsic prior knowledge underlying this specific MAR task into their network designs. Against this issue, we propose an adaptive convolutional dictionary network (ACDNet), which leverages both model-based and learning-based methods. Specifically, we explore the prior structures of metal artifacts, e.g., non-local repetitive streaking patterns, and encode them as an explicit weighted convolutional dictionary model. Then, a simple-yet-effective algorithm is carefully designed to solve the model. By unfolding every iterative substep of the proposed algorithm into a network module, we explicitly embed the prior structure into a deep network, \emph{i.e.,} a clear interpretability for the MAR task. Furthermore, our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image based on its content. Hence, our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods. Comprehensive experiments executed on synthetic and clinical datasets show the superiority of our ACDNet in terms of effectiveness and model generalization. {\color{blue}{\textit{Code is available at {\url{https://github.com/hongwang01/ACDNet}.}}}} △ Less

Submitted 16 June, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

Comments: https://github.com/hongwang01/ACDNet

Journal ref: the 31st International Joint Conference on Artificial Intelligence 2022

arXiv:2205.03742 [pdf, other]

Decoupled-and-Coupled Networks: Self-Supervised Hyperspectral Image Super-Resolution with Subpixel Fusion

Authors: Danfeng Hong, **g Yao, Deyu Meng, Naoto Yokoya, Jocelyn Chanussot

Abstract: Enormous efforts have been recently made to super-resolve hyperspectral (HS) images with the aid of high spatial resolution multispectral (MS) images. Most prior works usually perform the fusion task by means of multifarious pixel-level priors. Yet the intrinsic effects of a large distribution gap between HS-MS data due to differences in the spatial and spectral resolution are less investigated. T… ▽ More Enormous efforts have been recently made to super-resolve hyperspectral (HS) images with the aid of high spatial resolution multispectral (MS) images. Most prior works usually perform the fusion task by means of multifarious pixel-level priors. Yet the intrinsic effects of a large distribution gap between HS-MS data due to differences in the spatial and spectral resolution are less investigated. The gap might be caused by unknown sensor-specific properties or highly-mixed spectral information within one pixel (due to low spatial resolution). To this end, we propose a subpixel-level HS super-resolution framework by devising a novel decoupled-and-coupled network, called DC-Net, to progressively fuse HS-MS information from the pixel- to subpixel-level, from the image- to feature-level. As the name suggests, DC-Net first decouples the input into common (or cross-sensor) and sensor-specific components to eliminate the gap between HS-MS images before further fusion, and then fully blends them by a model-guided coupled spectral unmixing (CSU) net. More significantly, we append a self-supervised learning module behind the CSU net by guaranteeing the material consistency to enhance the detailed appearances of the restored HS product. Extensive experimental results show the superiority of our method both visually and quantitatively and achieve a significant improvement in comparison with the state-of-the-arts. Furthermore, the codes and datasets will be available at https://sites.google.com/view/danfeng-hong for the sake of reproducibility. △ Less

Submitted 7 May, 2022; originally announced May 2022.

arXiv:2204.08797 [pdf, other]

doi 10.1109/TMI.2021.3124217

Two-Stream Graph Convolutional Network for Intra-oral Scanner Image Segmentation

Authors: Yue Zhao, Lingming Zhang, Yang Liu, Deyu Meng, Zhiming Cui, Chenqiang Gao, Xinbo Gao, Chunfeng Lian, Dinggang Shen

Abstract: Precise segmentation of teeth from intra-oral scanner images is an essential task in computer-aided orthodontic surgical planning. The state-of-the-art deep learning-based methods often simply concatenate the raw geometric attributes (i.e., coordinates and normal vectors) of mesh cells to train a single-stream network for automatic intra-oral scanner image segmentation. However, since different ra… ▽ More Precise segmentation of teeth from intra-oral scanner images is an essential task in computer-aided orthodontic surgical planning. The state-of-the-art deep learning-based methods often simply concatenate the raw geometric attributes (i.e., coordinates and normal vectors) of mesh cells to train a single-stream network for automatic intra-oral scanner image segmentation. However, since different raw attributes reveal completely different geometric information, the naive concatenation of different raw attributes at the (low-level) input stage may bring unnecessary confusion in describing and differentiating between mesh cells, thus hampering the learning of high-level geometric representations for the segmentation task. To address this issue, we design a two-stream graph convolutional network (i.e., TSGCN), which can effectively handle inter-view confusion between different raw attributes to more effectively fuse their complementary information and learn discriminative multi-view geometric representations. Specifically, our TSGCN adopts two input-specific graph-learning streams to extract complementary high-level geometric representations from coordinates and normal vectors, respectively. Then, these single-view representations are further fused by a self-attention module to adaptively balance the contributions of different views in learning more discriminative multi-view representations for accurate and fully automatic tooth segmentation. We have evaluated our TSGCN on a real-patient dataset of dental (mesh) models acquired by 3D intraoral scanners. Experimental results show that our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation. Github: https://github.com/ZhangLingMing1/TSGCNet. △ Less

Submitted 19 April, 2022; originally announced April 2022.

Comments: 11 pages, 6 figures. arXiv admin note: text overlap with arXiv:2012.13697

Journal ref: IEEE Transactions on Medical Images, 41(4): 826-835, 2022

arXiv:2203.16396 [pdf]

doi 10.1109/TCNS.2022.3226951

Global Attitude Synchronization of Networked Rigid Bodies Under Directed Topologies

Authors: Fan Zhang, Deyuan Meng, **gyao Zhang

Abstract: The global attitude synchronization problem is studied for networked rigid bodies under directed topologies. To avoid the asynchronous pitfall where only vector parts converge to some identical value but scalar parts do not, multiplicative quaternion errors are leveraged to develop attitude synchronization protocols for rigid bodies with the absolute measurements. It is shown that global synchroni… ▽ More The global attitude synchronization problem is studied for networked rigid bodies under directed topologies. To avoid the asynchronous pitfall where only vector parts converge to some identical value but scalar parts do not, multiplicative quaternion errors are leveraged to develop attitude synchronization protocols for rigid bodies with the absolute measurements. It is shown that global synchronization of networked rigid bodies can be achieved if and only if the directed topology is quasi-strongly connected. Simultaneously, a novel double-energy-function analysis method, equipped with an ordering permutation technique about scalar parts and a coordinate transformation mechanism, is constructed for the quaternion behavior analysis of networked rigid bodies. In particular, global synchronization is achieved with our analysis method regardless of the highly nonlinear and strongly coupling problems resulting from multiplicative quaternion errors, which seriously hinder the traditional analysis of global synchronization for networked rigid bodies. Simulations for networked spacecraft are presented to show the global synchronization performances under different directed topologies. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: 26 pages, 6 figures

Journal ref: IEEE Transactions on Control of Network Systems, vol. 10, no. 3, pp. 1362-1373, Sept. 2023

arXiv:2203.10497 [pdf, ps, other]

Fundamental Trackability Problems for Iterative Learning Control

Authors: Deyuan Meng, **gyao Zhang

Abstract: Generally, the classic iterative learning control (ILC) methods focus on finding design conditions for repetitive systems to achieve the perfect tracking of any specified trajectory, whereas they ignore a fundamental problem of ILC: whether the specified trajectory is trackable, or equivalently, whether there exist some inputs for the repetitive systems under consideration to generate the specifie… ▽ More Generally, the classic iterative learning control (ILC) methods focus on finding design conditions for repetitive systems to achieve the perfect tracking of any specified trajectory, whereas they ignore a fundamental problem of ILC: whether the specified trajectory is trackable, or equivalently, whether there exist some inputs for the repetitive systems under consideration to generate the specified trajectory? The current paper contributes to dealing with this problem. Not only is a concept of trackability introduced formally for any specified trajectory in ILC, but also some related trackability criteria are established. Further, the relation between the trackability and the perfect tracking tasks for ILC is bridged, based on which a new convergence analysis approach is developed for ILC by leveraging properties of a functional Cauchy sequence (FCS). Simulation examples are given to verify the effectiveness of the presented trackability criteria and FCS-induced convergence analysis method for ILC. △ Less

Submitted 20 March, 2022; originally announced March 2022.

Comments: Submitted

arXiv:2203.04307 [pdf, other]

A Machine Learning Approach to Digital Contact Tracing: TC4TL Challenge

Authors: Badrinath Singhal, Chris Vorster, Di Meng, Gargi Gupta, Laura Dunne, Mark Germaine

Abstract: Contact tracing is a method used by public health organisations to try prevent the spread of infectious diseases in the community. Traditionally performed by manual contact tracers, more recently the use of apps have been considered utilising phone sensor data to determine the distance between two phones. In this paper, we investigate the development of machine learning approaches to determine the… ▽ More Contact tracing is a method used by public health organisations to try prevent the spread of infectious diseases in the community. Traditionally performed by manual contact tracers, more recently the use of apps have been considered utilising phone sensor data to determine the distance between two phones. In this paper, we investigate the development of machine learning approaches to determine the distance between two mobile phone devices using Bluetooth Low Energy, sensory data and meta data. We use TableNet architecture and feature engineering to improve on the existing state of the art (total nDCF 0.21 vs 2.08), significantly outperforming existing models. △ Less

Submitted 8 March, 2022; originally announced March 2022.

arXiv:2202.05972 [pdf, other]

Low-light Image Enhancement by Retinex Based Algorithm Unrolling and Adjustment

Authors: Xinyi Liu, Qi Xie, Qian Zhao, Hong Wang, Deyu Meng

Abstract: Motivated by their recent advances, deep learning techniques have been widely applied to low-light image enhancement (LIE) problem. Among which, Retinex theory based ones, mostly following a decomposition-adjustment pipeline, have taken an important place due to its physical interpretation and promising performance. However, current investigations on Retinex based deep learning are still not suffi… ▽ More Motivated by their recent advances, deep learning techniques have been widely applied to low-light image enhancement (LIE) problem. Among which, Retinex theory based ones, mostly following a decomposition-adjustment pipeline, have taken an important place due to its physical interpretation and promising performance. However, current investigations on Retinex based deep learning are still not sufficient, ignoring many useful experiences from traditional methods. Besides, the adjustment step is either performed with simple image processing techniques, or by complicated networks, both of which are unsatisfactory in practice. To address these issues, we propose a new deep learning framework for the LIE problem. The proposed framework contains a decomposition network inspired by algorithm unrolling, and adjustment networks considering both global brightness and local brightness sensitivity. By virtue of algorithm unrolling, both implicit priors learned from data and explicit priors borrowed from traditional methods can be embedded in the network, facilitate to better decomposition. Meanwhile, the consideration of global and local brightness can guide designing simple yet effective network modules for adjustment. Besides, to avoid manually parameter tuning, we also propose a self-supervised fine-tuning strategy, which can always guarantee a promising performance. Experiments on a series of typical LIE datasets demonstrated the effectiveness of the proposed method, both quantitatively and visually, as compared with existing methods. △ Less

Submitted 15 February, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

arXiv:2112.12660 [pdf, other]

doi 10.1016/j.media.2022.102729

InDuDoNet+: A Deep Unfolding Dual Domain Network for Metal Artifact Reduction in CT Images

Authors: Hong Wang, Yuexiang Li, Haimiao Zhang, Deyu Meng, Yefeng Zheng

Abstract: During the computed tomography (CT) imaging process, metallic implants within patients often cause harmful artifacts, which adversely degrade the visual quality of reconstructed CT images and negatively affect the subsequent clinical diagnosis. For the metal artifact reduction (MAR) task, current deep learning based methods have achieved promising performance. However, most of them share two main… ▽ More During the computed tomography (CT) imaging process, metallic implants within patients often cause harmful artifacts, which adversely degrade the visual quality of reconstructed CT images and negatively affect the subsequent clinical diagnosis. For the metal artifact reduction (MAR) task, current deep learning based methods have achieved promising performance. However, most of them share two main common limitations: 1) the CT physical imaging geometry constraint is not comprehensively incorporated into deep network structures; 2) the entire framework has weak interpretability for the specific MAR task; hence, the role of each network module is difficult to be evaluated. To alleviate these issues, in the paper, we construct a novel deep unfolding dual domain network, termed InDuDoNet+, into which CT imaging process is finely embedded. Concretely, we derive a joint spatial and Radon domain reconstruction model and propose an optimization algorithm with only simple operators for solving it. By unfolding the iterative steps involved in the proposed algorithm into the corresponding network modules, we easily build the InDuDoNet+ with clear interpretability. Furthermore, we analyze the CT values among different tissues, and merge the prior observations into a prior network for our InDuDoNet+, which significantly improve its generalization performance.Comprehensive experiments on synthesized data and clinical data substantiate the superiority of the proposed methods as well as the superior generalization performance beyond the current state-of-the-art (SOTA) MAR methods. . Code is available at \url{https://github.com/hongwang01/InDuDoNet_plus}. △ Less

Submitted 26 December, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

Journal ref: Medical Image Analysis 2022

arXiv:2110.12177 [pdf, other]

Vertebrae localization, segmentation and identification using a graph optimization and an anatomic consistency cycle

Authors: Di Meng, Edmond Boyer, Sergi Pujades

Abstract: Vertebrae localization, segmentation and identification in CT images is key to numerous clinical applications. While deep learning strategies have brought to this field significant improvements over recent years, transitional and pathological vertebrae are still plaguing most existing approaches as a consequence of their poor representation in training datasets. Alternatively, proposed non-learnin… ▽ More Vertebrae localization, segmentation and identification in CT images is key to numerous clinical applications. While deep learning strategies have brought to this field significant improvements over recent years, transitional and pathological vertebrae are still plaguing most existing approaches as a consequence of their poor representation in training datasets. Alternatively, proposed non-learning based methods take benefit of prior knowledge to handle such particular cases. In this work we propose to combine both strategies. To this purpose we introduce an iterative cycle in which individual vertebrae are recursively localized, segmented and identified using deep-networks, while anatomic consistency is enforced using statistical priors. In this strategy, the transitional vertebrae identification is handled by encoding their configurations in a graphical model that aggregates local deep-network predictions into an anatomically consistent final result. Our approach achieves state-of-the-art results on the VerSe20 challenge benchmark, and outperforms all methods on transitional vertebrae as well as the generalization to the VerSe19 challenge benchmark. Furthermore, our method can detect and report inconsistent spine regions that do not satisfy the anatomic consistency priors. Our code and model are openly available for research purposes. △ Less

Submitted 24 June, 2022; v1 submitted 23 October, 2021; originally announced October 2021.

arXiv:2110.01203 [pdf, ps, other]

From Control to Mathematics-Part II: Observability-Based Design for Iterative Methods in Solving Linear Equations

Authors: Deyuan Meng

Abstract: The control approaches generally resort to the tools from the mathematics, but whether and how the mathematics can benefit from the control approaches is unclear. This paper aims to bring the "control design" idea into the mathematics by providing an observer-based iterative method that focuses on solving linear algebraic equations (LAEs). An inherent relationship is revealed between the problem-s… ▽ More The control approaches generally resort to the tools from the mathematics, but whether and how the mathematics can benefit from the control approaches is unclear. This paper aims to bring the "control design" idea into the mathematics by providing an observer-based iterative method that focuses on solving linear algebraic equations (LAEs). An inherent relationship is revealed between the problem-solving of LAEs and the design of observer-based control systems, with which the iterative method for solving LAEs is exploited based on the design of the basic state observers. It is shown that all (least squares) solutions for any (un)solvable LAEs can be determined exponentially fast or monotonically with different selections of initial conditions. By integrating the design idea of the deadbeat control, the solving of LAEs can be achieved within only finite iterations. In particular, our proposed iterative method can be leveraged to develop a new observer-based design algorithm to realize the perfect tracking objective of conventional two-dimensional iterative learning control (ILC) systems, where the gap between classical ILC design and popular feedback-based control design is narrowed. △ Less

Submitted 4 October, 2021; originally announced October 2021.

Comments: submitted

arXiv:2109.05298 [pdf, other]

InDuDoNet: An Interpretable Dual Domain Network for CT Metal Artifact Reduction

Authors: Hong Wang, Yuexiang Li, Haimiao Zhang, Jiawei Chen, Kai Ma, Deyu Meng, Yefeng Zheng

Abstract: For the task of metal artifact reduction (MAR), although deep learning (DL)-based methods have achieved promising performances, most of them suffer from two problems: 1) the CT imaging geometry constraint is not fully embedded into the network during training, leaving room for further performance improvement; 2) the model interpretability is lack of sufficient consideration. Against these issues,… ▽ More For the task of metal artifact reduction (MAR), although deep learning (DL)-based methods have achieved promising performances, most of them suffer from two problems: 1) the CT imaging geometry constraint is not fully embedded into the network during training, leaving room for further performance improvement; 2) the model interpretability is lack of sufficient consideration. Against these issues, we propose a novel interpretable dual domain network, termed as InDuDoNet, which combines the advantages of model-driven and data-driven methodologies. Specifically, we build a joint spatial and Radon domain reconstruction model and utilize the proximal gradient technique to design an iterative algorithm for solving it. The optimization algorithm only consists of simple computational operators, which facilitate us to correspondingly unfold iterative steps into network modules and thus improve the interpretablility of the framework. Extensive experiments on synthesized and clinical data show the superiority of our InDuDoNet. Code is available in \url{https://github.com/hongwang01/InDuDoNet}.%method on the tasks of MAR and downstream multi-class pelvic fracture segmentation. △ Less

Submitted 11 September, 2021; originally announced September 2021.

Journal ref: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2021

arXiv:2108.06054 [pdf, other]

Local Patch Network with Global Attention for Infrared Small Target Detection

Authors: Fang Chen, Chenqiang Gao, Fangcen Liu, Yue Zhao, Yuxi Zhou, Deyu Meng, Wangmeng Zuo

Abstract: Infrared small target detection plays an important role in the infrared search and tracking applications. In recent years, deep learning techniques were introduced to this task and achieved noteworthy effects. Following general object segmentation methods, existing deep learning methods usually processed the image from the global view. However, the imaging locality of small targets and extreme cla… ▽ More Infrared small target detection plays an important role in the infrared search and tracking applications. In recent years, deep learning techniques were introduced to this task and achieved noteworthy effects. Following general object segmentation methods, existing deep learning methods usually processed the image from the global view. However, the imaging locality of small targets and extreme class-imbalance between the target and background pixels were not well-considered by these deep learning methods, which causes the low-efficiency on training and high-dependence on numerous data. A local patch network (LPNet) with global attention is proposed in this paper to detect small targets by jointly considering the global and local properties of infrared small target images. From the global view, a supervised attention module trained by the small target spread map is proposed to suppress most background pixels irrelevant with small target features. From the local view, local patches are split from global features and share the same convolution weights with each other in a patch net. By leveraging both the global and local properties, the data-driven framework proposed in this paper has fused multi-scale features for small target detection. Extensive synthetic and real data experiments show that the proposed method achieves the state-of-the-art performance compared with existing both conventional and deep learning methods. △ Less

Submitted 29 September, 2021; v1 submitted 13 August, 2021; originally announced August 2021.

Comments: 11 pages, 7 figures

arXiv:2107.06808 [pdf, other]

RCDNet: An Interpretable Rain Convolutional Dictionary Network for Single Image Deraining

Authors: Hong Wang, Qi Xie, Qian Zhao, Yuexiang Li, Yong Liang, Yefeng Zheng, Deyu Meng

Abstract: As a common weather, rain streaks adversely degrade the image quality. Hence, removing rains from an image has become an important issue in the field. To handle such an ill-posed single image deraining task, in this paper, we specifically build a novel deep architecture, called rain convolutional dictionary network (RCDNet), which embeds the intrinsic priors of rain streaks and has clear interpret… ▽ More As a common weather, rain streaks adversely degrade the image quality. Hence, removing rains from an image has become an important issue in the field. To handle such an ill-posed single image deraining task, in this paper, we specifically build a novel deep architecture, called rain convolutional dictionary network (RCDNet), which embeds the intrinsic priors of rain streaks and has clear interpretability. In specific, we first establish a RCD model for representing rain streaks and utilize the proximal gradient descent technique to design an iterative algorithm only containing simple operators for solving the model. By unfolding it, we then build the RCDNet in which every network module has clear physical meanings and corresponds to each operation involved in the algorithm. This good interpretability greatly facilitates an easy visualization and analysis on what happens inside the network and why it works well in inference process. Moreover, taking into account the domain gap issue in real scenarios, we further design a novel dynamic RCDNet, where the rain kernels can be dynamically inferred corresponding to input rainy images and then help shrink the space for rain layer estimation with few rain maps so as to ensure a fine generalization performance in the inconsistent scenarios of rain types between training and testing data. By end-to-end training such an interpretable network, all involved rain kernels and proximal operators can be automatically extracted, faithfully characterizing the features of both rain and clean background layers, and thus naturally lead to better deraining performance. Comprehensive experiments substantiate the superiority of our method, especially on its well generality to diverse testing scenarios and good interpretability for all its modules. Code is available in \emph{\url{https://github.com/hongwang01/DRCDNet}}. △ Less

Submitted 26 December, 2022; v1 submitted 14 July, 2021; originally announced July 2021.

Journal ref: Transactions on Neural Networks and Learning Systems2022

arXiv:2106.14178 [pdf, other]

Residual Moment Loss for Medical Image Segmentation

Authors: Quanziang Wang, Renzhen Wang, Yuexiang Li, Kai Ma, Yefeng Zheng, Deyu Meng

Abstract: Location information is proven to benefit the deep learning models on capturing the manifold structure of target objects, and accordingly boosts the accuracy of medical image segmentation. However, most existing methods encode the location information in an implicit way, e.g. the distance transform maps, which describe the relative distance from each pixel to the contour boundary, for the network… ▽ More Location information is proven to benefit the deep learning models on capturing the manifold structure of target objects, and accordingly boosts the accuracy of medical image segmentation. However, most existing methods encode the location information in an implicit way, e.g. the distance transform maps, which describe the relative distance from each pixel to the contour boundary, for the network to learn. These implicit approaches do not fully exploit the position information (i.e. absolute location) of targets. In this paper, we propose a novel loss function, namely residual moment (RM) loss, to explicitly embed the location information of segmentation targets during the training of deep learning networks. Particularly, motivated by image moments, the segmentation prediction map and ground-truth map are weighted by coordinate information. Then our RM loss encourages the networks to maintain the consistency between the two weighted maps, which promotes the segmentation networks to easily locate the targets and extract manifold-structure-related features. We validate the proposed RM loss by conducting extensive experiments on two publicly available datasets, i.e., 2D optic cup and disk segmentation and 3D left atrial segmentation. The experimental results demonstrate the effectiveness of our RM loss, which significantly boosts the accuracy of segmentation networks. △ Less

Submitted 27 June, 2021; originally announced June 2021.

arXiv:2106.10641 [pdf, other]

Nuclei Grading of Clear Cell Renal Cell Carcinoma in Histopathological Image by Composite High-Resolution Network

Authors: Zeyu Gao, Jiangbo Shi, Xianli Zhang, Yang Li, Haichuan Zhang, Jialun Wu, Chunbao Wang, Deyu Meng, Chen Li

Abstract: The grade of clear cell renal cell carcinoma (ccRCC) is a critical prognostic factor, making ccRCC nuclei grading a crucial task in RCC pathology analysis. Computer-aided nuclei grading aims to improve pathologists' work efficiency while reducing their misdiagnosis rate by automatically identifying the grades of tumor nuclei within histopathological images. Such a task requires precisely segment a… ▽ More The grade of clear cell renal cell carcinoma (ccRCC) is a critical prognostic factor, making ccRCC nuclei grading a crucial task in RCC pathology analysis. Computer-aided nuclei grading aims to improve pathologists' work efficiency while reducing their misdiagnosis rate by automatically identifying the grades of tumor nuclei within histopathological images. Such a task requires precisely segment and accurately classify the nuclei. However, most of the existing nuclei segmentation and classification methods can not handle the inter-class similarity property of nuclei grading, thus can not be directly applied to the ccRCC grading task. In this paper, we propose a Composite High-Resolution Network for ccRCC nuclei grading. Specifically, we propose a segmentation network called W-Net that can separate the clustered nuclei. Then, we recast the fine-grained classification of nuclei to two cross-category classification tasks, based on two high-resolution feature extractors (HRFEs) which are proposed for learning these two tasks. The two HRFEs share the same backbone encoder with W-Net by a composite connection so that meaningful features for the segmentation task can be inherited for the classification task. Last, a head-fusion block is applied to generate the predicted label of each nucleus. Furthermore, we introduce a dataset for ccRCC nuclei grading, containing 1000 image patches with 70945 annotated nuclei. We demonstrate that our proposed method achieves state-of-the-art performance compared to existing methods on this large ccRCC grading dataset. △ Less

Submitted 20 June, 2021; originally announced June 2021.

Comments: Accepted by MICCAI 2021

arXiv:2106.02884 [pdf, other]

A Deep Variational Bayesian Framework for Blind Image Deblurring

Authors: Hui Wang, Zongsheng Yue, Qian Zhao, Deyu Meng

Abstract: Blind image deblurring is an important yet very challenging problem in low-level vision. Traditional optimization based methods generally formulate this task as a maximum-a-posteriori estimation or variational inference problem, whose performance highly relies on the handcraft priors for both the latent image and the blur kernel. In contrast, recent deep learning methods generally learn, from a la… ▽ More Blind image deblurring is an important yet very challenging problem in low-level vision. Traditional optimization based methods generally formulate this task as a maximum-a-posteriori estimation or variational inference problem, whose performance highly relies on the handcraft priors for both the latent image and the blur kernel. In contrast, recent deep learning methods generally learn, from a large collection of training images, deep neural networks (DNNs) directly map** the blurry image to the clean one or to the blur kernel, paying less attention to the physical degradation process of the blurry image. In this paper, we present a deep variational Bayesian framework for blind image deblurring. Under this framework, the posterior of the latent clean image and blur kernel can be jointly estimated in an amortized inference fashion with DNNs, and the involved inference DNNs can be trained by fully considering the physical blur model, together with the supervision of data driven priors for the clean image and blur kernel, which is naturally led to by the evidence lower bound objective. Comprehensive experiments are conducted to substantiate the effectiveness of the proposed framework. The results show that it can not only achieve a promising performance with relatively simple networks, but also enhance the performance of existing DNNs for deblurring. △ Less

Submitted 5 June, 2021; originally announced June 2021.

arXiv:2101.04345 [pdf, ps, other]

doi 10.1109/TAC.2021.3115455

From Control to Mathematics-Part I: Controllability-Based Design for Iterative Methods in Solving Linear Equations

Authors: Deyuan Meng, Yuxin Wu

Abstract: In the interaction between control and mathematics, mathematical tools are fundamental for all the control methods, but it is unclear how control impacts mathematics. This is the first part of our paper that attempts to give an answer with focus on solving linear algebraic equations (LAEs) from the perspective of systems and control, where it mainly introduces the controllability-based design resu… ▽ More In the interaction between control and mathematics, mathematical tools are fundamental for all the control methods, but it is unclear how control impacts mathematics. This is the first part of our paper that attempts to give an answer with focus on solving linear algebraic equations (LAEs) from the perspective of systems and control, where it mainly introduces the controllability-based design results. By proposing an iterative method that integrates a learning control mechanism, a class of tracking problems for iterative learning control (ILC) is explored for the problem solving of LAEs. A trackability property of ILC is newly developed, by which analysis and synthesis results are established to disclose the equivalence between the solvability of LAEs and the controllability of discrete control systems. Hence, LAEs can be solved by equivalently achieving the perfect tracking tasks of resulting ILC systems via the classic state feedback-based design and analysis methods. It is shown that the solutions for any solvable LAE can all be calculated with different selections of the initial input. Moreover, the presented ILC method is applicable to determining all the least squares solutions of any unsolvable LAE. In particular, a deadbeat design is incorporated to ILC such that the solving of LAEs can be completed within finite iteration steps. The trackability property is also generalized to conventional two-dimensional ILC systems, which creates feedback-based methods, instead of the common used contraction map**-based methods, for the design and convergence analysis of ILC. △ Less

Submitted 12 January, 2021; originally announced January 2021.

arXiv:2012.05643 [pdf, ps, other]

Control Analysis and Synthesis of Data-Driven Learning: A Kalman State-Space Approach

Authors: Deyuan Meng

Abstract: This paper aims to deal with the control analysis and synthesis problem of data-driven learning, regardless of unknown plant models and iteration-varying uncertainties. For the tracking of any desired target, a Kalman state-space approach is presented to transform it into two robust stability problems, which bridges a connection between data-driven control and model-based control. This approach al… ▽ More This paper aims to deal with the control analysis and synthesis problem of data-driven learning, regardless of unknown plant models and iteration-varying uncertainties. For the tracking of any desired target, a Kalman state-space approach is presented to transform it into two robust stability problems, which bridges a connection between data-driven control and model-based control. This approach also makes it possible to employ the extended state observer (ESO) in the design of data-driven learning to overcome the effect of iteration-varying uncertainties. It is shown that ESO-based data-driven learning ensures model-free systems to achieve the tracking of any desired target. In particular, our results apply to iterative learning control, which is verified by an example. △ Less

Submitted 10 December, 2020; originally announced December 2020.

Comments: Submitted

arXiv:2008.10796 [pdf, other]

Deep Variational Network Toward Blind Image Restoration

Authors: Zongsheng Yue, Hongwei Yong, Qian Zhao, Lei Zhang, Deyu Meng, Kwan-Yee K. Wong

Abstract: Blind image restoration (IR) is a common yet challenging problem in computer vision. Classical model-based methods and recent deep learning (DL)-based methods represent two different methodologies for this problem, each with their own merits and drawbacks. In this paper, we propose a novel blind image restoration method, aiming to integrate both the advantages of them. Specifically, we construct a… ▽ More Blind image restoration (IR) is a common yet challenging problem in computer vision. Classical model-based methods and recent deep learning (DL)-based methods represent two different methodologies for this problem, each with their own merits and drawbacks. In this paper, we propose a novel blind image restoration method, aiming to integrate both the advantages of them. Specifically, we construct a general Bayesian generative model for the blind IR, which explicitly depicts the degradation process. In this proposed model, a pixel-wise non-i.i.d. Gaussian distribution is employed to fit the image noise. It is with more flexibility than the simple i.i.d. Gaussian or Laplacian distributions as adopted in most of conventional methods, so as to handle more complicated noise types contained in the image degradation. To solve the model, we design a variational inference algorithm where all the expected posteriori distributions are parameterized as deep neural networks to increase their model capability. Notably, such an inference algorithm induces a unified framework to jointly deal with the tasks of degradation estimation and image restoration. Further, the degradation information estimated in the former task is utilized to guide the latter IR process. Experiments on two typical blind IR tasks, namely image denoising and super-resolution, demonstrate that the proposed method achieves superior performance over current state-of-the-arts. △ Less

Submitted 26 April, 2024; v1 submitted 24 August, 2020; originally announced August 2020.

Comments: Accepted by TPAMI@2024. Code: https://github.com/zsyOAOA/VIRNet

ACM Class: I.4.4

arXiv:2007.05946 [pdf, other]

Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation

Authors: Zongsheng Yue, Qian Zhao, Lei Zhang, Deyu Meng

Abstract: Real-world image noise removal is a long-standing yet very challenging task in computer vision. The success of deep neural network in denoising stimulates the research of noise generation, aiming at synthesizing more clean-noisy image pairs to facilitate the training of deep denoisers. In this work, we propose a novel unified framework to simultaneously deal with the noise removal and noise genera… ▽ More Real-world image noise removal is a long-standing yet very challenging task in computer vision. The success of deep neural network in denoising stimulates the research of noise generation, aiming at synthesizing more clean-noisy image pairs to facilitate the training of deep denoisers. In this work, we propose a novel unified framework to simultaneously deal with the noise removal and noise generation tasks. Instead of only inferring the posteriori distribution of the latent clean image conditioned on the observed noisy image in traditional MAP framework, our proposed method learns the joint distribution of the clean-noisy image pairs. Specifically, we approximate the joint distribution with two different factorized forms, which can be formulated as a denoiser map** the noisy image to the clean one and a generator map** the clean image to the noisy one. The learned joint distribution implicitly contains all the information between the noisy and clean images, avoiding the necessity of manually designing the image priors and noise assumptions as traditional. Besides, the performance of our denoiser can be further improved by augmenting the original training dataset with the learned generator. Moreover, we propose two metrics to assess the quality of the generated noisy image, for which, to the best of our knowledge, such metrics are firstly proposed along this research line. Extensive experiments have been conducted to demonstrate the superiority of our method over the state-of-the-arts both in the real noise removal and generation tasks. The training and testing code is available at https://github.com/zsyOAOA/DANet. △ Less

Submitted 12 July, 2020; originally announced July 2020.

Comments: Accepted by ECCV 2020

ACM Class: I.4.4

arXiv:2007.05230 [pdf, other]

Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral Super-Resolution

Authors: **g Yao, Danfeng Hong, Jocelyn Chanussot, Deyu Meng, Xiaoxiang Zhu, Zongben Xu

Abstract: The recent advancement of deep learning techniques has made great progress on hyperspectral image super-resolution (HSI-SR). Yet the development of unsupervised deep networks remains challenging for this task. To this end, we propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet for short, to enhance the spatial resolution of HSI by means of higher-spatial-resolution m… ▽ More The recent advancement of deep learning techniques has made great progress on hyperspectral image super-resolution (HSI-SR). Yet the development of unsupervised deep networks remains challenging for this task. To this end, we propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet for short, to enhance the spatial resolution of HSI by means of higher-spatial-resolution multispectral image (MSI). Inspired by coupled spectral unmixing, a two-stream convolutional autoencoder framework is taken as backbone to jointly decompose MS and HS data into a spectrally meaningful basis and corresponding coefficients. CUCaNet is capable of adaptively learning spectral and spatial response functions from HS-MS correspondences by enforcing reasonable consistency assumptions on the networks. Moreover, a cross-attention module is devised to yield more effective spatial-spectral information transfer in networks. Extensive experiments are conducted on three widely-used HS-MS datasets in comparison with state-of-the-art HSI-SR models, demonstrating the superiority of the CUCaNet in the HSI-SR application. Furthermore, the codes and datasets will be available at: https://github.com/danfenghong/ECCV2020_CUCaNet. △ Less

Submitted 1 August, 2020; v1 submitted 10 July, 2020; originally announced July 2020.

arXiv:2005.09228 [pdf, other]

Structural Residual Learning for Single Image Rain Removal

Authors: Hong Wang, Yichen Wu, Qi Xie, Qian Zhao, Yong Liang, Deyu Meng

Abstract: To alleviate the adverse effect of rain streaks in image processing tasks, CNN-based single image rain removal methods have been recently proposed. However, the performance of these deep learning methods largely relies on the covering range of rain shapes contained in the pre-collected training rainy-clean image pairs. This makes them easily trapped into the overfitting-to-the-training-samples iss… ▽ More To alleviate the adverse effect of rain streaks in image processing tasks, CNN-based single image rain removal methods have been recently proposed. However, the performance of these deep learning methods largely relies on the covering range of rain shapes contained in the pre-collected training rainy-clean image pairs. This makes them easily trapped into the overfitting-to-the-training-samples issue and cannot finely generalize to practical rainy images with complex and diverse rain streaks. Against this generalization issue, this study proposes a new network architecture by enforcing the output residual of the network possess intrinsic rain structures. Such a structural residual setting guarantees the rain layer extracted by the network finely comply with the prior knowledge of general rain streaks, and thus regulates sound rain shapes capable of being well extracted from rainy images in both training and predicting stages. Such a general regularization function naturally leads to both its better training accuracy and testing generalization capability even for those non-seen rain configurations. Such superiority is comprehensively substantiated by experiments implemented on synthetic and real datasets both visually and quantitatively as compared with current state-of-the-art methods. △ Less

Submitted 19 May, 2020; originally announced May 2020.

arXiv:2005.01333 [pdf, other]

A Model-driven Deep Neural Network for Single Image Rain Removal

Authors: Hong Wang, Qi Xie, Qian Zhao, Deyu Meng

Abstract: Deep learning (DL) methods have achieved state-of-the-art performance in the task of single image rain removal. Most of current DL architectures, however, are still lack of sufficient interpretability and not fully integrated with physical structures inside general rain streaks. To this issue, in this paper, we propose a model-driven deep neural network for the task, with fully interpretable netwo… ▽ More Deep learning (DL) methods have achieved state-of-the-art performance in the task of single image rain removal. Most of current DL architectures, however, are still lack of sufficient interpretability and not fully integrated with physical structures inside general rain streaks. To this issue, in this paper, we propose a model-driven deep neural network for the task, with fully interpretable network structures. Specifically, based on the convolutional dictionary learning mechanism for representing rain, we propose a novel single image deraining model and utilize the proximal gradient descent technique to design an iterative algorithm only containing simple operators for solving the model. Such a simple implementation scheme facilitates us to unfold it into a new deep network architecture, called rain convolutional dictionary network (RCDNet), with almost every network module one-to-one corresponding to each operation involved in the algorithm. By end-to-end training the proposed RCDNet, all the rain kernels and proximal operators can be automatically extracted, faithfully characterizing the features of both rain and clean background layers, and thus naturally lead to its better deraining performance, especially in real scenarios. Comprehensive experiments substantiate the superiority of the proposed network, especially its well generality to diverse testing scenarios and good interpretability for all its modules, as compared with state-of-the-arts both visually and quantitatively. The source codes are available at \url{https://github.com/hongwang01/RCDNet}. △ Less

Submitted 4 May, 2020; originally announced May 2020.

arXiv:1910.10305 [pdf, ps, other]

System Equivalence Transformation: Robust Convergence of Iterative Learning Control with Nonrepetitive Uncertainties

Authors: Deyuan Meng, **gyao Zhang

Abstract: For iterative learning control (ILC), one of the basic problems left to address is how to solve the contradiction between convergence conditions for the output tracking error and for the input signal (or error). This problem is considered in the current paper, where the robust convergence analysis is achieved for ILC systems in the presence of nonrepetitive uncertainties. A system equivalence tran… ▽ More For iterative learning control (ILC), one of the basic problems left to address is how to solve the contradiction between convergence conditions for the output tracking error and for the input signal (or error). This problem is considered in the current paper, where the robust convergence analysis is achieved for ILC systems in the presence of nonrepetitive uncertainties. A system equivalence transformation (SET) is proposed for ILC such that given any desired reference trajectories, the output tracking problems for general nonsquare multi-input, multi-output (MIMO) systems can be equivalently transformed into those for the specific class of square MIMO systems with the same input and output channels. As a benefit of SET, a unified condition is only needed to guarantee both the uniform boundedness of all system signals and the robust convergence of the output tracking error, which avoids causing the condition contradiction problem in implementing the double-dynamics analysis approach to ILC. Simulation examples are included to demonstrate the validity of our established robust ILC results. △ Less

Submitted 22 October, 2019; originally announced October 2019.

Comments: Submitted

arXiv:1909.08326 [pdf, other]

doi 10.1007/s11432-020-3225-9

A Survey on Rain Removal from Video and Single Image

Authors: Hong Wang, Yichen Wu, Minghan Li, Qian Zhao, Deyu Meng

Abstract: Rain streaks might severely degenerate the performance of video/image processing tasks. The investigations on rain removal from video or a single image has thus been attracting much research attention in the field of computer vision and pattern recognition, and various methods have been proposed against this task in the recent years. However, there is still not a comprehensive survey paper to summ… ▽ More Rain streaks might severely degenerate the performance of video/image processing tasks. The investigations on rain removal from video or a single image has thus been attracting much research attention in the field of computer vision and pattern recognition, and various methods have been proposed against this task in the recent years. However, there is still not a comprehensive survey paper to summarize current rain removal methods and fairly compare their generalization performance, and especially, still not a off-the-shelf toolkit to accumulate recent representative methods for easy performance comparison and capability evaluation. Aiming at this meaningful task, in this study we present a comprehensive review for current rain removal methods for video and a single image. Specifically, these methods are categorized into model-driven and data-driven approaches, and more elaborate branches of each approach are further introduced. Intrinsic capabilities, especially generalization, of representative state-of-the-art methods of each approach have been evaluated and analyzed by experiments implemented on synthetic and real data both visually and quantitatively. Furthermore, we release a comprehensive repository, including direct links to 74 rain removal papers, source codes of 9 methods for video rain removal and 20 ones for single image rain removal, 19 related project pages, 6 synthetic datasets and 4 real ones, and 4 commonly used image quality metrics, to facilitate reproduction and performance comparison of current existing methods for general users. Some limitations and research issues worthy to be further investigated have also been discussed for future research of this direction. △ Less

Submitted 3 October, 2019; v1 submitted 18 September, 2019; originally announced September 2019.

Journal ref: SCIENCE CHINA Information Sciences 2021

arXiv:1909.06148 [pdf, other]

Video Rain/Snow Removal by Transformed Online Multiscale Convolutional Sparse Coding

Authors: Minghan Li, Xiangyong Cao, Qian Zhao, Lei Zhang, Chenqiang Gao, Deyu Meng

Abstract: Video rain/snow removal from surveillance videos is an important task in the computer vision community since rain/snow existed in videos can severely degenerate the performance of many surveillance system. Various methods have been investigated extensively, but most only consider consistent rain/snow under stable background scenes. Rain/snow captured from practical surveillance camera, however, is… ▽ More Video rain/snow removal from surveillance videos is an important task in the computer vision community since rain/snow existed in videos can severely degenerate the performance of many surveillance system. Various methods have been investigated extensively, but most only consider consistent rain/snow under stable background scenes. Rain/snow captured from practical surveillance camera, however, is always highly dynamic in time with the background scene transformed occasionally. To this issue, this paper proposes a novel rain/snow removal approach, which fully considers dynamic statistics of both rain/snow and background scenes taken from a video sequence. Specifically, the rain/snow is encoded as an online multi-scale convolutional sparse coding (OMS-CSC) model, which not only finely delivers the sparse scattering and multi-scale shapes of real rain/snow, but also well encodes their temporally dynamic configurations by real-time ameliorated parameters in the model. Furthermore, a transformation operator imposed on the background scenes is further embedded into the proposed model, which finely conveys the dynamic background transformations, such as rotations, scalings and distortions, inevitably existed in a real video sequence. The approach so constructed can naturally better adapt to the dynamic rain/snow as well as background changes, and also suitable to deal with the streaming video attributed its online learning mode. The proposed model is formulated in a concise maximum a posterior (MAP) framework and is readily solved by the ADMM algorithm. Compared with the state-of-the-art online and offline video rain/snow removal methods, the proposed method achieves better performance on synthetic and real videos datasets both visually and quantitatively. Specifically, our method can be implemented in relatively high efficiency, showing its potential to real-time video rain/snow removal. △ Less

Submitted 13 September, 2019; originally announced September 2019.

Comments: 14 pages, 15 figures

Showing 1–50 of 52 results for author: Meng, D