-
Perspective+ Unet: Enhancing Segmentation with Bi-Path Fusion and Efficient Non-Local Attention for Superior Receptive Fields
Authors:
**tong Hu,
Siyan Chen,
Zhiyi Pan,
Sen Zeng,
Wenming Yang
Abstract:
Precise segmentation of medical images is fundamental for extracting critical clinical information, which plays a pivotal role in enhancing the accuracy of diagnoses, formulating effective treatment plans, and improving patient outcomes. Although Convolutional Neural Networks (CNNs) and non-local attention methods have achieved notable success in medical image segmentation, they either struggle to…
▽ More
Precise segmentation of medical images is fundamental for extracting critical clinical information, which plays a pivotal role in enhancing the accuracy of diagnoses, formulating effective treatment plans, and improving patient outcomes. Although Convolutional Neural Networks (CNNs) and non-local attention methods have achieved notable success in medical image segmentation, they either struggle to capture long-range spatial dependencies due to their reliance on local features, or face significant computational and feature integration challenges when attempting to address this issue with global attention mechanisms. To overcome existing limitations in medical image segmentation, we propose a novel architecture, Perspective+ Unet. This framework is characterized by three major innovations: (i) It introduces a dual-pathway strategy at the encoder stage that combines the outcomes of traditional and dilated convolutions. This not only maintains the local receptive field but also significantly expands it, enabling better comprehension of the global structure of images while retaining detail sensitivity. (ii) The framework incorporates an efficient non-local transformer block, named ENLTB, which utilizes kernel function approximation for effective long-range dependency capture with linear computational and spatial complexity. (iii) A Spatial Cross-Scale Integrator strategy is employed to merge global dependencies and local contextual cues across model stages, meticulously refining features from various levels to harmonize global and local information. Experimental results on the ACDC and Synapse datasets demonstrate the effectiveness of our proposed Perspective+ Unet. The code is available in the supplementary material.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection
Authors:
Yujie Chen,
Jiangyan Yi,
Jun Xue,
Chenglong Wang,
Xiaohui Zhang,
Shunbo Dong,
Siding Zeng,
Jianhua Tao,
Lv Zhao,
Cunhang Fan
Abstract:
Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepf…
▽ More
Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepfake detection. Specifically, we use sinc Layer and multiple convolutional layers to capture short-range features, and then design a bidirectional Mamba to address Mamba's unidirectional modelling problem and further capture long-range feature information. Moreover, we develop a bidirectional fusion module to integrate embeddings, enhancing audio context representation and combining short- and long-range information. The results show that our proposed RawBMamba achieves a 34.1\% improvement over Rawformer on ASVspoof2021 LA dataset, and demonstrates competitive performance on other datasets.
△ Less
Submitted 18 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Environment-aware UAV Communications: CKM Construction and Predictive Beamforming
Authors:
Shiqi Zeng,
Xiaoli Xu,
Yong Zeng
Abstract:
Predictive millimeter-wave (mmWave) beamforming is a promising technique to enable low-latency and high-rate ground-air communications for cellular-connected unmanned aerial vehicles (UAVs). However, the high vulnerability of mmWave to blockages poses practical challenges to the implementation of such a technology. In this paper, we tackle the challenges by proposing a channel knowledge map (CKM)-…
▽ More
Predictive millimeter-wave (mmWave) beamforming is a promising technique to enable low-latency and high-rate ground-air communications for cellular-connected unmanned aerial vehicles (UAVs). However, the high vulnerability of mmWave to blockages poses practical challenges to the implementation of such a technology. In this paper, we tackle the challenges by proposing a channel knowledge map (CKM)-assisted predictive beamforming approach based on the echoed joint communication and sensing signal, whereby the line-of-sight (LoS) link identification is performed via hypothesis testing using prior information provided by CKM. Depending on the identification result, extended Kalman filtering (EKF) is adopted to reliably track the target UAV. Furthermore, if the non-line-of-sight (NLoS) state is identified, the target UAV will be immediately connected to a candidate base station (BS), namely a handover will be triggered to alleviate the communication outage. The simulation results show that the proposed method can significantly enhance the UAV tracking and mmWave communication performance compared to the benchmarking schemes without using CKM or LoS identification.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Channel Estimation for Holographic Communications in Hybrid Near-Far Field
Authors:
Shaohua Yue,
Shuhao Zeng,
Liang Liu,
Boya Di
Abstract:
To realize holographic communications, a potential technology for spectrum efficiency improvement in the future sixth-generation (6G) network, antenna arrays inlaid with numerous antenna elements will be deployed. However, the increase in antenna aperture size makes some users lie in the Fresnel region, leading to the hybrid near-field and far-field communication mode, where the conventional far-f…
▽ More
To realize holographic communications, a potential technology for spectrum efficiency improvement in the future sixth-generation (6G) network, antenna arrays inlaid with numerous antenna elements will be deployed. However, the increase in antenna aperture size makes some users lie in the Fresnel region, leading to the hybrid near-field and far-field communication mode, where the conventional far-field channel estimation methods no longer work well. To tackle the above challenge, this paper considers channel estimation in a hybrid-field multipath environment, where each user and each scatterer can be in either the far-field or the near-field region. First, a joint angular-polar domain channel transform is designed to capture the hybrid-field channel's near-field and far-field features. We then analyze the power diffusion effect in the hybrid-field channel, which indicates that the power corresponding to one near-field (far-field) path component of the multipath channel may spread to far-field (near-field) paths and causes estimation error. We design a novel power-diffusion-based orthogonal matching pursuit channel estimation algorithm (PD-OMP). It can eliminate the prior knowledge requirement of path numbers in the far field and near field, which is a must in other OMP-based channel estimation algorithms. Simulation results show that PD-OMP outperforms current hybrid-field channel estimation methods.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Near-Far Field Channel Modeling for Holographic MIMO Using Expectation-Maximization Methods
Authors:
Houfeng Chen,
Shuhao Zeng,
Hao Guo,
Tommy Svensson,
Hongliang Zhang
Abstract:
Holographic Multiple-Input Multiple-Output (HMIMO), which densely integrates numerous antennas into a limited space, is anticipated to provide higher rates for future 6G wireless communications. The increase in antenna aperture size makes the near-field region enlarge, causing some users to be located in the near-field region. Thus, we are facing a hybrid near-field and far-field communication pro…
▽ More
Holographic Multiple-Input Multiple-Output (HMIMO), which densely integrates numerous antennas into a limited space, is anticipated to provide higher rates for future 6G wireless communications. The increase in antenna aperture size makes the near-field region enlarge, causing some users to be located in the near-field region. Thus, we are facing a hybrid near-field and far-field communication problem, where conventional far-field modeling methods may not work well. In this paper, we propose a near-far field channel model that does not presuppose whether each path is near-field or far-field, different from the existing work requiring the ratio of the number of near-field paths to that of far-field paths as prior knowledge. However, this gives rise to a new challenge for accurately modeling the channel, as conventional methods of obtaining channel model parameters are not applicable to this model. Therefore, we propose a new method, Expectation-Maximization (EM)-based Near-Far Field Channel Modeling, to obtain channel model parameters, which considers whether each path is near-field or far-field as a hidden variable, and optimizes the hidden variables and channel model parameters through an alternating iteration method. Simulation results show that our method is superior to conventional near-field and far-field algorithms in fitting the near-far field channel in terms of outage probability.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection
Authors:
Xiaohui Zhang,
Jiangyan Yi,
Chenglong Wang,
Chuyuan Zhang,
Siding Zeng,
Jianhua Tao
Abstract:
The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms. Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types. To address this challenge, one of the…
▽ More
The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms. Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types. To address this challenge, one of the emergent effective approaches is continual learning. In this paper, we propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection. The fundamental concept underlying RWM involves categorizing all classes into two groups: those with compact feature distributions across tasks, such as genuine audio, and those with more spread-out distributions, like various types of fake audio. These distinctions are quantified by means of the in-class cosine distance, which subsequently serves as the basis for RWM to introduce a trainable gradient modification direction for distinct data types. Experimental evaluations against mainstream continual learning methods reveal the superiority of RWM in terms of knowledge acquisition and mitigating forgetting in audio deepfake detection. Furthermore, RWM's applicability extends beyond audio deepfake detection, demonstrating its potential significance in diverse machine learning domains such as image recognition.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning
Authors:
Iman Askari,
Xumein Tu,
Shen Zeng,
Huazhen Fang
Abstract:
Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC metho…
▽ More
Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC methods will be computationally too heavy to be a viable solution. In a departure, we propose the idea of model predictive inferential control (MPIC), which seeks to infer the best control decisions from the control objectives and constraints. Following the idea, we convert the MPC problem for motion planning into a Bayesian state estimation problem. Then, we develop a new particle filtering/smoothing approach to perform the estimation. This approach is implemented as banks of unscented Kalman filters/smoothers and offers high sampling efficiency, fast computation, and estimation accuracy. We evaluate the MPIC approach through a simulation study of autonomous driving in different scenarios, along with an exhaustive comparison with gradient-based MPC. The results show that the MPIC approach has considerable computational efficiency, regardless of complex neural network architectures, and shows the capability to solve large-scale MPC problems for neural state-space models.
△ Less
Submitted 19 October, 2023; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Deep Learning Predicts Biomarker Status and Discovers Related Histomorphology Characteristics for Low-Grade Glioma
Authors:
Zijie Fang,
Yihan Liu,
Yifeng Wang,
Xiangyang Zhang,
Yang Chen,
Chang**g Cai,
Yiyang Lin,
Ying Han,
Zhi Wang,
Shan Zeng,
Hong Shen,
Jun Tan,
Yongbing Zhang
Abstract:
Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a…
▽ More
Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a Multi-Biomarker Histomorphology Discoverer (Multi-Beholder) model based on the multiple instance learning (MIL) framework, to predict the status of five biomarkers in LGG using only hematoxylin and eosin-stained whole slide images and slide-level biomarker status labels. Specifically, by incorporating the one-class classification into the MIL framework, accurate instance pseudo-labeling is realized for instance-level supervision, which greatly complements the slide-level labels and improves the biomarker prediction performance. Multi-Beholder demonstrates superior prediction performance and generalizability for five LGG biomarkers (AUROC=0.6469-0.9735) in two cohorts (n=607) with diverse races and scanning protocols. Moreover, the excellent interpretability of Multi-Beholder allows for discovering the quantitative and qualitative correlations between biomarker status and histomorphology characteristics. Our pipeline not only provides a novel approach for biomarker prediction, enhancing the applicability of molecular treatments for LGG patients but also facilitates the discovery of new mechanisms in molecular functionality and LGG progression.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Leveraging Label Information for Multimodal Emotion Recognition
Authors:
Peiying Wang,
Sunlu Zeng,
Junqing Chen,
Lu Fan,
Meng Chen,
Youzheng Wu,
Xiaodong He
Abstract:
Multimodal emotion recognition (MER) aims to detect the emotional status of a given expression by combining the speech and text information. Intuitively, label information should be capable of hel** the model locate the salient tokens/frames relevant to the specific emotion, which finally facilitates the MER task. Inspired by this, we propose a novel approach for MER by leveraging label informat…
▽ More
Multimodal emotion recognition (MER) aims to detect the emotional status of a given expression by combining the speech and text information. Intuitively, label information should be capable of hel** the model locate the salient tokens/frames relevant to the specific emotion, which finally facilitates the MER task. Inspired by this, we propose a novel approach for MER by leveraging label information. Specifically, we first obtain the representative label embeddings for both text and speech modalities, then learn the label-enhanced text/speech representations for each utterance via label-token and label-frame interactions. Finally, we devise a novel label-guided attentive fusion module to fuse the label-aware text and speech representations for emotion classification. Extensive experiments were conducted on the public IEMOCAP dataset, and experimental results demonstrate that our proposed approach outperforms existing baselines and achieves new state-of-the-art performance.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
ParamNet: A Parameter-variable Network for Fast Stain Normalization
Authors:
Hongtao Kang,
Die Luo,
Li Chen,
Junbo Hu,
Shenghua Cheng,
Tingwei Quan,
Shaoqun Zeng,
Xiuli Liu
Abstract:
In practice, digital pathology images are often affected by various factors, resulting in very large differences in color and brightness. Stain normalization can effectively reduce the differences in color and brightness of digital pathology images, thus improving the performance of computer-aided diagnostic systems. Conventional stain normalization methods rely on one or several reference images,…
▽ More
In practice, digital pathology images are often affected by various factors, resulting in very large differences in color and brightness. Stain normalization can effectively reduce the differences in color and brightness of digital pathology images, thus improving the performance of computer-aided diagnostic systems. Conventional stain normalization methods rely on one or several reference images, but one or several images are difficult to represent the entire dataset. Although learning-based stain normalization methods are a general approach, they use complex deep networks, which not only greatly reduce computational efficiency, but also risk introducing artifacts. StainNet is a fast and robust stain normalization network, but it has not a sufficient capability for complex stain normalization due to its too simple network structure. In this study, we proposed a parameter-variable stain normalization network, ParamNet. ParamNet contains a parameter prediction sub-network and a color map** sub-network, where the parameter prediction sub-network can automatically determine the appropriate parameters for the color map** sub-network according to each input image. The feature of parameter variable ensures that our network has a sufficient capability for various stain normalization tasks. The color map** sub-network is a fully 1x1 convolutional network with a total of 59 variable parameters, which allows our network to be extremely computationally efficient and does not introduce artifacts. The results on cytopathology and histopathology datasets show that our ParamNet outperforms state-of-the-art methods and can effectively improve the generalization of classifiers on pathology diagnosis tasks. The code has been available at https://github.com/khtao/ParamNet.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Intelligent Omni-Surfaces Aided Wireless Communications: Does the Reciprocity Hold?
Authors:
Shaohua Yue,
Shuhao Zeng,
Hongliang Zhang,
Fenghan Lin,
Liang Liu,
Boya Di
Abstract:
Intelligent omni-surfaces (IOS) have attracted great attention recently due to its potential to achieve full-dimensional communications by simultaneously reflecting and refracting signals toward both sides of the surface. However, it still remains an open question whether the reciprocity holds between the uplink and downlink channels in the IOS-aided wireless communications. In this work, we first…
▽ More
Intelligent omni-surfaces (IOS) have attracted great attention recently due to its potential to achieve full-dimensional communications by simultaneously reflecting and refracting signals toward both sides of the surface. However, it still remains an open question whether the reciprocity holds between the uplink and downlink channels in the IOS-aided wireless communications. In this work, we first present a physics-compliant IOS related channel model, based on which the channel reciprocity is investigated. We then demonstrate the angle-dependent electromagnetic response of the IOS element in terms of both incident and departure angles. This serves as the key feature of IOS that drives our analytical results on beam non-reciprocity. Finally, simulation and experimental results are provided to verify our theoretical analyses.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
Reconfigurable Refractive Surfaces: An Energy-Efficient Way to Holographic MIMO
Authors:
Shuhao Zeng,
Hongliang Zhang,
Boya Di,
Haichao Qin,
Xin Su,
Lingyang Song
Abstract:
Holographic Multiple Input Multiple Output (HMIMO), which integrates massive antenna elements into a compact space to achieve a spatially continuous aperture, plays an important role in future wireless networks. With numerous antenna elements, it is hard to implement the HMIMO via phased arrays due to unacceptable power consumption. To address this issue, reconfigurable refractive surface (RRS) is…
▽ More
Holographic Multiple Input Multiple Output (HMIMO), which integrates massive antenna elements into a compact space to achieve a spatially continuous aperture, plays an important role in future wireless networks. With numerous antenna elements, it is hard to implement the HMIMO via phased arrays due to unacceptable power consumption. To address this issue, reconfigurable refractive surface (RRS) is an energy efficient enabler of HMIMO since the surface is free of expensive phase shifters. Unlike traditional metasurfaces working as passive relays, the RRS is used as transmit antennas, where the far-field approximation does not hold anymore, urging a new performance analysis framework. In this letter, we first derive the data rate of an RRS-based single-user downlink system, and then compare its power consumption with the phased array. Simulation results verify our analysis and show that the RRS is an energy-efficient way to HMIMO.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
-
Sampling-Based Nonlinear MPC of Neural Network Dynamics with Application to Autonomous Vehicle Motion Planning
Authors:
Iman Askari,
Babak Badnava,
Thomas Woodruff,
Shen Zeng,
Huazhen Fang
Abstract:
Control of machine learning models has emerged as an important paradigm for a broad range of robotics applications. In this paper, we present a sampling-based nonlinear model predictive control (NMPC) approach for control of neural network dynamics. We show its design in two parts: 1) formulating conventional optimization-based NMPC as a Bayesian state estimation problem, and 2) using particle fil…
▽ More
Control of machine learning models has emerged as an important paradigm for a broad range of robotics applications. In this paper, we present a sampling-based nonlinear model predictive control (NMPC) approach for control of neural network dynamics. We show its design in two parts: 1) formulating conventional optimization-based NMPC as a Bayesian state estimation problem, and 2) using particle filtering/smoothing to achieve the estimation. Through a principled sampling-based implementation, this approach can potentially make effective searches in the control action space for optimal control and also facilitate computation toward overcoming the challenges caused by neural network dynamics. We apply the proposed NMPC approach to motion planning for autonomous vehicles. The specific problem considers nonlinear unknown vehicle dynamics modeled as neural networks as well as dynamic on-road driving scenarios. The approach shows significant effectiveness in successful motion planning in case studies.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
Nonlinear Model Predictive Control Based on Constraint-Aware Particle Filtering/Smoothing
Authors:
Iman Askari,
Shen Zeng,
Huazhen Fang
Abstract:
Nonlinear model predictive control (NMPC) has gained widespread use in many applications. Its formulation traditionally involves repetitively solving a nonlinear constrained optimization problem online. In this paper, we investigate NMPC through the lens of Bayesian estimation and highlight that the Monte Carlo sampling method can offer a favorable way to implement NMPC. We develop a constraint-aw…
▽ More
Nonlinear model predictive control (NMPC) has gained widespread use in many applications. Its formulation traditionally involves repetitively solving a nonlinear constrained optimization problem online. In this paper, we investigate NMPC through the lens of Bayesian estimation and highlight that the Monte Carlo sampling method can offer a favorable way to implement NMPC. We develop a constraint-aware particle filtering/smoothing method and exploit it to implement NMPC. The new sampling-based NMPC algorithm can be executed easily and efficiently even for complex nonlinear systems, while potentially mitigating the issues of computational complexity and local minima faced by numerical optimization in conventional studies. The effectiveness of the proposed algorithm is evaluated through a simulation study.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow
Authors:
Sihan Zeng,
Alyssa Kody,
Youngdae Kim,
Kibaek Kim,
Daniel K. Molzahn
Abstract:
With the increasing penetration of distributed energy resources, distributed optimization algorithms have attracted significant attention for power systems applications due to their potential for superior scalability, privacy, and robustness to a single point-of-failure. The Alternating Direction Method of Multipliers (ADMM) is a popular distributed optimization algorithm; however, its convergence…
▽ More
With the increasing penetration of distributed energy resources, distributed optimization algorithms have attracted significant attention for power systems applications due to their potential for superior scalability, privacy, and robustness to a single point-of-failure. The Alternating Direction Method of Multipliers (ADMM) is a popular distributed optimization algorithm; however, its convergence performance is highly dependent on the selection of penalty parameters, which are usually chosen heuristically. In this work, we use reinforcement learning (RL) to develop an adaptive penalty parameter selection policy for the AC optimal power flow (ACOPF) problem solved via ADMM with the goal of minimizing the number of iterations until convergence. We train our RL policy using deep Q-learning, and show that this policy can result in significantly accelerated convergence (up to a 59% reduction in the number of iterations compared to existing, curvature-informed penalty parameter selection methods). Furthermore, we show that our RL policy demonstrates promise for generalizability, performing well under unseen loading schemes as well as under unseen losses of lines and generators (up to a 50% reduction in iterations). This work thus provides a proof-of-concept for using RL for parameter selection in ADMM for power systems applications.
△ Less
Submitted 5 May, 2022; v1 submitted 22 October, 2021;
originally announced October 2021.
-
Trajectory Optimization and Resource Allocation for OFDMA UAV Relay Networks
Authors:
Shuhao Zeng,
Hongliang Zhang,
Boya Di,
Lingyang Song
Abstract:
In this paper, we consider a single-cell multi-user orthogonal frequency division multiple access (OFDMA) network with one unmanned aerial vehicle (UAV), which works as an amplify-and-forward relay to improve the quality-of-service (QoS) of the user equipments (UEs) in the cell edge. Aiming to improve the throughput while guaranteeing the user fairness, we jointly optimize the communication mode,…
▽ More
In this paper, we consider a single-cell multi-user orthogonal frequency division multiple access (OFDMA) network with one unmanned aerial vehicle (UAV), which works as an amplify-and-forward relay to improve the quality-of-service (QoS) of the user equipments (UEs) in the cell edge. Aiming to improve the throughput while guaranteeing the user fairness, we jointly optimize the communication mode, subchannel allocation, power allocation, and UAV trajectory, which is an NP-hard problem. To design the UAV trajectory and resource allocation efficiently, we first decompose the problem into three subproblems, i.e., mode selection and subchannel allocation, trajectory optimization, and power allocation, and then solve these subproblems iteratively. Simulation results show that the proposed algorithm outperforms the random algorithm and the cellular scheme.
△ Less
Submitted 22 April, 2021;
originally announced April 2021.
-
StainNet: a fast and robust stain normalization network
Authors:
Hongtao Kang,
Die Luo,
Weihua Feng,
Junbo Hu,
Shaoqun Zeng,
Tingwei Quan,
Xiuli Liu
Abstract:
Stain normalization often refers to transferring the color distribution of the source image to that of the target image and has been widely used in biomedical image analysis. The conventional stain normalization is regarded as constructing a pixel-by-pixel color map** model, which only depends on one reference image, and can not accurately achieve the style transformation between image datasets.…
▽ More
Stain normalization often refers to transferring the color distribution of the source image to that of the target image and has been widely used in biomedical image analysis. The conventional stain normalization is regarded as constructing a pixel-by-pixel color map** model, which only depends on one reference image, and can not accurately achieve the style transformation between image datasets. In principle, this style transformation can be well solved by the deep learning-based methods due to its complicated network structure, whereas, its complicated structure results in the low computational efficiency and artifacts in the style transformation, which has restricted the practical application. Here, we use distillation learning to reduce the complexity of deep learning methods and a fast and robust network called StainNet to learn the color map** between the source image and target image. StainNet can learn the color map** relationship from a whole dataset and adjust the color value in a pixel-to-pixel manner. The pixel-to-pixel manner restricts the network size and avoids artifacts in the style transformation. The results on the cytopathology and histopathology datasets show that StainNet can achieve comparable performance to the deep learning-based methods. Computation results demonstrate StainNet is more than 40 times faster than StainGAN and can normalize a 100,000x100,000 whole slide image in 40 seconds.
△ Less
Submitted 23 July, 2021; v1 submitted 23 December, 2020;
originally announced December 2020.
-
Visually Imperceptible Adversarial Patch Attacks on Digital Images
Authors:
Yaguan Qian,
Jiamin Wang,
Bin Wang,
Shaoning Zeng,
Zhaoquan Gu,
Shouling Ji,
Wassim Swaileh
Abstract:
The vulnerability of deep neural networks (DNNs) to adversarial examples has attracted more attention. Many algorithms have been proposed to craft powerful adversarial examples. However, most of these algorithms modified the global or local region of pixels without taking network explanations into account. Hence, the perturbations are redundant, which are easily detected by human eyes. In this pap…
▽ More
The vulnerability of deep neural networks (DNNs) to adversarial examples has attracted more attention. Many algorithms have been proposed to craft powerful adversarial examples. However, most of these algorithms modified the global or local region of pixels without taking network explanations into account. Hence, the perturbations are redundant, which are easily detected by human eyes. In this paper, we propose a novel method to generate local region perturbations. The main idea is to find a contributing feature region (CFR) of an image by simulating the human attention mechanism and then add perturbations to CFR. Furthermore, a soft mask matrix is designed on the basis of an activation map to finely represent the contributions of each pixel in CFR. With this soft mask, we develop a new loss function with inverse temperature to search for optimal perturbations in CFR. Due to the network explanations, the perturbations added to CFR are more effective than those added to other regions. Extensive experiments conducted on CIFAR-10 and ILSVRC2012 demonstrate the effectiveness of the proposed method, including attack success rate, imperceptibility, and transferability.
△ Less
Submitted 27 April, 2021; v1 submitted 1 December, 2020;
originally announced December 2020.
-
Reconstruct high-resolution multi-focal plane images from a single 2D wide field image
Authors:
Jiabo Ma,
Sibo Liu,
Shenghua Cheng,
Xiuli Liu,
Li Cheng,
Shaoqun Zeng
Abstract:
High-resolution 3D medical images are important for analysis and diagnosis, but axial scanning to acquire them is very time-consuming. In this paper, we propose a fast end-to-end multi-focal plane imaging network (MFPINet) to reconstruct high-resolution multi-focal plane images from a single 2D low-resolution wild filed image without relying on scanning. To acquire realistic MFP images fast, the p…
▽ More
High-resolution 3D medical images are important for analysis and diagnosis, but axial scanning to acquire them is very time-consuming. In this paper, we propose a fast end-to-end multi-focal plane imaging network (MFPINet) to reconstruct high-resolution multi-focal plane images from a single 2D low-resolution wild filed image without relying on scanning. To acquire realistic MFP images fast, the proposed MFPINet adopts generative adversarial network framework and the strategies of post-sampling and refocusing all focal planes at one time. We conduct a series experiments on cytology microscopy images and demonstrate that MFPINet performs well on both axial refocusing and horizontal super resolution. Furthermore, MFPINet is approximately 24 times faster than current refocusing methods for reconstructing the same volume images. The proposed method has the potential to greatly increase the speed of high-resolution 3D imaging and expand the application of low-resolution wide-field images.
△ Less
Submitted 20 September, 2020;
originally announced September 2020.
-
Hyperspectral Image Denoising with Partially Orthogonal Matrix Vector Tensor Factorization
Authors:
Zhen Long,
Yipeng Liu,
Sixing Zeng,
Jiani Liu,
Fei Wen,
Ce Zhu
Abstract:
Hyperspectral image (HSI) has some advantages over natural image for various applications due to the extra spectral information. During the acquisition, it is often contaminated by severe noises including Gaussian noise, impulse noise, deadlines, and stripes. The image quality degeneration would badly effect some applications. In this paper, we present a HSI restoration method named smooth and rob…
▽ More
Hyperspectral image (HSI) has some advantages over natural image for various applications due to the extra spectral information. During the acquisition, it is often contaminated by severe noises including Gaussian noise, impulse noise, deadlines, and stripes. The image quality degeneration would badly effect some applications. In this paper, we present a HSI restoration method named smooth and robust low rank tensor recovery. Specifically, we propose a structural tensor decomposition in accordance with the linear spectral mixture model of HSI. It decomposes a tensor into sums of outer matrix vector products, where the vectors are orthogonal due to the independence of endmember spectrums. Based on it, the global low rank tensor structure can be well exposited for HSI denoising. In addition, the 3D anisotropic total variation is used for spatial spectral piecewise smoothness of HSI. Meanwhile, the sparse noise including impulse noise, deadlines and stripes, is detected by the l1 norm regularization. The Frobenius norm is used for the heavy Gaussian noise in some real world scenarios. The alternating direction method of multipliers is adopted to solve the proposed optimization model, which simultaneously exploits the global low rank property and the spatial spectral smoothness of the HSI. Numerical experiments on both simulated and real data illustrate the superiority of the proposed method in comparison with the existing ones.
△ Less
Submitted 28 June, 2020;
originally announced July 2020.
-
FFusionCGAN: An end-to-end fusion method for few-focus images using conditional GAN in cytopathological digital slides
Authors:
Xiebo Geng,
Sibo Liua,
Wei Han,
Xu Li,
Jiabo Ma,
**gya Yu,
Xiuli Liu,
Sahoqun Zeng,
Li Chen,
Shenghua Cheng
Abstract:
Multi-focus image fusion technologies compress different focus depth images into an image in which most objects are in focus. However, although existing image fusion techniques, including traditional algorithms and deep learning-based algorithms, can generate high-quality fused images, they need multiple images with different focus depths in the same field of view. This criterion may not be met in…
▽ More
Multi-focus image fusion technologies compress different focus depth images into an image in which most objects are in focus. However, although existing image fusion techniques, including traditional algorithms and deep learning-based algorithms, can generate high-quality fused images, they need multiple images with different focus depths in the same field of view. This criterion may not be met in some cases where time efficiency is required or the hardware is insufficient. The problem is especially prominent in large-size whole slide images. This paper focused on the multi-focus image fusion of cytopathological digital slide images, and proposed a novel method for generating fused images from single-focus or few-focus images based on conditional generative adversarial network (GAN). Through the adversarial learning of the generator and discriminator, the method is capable of generating fused images with clear textures and large depth of field. Combined with the characteristics of cytopathological images, this paper designs a new generator architecture combining U-Net and DenseBlock, which can effectively improve the network's receptive field and comprehensively encode image features. Meanwhile, this paper develops a semantic segmentation network that identifies the blurred regions in cytopathological images. By integrating the network into the generative model, the quality of the generated fused images is effectively improved. Our method can generate fused images from only single-focus or few-focus images, thereby avoiding the problem of collecting multiple images of different focus depths with increased time and hardware costs. Furthermore, our model is designed to learn the direct map** of input source images to fused images without the need to manually design complex activity level measurements and fusion rules as in traditional methods.
△ Less
Submitted 2 January, 2020;
originally announced January 2020.
-
Two-stage Image Classification Supervised by a Single Teacher Single Student Model
Authors:
Jianhang Zhou,
Shaoning Zeng,
Bob Zhang
Abstract:
The two-stage strategy has been widely used in image classification. However, these methods barely take the classification criteria of the first stage into consideration in the second prediction stage. In this paper, we propose a novel two-stage representation method (TSR), and convert it to a Single-Teacher Single-Student (STSS) problem in our two-stage image classification framework. We seek the…
▽ More
The two-stage strategy has been widely used in image classification. However, these methods barely take the classification criteria of the first stage into consideration in the second prediction stage. In this paper, we propose a novel two-stage representation method (TSR), and convert it to a Single-Teacher Single-Student (STSS) problem in our two-stage image classification framework. We seek the nearest neighbours of the test sample to choose candidate target classes. Meanwhile, the first stage classifier is formulated as the teacher, which holds the classification scores. The samples of the candidate classes are utilized to learn a student classifier based on L2-minimization in the second stage. The student will be supervised by the teacher classifier, which approves the student only if it obtains a higher score. In actuality, the proposed framework generates a stronger classifier by staging two weaker classifiers in a novel way. The experiments conducted on several face and object databases show that our proposed framework is effective and outperforms multiple popular classification methods.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Multi-stage domain adversarial style reconstruction for cytopathological image stain normalization
Authors:
Xihao Chen,
**gya Yu,
Li Chen,
Shaoqun Zeng,
Xiuli Liu,
Shenghua Cheng
Abstract:
The different stain styles of cytopathological images have a negative effect on the generalization ability of automated image analysis algorithms. This article proposes a new framework that normalizes the stain style for cytopathological images through a stain removal module and a multi-stage domain adversarial style reconstruction module. We convert colorful images into grayscale images with a co…
▽ More
The different stain styles of cytopathological images have a negative effect on the generalization ability of automated image analysis algorithms. This article proposes a new framework that normalizes the stain style for cytopathological images through a stain removal module and a multi-stage domain adversarial style reconstruction module. We convert colorful images into grayscale images with a color-encoding mask. Using the mask, reconstructed images retain their basic color without red and blue mixing, which is important for cytopathological image interpretation. The style reconstruction module consists of per-pixel regression with intradomain adversarial learning, inter-domain adversarial learning, and optional task-based refining. Per-pixel regression with intradomain adversarial learning establishes the generative network from the decolorized input to the reconstructed output. The interdomain adversarial learning further reduces the difference in stain style. The generation network can be optimized by combining it with the task network. Experimental results show that the proposed techniques help to optimize the generation network. The average accuracy increases from 75.41% to 84.79% after the intra-domain adversarial learning, and to 87.00% after interdomain adversarial learning. Under the guidance of the task network, the average accuracy rate reaches 89.58%. The proposed method achieves unsupervised stain normalization of cytopathological images, while preserving the cell structure, texture structure, and cell color properties of the image. This method overcomes the problem of generalizing the task models between different stain styles of cytopathological images.
△ Less
Submitted 11 September, 2019;
originally announced September 2019.
-
Time Series Simulation by Conditional Generative Adversarial Net
Authors:
Rao Fu,
Jie Chen,
Shutian Zeng,
Yi** Zhuang,
Agus Sudjianto
Abstract:
Generative Adversarial Net (GAN) has been proven to be a powerful machine learning tool in image data analysis and generation. In this paper, we propose to use Conditional Generative Adversarial Net (CGAN) to learn and simulate time series data. The conditions can be both categorical and continuous variables containing different kinds of auxiliary information. Our simulation studies show that CGAN…
▽ More
Generative Adversarial Net (GAN) has been proven to be a powerful machine learning tool in image data analysis and generation. In this paper, we propose to use Conditional Generative Adversarial Net (CGAN) to learn and simulate time series data. The conditions can be both categorical and continuous variables containing different kinds of auxiliary information. Our simulation studies show that CGAN is able to learn different kinds of normal and heavy tail distributions, as well as dependent structures of different time series and it can further generate conditional predictive distributions consistent with the training data distributions. We also provide an in-depth discussion on the rationale of GAN and the neural network as hierarchical splines to draw a clear connection with the existing statistical method for distribution generation. In practice, CGAN has a wide range of applications in the market risk and counterparty risk analysis: it can be applied to learn the historical data and generate scenarios for the calculation of Value-at-Risk (VaR) and Expected Shortfall (ES) and predict the movement of the market risk factors. We present a real data analysis including a backtesting to demonstrate CGAN is able to outperform the Historic Simulation, a popular method in market risk analysis for the calculation of VaR. CGAN can also be applied in the economic time series modeling and forecasting, and an example of hypothetical shock analysis for economic models and the generation of potential CCAR scenarios by CGAN is given at the end of the paper.
△ Less
Submitted 25 April, 2019;
originally announced April 2019.
-
On Lossless Causal Compression of Periodic Signals
Authors:
Jan Maximilian Montenbruck,
Shen Zeng
Abstract:
We present and study a scheme for lossless causal compression of periodic real-valued signals. In particular, our technique compresses a vector-valued signal to a scalar-valued signal by mixing it with another periodic signal. The conditions for being able to reconstruct the original signal then amount to certain non-resonances between the periods of the two signals. The proposed compression schem…
▽ More
We present and study a scheme for lossless causal compression of periodic real-valued signals. In particular, our technique compresses a vector-valued signal to a scalar-valued signal by mixing it with another periodic signal. The conditions for being able to reconstruct the original signal then amount to certain non-resonances between the periods of the two signals. The proposed compression scheme turns out to implicitly be inherent to communication networks with round-robin scheduling and digital photography with active pixel sensors.
△ Less
Submitted 6 December, 2018;
originally announced December 2018.