-
Sparse-view Signal-domain Photoacoustic Tomography Reconstruction Method Based on Neural Representation
Authors:
Bowei Yao,
Yi Zeng,
Haizhao Dai,
Qing Wu,
Youshen Xiao,
Fei Gao,
Yuyao Zhang,
**gyi Yu,
Xiran Cai
Abstract:
Photoacoustic tomography is a hybrid biomedical technology, which combines the advantages of acoustic and optical imaging. However, for the conventional image reconstruction method, the image quality is affected obviously by artifacts under the condition of sparse sampling. in this paper, a novel model-based sparse reconstruction method via implicit neural representation was proposed for improving…
▽ More
Photoacoustic tomography is a hybrid biomedical technology, which combines the advantages of acoustic and optical imaging. However, for the conventional image reconstruction method, the image quality is affected obviously by artifacts under the condition of sparse sampling. in this paper, a novel model-based sparse reconstruction method via implicit neural representation was proposed for improving the image quality reconstructed from sparse data. Specially, the initial acoustic pressure distribution was modeled as a continuous function of spatial coordinates, and parameterized by a multi-layer perceptron. The weights of multi-layer perceptron were determined by training the network in self-supervised manner. And the total variation regularization term was used to offer the prior knowledge. We compared our result with some ablation studies, and the results show that out method outperforms existing methods on simulation and experimental data. Under the sparse sampling condition, our method can suppress the artifacts and avoid the ill-posed problem effectively, which reconstruct images with higher signal-to-noise ratio and contrast-to-noise ratio than traditional methods. The high-quality results for sparse data make the proposed method hold the potential for further decreasing the hardware cost of photoacoustic tomography system.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Low-Overhead Channel Estimation via 3D Extrapolation for TDD mmWave Massive MIMO Systems Under High-Mobility Scenarios
Authors:
Binggui Zhou,
Xi Yang,
Shaodan Ma,
Feifei Gao,
Guanghua Yang
Abstract:
In TDD mmWave massive MIMO systems, the downlink CSI can be attained through uplink channel estimation thanks to the uplink-downlink channel reciprocity. However, the channel aging issue is significant under high-mobility scenarios and thus necessitates frequent uplink channel estimation. In addition, large amounts of antennas and subcarriers lead to high-dimensional CSI matrices, aggravating the…
▽ More
In TDD mmWave massive MIMO systems, the downlink CSI can be attained through uplink channel estimation thanks to the uplink-downlink channel reciprocity. However, the channel aging issue is significant under high-mobility scenarios and thus necessitates frequent uplink channel estimation. In addition, large amounts of antennas and subcarriers lead to high-dimensional CSI matrices, aggravating the pilot training overhead. To systematically reduce the pilot overhead, a spatial, frequency, and temporal domain (3D) channel extrapolation framework is proposed in this paper. Considering the marginal effects of pilots in the spatial and frequency domains and the effectiveness of traditional knowledge-driven channel estimation methods, we first propose a knowledge-and-data driven spatial-frequency channel extrapolation network (KDD-SFCEN) for uplink channel estimation by exploiting the least square estimator for coarse channel estimation and joint spatial-frequency channel extrapolation to reduce the spatial-frequency domain pilot overhead. Then, resorting to the uplink-downlink channel reciprocity and temporal domain dependencies of downlink channels, a temporal uplink-downlink channel extrapolation network (TUDCEN) is proposed for slot-level channel extrapolation, aiming to enlarge the pilot signal period and thus reduce the temporal domain pilot overhead under high-mobility scenarios. Specifically, we propose the spatial-frequency sampling embedding module to reduce the representation dimension and consequent computational complexity, and we propose to exploit the autoregressive generative Transformer for generating downlink channels autoregressively. Numerical results demonstrate the superiority of the proposed framework in significantly reducing the pilot training overhead by more than 16 times and improving the system's spectral efficiency under high-mobility scenarios.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Dual-Stream Attention Network for Hyperspectral Image Unmixing
Authors:
Yufang Wang,
Wenmin Wu,
Lin Qi,
Feng Gao
Abstract:
Hyperspectral image (HSI) contains abundant spatial and spectral information, making it highly valuable for unmixing. In this paper, we propose a Dual-Stream Attention Network (DSANet) for HSI unmixing. The endmembers and abundance of a pixel in HSI have high correlations with its adjacent pixels. Therefore, we adopt a "many to one" strategy to estimate the abundance of the central pixel. In addit…
▽ More
Hyperspectral image (HSI) contains abundant spatial and spectral information, making it highly valuable for unmixing. In this paper, we propose a Dual-Stream Attention Network (DSANet) for HSI unmixing. The endmembers and abundance of a pixel in HSI have high correlations with its adjacent pixels. Therefore, we adopt a "many to one" strategy to estimate the abundance of the central pixel. In addition, we adopt multiview spectral method, dividing spectral bands into multiple partitions with low correlations to estimate abundances. To aggregate the estimated abundances for complementary from the two branches, we design a cross-fusion attention network to enhance valuable information. Extensive experiments have been conducted on two real datasets, which demonstrate the effectiveness of our DSANet.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Sparse Focus Network for Multi-Source Remote Sensing Data Classification
Authors:
Xuepeng **,
Junyan Lin,
Feng Gao,
Lin Qi,
Yang Zhou
Abstract:
Multi-source remote sensing data classification has emerged as a prominent research topic with the advancement of various sensors. Existing multi-source data classification methods are susceptible to irrelevant information interference during multi-source feature extraction and fusion. To solve this issue, we propose a sparse focus network for multi-source data classification. Sparse attention is…
▽ More
Multi-source remote sensing data classification has emerged as a prominent research topic with the advancement of various sensors. Existing multi-source data classification methods are susceptible to irrelevant information interference during multi-source feature extraction and fusion. To solve this issue, we propose a sparse focus network for multi-source data classification. Sparse attention is employed in Transformer block for HSI and SAR/LiDAR feature extraction, thereby the most useful self-attention values are maintained for better feature aggregation. Furthermore, cross-attention is used to enhance multi-source feature interactions, and further improves the efficiency of cross-modal feature fusion. Experimental results on the Berlin and Houston2018 datasets highlight the effectiveness of SF-Net, outperforming existing state-of-the-art methods.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Arctic Sea Ice Image Super-Resolution Based on Multi-Scale Convolution and Dual-Gating Mechanism
Authors:
Zhaomin Fang,
Wankun Chen,
Feng Gao,
Yanhai Gan,
Junyu Dong,
Yang Zhou
Abstract:
Arctic Sea Ice Concentration (SIC) is the ratio of ice-covered area to the total sea area of the Arctic Ocean, which is a key indicator for maritime activities. Nowadays, we often use passive microwave images to display SIC, but it has low spatial resolution, and most of the existing super-resolution methods of Arctic SIC don't take the integration of spatial and channel features into account and…
▽ More
Arctic Sea Ice Concentration (SIC) is the ratio of ice-covered area to the total sea area of the Arctic Ocean, which is a key indicator for maritime activities. Nowadays, we often use passive microwave images to display SIC, but it has low spatial resolution, and most of the existing super-resolution methods of Arctic SIC don't take the integration of spatial and channel features into account and can't effectively integrate the multi-scale feature. To overcome the aforementioned issues, we propose MFM-Net for Arctic SIC super-resolution, which concurrently aggregates multi-scale information while integrating spatial and channel features. Extensive experiments on Arctic SIC dataset from the AMSR-E/AMSR-2 SIC DT-ASI products from Ocean University of China validate the effectiveness of porposed MFM-Net.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Boosting Spatial-Spectral Masked Auto-Encoder Through Mining Redundant Spectra for HSI-SAR/LiDAR Classification
Authors:
Junyan Lin,
Xuepeng **,
Feng Gao,
Junyu Dong,
Hui Yu
Abstract:
Although recent masked image modeling (MIM)-based HSI-LiDAR/SAR classification methods have gradually recognized the importance of the spectral information, they have not adequately addressed the redundancy among different spectra, resulting in information leakage during the pretraining stage. This issue directly impairs the representation ability of the model. To tackle the problem, we propose a…
▽ More
Although recent masked image modeling (MIM)-based HSI-LiDAR/SAR classification methods have gradually recognized the importance of the spectral information, they have not adequately addressed the redundancy among different spectra, resulting in information leakage during the pretraining stage. This issue directly impairs the representation ability of the model. To tackle the problem, we propose a new strategy, named Mining Redundant Spectra (MRS). Unlike randomly masking spectral bands, MRS selectively masks them by similarity to increase the reconstruction difficulty. Specifically, a random spectral band is chosen during pretraining, and the selected and highly similar bands are masked. Experimental results demonstrate that employing the MRS strategy during the pretraining stage effectively improves the accuracy of existing MIM-based methods on the Berlin and Houston 2018 datasets.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
LSKSANet: A Novel Architecture for Remote Sensing Image Semantic Segmentation Leveraging Large Selective Kernel and Sparse Attention Mechanism
Authors:
Miao Fu,
Feng Gao,
Ruzhuang Hua,
Yanhai Gan,
Xiaowei Zhou,
Yang Zhou
Abstract:
In this paper, we proposed large selective kernel and sparse attention network (LSKSANet) for remote sensing image semantic segmentation. The LSKSANet is a lightweight network that effectively combines convolution with sparse attention mechanisms. Specifically, we design large selective kernel module to decomposing the large kernel into a series of depth-wise convolutions with progressively increa…
▽ More
In this paper, we proposed large selective kernel and sparse attention network (LSKSANet) for remote sensing image semantic segmentation. The LSKSANet is a lightweight network that effectively combines convolution with sparse attention mechanisms. Specifically, we design large selective kernel module to decomposing the large kernel into a series of depth-wise convolutions with progressively increasing dilation rates, thereby expanding the receptive field without significantly increasing the computational burden. In addition, we introduce the sparse attention to keep the most useful self-attention values for better feature aggregation. Experimental results on the Vaihingen and Postdam datasets demonstrate the superior performance of the proposed LSKSANet over state-of-the-art methods.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Integrated Sensing and Communications Framework for 6G Networks
Authors:
Hongliang Luo,
Tengyu Zhang,
Chuanbin Zhao,
Yucong Wang,
Bo Lin,
Yuhua Jiang,
Dongqi Luo,
Feifei Gao
Abstract:
In this paper, we propose a novel integrated sensing and communications (ISAC) framework for the sixth generation (6G) mobile networks, in which we decompose the real physical world into static environment, dynamic targets, and various object materials. The ubiquitous static environment occupies the vast majority of the physical world, for which we design static environment reconstruction (SER) sc…
▽ More
In this paper, we propose a novel integrated sensing and communications (ISAC) framework for the sixth generation (6G) mobile networks, in which we decompose the real physical world into static environment, dynamic targets, and various object materials. The ubiquitous static environment occupies the vast majority of the physical world, for which we design static environment reconstruction (SER) scheme to obtain the layout and point cloud information of static buildings. The dynamic targets floating in static environments create the spatiotemporal transition of the physical world, for which we design comprehensive dynamic target sensing (DTS) scheme to detect, estimate, track, image and recognize the dynamic targets in real-time. The object materials enrich the electromagnetic laws of the physical world, for which we develop object material recognition (OMR) scheme to estimate the electromagnetic coefficient of the objects. Besides, to integrate these sensing functions into existing communications systems, we discuss the interference issues and corresponding solutions for ISAC cellular networks. Furthermore, we develop an ISAC hardware prototype platform that can reconstruct the environmental maps and sense the dynamic targets while maintaining communications services. With all these designs, the proposed ISAC framework can support multifarious emerging applications, such as digital twins, low altitude economy, internet of vehicles, marine management, deformation monitoring, etc.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Environment Sensing-aided Beam Prediction with Transfer Learning for Smart Factory
Authors:
Yuan Feng,
Chuanbing Zhao,
Feifei Gao,
Yong Zhang,
Shaodan Ma
Abstract:
In this paper, we propose an environment sensing-aided beam prediction model for smart factory that can be transferred from given environments to a new environment. In particular, we first design a pre-training model that predicts the optimal beam by sensing the present environmental information. When encountering a new environment, it generally requires collecting a large amount of new training d…
▽ More
In this paper, we propose an environment sensing-aided beam prediction model for smart factory that can be transferred from given environments to a new environment. In particular, we first design a pre-training model that predicts the optimal beam by sensing the present environmental information. When encountering a new environment, it generally requires collecting a large amount of new training data to retrain the model, whose cost severely impedes the application of the designed pre-training model. Therefore, we next design a transfer learning strategy that fine-tunes the pre-trained model by limited labeled data of the new environment. Simulation results show that when the pre-trained model is fine-tuned by 30\% of labeled data from the new environment, the Top-10 beam prediction accuracy reaches 94\%. Moreover, compared with the way to completely re-training the prediction model, the amount of training data and the time cost of the proposed transfer learning strategy reduce 70\% and 75\% respectively.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Electromagnetic Property Sensing in ISAC with Multiple Base Stations: Algorithm, Pilot Design,and Performance Analysis
Authors:
Yuhua Jiang,
Feifei Gao,
Shi **,
Tiejun Cui
Abstract:
Integrated sensing and communication (ISAC) has opened up numerous game-changing opportunities for future wireless systems. In this paper, we develop a novel scheme that utilizes orthogonal frequency division multiplexing (OFDM) pilot signals to sense the electromagnetic (EM) property of the target and thus identify the materials of the target. Specifically, we first establish an EM wave propagati…
▽ More
Integrated sensing and communication (ISAC) has opened up numerous game-changing opportunities for future wireless systems. In this paper, we develop a novel scheme that utilizes orthogonal frequency division multiplexing (OFDM) pilot signals to sense the electromagnetic (EM) property of the target and thus identify the materials of the target. Specifically, we first establish an EM wave propagation model with Maxwell equations, where the EM property of the target is captured by a closed-form expression of the channel. We then build the mathematical model for the relative permittivity and conductivity distribution (RPCD) within a predetermined region of interest shared by multiple base stations (BSs). Based on the EM wave propagation model, we propose an EM property sensing method, in which the RPCD can be reconstructed from compressive sensing techniques that exploits the joint sparsity structure of the EM property vector. We then develop a fusion algorithm to combine data from multiple BSs, which can enhance the reconstruction accuracy of EM property by efficiently integrating diverse measurements. Moreover, the fusion is performed at the feature level of RPCD and features low transmission overhead. We further design the pilot signals that can minimize the mutual coherence of the equivalent channels and enhance the diversity of incident EM wave patterns. Simulation results demonstrate the efficacy of the proposed method in achieving high-quality RPCD reconstruction and accurate material classification.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Environment Reconstruction based on Multi-User Selection and Multi-Modal Fusion in ISAC
Authors:
Bo Lin,
Chuanbin Zhao,
Feifei Gao,
Geoffrey Ye Li
Abstract:
Integrated sensing and communications (ISAC) has been deemed as a key technology for the sixth generation (6G) wireless communications systems. In this paper, we explore the inherent clustered nature of wireless users and design a multi-user based environment reconstruction scheme. Specifically, we first select users based on the estimation precision of channel's multipath, including the line-of-s…
▽ More
Integrated sensing and communications (ISAC) has been deemed as a key technology for the sixth generation (6G) wireless communications systems. In this paper, we explore the inherent clustered nature of wireless users and design a multi-user based environment reconstruction scheme. Specifically, we first select users based on the estimation precision of channel's multipath, including the line-of-sight (LOS) and the non-line-of-sight (NLOS) paths, to enhance the accuracy of environment reconstruction. Then, we develop a fusion strategy that merges communications signalling with camera image to increase the accuracy and robustness of environment reconstruction. The simulation results demonstrate that the proposed algorithm can achieve a remarkable sensing accuracy of centimeter level, which is about 17 times better than the scheme without user selection. Meanwhile, the fusion of communications data and vision data leads to a threefold accuracy improvement over the image only method, especially under challenging weather conditions like raining and snowing.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
STAR-RIS Aided Integrated Sensing and Communication over High Mobility Scenario
Authors:
Muye Li,
Shun Zhang,
Yao Ge,
Zan Li,
Feifei Gao,
**zhi Fan
Abstract:
Integrated sensing and communication (ISAC) has become a promising technology for future communication system. In this paper, we consider a millimeter wave system over high mobility scenario, and propose a novel simultaneous transmission and reflection reconfigurable intelligent surface (STAR-RIS) aided ISAC scheme. To improve the communication service of the in-vehicle user equipment (UE) and sim…
▽ More
Integrated sensing and communication (ISAC) has become a promising technology for future communication system. In this paper, we consider a millimeter wave system over high mobility scenario, and propose a novel simultaneous transmission and reflection reconfigurable intelligent surface (STAR-RIS) aided ISAC scheme. To improve the communication service of the in-vehicle user equipment (UE) and simultaneously track and sense the vehicle with the help of nearby roadside units (RSUs), a STAR-RIS is equipped on the outside surface of the vehicle. Firstly, an efficient transmission structure is developed, where a number of training sequences with orthogonal precoders and combiners are respectively utilized at BS and RSUs for channel parameter extraction. Then, the near-field static channel model between the STAR-RIS and in-vehicle UE as well as the far-field time-frequency selective BS-RIS-RSUs channel model are characterized. By utilizing the multidimensional orthogonal matching pursuit (MOMP) algorithm, the cascaded channel parameters of the BS-RIS-RSUs links can be obtained at the RSUs. Thus, the vehicle localization and its velocity measurement can be acquired by jointly utilizing these extracted cascaded channel parameters of all RSUs. Note that the MOMP algorithm can be further utilized to extract the channel parameters of the BS-RIS-UE link for communication. With the help of sensing results, the phase shifts of the STAR-RIS are delicately designed, which can significantly improve the received signal strength for both the RSUs and the in-vehicle UE, and can finally enhance the sensing and communication performance. Moreover, the trade-off for sensing and communication is designed by optimizing the energy splitting factors of the STAR-RIS. Finally, simulation results are provided to validate the feasibility and effectiveness of our proposed STAR-RIS aided ISAC scheme.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference
Authors:
Fan Zhang,
Zhaohan Wang,
Xin Lyu,
Siyuan Zhao,
Mengjian Li,
Weidong Geng,
Naye Ji,
Hui Du,
Fuxing Gao,
Hao Wu,
Shunman Li
Abstract:
Speech-driven gesture generation is an emerging field within virtual human creation. However, a significant challenge lies in accurately determining and processing the multitude of input features (such as acoustic, semantic, emotional, personality, and even subtle unknown features). Traditional approaches, reliant on various explicit feature inputs and complex multimodal processing, constrain the…
▽ More
Speech-driven gesture generation is an emerging field within virtual human creation. However, a significant challenge lies in accurately determining and processing the multitude of input features (such as acoustic, semantic, emotional, personality, and even subtle unknown features). Traditional approaches, reliant on various explicit feature inputs and complex multimodal processing, constrain the expressiveness of resulting gestures and limit their applicability. To address these challenges, we present Persona-Gestor, a novel end-to-end generative model designed to generate highly personalized 3D full-body gestures solely relying on raw speech audio. The model combines a fuzzy feature extractor and a non-autoregressive Adaptive Layer Normalization (AdaLN) transformer diffusion architecture. The fuzzy feature extractor harnesses a fuzzy inference strategy that automatically infers implicit, continuous fuzzy features. These fuzzy features, represented as a unified latent feature, are fed into the AdaLN transformer. The AdaLN transformer introduces a conditional mechanism that applies a uniform function across all tokens, thereby effectively modeling the correlation between the fuzzy features and the gesture sequence. This module ensures a high level of gesture-speech synchronization while preserving naturalness. Finally, we employ the diffusion model to train and infer various gestures. Extensive subjective and objective evaluations on the Trinity, ZEGGS, and BEAT datasets confirm our model's superior performance to the current state-of-the-art approaches. Persona-Gestor improves the system's usability and generalization capabilities, setting a new benchmark in speech-driven gesture synthesis and broadening the horizon for virtual human technology. Supplementary videos and code can be accessed at https://zf223669.github.io/Diffmotion-v2-website/
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising
Authors:
Shuai Hu,
Feng Gao,
Xiaowei Zhou,
Junyu Dong,
Qian Du
Abstract:
Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data. However, simultaneously modeling global and local features is rarely explored to enhance HSI denoising. In this letter, we propose a hybrid convolution and attention network (HCANet), which leverages both the strengths of convolution neural networks (CNNs) and Transformers. To enhan…
▽ More
Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data. However, simultaneously modeling global and local features is rarely explored to enhance HSI denoising. In this letter, we propose a hybrid convolution and attention network (HCANet), which leverages both the strengths of convolution neural networks (CNNs) and Transformers. To enhance the modeling of both global and local features, we have devised a convolution and attention fusion module aimed at capturing long-range dependencies and neighborhood spectral correlations. Furthermore, to improve multi-scale information aggregation, we design a multi-scale feed-forward network to enhance denoising performance by extracting features at different scales. Experimental results on mainstream HSI datasets demonstrate the rationality and effectiveness of the proposed HCANet. The proposed model is effective in removing various types of complex noise. Our codes are available at \url{https://github.com/summitgao/HCANet}.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
6D Radar Sensing and Tracking in Monostatic Integrated Sensing and Communications System
Authors:
Hongliang Luo,
Feifei Gao,
Fan Liu,
Shi **
Abstract:
In this paper, we propose a novel scheme for sixdimensional (6D) radar sensing and tracking of dynamic target based on multiple input and multiple output (MIMO) array for monostatic integrated sensing and communications (ISAC) system. Unlike most existing ISAC studies believing that only the radial velocity of far-field dynamic target can be measured based on one single base station (BS), we find…
▽ More
In this paper, we propose a novel scheme for sixdimensional (6D) radar sensing and tracking of dynamic target based on multiple input and multiple output (MIMO) array for monostatic integrated sensing and communications (ISAC) system. Unlike most existing ISAC studies believing that only the radial velocity of far-field dynamic target can be measured based on one single base station (BS), we find that the sensing echo channel of MIMO-ISAC system actually includes the distance, horizontal angle, pitch angle, radial velocity, horizontal angular velocity, and pitch angular velocity of the dynamic target. Thus we may fully rely on one single BS to estimate the dynamic target's 6D motion parameters from the sensing echo signals. Specifically, we first propose the long-term motion and short-term motion model of dynamic target, in which the short-term motion model serves the single-shot sensing of dynamic target, while the long-term motion model serves multiple-shots tracking of dynamic target. As a step further, we derive the sensing channel model corresponding to the short-term motion. Next, for singleshot sensing, we employ the array signal processing methods to estimate the dynamic target's horizontal angle, pitch angle, distance, and virtual velocity. By realizing that the virtual velocities observed by different antennas are different, we adopt plane fitting to estimate the radial velocity, horizontal angular velocity, and pitch angular velocity of dynamic target. Furthermore, we implement the multiple-shots tracking of dynamic target based on each single-shot sensing results and Kalman filtering. Simulation results demonstrate the effectiveness of the proposed 6D radar sensing and tracking scheme.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
Electromagnetic Property Sensing: A New Paradigm of Integrated Sensing and Communication
Authors:
Yuhua Jiang,
Feifei Gao,
Shi **
Abstract:
Integrated sensing and communication (ISAC) has opened up numerous game-changing opportunities for future wireless systems. In this paper, we develop a novel scheme that utilizes orthogonal frequency division multiplexing (OFDM) pilot signals in ISAC systems to sense the electromagnetic (EM) property of the target and thus also identify the material of the target. Specifically, we first establish…
▽ More
Integrated sensing and communication (ISAC) has opened up numerous game-changing opportunities for future wireless systems. In this paper, we develop a novel scheme that utilizes orthogonal frequency division multiplexing (OFDM) pilot signals in ISAC systems to sense the electromagnetic (EM) property of the target and thus also identify the material of the target. Specifically, we first establish an end-to-end EM propagation model by means of Maxwell equations, where the EM property of the target is captured by a closed-form expression of the ISAC channel, incorporating the Lippmann-Schwinger equation and the method of moments (MOM) for discretization. We then model the relative permittivity and conductivity distribution (RPCD) within a specified detection region. Based on the sensing model, we introduce a multi-frequency-based EM property sensing method by which the RPCD can be reconstructed from compressive sensing techniques that exploits the joint sparsity structure of the EM property vector. To improve the sensing accuracy, we design a beamforming strategy from the communications transmitter based on the Born approximation that can minimize the mutual coherence of the sensing matrix. The optimization problem is cast in terms of the Gram matrix and is solved iteratively to obtain the optimal beamforming matrix. Simulation results demonstrate the efficacy of the proposed method in achieving high-quality RPCD reconstruction and accurate material classification. Furthermore, improvements in RPCD reconstruction quality and material classification accuracy are observed with increased signal-to-noise ratio (SNR) or reduced target-transmitter distance.
△ Less
Submitted 23 January, 2024; v1 submitted 27 December, 2023;
originally announced December 2023.
-
A Low-Overhead Incorporation-Extrapolation based Few-Shot CSI Feedback Framework for Massive MIMO Systems
Authors:
Binggui Zhou,
Xi Yang,
**tao Wang,
Shaodan Ma,
Feifei Gao,
Guanghua Yang
Abstract:
Accurate channel state information (CSI) is essential for downlink precoding in frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems with orthogonal frequency-division multiplexing (OFDM). However, obtaining CSI through feedback from the user equipment (UE) becomes challenging with the increasing scale of antennas and subcarriers and leads to extremely high CSI…
▽ More
Accurate channel state information (CSI) is essential for downlink precoding in frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems with orthogonal frequency-division multiplexing (OFDM). However, obtaining CSI through feedback from the user equipment (UE) becomes challenging with the increasing scale of antennas and subcarriers and leads to extremely high CSI feedback overhead. Deep learning-based methods have emerged for compressing CSI but these methods generally require substantial collected samples and thus pose practical challenges. Moreover, existing deep learning methods also suffer from dramatically growing feedback overhead owing to their focus on full-dimensional CSI feedback. To address these issues, we propose a low-overhead Incorporation-Extrapolation based Few-Shot CSI feedback Framework (IEFSF) for massive MIMO systems. An incorporation-extrapolation scheme for eigenvector-based CSI feedback is proposed to reduce the feedback overhead. Then, to alleviate the necessity of extensive collected samples and enable few-shot CSI feedback, we further propose a knowledge-driven data augmentation (KDDA) method and an artificial intelligence-generated content (AIGC) -based data augmentation method by exploiting the domain knowledge of wireless channels and by exploiting a novel generative model, respectively. Experimental results based on the DeepMIMO dataset demonstrate that the proposed IEFSF significantly reduces CSI feedback overhead by 64 times compared with existing methods while maintaining higher feedback accuracy using only several hundred collected samples.
△ Less
Submitted 21 June, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Automated interpretation of congenital heart disease from multi-view echocardiograms
Authors:
**g Wang,
Xiaofeng Liu,
Fangyun Wang,
Lin Zheng,
Fengqiao Gao,
Hanwen Zhang,
Xin Zhang,
Wanqing Xie,
Binbin Wang
Abstract:
Congenital heart disease (CHD) is the most common birth defect and the leading cause of neonate death in China. Clinical diagnosis can be based on the selected 2D key-frames from five views. Limited by the availability of multi-view data, most methods have to rely on the insufficient single view analysis. This study proposes to automatically analyze the multi-view echocardiograms with a practical…
▽ More
Congenital heart disease (CHD) is the most common birth defect and the leading cause of neonate death in China. Clinical diagnosis can be based on the selected 2D key-frames from five views. Limited by the availability of multi-view data, most methods have to rely on the insufficient single view analysis. This study proposes to automatically analyze the multi-view echocardiograms with a practical end-to-end framework. We collect the five-view echocardiograms video records of 1308 subjects (including normal controls, ventricular septal defect (VSD) patients and atrial septal defect (ASD) patients) with both disease labels and standard-view key-frame labels. Depthwise separable convolution-based multi-channel networks are adopted to largely reduce the network parameters. We also approach the imbalanced class problem by augmenting the positive training samples. Our 2D key-frame model can diagnose CHD or negative samples with an accuracy of 95.4\%, and in negative, VSD or ASD classification with an accuracy of 92.3\%. To further alleviate the work of key-frame selection in real-world implementation, we propose an adaptive soft attention scheme to directly explore the raw video data. Four kinds of neural aggregation methods are systematically investigated to fuse the information of an arbitrary number of frames in a video. Moreover, with a view detection module, the system can work without the view records. Our video-based model can diagnose with an accuracy of 93.9\% (binary classification), and 92.1\% (3-class classification) in a collected 2D video testing set, which does not need key-frame selection and view annotation in testing. The detailed ablation study and the interpretability analysis are provided.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
SS-MAE: Spatial-Spectral Masked Auto-Encoder for Multi-Source Remote Sensing Image Classification
Authors:
Junyan Lin,
Feng Gao,
Xiaocheng Shi,
Junyu Dong,
Qian Du
Abstract:
Masked image modeling (MIM) is a highly popular and effective self-supervised learning method for image understanding. Existing MIM-based methods mostly focus on spatial feature modeling, neglecting spectral feature modeling. Meanwhile, existing MIM-based methods use Transformer for feature extraction, some local or high-frequency information may get lost. To this end, we propose a spatial-spectra…
▽ More
Masked image modeling (MIM) is a highly popular and effective self-supervised learning method for image understanding. Existing MIM-based methods mostly focus on spatial feature modeling, neglecting spectral feature modeling. Meanwhile, existing MIM-based methods use Transformer for feature extraction, some local or high-frequency information may get lost. To this end, we propose a spatial-spectral masked auto-encoder (SS-MAE) for HSI and LiDAR/SAR data joint classification. Specifically, SS-MAE consists of a spatial-wise branch and a spectral-wise branch. The spatial-wise branch masks random patches and reconstructs missing pixels, while the spectral-wise branch masks random spectral channels and reconstructs missing channels. Our SS-MAE fully exploits the spatial and spectral representations of the input data. Furthermore, to complement local features in the training stage, we add two lightweight CNNs for feature extraction. Both global and local features are taken into account for feature modeling. To demonstrate the effectiveness of the proposed SS-MAE, we conduct extensive experiments on three publicly available datasets. Extensive experiments on three multi-source datasets verify the superiority of our SS-MAE compared with several state-of-the-art baselines. The source codes are available at \url{https://github.com/summitgao/SS-MAE}.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Moving Target Sensing for ISAC Systems in Clutter Environment
Authors:
Dongqi Luo,
Huihui Wu,
Hongliang Luo,
Bo Lin,
Feifei Gao
Abstract:
In this paper, we consider the moving target sensing problem for integrated sensing and communication (ISAC) systems in clutter environment. Scatterers produce strong clutter, deteriorating the performance of ISAC systems in practice. Given that scatterers are typically stationary and the targets of interest are usually moving, we here focus on sensing the moving targets. Specifically, we adopt a…
▽ More
In this paper, we consider the moving target sensing problem for integrated sensing and communication (ISAC) systems in clutter environment. Scatterers produce strong clutter, deteriorating the performance of ISAC systems in practice. Given that scatterers are typically stationary and the targets of interest are usually moving, we here focus on sensing the moving targets. Specifically, we adopt a scanning beam to search for moving target candidates. For the received signal in each scan, we employ high-pass filtering in the Doppler domain to suppress the clutter within the echo, thereby identifying candidate moving targets according to the power of filtered signal. Then, we adopt root-MUSIC-based algorithms to estimate the angle, range, and radial velocity of these candidate moving targets. Subsequently, we propose a target detection algorithm to reject false targets. Simulation results validate the effectiveness of these proposed methods.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Integrated Sensing and Communications in Clutter Environment
Authors:
Hongliang Luo,
Yucong Wang,
Dongqi Luo,
Jianwei Zhao,
Huihui Wu,
Shaodan Ma,
Feifei Gao
Abstract:
In this paper, we propose a practical integrated sensing and communications (ISAC) framework to sense dynamic targets from clutter environment while ensuring users communications quality. To implement communications function and sensing function simultaneously, we design multiple communications beams that can communicate with the users as well as one sensing beam that can rotate and scan the entir…
▽ More
In this paper, we propose a practical integrated sensing and communications (ISAC) framework to sense dynamic targets from clutter environment while ensuring users communications quality. To implement communications function and sensing function simultaneously, we design multiple communications beams that can communicate with the users as well as one sensing beam that can rotate and scan the entire space. To minimize the interference of sensing beam on existing communications systems, we divide the service area into sensing beam for sensing (S4S) sector and communications beam for sensing (C4S) sector, and provide beamforming design and power allocation optimization strategies for each type sector. Unlike most existing ISAC studies that ignore the interference of static environmental clutter on target sensing, we construct a mixed sensing channel model that includes both static environment and dynamic targets. When base station receives the echo signals, the mean phasor cancellation (MPC) method is employed to filter out the interference from static environmental clutter and to extract the effective dynamic target echoes. Then a complete and practical dynamic target sensing scheme is designed to detect the presence of dynamic targets and to estimate their angles, distances, and velocities. In particular, dynamic target detection and angle estimation are realized through angle-Doppler spectrum estimation (ADSE) and joint detection over multiple subcarriers (MSJD), while distance and velocity estimation are realized through the extended subspace algorithm. Simulation results demonstrate the effectiveness of the proposed scheme and its superiority over the existing methods that ignore environmental clutter.
△ Less
Submitted 5 February, 2024; v1 submitted 2 November, 2023;
originally announced November 2023.
-
Experimental Results of Underwater Sound Speed Profile Inversion by Few-shot Multi-task Learning
Authors:
Wei Huang,
Fan Gao,
Junting Wang,
Hao Zhang
Abstract:
Underwater Sound Speed Profile (SSP) distribution has great influence on the propagation mode of acoustic signal, thus the fast and accurate estimation of SSP is of great importance in building underwater observation systems. The state-of-the-art SSP inversion methods include frameworks of matched field processing (MFP), compressive sensing (CS), and feedforeward neural networks (FNN), among which…
▽ More
Underwater Sound Speed Profile (SSP) distribution has great influence on the propagation mode of acoustic signal, thus the fast and accurate estimation of SSP is of great importance in building underwater observation systems. The state-of-the-art SSP inversion methods include frameworks of matched field processing (MFP), compressive sensing (CS), and feedforeward neural networks (FNN), among which the FNN shows better real-time performance while maintain the same level of accuracy. However, the training of FNN needs quite a lot historical SSP samples, which is diffcult to be satisfied in many ocean areas. This situation is called few-shot learning. To tackle this issue, we propose a multi-task learning (MTL) model with partial parameter sharing among different traning tasks. By MTL, common features could be extracted, thus accelerating the learning process on given tasks, and reducing the demand for reference samples, so as to enhance the generalization ability in few-shot learning. To verify the feasibility and effectiveness of MTL, a deep-ocean experiment was held in April 2023 at the South China Sea. Results shows that MTL outperforms the state-of-the-art methods in terms of accuracy for SSP inversion, while inherits the real-time advantage of FNN during the inversion stage.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Underwater Sound Speed Profile Construction: A Review
Authors:
Wei Huang,
Jixuan Zhou,
Fan Gao,
Jiajun Lu,
Sijia Li,
Pengfei Wu,
Junting Wang,
Hao Zhang,
Tianhe Xu
Abstract:
Real--time and accurate construction of regional sound speed profiles (SSP) is important for building underwater positioning, navigation, and timing (PNT) systems as it greatly affect the signal propagation modes such as trajectory. In this paper, we summarizes and analyzes the current research status in the field of underwater SSP construction, and the mainstream methods include direct SSP measur…
▽ More
Real--time and accurate construction of regional sound speed profiles (SSP) is important for building underwater positioning, navigation, and timing (PNT) systems as it greatly affect the signal propagation modes such as trajectory. In this paper, we summarizes and analyzes the current research status in the field of underwater SSP construction, and the mainstream methods include direct SSP measurement and SSP inversion. In the direct measurement method, we compare the performance of popular international commercial temperature, conductivity, and depth profilers (CTD). While for the inversion methods, the framework and basic principles of matched field processing (MFP), compressive sensing (CS), and deep learning (DL) for constructing SSP are introduced, and their advantages and disadvantages are compared. The traditional direct measurement method has good accuracy performance, but it usually takes a long time. The proposal of SSP inversion method greatly improves the convenience and real--time performance, but the accuracy is not as good as the direct measurement method. Currently, the SSP inversion relies on sonar observation data, making it difficult to apply to areas that couldn't be covered by underwater observation systems, and these methods are unable to predict the distribution of sound velocity at future times. How to comprehensively utilize multi-source data and provide elastic sound velocity distribution estimation services with different accuracy and real-time requirements for underwater users without sonar observation data is the mainstream trend in future research on SSP construction.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Fast Ray-Tracing-Based Precise Underwater Acoustic Localization without Prior Acknowledgment of Target Depth
Authors:
Wei Huang,
Hao Zhang,
Kaitao Meng,
Fan Gao,
Wenzhou Sun,
Jianxu Shu,
Tianhe Xu,
Deshi Li
Abstract:
Underwater localization is of great importance for marine observation and building positioning, navigation, timing (PNT) systems that could be widely applied in disaster warning, underwater rescues and resources exploration. The uneven distribution of underwater sound velocity poses great challenge for precise underwater positioning. The current soundline correction positioning method mainly aims…
▽ More
Underwater localization is of great importance for marine observation and building positioning, navigation, timing (PNT) systems that could be widely applied in disaster warning, underwater rescues and resources exploration. The uneven distribution of underwater sound velocity poses great challenge for precise underwater positioning. The current soundline correction positioning method mainly aims at scenarios with known target depth. However, for nodes that are non-cooperative nodes or lack of depth information, soundline tracking strategies cannot work well due to nonunique positional solutions. To tackle this issue, we propose an iterative ray tracing 3D underwater localization (IRTUL) method for stratification compensation. To demonstrate the feasibility of fast stratification compensation, we first derive the signal path as a function of glancing angle, and then prove that the signal propagation time and horizontal propagation distance are monotonic functions of the initial grazing angle, so that fast ray tracing can be achieved. Then, we propose an sound velocity profile (SVP) simplification method, which reduces the computational cost of ray tracing. Experimental results show that the IRTUL has the most significant distance correction in the depth direction, and the average accuracy of IRTUL has been improved by about 3 meters compared to localization model with constant sound velocity. Also, the simplified SVP can significantly improve real-time performance with average accuracy loss less than 0.2 m when used for positioning.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Beam Squint Assisted User Localization in Near-Field Integrated Sensing and Communications Systems
Authors:
Hongliang Luo,
Feifei Gao,
Wanmai Yuan,
Shun Zhang
Abstract:
Integrated sensing and communication (ISAC) has been regarded as a key technology for 6G wireless communications, in which large-scale multiple input and multiple output (MIMO) array with higher and wider frequency bands will be adopted. However, recent studies show that the beam squint phenomenon can not be ignored in wideband MIMO system, which generally deteriorates the communications performan…
▽ More
Integrated sensing and communication (ISAC) has been regarded as a key technology for 6G wireless communications, in which large-scale multiple input and multiple output (MIMO) array with higher and wider frequency bands will be adopted. However, recent studies show that the beam squint phenomenon can not be ignored in wideband MIMO system, which generally deteriorates the communications performance. In this paper, we find that with the aid of true-time-delay lines (TTDs), the range and trajectory of the beam squint in near-field communications systems can be freely controlled, and hence it is possible to reversely utilize the beam squint for user localization. We derive the trajectory equation for near-field beam squint points and design a way to control such trajectory. With the proposed design, beamforming from different subcarriers would purposely point to different angles and different distances, such that users from different positions would receive the maximum power at different subcarriers. Hence, one can simply localize multiple users from the beam squint effect in frequency domain, and thus reduce the beam swee** overhead as compared to the conventional time domain beam search based approach. Furthermore, we utilize the phase difference of the maximum power subcarriers received by the user at different frequencies in several times beam swee** to obtain a more accurate distance estimation result, ultimately realizing high accuracy and low beam swee** overhead user localization. Simulation results demonstrate the effectiveness of the proposed schemes.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Convolution and Attention Mixer for Synthetic Aperture Radar Image Change Detection
Authors:
Haopeng Zhang,
Zi**g Lin,
Feng Gao,
Junyu Dong,
Qian Du,
Heng-Chao Li
Abstract:
Synthetic aperture radar (SAR) image change detection is a critical task and has received increasing attentions in the remote sensing community. However, existing SAR change detection methods are mainly based on convolutional neural networks (CNNs), with limited consideration of global attention mechanism. In this letter, we explore Transformer-like architecture for SAR change detection to incorpo…
▽ More
Synthetic aperture radar (SAR) image change detection is a critical task and has received increasing attentions in the remote sensing community. However, existing SAR change detection methods are mainly based on convolutional neural networks (CNNs), with limited consideration of global attention mechanism. In this letter, we explore Transformer-like architecture for SAR change detection to incorporate global attention. To this end, we propose a convolution and attention mixer (CAMixer). First, to compensate the inductive bias for Transformer, we combine self-attention with shift convolution in a parallel way. The parallel design effectively captures the global semantic information via the self-attention and performs local feature extraction through shift convolution simultaneously. Second, we adopt a gating mechanism in the feed-forward network to enhance the non-linear feature transformation. The gating mechanism is formulated as the element-wise multiplication of two parallel linear layers. Important features can be highlighted, leading to high-quality representations against speckle noise. Extensive experiments conducted on three SAR datasets verify the superior performance of the proposed CAMixer. The source codes will be publicly available at https://github.com/summitgao/CAMixer .
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
TSI-Net: A Timing Sequence Image Segmentation Network for Intracranial Artery Segmentation in Digital Subtraction Angiography
Authors:
Lemeng Wang,
Wentao Liu,
Wei** Xu,
Haoyuan Li,
Huihua Yang,
Feng Gao
Abstract:
Cerebrovascular disease is one of the major diseases facing the world today. Automatic segmentation of intracranial artery (IA) in digital subtraction angiography (DSA) sequences is an important step in the diagnosis of vascular related diseases and in guiding neurointerventional procedures. While, a single image can only show part of the IA within the contrast medium according to the imaging prin…
▽ More
Cerebrovascular disease is one of the major diseases facing the world today. Automatic segmentation of intracranial artery (IA) in digital subtraction angiography (DSA) sequences is an important step in the diagnosis of vascular related diseases and in guiding neurointerventional procedures. While, a single image can only show part of the IA within the contrast medium according to the imaging principle of DSA technology. Therefore, 2D DSA segmentation methods are unable to capture the complete IA information and treatment of cerebrovascular diseases. We propose A timing sequence image segmentation network with U-shape, called TSI-Net, which incorporates a bi-directional ConvGRU module (BCM) in the encoder. The network incorporates a bi-directional ConvGRU module (BCM) in the encoder, which can input variable-length DSA sequences, retain past and future information, segment them into 2D images. In addition, we introduce a sensitive detail branch (SDB) at the end for supervising fine vessels. Experimented on the DSA sequence dataset DIAS, the method performs significantly better than state-of-the-art networks in recent years. In particular, it achieves a Sen evaluation metric of 0.797, which is a 3% improvement compared to other methods.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Review of photoacoustic imaging plus X
Authors:
Daohuai Jiang,
Luyao Zhu,
Shangqing Tong,
Yuting Shen,
Feng Gao,
Fei Gao
Abstract:
Photoacoustic imaging (PAI) is a novel modality in biomedical imaging technology that combines the rich optical contrast with the deep penetration of ultrasound. To date, PAI technology has found applications in various biomedical fields. In this review, we present an overview of the emerging research frontiers on PAI plus other advanced technologies, named as PAI plus X, which includes but not li…
▽ More
Photoacoustic imaging (PAI) is a novel modality in biomedical imaging technology that combines the rich optical contrast with the deep penetration of ultrasound. To date, PAI technology has found applications in various biomedical fields. In this review, we present an overview of the emerging research frontiers on PAI plus other advanced technologies, named as PAI plus X, which includes but not limited to PAI plus treatment, PAI plus new circuits design, PAI plus accurate positioning system, PAI plus fast scanning systems, PAI plus novel ultrasound sensors, PAI plus advanced laser sources, PAI plus deep learning, and PAI plus other imaging modalities. We will discuss each technology's current state, technical advantages, and prospects for application, reported mostly in recent three years. Lastly, we discuss and summarize the challenges and potential future work in PAI plus X area.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
Authors:
Fan Zhang,
Naye Ji,
Fuxing Gao,
Siyuan Zhao,
Zhaohan Wang,
Shunman Li
Abstract:
The generation of co-speech gestures for digital humans is an emerging area in the field of virtual human creation. Prior research has made progress by using acoustic and semantic information as input and adopting classify method to identify the person's ID and emotion for driving co-speech gesture generation. However, this endeavour still faces significant challenges. These challenges go beyond t…
▽ More
The generation of co-speech gestures for digital humans is an emerging area in the field of virtual human creation. Prior research has made progress by using acoustic and semantic information as input and adopting classify method to identify the person's ID and emotion for driving co-speech gesture generation. However, this endeavour still faces significant challenges. These challenges go beyond the intricate interplay between co-speech gestures, speech acoustic, and semantics; they also encompass the complexities associated with personality, emotion, and other obscure but important factors. This paper introduces "diffmotion-v2," a speech-conditional diffusion-based and non-autoregressive transformer-based generative model with WavLM pre-trained model. It can produce individual and stylized full-body co-speech gestures only using raw speech audio, eliminating the need for complex multimodal processing and manually annotated. Firstly, considering that speech audio not only contains acoustic and semantic features but also conveys personality traits, emotions, and more subtle information related to accompanying gestures, we pioneer the adaptation of WavLM, a large-scale pre-trained model, to extract low-level and high-level audio information. Secondly, we introduce an adaptive layer norm architecture in the transformer-based layer to learn the relationship between speech information and accompanying gestures. Extensive subjective evaluation experiments are conducted on the Trinity, ZEGGS, and BEAT datasets to confirm the WavLM and the model's ability to synthesize natural co-speech gestures with various styles.
△ Less
Submitted 13 April, 2024; v1 submitted 11 August, 2023;
originally announced August 2023.
-
Achieving Covert Communication With A Probabilistic Jamming Strategy
Authors:
Xun Chen,
Fujun Gao,
Min Qiu,
Jia Zhang,
Feng Shu,
Shihao Yan
Abstract:
In this work, we consider a covert communication scenario, where a transmitter Alice communicates to a receiver Bob with the aid of a probabilistic and uninformed jammer against an adversary warden's detection. The transmission status and power of the jammer are random and follow some priori probabilities. We first analyze the warden's detection performance as a function of the jammer's transmissi…
▽ More
In this work, we consider a covert communication scenario, where a transmitter Alice communicates to a receiver Bob with the aid of a probabilistic and uninformed jammer against an adversary warden's detection. The transmission status and power of the jammer are random and follow some priori probabilities. We first analyze the warden's detection performance as a function of the jammer's transmission probability, transmit power distribution, and Alice's transmit power. We then maximize the covert throughput from Alice to Bob subject to a covertness constraint, by designing the covert communication strategies from three different perspectives: Alice's perspective, the jammer's perspective, and the global perspective. Our analysis reveals that the minimum jamming power should not always be zero in the probabilistic jamming strategy, which is different from that in the continuous jamming strategy presented in the literature. In addition, we prove that the minimum jamming power should be the same as Alice's covert transmit power, depending on the covertness and average jamming power constraints. Furthermore, our results show that the probabilistic jamming can outperform the continuous jamming in terms of achieving a higher covert throughput under the same covertness and average jamming power constraints.
△ Less
Submitted 29 August, 2023; v1 submitted 8 August, 2023;
originally announced August 2023.
-
MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression
Authors:
Wei Jiang,
Jiayu Yang,
Yongqi Zhai,
Feng Gao,
Ronggang Wang
Abstract:
Recently, learned image compression has achieved impressive performance. The entropy model, which estimates the distribution of the latent representation, plays a crucial role in enhancing rate-distortion performance. However, existing global context modules rely on computationally intensive quadratic complexity computations to capture global correlations. This quadratic complexity imposes limitat…
▽ More
Recently, learned image compression has achieved impressive performance. The entropy model, which estimates the distribution of the latent representation, plays a crucial role in enhancing rate-distortion performance. However, existing global context modules rely on computationally intensive quadratic complexity computations to capture global correlations. This quadratic complexity imposes limitations on the potential of high-resolution image coding. Moreover, effectively capturing local, global, and channel-wise contexts with acceptable even linear complexity within a single entropy model remains a challenge. To address these limitations, we propose the Linear Complexity Multi-Reference Entropy Model (MEM++). MEM++ effectively captures the diverse range of correlations inherent in the latent representation. Specifically, the latent representation is first divided into multiple slices. When compressing a particular slice, the previously compressed slices serve as its channel-wise contexts. To capture local contexts without sacrificing performance, we introduce a novel checkerboard attention module. Additionally, to capture global contexts, we propose the linear complexity attention-based global correlations capturing by leveraging the decomposition of the softmax operation. The attention map of the previously decoded slice is implicitly computed and employed to predict global correlations in the current slice. Based on MEM++, we propose image compression model MLIC++. Extensive experimental evaluations demonstrate that our MLIC++ achieves state-of-the-art performance, reducing BD-rate by 13.39% on the Kodak dataset compared to VTM-17.0 in PSNR. Furthermore, MLIC++ exhibits linear GPU memory consumption with resolution, making it highly suitable for high-resolution image coding. Code and pre-trained models are available at https://github.com/JiangWeibeta/MLIC.
△ Less
Submitted 19 February, 2024; v1 submitted 28 July, 2023;
originally announced July 2023.
-
Gradient-based adaptive wavelet de-noising method for photoacoustic imaging in vivo
Authors:
Xinke Li,
Peng Ge,
Yuting Shen,
Feng Gao,
Fei Gao
Abstract:
Photoacoustic imaging (PAI) has been applied to many biomedical applications over the past decades. However, the received PA signal usually suffers from poor signal-to-noise ratio (SNR). Conventional solution of employing higher-power laser, or doing long-time signal averaging, may raise the system cost, time consumption, and tissue damage. Another strategy is de-noising algorithm design. In this…
▽ More
Photoacoustic imaging (PAI) has been applied to many biomedical applications over the past decades. However, the received PA signal usually suffers from poor signal-to-noise ratio (SNR). Conventional solution of employing higher-power laser, or doing long-time signal averaging, may raise the system cost, time consumption, and tissue damage. Another strategy is de-noising algorithm design. In this paper, we propose a new de-noising method, termed gradient-based adaptive wavelet de-noising, which sets the energy gradient mutation point of low-frequency wavelet components as the threshold. We conducted simulation, ex vivo and in vivo experiments to validate the performance of the algorithm. The quality of de-noised PA image/signal by our proposed algorithm has improved by 20%-40%, in comparison to the traditional signal denoising algorithms, which produces better contrast and clearer details. The proposed de-noising method provides potential to improve the SNR of PA signal under single-shot low-power laser illumination for biomedical applications in vivo.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Machine-Learning-based Colorectal Tissue Classification via Acoustic Resolution Photoacoustic Microscopy
Authors:
Shangqing Tong,
Peng Ge,
Yanan Jiao,
Zhaofu Ma,
Ziye Li,
Longhai Liu,
Feng Gao,
Xiaohui Du,
Fei Gao
Abstract:
Colorectal cancer is a deadly disease that has become increasingly prevalent in recent years. Early detection is crucial for saving lives, but traditional diagnostic methods such as colonoscopy and biopsy have limitations. Colonoscopy cannot provide detailed information within the tissues affected by cancer, while biopsy involves tissue removal, which can be painful and invasive. In order to impro…
▽ More
Colorectal cancer is a deadly disease that has become increasingly prevalent in recent years. Early detection is crucial for saving lives, but traditional diagnostic methods such as colonoscopy and biopsy have limitations. Colonoscopy cannot provide detailed information within the tissues affected by cancer, while biopsy involves tissue removal, which can be painful and invasive. In order to improve diagnostic efficiency and reduce patient suffering, we studied machine-learningbased approach for colorectal tissue classification that uses acoustic resolution photoacoustic microscopy (ARPAM). With this tool, we were able to classify benign and malignant tissue using multiple machine learning methods. Our results were analyzed both quantitatively and qualitatively to evaluate the effectiveness of our approach.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Bundle-specific Tractogram Distribution Estimation Using Higher-order Streamline Differential Equation
Authors:
Yuan**g Feng,
Lei Xie,
**gqiang Wang,
Jianzhong He,
Fei Gao
Abstract:
Tractography traces the peak directions extracted from fiber orientation distribution (FOD) suffering from ambiguous spatial correspondences between diffusion directions and fiber geometry, which is prone to producing erroneous tracks while missing true positive connections. The peaks-based tractography methods 'locally' reconstructed streamlines in 'single to single' manner, thus lacking of globa…
▽ More
Tractography traces the peak directions extracted from fiber orientation distribution (FOD) suffering from ambiguous spatial correspondences between diffusion directions and fiber geometry, which is prone to producing erroneous tracks while missing true positive connections. The peaks-based tractography methods 'locally' reconstructed streamlines in 'single to single' manner, thus lacking of global information about the trend of the whole fiber bundle. In this work, we propose a novel tractography method based on a bundle-specific tractogram distribution function by using a higher-order streamline differential equation, which reconstructs the streamline bundles in 'cluster to cluster' manner. A unified framework for any higher-order streamline differential equation is presented to describe the fiber bundles with disjoint streamlines defined based on the diffusion tensor vector field. At the global level, the tractography process is simplified as the estimation of bundle-specific tractogram distribution (BTD) coefficients by minimizing the energy optimization model, and is used to characterize the relations between BTD and diffusion tensor vector under the prior guidance by introducing the tractogram bundle information to provide anatomic priors. Experiments are performed on simulated Hough, Sine, Circle data, ISMRM 2015 Tractography Challenge data, FiberCup data, and in vivo data from the Human Connectome Project (HCP) data for qualitative and quantitative evaluation. The results demonstrate that our approach can reconstruct the complex global fiber bundles directly. BTD reduces the error deviation and accumulation at the local level and shows better results in reconstructing long-range, twisting, and large fanning tracts.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Score-based Generative Models for Photoacoustic Image Reconstruction with Rotation Consistency Constraints
Authors:
Shangqing Tong,
Hengrong Lan,
Liming Nie,
Jianwen Luo,
Fei Gao
Abstract:
Photoacoustic tomography (PAT) is a newly emerged imaging modality which enables both high optical contrast and acoustic depth of penetration. Reconstructing images of photoacoustic tomography from limited amount of senser data is among one of the major challenges in photoacoustic imaging. Previous works based on deep learning were trained in supervised fashion, which directly map the input partia…
▽ More
Photoacoustic tomography (PAT) is a newly emerged imaging modality which enables both high optical contrast and acoustic depth of penetration. Reconstructing images of photoacoustic tomography from limited amount of senser data is among one of the major challenges in photoacoustic imaging. Previous works based on deep learning were trained in supervised fashion, which directly map the input partially known sensor data to the ground truth reconstructed from full field of view. Recently, score-based generative models played an increasingly significant role in generative modeling. Leveraging this probabilistic model, we proposed Rotation Consistency Constrained Score-based Generative Model (RCC-SGM), which recovers the PAT images by iterative sampling between Langevin dynamics and a constraint term utilizing the rotation consistency between the images and the measurements. Our proposed method can generalize to different measurement processes (32.29 PSNR with 16 measurements under random sampling, whereas 28.50 for supervised counterpart), while supervised methods need to train on specific inverse map**s.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
DIAS: A Dataset and Benchmark for Intracranial Artery Segmentation in DSA sequences
Authors:
Wentao Liu,
Tong Tian,
Lemeng Wang,
Wei** Xu,
Lei Li,
Haoyuan Li,
Wenyi Zhao,
Siyu Tian,
Xipeng Pan,
Huihua Yang,
Feng Gao,
Yiming Deng,
Xin Yang,
Ruisheng Su
Abstract:
The automated segmentation of Intracranial Arteries (IA) in Digital Subtraction Angiography (DSA) plays a crucial role in the quantification of vascular morphology, significantly contributing to computer-assisted stroke research and clinical practice. Current research primarily focuses on the segmentation of single-frame DSA using proprietary datasets. However, these methods face challenges due to…
▽ More
The automated segmentation of Intracranial Arteries (IA) in Digital Subtraction Angiography (DSA) plays a crucial role in the quantification of vascular morphology, significantly contributing to computer-assisted stroke research and clinical practice. Current research primarily focuses on the segmentation of single-frame DSA using proprietary datasets. However, these methods face challenges due to the inherent limitation of single-frame DSA, which only partially displays vascular contrast, thereby hindering accurate vascular structure representation. In this work, we introduce DIAS, a dataset specifically developed for IA segmentation in DSA sequences. We establish a comprehensive benchmark for evaluating DIAS, covering full, weak, and semi-supervised segmentation methods. Specifically, we propose the vessel sequence segmentation network, in which the sequence feature extraction module effectively captures spatiotemporal representations of intravascular contrast, achieving intracranial artery segmentation in 2D+Time DSA sequences. For weakly-supervised IA segmentation, we propose a novel scribble learning-based image segmentation framework, which, under the guidance of scribble labels, employs cross pseudo-supervision and consistency regularization to improve the performance of the segmentation network. Furthermore, we introduce the random patch-based self-training framework, aimed at alleviating the performance constraints encountered in IA segmentation due to the limited availability of annotated DSA data. Our extensive experiments on the DIAS dataset demonstrate the effectiveness of these methods as potential baselines for future research and clinical applications. The dataset and code are publicly available at https://doi.org/10.5281/zenodo.11396520 and https://github.com/lseventeen/DIAS.
△ Less
Submitted 13 June, 2024; v1 submitted 21 June, 2023;
originally announced June 2023.
-
Radar Sensing via OTFS Signaling
Authors:
Kecheng Zhang,
Zhongjie Li,
Weijie Yuan,
Yunlong Cai,
Feifei Gao
Abstract:
By multiplexing information symbols in the delay-Doppler (DD) domain, orthogonal time frequency space (OTFS) is a promising candidate for future wireless communication in high-mobility scenarios. In addition to the superior communication performance, OTFS is also a natural choice for radar sensing since the primary parameters (range and velocity of targets) in radar signal processing can be inferr…
▽ More
By multiplexing information symbols in the delay-Doppler (DD) domain, orthogonal time frequency space (OTFS) is a promising candidate for future wireless communication in high-mobility scenarios. In addition to the superior communication performance, OTFS is also a natural choice for radar sensing since the primary parameters (range and velocity of targets) in radar signal processing can be inferred directly from the delay and Doppler shifts. Though there are several works on OTFS radar sensing, most of them consider the integer parameter estimation only, while the delay and Doppler shifts are usually fractional in the real world. In this paper, we propose a two-step method to estimate the fractional delay and Doppler shifts. We first perform the two-dimensional (2D) correlation between the received and transmitted DD domain symbols to obtain the integer parts of the parameters. Then a difference-based method is implemented to estimate the fractional parts of delay and Doppler indices. Meanwhile, we implement a target detection method based on a generalized likelihood ratio test since the number of potential targets in the sensing scenario is usually unknown. The simulation results show that the proposed method can obtain the delay and Doppler shifts accurately and get the number of sensing targets with a high detection probability.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
LF-PGVIO: A Visual-Inertial-Odometry Framework for Large Field-of-View Cameras using Points and Geodesic Segments
Authors:
Ze Wang,
Kailun Yang,
Hao Shi,
Yufan Zhang,
Zhijie Xu,
Fei Gao,
Kaiwei Wang
Abstract:
In this paper, we propose LF-PGVIO, a Visual-Inertial-Odometry (VIO) framework for large Field-of-View (FoV) cameras with a negative plane using points and geodesic segments. The purpose of our research is to unleash the potential of point-line odometry with large-FoV omnidirectional cameras, even for cameras with negative-plane FoV. To achieve this, we propose an Omnidirectional Curve Segment Det…
▽ More
In this paper, we propose LF-PGVIO, a Visual-Inertial-Odometry (VIO) framework for large Field-of-View (FoV) cameras with a negative plane using points and geodesic segments. The purpose of our research is to unleash the potential of point-line odometry with large-FoV omnidirectional cameras, even for cameras with negative-plane FoV. To achieve this, we propose an Omnidirectional Curve Segment Detection (OCSD) method combined with a camera model which is applicable to images with large distortions, such as panoramic annular images, fisheye images, and various panoramic images. The geodesic segment is sliced into multiple straight-line segments based on the radian and descriptors are extracted and recombined. Descriptor matching establishes the constraint relationship between 3D line segments in multiple frames. In our VIO system, line feature residual is also extended to support large-FoV cameras. Extensive evaluations on public datasets demonstrate the superior accuracy and robustness of LF-PGVIO compared to state-of-the-art methods. The source code will be made publicly available at https://github.com/flysoaryun/LF-PGVIO.
△ Less
Submitted 11 March, 2024; v1 submitted 11 June, 2023;
originally announced June 2023.
-
Complex CNN CSI Enhancer for Integrated Sensing and Communications
Authors:
Xu Chen,
Zhiyong Feng,
J. Andrew Zhang,
Feifei Gao,
Xin Yuan,
Zhaohui Yang,
** Zhang
Abstract:
In this paper, we propose a novel complex convolutional neural network (CNN) CSI enhancer for integrated sensing and communications (ISAC), which exploits the correlation between the sensing parameters (such as angle-of-arrival and range) and the channel state information (CSI) to significantly improve the CSI estimation accuracy and further enhance the sensing accuracy. Within the CNN CSI enhance…
▽ More
In this paper, we propose a novel complex convolutional neural network (CNN) CSI enhancer for integrated sensing and communications (ISAC), which exploits the correlation between the sensing parameters (such as angle-of-arrival and range) and the channel state information (CSI) to significantly improve the CSI estimation accuracy and further enhance the sensing accuracy. Within the CNN CSI enhancer, we use the complex-valued computation layers to form the CNN, which maintains the phase information of CSI. We also transform the CSI into the sparse angle-delay domain, leading to heatmap images with prominent peaks that can be efficiently processed by CNN. Based on the enhanced CSI outputs, we further propose a novel biased fast Fourier transform (FFT)-based sensing scheme for improving the range sensing accuracy, by artificially introducing phase biasing terms. Extensive simulation results show that the ISAC complex CNN CSI enhancer can converge within 30 training epochs. The normalized mean square error (NMSE) of its CSI estimates is about 17 dB lower than that of the linear minimum mean square error (LMMSE) estimator, and the bit error rate (BER) of demodulation using the enhanced CSI estimation approaches that with perfect CSI. Finally, the range estimation MSE of the proposed biased FFT-based sensing method approaches that of the subspace-based sensing method, at a much lower complexity.
△ Less
Submitted 19 June, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.
-
YOLO: An Efficient Terahertz Band Integrated Sensing and Communications Scheme with Beam Squint
Authors:
Hongliang Luo,
Feifei Gao,
Hai Lin,
Shaodan Ma,
H. Vincent Poor
Abstract:
Using communications signals for dynamic target sensing is an important component of integrated sensing and communications (ISAC). In this paper, we propose to utilize the beam squint effect to realize fast non-cooperative dynamic target sensing in massive multiple input and multiple output (MIMO) Terahertz band communications systems. Specifically, we construct a wideband channel model of the ech…
▽ More
Using communications signals for dynamic target sensing is an important component of integrated sensing and communications (ISAC). In this paper, we propose to utilize the beam squint effect to realize fast non-cooperative dynamic target sensing in massive multiple input and multiple output (MIMO) Terahertz band communications systems. Specifically, we construct a wideband channel model of the echo signals, and design a beamforming strategy that controls the range of beam squint by adjusting the values of phase shifters and true time delay lines. With this design, beams at different subcarriers can be aligned along different directions in a planned way. Then the received echo signals at different subcarriers will carry target information in different directions, based on which the targets' angles can be estimated through sophisticatedly designed algorithm. Moreover, we propose a supporting method based on extended array signal estimation, which utilizes the phase changes of different frequency subcarriers within different OFDM symbols to estimate the distance and velocity of dynamic targets. Interestingly, the proposed sensing scheme only needs to transmit and receive the signals once, which can be termed as You Only Listen Once (YOLO). Compared with the traditional ISAC method that requires time consuming beam swee**, the proposed one greatly reduces the sensing overhead. Simulation results are provided to demonstrate the effectiveness of the proposed scheme.
△ Less
Submitted 5 February, 2024; v1 submitted 19 May, 2023;
originally announced May 2023.
-
Two-Bit RIS-Aided Communications at 3.5GHz: Some Insights from the Measurement Results Under Multiple Practical Scenes
Authors:
Shun Zhang,
Haoran Sun,
Runze Yu,
Hongshenyuan Cui,
Jian Ren,
Feifei Gao,
Shi **,
Hongxiang Xie,
Hao Wang
Abstract:
In this paper, we propose a two-bit reconfigurable intelligent surface (RIS)-aided communication system, which mainly consists of a two-bit RIS, a transmitter and a receiver. A corresponding prototype verification system is designed to perform experimental tests in practical environments. The carrier frequency is set as 3.5GHz, and the RIS array possesses 256 units, each of which adopts two-bit ph…
▽ More
In this paper, we propose a two-bit reconfigurable intelligent surface (RIS)-aided communication system, which mainly consists of a two-bit RIS, a transmitter and a receiver. A corresponding prototype verification system is designed to perform experimental tests in practical environments. The carrier frequency is set as 3.5GHz, and the RIS array possesses 256 units, each of which adopts two-bit phase quantization. In particular, we adopt a self-developed broadband intelligent communication system 40MHz-Net (BICT-40N) terminal in order to fully acquire the channel information. The terminal mainly includes a baseband board and a radio frequency (RF) front-end board, where the latter can achieve 26 dB transmitting link gain and 33 dB receiving link gain. The orthogonal frequency division multiplexing (OFDM) signal is used for the terminal, where the bandwidth is 40MHz and the subcarrier spacing is 625KHz. Also, the terminal supports a series of modulation modes, including QPSK, QAM, etc.Through experimental tests, we validate a few functions and properties of the RIS as follows. First, we validate a novel RIS power consumption model, which considers both the static and the dynamic power consumption. Besides, we demonstrate the existence of the imaging interference and find that two-bit RIS can lower the imaging interference about 10 dBm. Moreover, we verify that the RIS can outperform the metal plate in terms of the beam focusing performance. In addition, we find that the RIS has the ability to improve the channel stationarity. Then, we realize the multi-beam reflection of the RIS utilizing the pattern addition (PA) algorithm. Lastly, we validate the existence of the mutual coupling between different RIS units.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Waveform Design for Communication-Assisted Sensing in 6G Perceptive Networks
Authors:
Fuwang Dong,
Fan Liu,
Shihang Lu,
Weijie Yuan,
Yuanhao Cui,
Yifeng Xiong,
Feifei Gao
Abstract:
The integrated sensing and communication (ISAC) technique has the potential to achieve coordination gain by exploiting the mutual assistance between sensing and communication (S&C) functions. While the sensing-assisted communications (SAC) technology has been extensively studied for high-mobility scenarios, the communication-assisted sensing (CAS) counterpart remains widely unexplored. This paper…
▽ More
The integrated sensing and communication (ISAC) technique has the potential to achieve coordination gain by exploiting the mutual assistance between sensing and communication (S&C) functions. While the sensing-assisted communications (SAC) technology has been extensively studied for high-mobility scenarios, the communication-assisted sensing (CAS) counterpart remains widely unexplored. This paper presents a waveform design framework for CAS in 6G perceptive networks, aiming to attain an optimal sensing quality of service (QoS) at the user after the target's parameters successively ``pass-through'' the S$\&$C channels. In particular, a pair of transmission schemes, namely, separated S&C and dual-functional waveform designs, are proposed to optimize the sensing QoS under the constraints of the rate-distortion and power budget. The first scheme reveals a power allocation trade-off, while the latter presents a water-filling trade-off. Numerical results demonstrate the effectiveness of the proposed algorithms, where the dual-functional scheme exhibits approximately 25% performance gain compared to its separated waveform design counterpart.
△ Less
Submitted 20 July, 2023; v1 submitted 18 May, 2023;
originally announced May 2023.
-
Near Field Computational Imaging with RIS Generated Virtual Masks
Authors:
Yuhua Jiang,
Feifei Gao,
Yimin Liu,
Shi **,
Tiejun Cui
Abstract:
Near field computational imaging has been recognized as a promising technique for non-destructive and highly accurate detection of the target. Meanwhile, reconfigurable intelligent surface (RIS) can flexibly control the scattered electromagnetic (EM) fields for sensing the target and can thus help computational imaging in the near field. In this paper, we propose a near-field imaging scheme based…
▽ More
Near field computational imaging has been recognized as a promising technique for non-destructive and highly accurate detection of the target. Meanwhile, reconfigurable intelligent surface (RIS) can flexibly control the scattered electromagnetic (EM) fields for sensing the target and can thus help computational imaging in the near field. In this paper, we propose a near-field imaging scheme based on holograghic aperture RIS. Specifically, we first establish an end-to-end EM propagation model from the perspective of Maxwell equations. To mitigate the inherent ill conditioning of the inverse problem in the imaging system, we design the EM field patterns as masks that help translate the inverse problem into a forward problem. Next, we utilize RIS to generate different virtual EM masks on the target surface and calculate the cross-correlation between the mask patterns and the electric field strength at the receiver. We then provide a RIS design scheme for virtual EM masks by employing a regularization technique. The cross-range resolution of the proposed method is analyzed based on the spatial spectrum of the generated masks. Simulation results demonstrate that the proposed method can achieve high-quality imaging. Moreover, the imaging quality can be improved by generating more virtual EM masks, by increasing the signal-to-noise ratio (SNR) at the receiver, or by placing the target closer to the RIS.
△ Less
Submitted 8 March, 2024; v1 submitted 22 April, 2023;
originally announced April 2023.
-
SAWU-Net: Spatial Attention Weighted Unmixing Network for Hyperspectral Images
Authors:
Lin Qi,
Xuewen Qin,
Feng Gao,
Junyu Dong,
Xinbo Gao
Abstract:
Hyperspectral unmixing is a critical yet challenging task in hyperspectral image interpretation. Recently, great efforts have been made to solve the hyperspectral unmixing task via deep autoencoders. However, existing networks mainly focus on extracting spectral features from mixed pixels, and the employment of spatial feature prior knowledge is still insufficient. To this end, we put forward a sp…
▽ More
Hyperspectral unmixing is a critical yet challenging task in hyperspectral image interpretation. Recently, great efforts have been made to solve the hyperspectral unmixing task via deep autoencoders. However, existing networks mainly focus on extracting spectral features from mixed pixels, and the employment of spatial feature prior knowledge is still insufficient. To this end, we put forward a spatial attention weighted unmixing network, dubbed as SAWU-Net, which learns a spatial attention network and a weighted unmixing network in an end-to-end manner for better spatial feature exploitation. In particular, we design a spatial attention module, which consists of a pixel attention block and a window attention block to efficiently model pixel-based spectral information and patch-based spatial information, respectively. While in the weighted unmixing framework, the central pixel abundance is dynamically weighted by the coarse-grained abundances of surrounding pixels. In addition, SAWU-Net generates dynamically adaptive spatial weights through the spatial attention mechanism, so as to dynamically integrate surrounding pixels more effectively. Experimental results on real and synthetic datasets demonstrate the better accuracy and superiority of SAWU-Net, which reflects the effectiveness of the proposed spatial attention mechanism.
△ Less
Submitted 22 April, 2023;
originally announced April 2023.
-
LLIC: Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression
Authors:
Wei Jiang,
Peirong Ning,
Jiayu Yang,
Yongqi Zhai,
Feng Gao,
Ronggang Wang
Abstract:
The effective receptive field (ERF) plays an important role in transform coding, which determines how much redundancy can be removed during transform and how many spatial priors can be utilized to synthesize textures during inverse transform. Existing methods rely on stacks of small kernels, whose ERFs remain insufficiently large, or heavy non-local attention mechanisms, which limit the potential…
▽ More
The effective receptive field (ERF) plays an important role in transform coding, which determines how much redundancy can be removed during transform and how many spatial priors can be utilized to synthesize textures during inverse transform. Existing methods rely on stacks of small kernels, whose ERFs remain insufficiently large, or heavy non-local attention mechanisms, which limit the potential of high-resolution image coding. To tackle this issue, we propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression (LLIC). Specifically, for the first time in the learned image compression community, we introduce a few large kernelbased depth-wise convolutions to reduce more redundancy while maintaining modest complexity. Due to the wide range of image diversity, we further propose a mechanism to augment convolution adaptability through the self-conditioned generation of weights. The large kernels cooperate with non-linear embedding and gate mechanisms for better expressiveness and lighter pointwise interactions. Our investigation extends to refined training methods that unlock the full potential of these large kernels. Moreover, to promote more dynamic inter-channel interactions, we introduce an adaptive channel-wise bit allocation strategy that autonomously generates channel importance factors in a self-conditioned manner. To demonstrate the effectiveness of the proposed transform coding, we align the entropy model to compare with existing transform methods and obtain models LLIC-STF, LLIC-ELIC, and LLIC-TCM. Extensive experiments demonstrate that our proposed LLIC models have significant improvements over the corresponding baselines and reduce the BD-Rate by 9.49%, 9.47%, 10.94% on Kodak over VTM-17.0 Intra, respectively. Our LLIC models achieve state-of-the-art performances and better trade-offs between performance and complexity.
△ Less
Submitted 21 June, 2024; v1 submitted 19 April, 2023;
originally announced April 2023.
-
Physical Knowledge Enhanced Deep Neural Network for Sea Surface Temperature Prediction
Authors:
Yuxin Meng,
Feng Gao,
Eric Rigall,
Ran Dong,
Junyu Dong,
Qian Du
Abstract:
Traditionally, numerical models have been deployed in oceanography studies to simulate ocean dynamics by representing physical equations. However, many factors pertaining to ocean dynamics seem to be ill-defined. We argue that transferring physical knowledge from observed data could further improve the accuracy of numerical models when predicting Sea Surface Temperature (SST). Recently, the advanc…
▽ More
Traditionally, numerical models have been deployed in oceanography studies to simulate ocean dynamics by representing physical equations. However, many factors pertaining to ocean dynamics seem to be ill-defined. We argue that transferring physical knowledge from observed data could further improve the accuracy of numerical models when predicting Sea Surface Temperature (SST). Recently, the advances in earth observation technologies have yielded a monumental growth of data. Consequently, it is imperative to explore ways in which to improve and supplement numerical models utilizing the ever-increasing amounts of historical observational data. To this end, we introduce a method for SST prediction that transfers physical knowledge from historical observations to numerical models. Specifically, we use a combination of an encoder and a generative adversarial network (GAN) to capture physical knowledge from the observed data. The numerical model data is then fed into the pre-trained model to generate physics-enhanced data, which can then be used for SST prediction. Experimental results demonstrate that the proposed method considerably enhances SST prediction performance when compared to several state-of-the-art baselines.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Multi-scale Adaptive Fusion Network for Hyperspectral Image Denoising
Authors:
Haodong Pan,
Feng Gao,
Junyu Dong,
Qian Du
Abstract:
Removing the noise and improving the visual quality of hyperspectral images (HSIs) is challenging in academia and industry. Great efforts have been made to leverage local, global or spectral context information for HSI denoising. However, existing methods still have limitations in feature interaction exploitation among multiple scales and rich spectral structure preservation. In view of this, we p…
▽ More
Removing the noise and improving the visual quality of hyperspectral images (HSIs) is challenging in academia and industry. Great efforts have been made to leverage local, global or spectral context information for HSI denoising. However, existing methods still have limitations in feature interaction exploitation among multiple scales and rich spectral structure preservation. In view of this, we propose a novel solution to investigate the HSI denoising using a Multi-scale Adaptive Fusion Network (MAFNet), which can learn the complex nonlinear map** between clean and noisy HSI. Two key components contribute to improving the hyperspectral image denoising: A progressively multiscale information aggregation network and a co-attention fusion module. Specifically, we first generate a set of multiscale images and feed them into a coarse-fusion network to exploit the contextual texture correlation. Thereafter, a fine fusion network is followed to exchange the information across the parallel multiscale subnetworks. Furthermore, we design a co-attention fusion module to adaptively emphasize informative features from different scales, and thereby enhance the discriminative learning capability for denoising. Extensive experiments on synthetic and real HSI datasets demonstrate that the proposed MAFNet has achieved better denoising performance than other state-of-the-art techniques. Our codes are available at \verb'https://github.com/summitgao/MAFNet'.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Multi-User Matching and Resource Allocation in Vision Aided Communications
Authors:
Weihua Xu,
Feifei Gao,
Yong Zhang,
Chengkang Pan,
Guangyi Liu
Abstract:
Visual perception is an effective way to obtain the spatial characteristics of wireless channels and to reduce the overhead for communications system. A critical problem for the visual assistance is that the communications system needs to match the radio signal with the visual information of the corresponding user, i.e., to identify the visual user that corresponds to the target radio signal from…
▽ More
Visual perception is an effective way to obtain the spatial characteristics of wireless channels and to reduce the overhead for communications system. A critical problem for the visual assistance is that the communications system needs to match the radio signal with the visual information of the corresponding user, i.e., to identify the visual user that corresponds to the target radio signal from all the environmental objects. In this paper, we propose a user matching method for environment with a variable number of objects. Specifically, we apply 3D detection to extract all the environmental objects from the images taken by multiple cameras. Then, we design a deep neural network (DNN) to estimate the location distribution of users by the images and beam pairs at multiple moments, and thereby identify the users from all the extracted environmental objects. Moreover, we present a resource allocation method based on the taken images to reduce the time and spectrum overhead compared to traditional resource allocation methods. Simulation results show that the proposed user matching method outperforms the existing methods, and the proposed resource allocation method can achieve $92\%$ transmission rate of the traditional resource allocation method but with the time and spectrum overhead significantly reduced.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Discovering Structure From Corruption for Unsupervised Image Reconstruction
Authors:
Oscar Leong,
Angela F. Gao,
He Sun,
Katherine L. Bouman
Abstract:
We consider solving ill-posed imaging inverse problems without access to an image prior or ground-truth examples. An overarching challenge in these inverse problems is that an infinite number of images, including many that are implausible, are consistent with the observed measurements. Thus, image priors are required to reduce the space of possible solutions to more desirable reconstructions. Howe…
▽ More
We consider solving ill-posed imaging inverse problems without access to an image prior or ground-truth examples. An overarching challenge in these inverse problems is that an infinite number of images, including many that are implausible, are consistent with the observed measurements. Thus, image priors are required to reduce the space of possible solutions to more desirable reconstructions. However, in many applications it is difficult or potentially impossible to obtain example images to construct an image prior. Hence inaccurate priors are often used, which inevitably result in biased solutions. Rather than solving an inverse problem using priors that encode the spatial structure of any one image, we propose to solve a set of inverse problems jointly by incorporating prior constraints on the collective structure of the underlying images. The key assumption of our work is that the underlying images we aim to reconstruct share common, low-dimensional structure. We show that such a set of inverse problems can be solved simultaneously without the use of a spatial image prior by instead inferring a shared image generator with a low-dimensional latent space. The parameters of the generator and latent embeddings are found by maximizing a proxy for the Evidence Lower Bound (ELBO). Once identified, the generator and latent embeddings can be combined to provide reconstructed images for each inverse problem. The framework we propose can handle general forward model corruptions, and we show that measurements derived from only a small number of ground-truth images ($\leqslant 150$) are sufficient for image reconstruction. We demonstrate our approach on a variety of convex and non-convex inverse problems, including denoising, phase retrieval, and black hole video reconstruction.
△ Less
Submitted 1 November, 2023; v1 submitted 11 April, 2023;
originally announced April 2023.
-
Sensing Performance of Cooperative Joint Sensing-Communication UAV Network
Authors:
Xu Chen,
Zhiyong Feng,
Zhiqing Wei,
Feifei Gao,
Xin Yuan
Abstract:
We propose a novel cooperative joint sensing-communication (JSC) unmanned aerial vehicle (UAV) network that can achieve downward-looking detection and transmit detection data simultaneously using the same time and frequency resources by exploiting the beam sharing scheme. The UAV network consists of a UAV that works as a fusion center (FCUAV) and multiple subordinate UAVs (SU). All UAVs fly at the…
▽ More
We propose a novel cooperative joint sensing-communication (JSC) unmanned aerial vehicle (UAV) network that can achieve downward-looking detection and transmit detection data simultaneously using the same time and frequency resources by exploiting the beam sharing scheme. The UAV network consists of a UAV that works as a fusion center (FCUAV) and multiple subordinate UAVs (SU). All UAVs fly at the fixed height. FCUAV integrates the sensing data of network and carries out downward-looking detection. SUs carry out downward-looking detection and transmit the sensing data to FCUAV. To achieve the beam sharing scheme, each UAV is equipped with a novel JSC antenna array that is composed of both the sensing subarray (SenA) and the communication subarray (ComA) in order to generate the sensing beam (SenB) and the communication beam (ComB) for detection and communication, respectively. SenB and ComB of each UAV share a total amount of radio power. Because of the spatial orthogonality of communication and sensing, SenB and ComB can be easily formed orthogonally. The upper bound of average cooperative sensing area (UB-ACSA) is defined as the metric to measure the sensing performance, which is related to the mutual sensing interference and the communication capacity. Numerical simulations prove the validity of the theoretical expressions for UB-ACSA of the network. The optimal number of UAVs and the optimal SenB power are identified under the total power constraint.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.