-
An I2I Inpainting Approach for Efficient Channel Knowledge Map Construction
Authors:
Zhenzhou **,
Li You,
Jue Wang,
Xiang-Gen Xia,
Xiqi Gao
Abstract:
Channel knowledge map (CKM) has received widespread attention as an emerging enabling technology for environment-aware wireless communications. It involves the construction of databases containing location-specific channel knowledge, which are then leveraged to facilitate channel state information (CSI) acquisition and transceiver design. In this context, a fundamental challenge lies in efficientl…
▽ More
Channel knowledge map (CKM) has received widespread attention as an emerging enabling technology for environment-aware wireless communications. It involves the construction of databases containing location-specific channel knowledge, which are then leveraged to facilitate channel state information (CSI) acquisition and transceiver design. In this context, a fundamental challenge lies in efficiently constructing the CKM based on a given wireless propagation environment. Most existing methods are based on stochastic modeling and sequence prediction, which do not fully exploit the inherent physical characteristics of the propagation environment, resulting in low accuracy and high computational complexity. To address these limitations, we propose a Laplacian pyramid (LP)-based CKM construction scheme to predict the channel knowledge at arbitrary locations in a targeted area. Specifically, we first view the channel knowledge as a 2-D image and transform the CKM construction problem into an image-to-image (I2I) inpainting task, which predicts the channel knowledge at a specific location by recovering the corresponding pixel value in the image matrix. Then, inspired by the reversible and closed-form structure of the LP, we show its natural suitability for our task in designing a fast I2I map** network. For different frequency components of LP decomposition, we design tailored networks accordingly. Besides, to encode the global structural information of the propagation environment, we introduce self-attention and cross-covariance attention mechanisms in different layers, respectively. Finally, experimental results show that the proposed scheme outperforms the benchmark, achieving higher reconstruction accuracy while with lower computational complexity. Moreover, the proposed approach has a strong generalization ability and can be implemented in different wireless communication scenarios.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation
Authors:
Zhusi Zhong,
Jie Li,
John Sollee,
Scott Collins,
Harrison Bai,
Paul Zhang,
Terrence Healey,
Michael Atalay,
Xinbo Gao,
Zhicheng Jiao
Abstract:
In response to the worldwide COVID-19 pandemic, advanced automated technologies have emerged as valuable tools to aid healthcare professionals in managing an increased workload by improving radiology report generation and prognostic analysis. This study proposes Multi-modality Regional Alignment Network (MRANet), an explainable model for radiology report generation and survival prediction that foc…
▽ More
In response to the worldwide COVID-19 pandemic, advanced automated technologies have emerged as valuable tools to aid healthcare professionals in managing an increased workload by improving radiology report generation and prognostic analysis. This study proposes Multi-modality Regional Alignment Network (MRANet), an explainable model for radiology report generation and survival prediction that focuses on high-risk regions. By learning spatial correlation in the detector, MRANet visually grounds region-specific descriptions, providing robust anatomical regions with a completion strategy. The visual features of each region are embedded using a novel survival attention mechanism, offering spatially and risk-aware features for sentence encoding while maintaining global coherence across tasks. A cross LLMs alignment is employed to enhance the image-to-text transfer process, resulting in sentences rich with clinical detail and improved explainability for radiologist. Multi-center experiments validate both MRANet's overall performance and each module's composition within the model, encouraging further advancements in radiology report generation research emphasizing clinical interpretation and trustworthiness in AI models applied to medical studies. The code is available at https://github.com/zzs95/MRANet.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
NAFRSSR: a Lightweight Recursive Network for Efficient Stereo Image Super-Resolution
Authors:
Yihong Chen,
Zhen Fan,
Shuai Dong,
Zhiwei Chen,
Wenjie Li,
Minghui Qin,
Min Zeng,
Xubing Lu,
Guofu Zhou,
Xingsen Gao,
Jun-Ming Liu
Abstract:
Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high co…
▽ More
Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high computational complexity. Here, we propose a simple yet efficient stereo image SR model called NAFRSSR, which is modified from the previous state-of-the-art model NAFSSR by introducing recursive connections and lightweighting the constituent modules. Our NAFRSSR model is composed of nonlinear activation free and group convolution-based blocks (NAFGCBlocks) and depth-separated stereo cross attention modules (DSSCAMs). The NAFGCBlock improves feature extraction and reduces number of parameters by removing the simple channel attention mechanism from NAFBlock and using group convolution. The DSSCAM enhances feature fusion and reduces number of parameters by replacing 1x1 pointwise convolution in SCAM with weight-shared 3x3 depthwise convolution. Besides, we propose to incorporate trainable edge detection operator into NAFRSSR to further improve the model performance. Four variants of NAFRSSR with different sizes, namely, NAFRSSR-Mobile (NAFRSSR-M), NAFRSSR-Tiny (NAFRSSR-T), NAFRSSR-Super (NAFRSSR-S) and NAFRSSR-Base (NAFRSSR-B) are designed, and they all exhibit fewer parameters, higher PSNR/SSIM, and faster speed than the previous state-of-the-art models. In particular, to the best of our knowledge, NAFRSSR-M is the lightest (0.28M parameters) and fastest (50 ms inference time) model achieving an average PSNR/SSIM as high as 24.657 dB/0.7622 on the benchmark datasets. Codes and models will be released at https://github.com/JNUChenYiHong/NAFRSSR.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Precoder Design for User-Centric Network Massive MIMO with Matrix Manifold Optimization
Authors:
Rui Sun,
Li You,
An-An Lu,
Chen Sun,
Xiqi Gao,
Xiang-Gen Xia
Abstract:
In this paper, we investigate the precoder design for user-centric network (UCN) massive multiple-input multiple-output (mMIMO) downlink with matrix manifold optimization. In UCN mMIMO systems, each user terminal (UT) is served by a subset of base stations (BSs) instead of all the BSs, facilitating the implementation of the system and lowering the dimension of the precoders to be designed. By prov…
▽ More
In this paper, we investigate the precoder design for user-centric network (UCN) massive multiple-input multiple-output (mMIMO) downlink with matrix manifold optimization. In UCN mMIMO systems, each user terminal (UT) is served by a subset of base stations (BSs) instead of all the BSs, facilitating the implementation of the system and lowering the dimension of the precoders to be designed. By proving that the precoder set satisfying the per-BS power constraints forms a Riemannian submanifold of a linear product manifold, we transform the constrained precoder design problem in Euclidean space to an unconstrained one on the Riemannian submanifold. Riemannian ingredients, including orthogonal projection, Riemannian gradient, retraction and vector transport, of the problem on the Riemannian submanifold are further derived, with which the Riemannian conjugate gradient (RCG) design method is proposed for solving the unconstrained problem. The proposed method avoids the inverses of large dimensional matrices, which is beneficial in practice. The complexity analyses show the high computational efficiency of RCG precoder design. Simulation results demonstrate the numerical superiority of the proposed precoder design and the high efficiency of the UCN mMIMO system.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Authors:
Ruijie Tao,
Xinyuan Qian,
Rohan Kumar Das,
Xiaoxue Gao,
Jiadong Wang,
Haizhou Li
Abstract:
Audio-visual active speaker detection (AV-ASD) aims to identify which visible face is speaking in a scene with one or more persons. Most existing AV-ASD methods prioritize capturing speech-lip correspondence. However, there is a noticeable gap in addressing the challenges from real-world AV-ASD scenarios. Due to the presence of low-quality noisy videos in such cases, AV-ASD systems without a selec…
▽ More
Audio-visual active speaker detection (AV-ASD) aims to identify which visible face is speaking in a scene with one or more persons. Most existing AV-ASD methods prioritize capturing speech-lip correspondence. However, there is a noticeable gap in addressing the challenges from real-world AV-ASD scenarios. Due to the presence of low-quality noisy videos in such cases, AV-ASD systems without a selective listening ability are short of effectively filtering out disruptive voice components from mixed audio inputs. In this paper, we propose a Multi-modal Speaker Extraction-to-Detection framework named `MuSED', which is pre-trained with audio-visual target speaker extraction to learn the denoising ability, then it is fine-tuned with the AV-ASD task. Meanwhile, to better capture the multi-modal information and deal with real-world problems such as missing modality, MuSED is modelled on the time domain directly and integrates the multi-modal plus-and-minus augmentation strategy. Our experiments demonstrate that MuSED substantially outperforms the state-of-the-art AV-ASD methods and achieves 95.6% mAP on the AVA-ActiveSpeaker dataset, 98.3% AP on the ASW dataset, and 97.9% F1 on the Columbia AV-ASD dataset, respectively. We will publicly release the code in due course.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Integrated Communications and Localization for Massive MIMO LEO Satellite Systems
Authors:
Li You,
Xiaoyu Qiang,
Yongxiang Zhu,
Fan Jiang,
Christos G. Tsinos,
Wen** Wang,
Henk Wymeersch,
Xiqi Gao,
Björn Ottersten
Abstract:
Integrated communications and localization (ICAL) will play an important part in future sixth generation (6G) networks for the realization of Internet of Everything (IoE) to support both global communications and seamless localization. Massive multiple-input multiple-output (MIMO) low earth orbit (LEO) satellite systems have great potential in providing wide coverage with enhanced gains, and thus…
▽ More
Integrated communications and localization (ICAL) will play an important part in future sixth generation (6G) networks for the realization of Internet of Everything (IoE) to support both global communications and seamless localization. Massive multiple-input multiple-output (MIMO) low earth orbit (LEO) satellite systems have great potential in providing wide coverage with enhanced gains, and thus are strong candidates for realizing ubiquitous ICAL. In this paper, we develop a wideband massive MIMO LEO satellite system to simultaneously support wireless communications and localization operations in the downlink. In particular, we first characterize the signal propagation properties and derive a localization performance bound. Based on these analyses, we focus on the hybrid analog/digital precoding design to achieve high communication capability and localization precision. Numerical results demonstrate that the proposed ICAL scheme supports both the wireless communication and localization operations for typical system setups.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Deep Network for Image Compressed Sensing Coding Using Local Structural Sampling
Authors:
Wenxue Cui,
Xingtao Wang,
Xiaopeng Fan,
Shaohui Liu,
Xinwei Gao,
Debin Zhao
Abstract:
Existing image compressed sensing (CS) coding frameworks usually solve an inverse problem based on measurement coding and optimization-based image reconstruction, which still exist the following two challenges: 1) The widely used random sampling matrix, such as the Gaussian Random Matrix (GRM), usually leads to low measurement coding efficiency. 2) The optimization-based reconstruction methods gen…
▽ More
Existing image compressed sensing (CS) coding frameworks usually solve an inverse problem based on measurement coding and optimization-based image reconstruction, which still exist the following two challenges: 1) The widely used random sampling matrix, such as the Gaussian Random Matrix (GRM), usually leads to low measurement coding efficiency. 2) The optimization-based reconstruction methods generally maintain a much higher computational complexity. In this paper, we propose a new CNN based image CS coding framework using local structural sampling (dubbed CSCNet) that includes three functional modules: local structural sampling, measurement coding and Laplacian pyramid reconstruction. In the proposed framework, instead of GRM, a new local structural sampling matrix is first developed, which is able to enhance the correlation between the measurements through a local perceptual sampling strategy. Besides, the designed local structural sampling matrix can be jointly optimized with the other functional modules during training process. After sampling, the measurements with high correlations are produced, which are then coded into final bitstreams by the third-party image codec. At last, a Laplacian pyramid reconstruction network is proposed to efficiently recover the target image from the measurement domain to the image domain. Extensive experimental results demonstrate that the proposed scheme outperforms the existing state-of-the-art CS coding methods, while maintaining fast computational speed.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Towards a Digital Twin Framework in Additive Manufacturing: Machine Learning and Bayesian Optimization for Time Series Process Optimization
Authors:
Vispi Karkaria,
Anthony Goeckner,
Ru**g Zha,
Jie Chen,
Jian**g Zhang,
Qi Zhu,
Jian Cao,
Robert X. Gao,
Wei Chen
Abstract:
Laser-directed-energy deposition (DED) offers advantages in additive manufacturing (AM) for creating intricate geometries and material grading. Yet, challenges like material inconsistency and part variability remain, mainly due to its layer-wise fabrication. A key issue is heat accumulation during DED, which affects the material microstructure and properties. While closed-loop control methods for…
▽ More
Laser-directed-energy deposition (DED) offers advantages in additive manufacturing (AM) for creating intricate geometries and material grading. Yet, challenges like material inconsistency and part variability remain, mainly due to its layer-wise fabrication. A key issue is heat accumulation during DED, which affects the material microstructure and properties. While closed-loop control methods for heat management are common in DED research, few integrate real-time monitoring, physics-based modeling, and control in a unified framework. Our work presents a digital twin (DT) framework for real-time predictive control of DED process parameters to meet specific design objectives. We develop a surrogate model using Long Short-Term Memory (LSTM)-based machine learning with Bayesian Inference to predict temperatures in DED parts. This model predicts future temperature states in real time. We also introduce Bayesian Optimization (BO) for Time Series Process Optimization (BOTSPO), based on traditional BO but featuring a unique time series process profile generator with reduced dimensions. BOTSPO dynamically optimizes processes, identifying optimal laser power profiles to attain desired mechanical properties. The established process trajectory guides online optimizations, aiming to enhance performance. This paper outlines the digital twin framework's components, promoting its integration into a comprehensive system for AM.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
MMW-Carry: Enhancing Carry Object Detection through Millimeter-Wave Radar-Camera Fusion
Authors:
Xiangyu Gao,
Youchen Luo,
Ali Alansari,
Ya** Sun
Abstract:
This paper introduces MMW-Carry, a system designed to predict the probability of individuals carrying various objects using millimeter-wave radar signals, complemented by camera input. The primary goal of MMW-Carry is to provide a rapid and cost-effective preliminary screening solution, specifically tailored for non-super-sensitive scenarios. Overall, MMW-Carry achieves significant advancements in…
▽ More
This paper introduces MMW-Carry, a system designed to predict the probability of individuals carrying various objects using millimeter-wave radar signals, complemented by camera input. The primary goal of MMW-Carry is to provide a rapid and cost-effective preliminary screening solution, specifically tailored for non-super-sensitive scenarios. Overall, MMW-Carry achieves significant advancements in two crucial aspects. Firstly, it addresses localization challenges in complex indoor environments caused by multi-path reflections, enhancing the system's overall robustness. This is accomplished by the integration of camera-based human detection, tracking, and the radar-camera plane transformation for obtaining subjects' spatial occupancy region, followed by a zooming-in operation on the radar images. Secondly, the system performance is elevated by leveraging long-term observation of a subject. This is realized through the intelligent fusion of neural network results from multiple different-view radar images of an in-track moving subject and their carried objects, facilitated by a proposed knowledge-transfer module. Our experiment results demonstrate that MMW-Carry detects objects with an average error rate of 25.22\% false positives and a 21.71\% missing rate for individuals moving randomly in a large indoor space, carrying the common-in-everyday-life objects, both in open carry or concealed ways. These findings affirm MMW-Carry's potential to extend its capabilities to detect a broader range of objects for diverse applications.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks
Authors:
Duo Ma,
Xianghu Yue,
Junyi Ao,
Xiaoxue Gao,
Haizhou Li
Abstract:
Human language can be expressed in either written or spoken form, i.e. text or speech. Humans can acquire knowledge from text to improve speaking and listening. However, the quest for speech pre-trained models to leverage unpaired text has just started. In this paper, we investigate a new way to pre-train such a joint speech-text model to learn enhanced speech representations and benefit various s…
▽ More
Human language can be expressed in either written or spoken form, i.e. text or speech. Humans can acquire knowledge from text to improve speaking and listening. However, the quest for speech pre-trained models to leverage unpaired text has just started. In this paper, we investigate a new way to pre-train such a joint speech-text model to learn enhanced speech representations and benefit various speech-related downstream tasks. Specifically, we propose a novel pre-training method, text-guided HuBERT, or T-HuBERT, which performs self-supervised learning over speech to derive phoneme-like discrete representations. And these phoneme-like pseudo-label sequences are firstly derived from speech via the generative adversarial networks (GAN) to be statistically similar to those from additional unpaired textual data. In this way, we build a bridge between unpaired speech and text in an unsupervised manner. Extensive experiments demonstrate the significant superiority of our proposed method over various strong baselines, which achieves up to 15.3% relative Word Error Rate (WER) reduction on the LibriSpeech dataset.
△ Less
Submitted 28 February, 2024; v1 submitted 24 February, 2024;
originally announced February 2024.
-
Diffusion Model Based Visual Compensation Guidance and Visual Difference Analysis for No-Reference Image Quality Assessment
Authors:
Zhaoyang Wang,
Bo Hu,
Mingyang Zhang,
Jie Li,
Leida Li,
Maoguo Gong,
Xinbo Gao
Abstract:
Existing free-energy guided No-Reference Image Quality Assessment (NR-IQA) methods still suffer from finding a balance between learning feature information at the pixel level of the image and capturing high-level feature information and the efficient utilization of the obtained high-level feature information remains a challenge. As a novel class of state-of-the-art (SOTA) generative model, the dif…
▽ More
Existing free-energy guided No-Reference Image Quality Assessment (NR-IQA) methods still suffer from finding a balance between learning feature information at the pixel level of the image and capturing high-level feature information and the efficient utilization of the obtained high-level feature information remains a challenge. As a novel class of state-of-the-art (SOTA) generative model, the diffusion model exhibits the capability to model intricate relationships, enabling a comprehensive understanding of images and possessing a better learning of both high-level and low-level visual features. In view of these, we pioneer the exploration of the diffusion model into the domain of NR-IQA. Firstly, we devise a new diffusion restoration network that leverages the produced enhanced image and noise-containing images, incorporating nonlinear features obtained during the denoising process of the diffusion model, as high-level visual information. Secondly, two visual evaluation branches are designed to comprehensively analyze the obtained high-level feature information. These include the visual compensation guidance branch, grounded in the transformer architecture and noise embedding strategy, and the visual difference analysis branch, built on the ResNet architecture and the residual transposed attention block. Extensive experiments are conducted on seven public NR-IQA datasets, and the results demonstrate that the proposed model outperforms SOTA methods for NR-IQA.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Point cloud-based registration and image fusion between cardiac SPECT MPI and CTA
Authors:
Shaojie Tang,
Penpen Miao,
Xingyu Gao,
Yu Zhong,
Dantong Zhu,
Haixing Wen,
Zhihui Xu,
Qiuyue Wei,
Hong** Yao,
Xin Huang,
Rui Gao,
Chen Zhao,
Weihua Zhou
Abstract:
A method was proposed for the point cloud-based registration and image fusion between cardiac single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) and cardiac computed tomography angiograms (CTA). Firstly, the left ventricle (LV) epicardial regions (LVERs) in SPECT and CTA images were segmented by using different U-Net neural networks trained to generate the point c…
▽ More
A method was proposed for the point cloud-based registration and image fusion between cardiac single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) and cardiac computed tomography angiograms (CTA). Firstly, the left ventricle (LV) epicardial regions (LVERs) in SPECT and CTA images were segmented by using different U-Net neural networks trained to generate the point clouds of the LV epicardial contours (LVECs). Secondly, according to the characteristics of cardiac anatomy, the special points of anterior and posterior interventricular grooves (APIGs) were manually marked in both SPECT and CTA image volumes. Thirdly, we developed an in-house program for coarsely registering the special points of APIGs to ensure a correct cardiac orientation alignment between SPECT and CTA images. Fourthly, we employed ICP, SICP or CPD algorithm to achieve a fine registration for the point clouds (together with the special points of APIGs) of the LV epicardial surfaces (LVERs) in SPECT and CTA images. Finally, the image fusion between SPECT and CTA was realized after the fine registration. The experimental results showed that the cardiac orientation was aligned well and the mean distance error of the optimal registration method (CPD with affine transform) was consistently less than 3 mm. The proposed method could effectively fuse the structures from cardiac CTA and SPECT functional images, and demonstrated a potential in assisting in accurate diagnosis of cardiac diseases by combining complementary advantages of the two imaging modalities.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
DanceMeld: Unraveling Dance Phrases with Hierarchical Latent Codes for Music-to-Dance Synthesis
Authors:
Xin Gao,
Li Hu,
Peng Zhang,
Bang Zhang,
Liefeng Bo
Abstract:
In the realm of 3D digital human applications, music-to-dance presents a challenging task. Given the one-to-many relationship between music and dance, previous methods have been limited in their approach, relying solely on matching and generating corresponding dance movements based on music rhythm. In the professional field of choreography, a dance phrase consists of several dance poses and dance…
▽ More
In the realm of 3D digital human applications, music-to-dance presents a challenging task. Given the one-to-many relationship between music and dance, previous methods have been limited in their approach, relying solely on matching and generating corresponding dance movements based on music rhythm. In the professional field of choreography, a dance phrase consists of several dance poses and dance movements. Dance poses composed of a series of basic meaningful body postures, while dance movements can reflect dynamic changes such as the rhythm, melody, and style of dance. Taking inspiration from these concepts, we introduce an innovative dance generation pipeline called DanceMeld, which comprising two stages, i.e., the dance decouple stage and the dance generation stage. In the decouple stage, a hierarchical VQ-VAE is used to disentangle dance poses and dance movements in different feature space levels, where the bottom code represents dance poses, and the top code represents dance movements. In the generation stage, we utilize a diffusion model as a prior to model the distribution and generate latent codes conditioned on music features. We have experimentally demonstrated the representational capabilities of top code and bottom code, enabling the explicit decoupling expression of dance poses and dance movements. This disentanglement not only provides control over motion details, styles, and rhythm but also facilitates applications such as dance style transfer and dance unit editing. Our approach has undergone qualitative and quantitative experiments on the AIST++ dataset, demonstrating its superiority over other methods.
△ Less
Submitted 30 November, 2023;
originally announced January 2024.
-
Electromagnetic Information Theory: Fundamentals and Applications for 6G Wireless Communication Systems
Authors:
Cheng-Xiang Wang,
Yue Yang,
Jie Huang,
Xiqi Gao,
Tie Jun Cui,
Lajos Hanzo
Abstract:
In wireless communications, electromagnetic theory and information theory constitute a pair of fundamental theories, bridged by antenna theory and wireless propagation channel modeling theory. Up to the fifth generation (5G) wireless communication networks, these four theories have been develo** relatively independently. However, in sixth generation (6G) space-air-ground-sea wireless communicati…
▽ More
In wireless communications, electromagnetic theory and information theory constitute a pair of fundamental theories, bridged by antenna theory and wireless propagation channel modeling theory. Up to the fifth generation (5G) wireless communication networks, these four theories have been develo** relatively independently. However, in sixth generation (6G) space-air-ground-sea wireless communication networks, seamless coverage is expected in the three-dimensional (3D) space, potentially necessitating the acquisition of channel state information (CSI) and channel capacity calculation at anywhere and any time. Additionally, the key 6G technologies such as ultra-massive multiple-input multiple-output (MIMO) and holographic MIMO achieves intricate interaction of the antennas and wireless propagation environments, which necessitates the joint modeling of antennas and wireless propagation channels. To address the challenges in 6G, the integration of the above four theories becomes inevitable, leading to the concept of the so-called electromagnetic information theory (EIT). In this article, a suite of 6G key technologies is highlighted. Then, the concepts and relationships of the four theories are unveiled. Finally, the necessity and benefits of integrating them into the EIT are revealed.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Fluid Antenna-Assisted MIMO Transmission Exploiting Statistical CSI
Authors:
Yuqi Ye,
Li You,
Jue Wang,
Hao Xu,
Kai-Kit Wong,
Xiqi Gao
Abstract:
In conventional multiple-input multiple-output (MIMO) communication systems, the positions of antennas are fixed. To take full advantage of spatial degrees of freedom, a new technology called fluid antenna (FA) is proposed to obtain higher achievable rate and diversity gain. Most existing works on FA exploit instantaneous channel state information (CSI). However, in FA-assisted systems, it is diff…
▽ More
In conventional multiple-input multiple-output (MIMO) communication systems, the positions of antennas are fixed. To take full advantage of spatial degrees of freedom, a new technology called fluid antenna (FA) is proposed to obtain higher achievable rate and diversity gain. Most existing works on FA exploit instantaneous channel state information (CSI). However, in FA-assisted systems, it is difficult to obtain instantaneous CSI since changes in the antenna position will lead to channel variation. In this letter, we investigate a FA-assisted MIMO system using relatively slow-varying statistical CSI. Specifically, in the criterion of rate maximization, we propose an algorithmic framework for transmit precoding and transmit/receive FAs position designs with statistical CSI. Simulation results show that our proposed algorithm in FA-assisted systems significantly outperforms baselines in terms of rate performance.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Visual tracking brain computer interface
Authors:
Changxing Huang,
Nanlin Shi,
Yining Miao,
Xiaogang Chen,
Yijun Wang,
Xiaorong Gao
Abstract:
Brain-computer interfaces (BCIs) offer a way to interact with computers without relying on physical movements. Non-invasive electroencephalography (EEG)-based visual BCIs, known for efficient speed and calibration ease, face limitations in continuous tasks due to discrete stimulus design and decoding methods. To achieve continuous control, we implemented a novel spatial encoding stimulus paradigm…
▽ More
Brain-computer interfaces (BCIs) offer a way to interact with computers without relying on physical movements. Non-invasive electroencephalography (EEG)-based visual BCIs, known for efficient speed and calibration ease, face limitations in continuous tasks due to discrete stimulus design and decoding methods. To achieve continuous control, we implemented a novel spatial encoding stimulus paradigm and devised a corresponding projection method to enable continuous modulation of decoded velocity. Subsequently, we conducted experiments involving 17 participants and achieved Fitt's ITR of 0.55 bps for the fixed tracking task and 0.37 bps for the random tracking task. The proposed BCI with a high Fitt's ITR was then integrated into two applications, including painting and gaming. In conclusion, this study proposed a visual BCI-based control method to go beyond discrete commands, allowing natural continuous control based on neural activity.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
High-performance cVEP-BCI under minimal calibration
Authors:
Yining Miao,
Nanlin Shi,
Changxing Huang,
Yonghao Song,
Xiaogang Chen,
Yijun Wang,
Xiaorong Gao
Abstract:
The ultimate goal of brain-computer interfaces (BCIs) based on visual modulation paradigms is to achieve high-speed performance without the burden of extensive calibration. Code-modulated visual evoked potential-based BCIs (cVEP-BCIs) modulated by broadband white noise (WN) offer various advantages, including increased communication speed, expanded encoding target capabilities, and enhanced coding…
▽ More
The ultimate goal of brain-computer interfaces (BCIs) based on visual modulation paradigms is to achieve high-speed performance without the burden of extensive calibration. Code-modulated visual evoked potential-based BCIs (cVEP-BCIs) modulated by broadband white noise (WN) offer various advantages, including increased communication speed, expanded encoding target capabilities, and enhanced coding flexibility. However, the complexity of the spatial-temporal patterns under broadband stimuli necessitates extensive calibration for effective target identification in cVEP-BCIs. Consequently, the information transfer rate (ITR) of cVEP-BCI under limited calibration usually stays around 100 bits per minute (bpm), significantly lagging behind state-of-the-art steady-state visual evoked potential-based BCIs (SSVEP-BCIs), which achieve rates above 200 bpm. To enhance the performance of cVEP-BCIs with minimal calibration, we devised an efficient calibration stage involving a brief single-target flickering, lasting less than a minute, to extract generalizable spatial-temporal patterns. Leveraging the calibration data, we developed two complementary methods to construct cVEP temporal patterns: the linear modeling method based on the stimulus sequence and the transfer learning techniques using cross-subject data. As a result, we achieved the highest ITR of 250 bpm under a minute of calibration, which has been shown to be comparable to the state-of-the-art SSVEP paradigms. In summary, our work significantly improved the cVEP performance under few-shot learning, which is expected to expand the practicality and usability of cVEP-BCIs.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Few-Shot Recognition and Classification Framework for Jamming Signal: A CGAN-Based Fusion CNN Approach
Authors:
Xuhui Ding,
Yue Zhang,
Gaoyang Li,
Xiaozheng Gao,
Neng Ye,
Dusit Niyato,
Kai Yang
Abstract:
Subject to intricate environmental variables, the precise classification of jamming signals holds paramount significance in the effective implementation of anti-jamming strategies within communication systems. In light of this imperative, we propose an innovative fusion algorithm based on conditional generative adversarial network (CGAN) and convolutional neural network (CNN), which aims to deal w…
▽ More
Subject to intricate environmental variables, the precise classification of jamming signals holds paramount significance in the effective implementation of anti-jamming strategies within communication systems. In light of this imperative, we propose an innovative fusion algorithm based on conditional generative adversarial network (CGAN) and convolutional neural network (CNN), which aims to deal with the difficulty in applying deep learning (DL) algorithms due to the instantaneous nature of jamming signals in practical communication systems. Compared with previous methods, our algorithm embeds jamming category labels to constrain the range of generated signals in the frequency domain by using the CGAN model, which simultaneously captures potential label information while learning the distribution of signal data thus achieves an 8% improvement in accuracy even when working with a few-sample dataset. Real-world satellite communication scenarios are simulated by adopting hardware platform, and we validate our algorithm by using the resulting time-domain waveform data. The experimental results indicate that our algorithm still performs extremely well, which demonstrates significant potential for practical application in real-world communication scenarios.
△ Less
Submitted 26 June, 2024; v1 submitted 9 November, 2023;
originally announced November 2023.
-
Cooperative Dispatch of Microgrids Community Using Risk-Sensitive Reinforcement Learning with Monotonously Improved Performance
Authors:
Ziqing Zhu,
Xiang Gao,
Siqi Bu,
Ka Wing Chan,
Bin Zhou,
Shiwei Xia
Abstract:
The integration of individual microgrids (MGs) into Microgrid Clusters (MGCs) significantly improves the reliability and flexibility of energy supply, through resource sharing and ensuring backup during outages. The dispatch of MGCs is the key challenge to be tackled to ensure their secure and economic operation. Currently, there is a lack of optimization method that can achieve a trade-off among…
▽ More
The integration of individual microgrids (MGs) into Microgrid Clusters (MGCs) significantly improves the reliability and flexibility of energy supply, through resource sharing and ensuring backup during outages. The dispatch of MGCs is the key challenge to be tackled to ensure their secure and economic operation. Currently, there is a lack of optimization method that can achieve a trade-off among top-priority requirements of MGCs' dispatch, including fast computation speed, optimality, multiple objectives, and risk mitigation against uncertainty. In this paper, a novel Multi-Objective, Risk-Sensitive, and Online Trust Region Policy Optimization (RS-TRPO) Algorithm is proposed to tackle this problem. First, a dispatch paradigm for autonomous MGs in the MGC is proposed, enabling them sequentially implement their self-dispatch to mitigate potential conflicts. This dispatch paradigm is then formulated as a Markov Game model, which is finally solved by the RS-TRPO algorithm. This online algorithm enables MGs to spontaneously search for the Pareto Frontier considering multiple objectives and risk mitigation. The outstanding computational performance of this algorithm is demonstrated in comparison with mathematical programming methods and heuristic algorithms in a modified IEEE 30-Bus Test System integrated with four autonomous MGs.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
A Comprehensive Indoor Environment Dataset from Single-family Houses in the US
Authors:
Sheik Murad Hassan Anik,
Xinghua Gao,
Na Meng
Abstract:
The paper describes a dataset comprising indoor environmental factors such as temperature, humidity, air quality, and noise levels. The data was collected from 10 sensing devices installed in various locations within three single-family houses in Virginia, USA. The objective of the data collection was to study the indoor environmental conditions of the houses over time. The data were collected at…
▽ More
The paper describes a dataset comprising indoor environmental factors such as temperature, humidity, air quality, and noise levels. The data was collected from 10 sensing devices installed in various locations within three single-family houses in Virginia, USA. The objective of the data collection was to study the indoor environmental conditions of the houses over time. The data were collected at a frequency of one record per minute for a year, combining over 2.5 million records. The paper provides actual floor plans with sensor placements to aid researchers and practitioners in creating reliable building performance models. The techniques used to collect and verify the data are also explained in the paper. The resulting dataset can be employed to enhance models for building energy consumption, occupant behavior, predictive maintenance, and other relevant purposes.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Indoor Exploration and Simultaneous Trolley Collection Through Task-Oriented Environment Partitioning
Authors:
Junjie Gao,
Peijia Xie,
Xuheng Gao,
Zhirui Sun,
Jiankun Wang,
Max Q. -H. Meng
Abstract:
In this paper, we present a simultaneous exploration and object search framework for the application of autonomous trolley collection. For environment representation, a task-oriented environment partitioning algorithm is presented to extract diverse information for each sub-task. First, LiDAR data is classified as potential objects, walls, and obstacles after outlier removal. Segmented point cloud…
▽ More
In this paper, we present a simultaneous exploration and object search framework for the application of autonomous trolley collection. For environment representation, a task-oriented environment partitioning algorithm is presented to extract diverse information for each sub-task. First, LiDAR data is classified as potential objects, walls, and obstacles after outlier removal. Segmented point clouds are then transformed into a hybrid map with the following functional components: object proposals to avoid missing trolleys during exploration; room layouts for semantic space segmentation; and polygonal obstacles containing geometry information for efficient motion planning. For exploration and simultaneous trolley collection, we propose an efficient exploration-based object search method. First, a traveling salesman problem with precedence constraints (TSP-PC) is formulated by grou** frontiers and object proposals. The next target is selected by prioritizing object search while avoiding excessive robot backtracking. Then, feasible trajectories with adequate obstacle clearance are generated by topological graph search. We validate the proposed framework through simulations and demonstrate the system with real-world autonomous trolley collection tasks.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
A Tutorial on Environment-Aware Communications via Channel Knowledge Map for 6G
Authors:
Yong Zeng,
Junting Chen,
Jie Xu,
Di Wu,
Xiaoli Xu,
Shi **,
Xiqi Gao,
David Gesbert,
Shuguang Cui,
Rui Zhang
Abstract:
Sixth-generation (6G) mobile communication networks are expected to have dense infrastructures, large antenna size, wide bandwidth, cost-effective hardware, diversified positioning methods, and enhanced intelligence. Such trends bring both new challenges and opportunities for the practical design of 6G. On one hand, acquiring channel state information (CSI) in real time for all wireless links beco…
▽ More
Sixth-generation (6G) mobile communication networks are expected to have dense infrastructures, large antenna size, wide bandwidth, cost-effective hardware, diversified positioning methods, and enhanced intelligence. Such trends bring both new challenges and opportunities for the practical design of 6G. On one hand, acquiring channel state information (CSI) in real time for all wireless links becomes quite challenging in 6G. On the other hand, there would be numerous data sources in 6G containing high-quality location-tagged channel data, e.g., the estimated channels or beams between base station (BS) and user equipment (UE), making it possible to better learn the local wireless environment. By exploiting this new opportunity and for tackling the CSI acquisition challenge, there is a promising paradigm shift from the conventional environment-unaware communications to the new environment-aware communications based on the novel approach of channel knowledge map (CKM). This article aims to provide a comprehensive overview on environment-aware communications enabled by CKM to fully harness its benefits for 6G. First, the basic concept of CKM is presented, followed by the comparison of CKM with various existing channel inference techniques. Next, the main techniques for CKM construction are discussed, including both environment model-free and environment model-assisted approaches. Furthermore, a general framework is presented for the utilization of CKM to achieve environment-aware communications, followed by some typical CKM-aided communication scenarios. Finally, important open problems in CKM research are highlighted and potential solutions are discussed to inspire future work.
△ Less
Submitted 6 February, 2024; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Decoding Natural Images from EEG for Object Recognition
Authors:
Yonghao Song,
Bingchuan Liu,
Xiang Li,
Nanlin Shi,
Yijun Wang,
Xiaorong Gao
Abstract:
Electroencephalography (EEG) signals, known for convenient non-invasive acquisition but low signal-to-noise ratio, have recently gained substantial attention due to the potential to decode natural images. This paper presents a self-supervised framework to demonstrate the feasibility of learning image representations from EEG signals, particularly for object recognition. The framework utilizes imag…
▽ More
Electroencephalography (EEG) signals, known for convenient non-invasive acquisition but low signal-to-noise ratio, have recently gained substantial attention due to the potential to decode natural images. This paper presents a self-supervised framework to demonstrate the feasibility of learning image representations from EEG signals, particularly for object recognition. The framework utilizes image and EEG encoders to extract features from paired image stimuli and EEG responses. Contrastive learning aligns these two modalities by constraining their similarity. With the framework, we attain significantly above-chance results on a comprehensive EEG-image dataset, achieving a top-1 accuracy of 15.6% and a top-5 accuracy of 42.8% in challenging 200-way zero-shot tasks. Moreover, we perform extensive experiments to explore the biological plausibility by resolving the temporal, spatial, spectral, and semantic aspects of EEG signals. Besides, we introduce attention modules to capture spatial correlations, providing implicit evidence of the brain activity perceived from EEG data. These findings yield valuable insights for neural decoding and brain-computer interfaces in real-world scenarios. The code will be released on https://github.com/eeyhsong/NICE-EEG.
△ Less
Submitted 4 April, 2024; v1 submitted 25 August, 2023;
originally announced August 2023.
-
Estimating and approaching maximum information rate of noninvasive visual brain-computer interface
Authors:
Nanlin Shi,
Yining Miao,
Changxing Huang,
Xiang Li,
Yonghao Song,
Xiaogang Chen,
Yijun Wang,
Xiaorong Gao
Abstract:
The mission of visual brain-computer interfaces (BCIs) is to enhance information transfer rate (ITR) to reach high speed towards real-life communication. Despite notable progress, noninvasive visual BCIs have encountered a plateau in ITRs, leaving it uncertain whether higher ITRs are achievable. In this study, we investigate the information rate limits of the primary visual channel to explore whet…
▽ More
The mission of visual brain-computer interfaces (BCIs) is to enhance information transfer rate (ITR) to reach high speed towards real-life communication. Despite notable progress, noninvasive visual BCIs have encountered a plateau in ITRs, leaving it uncertain whether higher ITRs are achievable. In this study, we investigate the information rate limits of the primary visual channel to explore whether we can and how we should build visual BCI with higher information rate. Using information theory, we estimate a maximum achievable ITR of approximately 63 bits per second (bps) with a uniformly-distributed White Noise (WN) stimulus. Based on this discovery, we propose a broadband WN BCI approach that expands the utilization of stimulus bandwidth, in contrast to the current state-of-the-art visual BCI methods based on steady-state visual evoked potentials (SSVEPs). Through experimental validation, our broadband BCI outperforms the SSVEP BCI by an impressive margin of 7 bps, setting a new record of 50 bps. This achievement demonstrates the possibility of decoding 40 classes of noninvasive neural responses within a short duration of only 0.1 seconds. The information-theoretical framework introduced in this study provides valuable insights applicable to all sensory-evoked BCIs, making a significant step towards the development of next-generation human-machine interaction systems.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Multi-Objective Optimisation of URLLC-Based Metaverse Services
Authors:
Xinyu Gao,
Wenqiang Yi,
Yuanwei Liu,
Lajos Hanzo
Abstract:
Metaverse aims for building a fully immersive virtual shared space, where the users are able to engage in various activities. To successfully deploy the service for each user, the Metaverse service provider and network service provider generally localise the user first and then support the communication between the base station (BS) and the user. A reconfigurable intelligent surface (RIS) is capab…
▽ More
Metaverse aims for building a fully immersive virtual shared space, where the users are able to engage in various activities. To successfully deploy the service for each user, the Metaverse service provider and network service provider generally localise the user first and then support the communication between the base station (BS) and the user. A reconfigurable intelligent surface (RIS) is capable of creating a reflected link between the BS and the user to enhance line-of-sight. Furthermore, the new key performance indicators (KPIs) in Metaverse, such as its energy-consumption-dependent total service cost and transmission latency, are often overlooked in ultra-reliable low latency communication (URLLC) designs, which have to be carefully considered in next-generation URLLC (xURLLC) regimes. In this paper, our design objective is to jointly optimise the transmit power, the RIS phase shifts, and the decoding error probability to simultaneously minimise the total service cost and transmission latency and approach the Pareto Front (PF). We conceive a twin-stage central controller, which aims for localising the users first and then supports the communication between the BS and users. In the first stage, we localise the Metaverse users, where the stochastic gradient descent (SGD) algorithm is invoked for accurate user localisation. In the second stage, a meta-learning-based position-dependent multi-objective soft actor and critic (MO-SAC) algorithm is proposed to approach the PF between the total service cost and transmission latency and to further optimise the latency-dependent reliability. Our numerical results demonstrate that ...
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Sensing User's Activity, Channel, and Location with Near-Field Extra-Large-Scale MIMO
Authors:
Li Qiao,
Anwen Liao,
Zhuoran Li,
Hua Wang,
Zhen Gao,
Xiang Gao,
Yu Su,
Pei Xiao,
Li You,
Derrick Wing Kwan Ng
Abstract:
This paper proposes a grant-free massive access scheme based on the millimeter wave (mmWave) extra-large-scale multiple-input multiple-output (XL-MIMO) to support massive Internet-of-Things (IoT) devices with low latency, high data rate, and high localization accuracy in the upcoming sixth-generation (6G) networks. The XL-MIMO consists of multiple antenna subarrays that are widely spaced over the…
▽ More
This paper proposes a grant-free massive access scheme based on the millimeter wave (mmWave) extra-large-scale multiple-input multiple-output (XL-MIMO) to support massive Internet-of-Things (IoT) devices with low latency, high data rate, and high localization accuracy in the upcoming sixth-generation (6G) networks. The XL-MIMO consists of multiple antenna subarrays that are widely spaced over the service area to ensure line-of-sight (LoS) transmissions. First, we establish the XL-MIMO-based massive access model considering the near-field spatial non-stationary (SNS) property. Then, by exploiting the block sparsity of subarrays and the SNS property, we propose a structured block orthogonal matching pursuit algorithm for efficient active user detection (AUD) and channel estimation (CE). Furthermore, different sensing matrices are applied in different pilot subcarriers for exploiting the diversity gains. Additionally, a multi-subarray collaborative localization algorithm is designed for localization. In particular, the angle of arrival (AoA) and time difference of arrival (TDoA) of the LoS links between active users and related subarrays are extracted from the estimated XL-MIMO channels, and then the coordinates of active users are acquired by jointly utilizing the AoAs and TDoAs. Simulation results show that the proposed algorithms outperform existing algorithms in terms of AUD and CE performance and can achieve centimeter-level localization accuracy.
△ Less
Submitted 16 October, 2023; v1 submitted 20 July, 2023;
originally announced July 2023.
-
A Quick Guide for the Iterated Extended Kalman Filter on Manifolds
Authors:
Jianzhu Huai,
Xiang Gao
Abstract:
The extended Kalman filter (EKF) is a common state estimation method for discrete nonlinear systems. It recursively executes the propagation step as time goes by and the update step when a set of measurements arrives. In the update step, the EKF linearizes the measurement function only once. In contrast, the iterated EKF (IEKF) refines the state in the update step by iteratively solving a least sq…
▽ More
The extended Kalman filter (EKF) is a common state estimation method for discrete nonlinear systems. It recursively executes the propagation step as time goes by and the update step when a set of measurements arrives. In the update step, the EKF linearizes the measurement function only once. In contrast, the iterated EKF (IEKF) refines the state in the update step by iteratively solving a least squares problem. The IEKF has been extended to work with state variables on manifolds which have differentiable $\boxplus$ and $\boxminus$ operators, including Lie groups. However, existing descriptions are often long, deep, and even with errors. This note provides a quick reference for the IEKF on manifolds, using freshman-level matrix calculus. Besides the bare-bone equations, we highlight the key steps in deriving them.
△ Less
Submitted 4 October, 2023; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Static Background Removal in Vehicular Radar: Filtering in Azimuth-Elevation-Doppler Domain
Authors:
Xiangyu Gao,
Sumit Roy,
Lyutianyang Zhang
Abstract:
Anti-collision assistance (as part of the current push towards increasing vehicular autonomy) critically depends on accurate detection/localization of moving targets in vicinity. An effective solution pathway involves removing background or static objects from the scene, so as to enhance the detection/localization of moving targets as a key component for improving overall system performance. In th…
▽ More
Anti-collision assistance (as part of the current push towards increasing vehicular autonomy) critically depends on accurate detection/localization of moving targets in vicinity. An effective solution pathway involves removing background or static objects from the scene, so as to enhance the detection/localization of moving targets as a key component for improving overall system performance. In this paper, we present an efficient algorithm for background removal for automotive scenarios, applicable to commodity frequency-modulated continuous wave (FMCW)-based radars. Our proposed algorithm follows a three-step approach: a) preprocessing of back-scattered received radar signal for 4-dimensional (4D) point clouds generation, b) 3-dimensional (3D) radar ego-motion estimation, and c) notch filter-based background removal in the azimuth-elevation-Doppler domain. To begin, we model the received signal corresponding to multiple-input multiple-output (MIMO) FMCW transmissions and develop a signal processing framework for extracting 4D point clouds. Subsequently, we introduce a robust 3D ego-motion estimation algorithm that accurately estimates source radar velocity, accounting for measurement errors and Doppler ambiguity, by processing the point clouds. Additionally, our algorithm leverages the relationship between Doppler velocity, azimuth angle, elevation angle, and radar ego-motion velocity to identify the background clutter spectrum and employ notch filters for its removal. The performance of our algorithm is evaluated using both simulated data and experiments with real-world data. By offering a fast and computationally efficient solution, our approach contributes to a potential pathway for challenges posed by non-homogeneous environments and real-time processing requirements.
△ Less
Submitted 29 July, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Multi-View Attention Learning for Residual Disease Prediction of Ovarian Cancer
Authors:
Xiangneng Gao,
Shulan Ruan,
Jun Shi,
Guoqing Hu,
Wei Wei
Abstract:
In the treatment of ovarian cancer, precise residual disease prediction is significant for clinical and surgical decision-making. However, traditional methods are either invasive (e.g., laparoscopy) or time-consuming (e.g., manual analysis). Recently, deep learning methods make many efforts in automatic analysis of medical images. Despite the remarkable progress, most of them underestimated the im…
▽ More
In the treatment of ovarian cancer, precise residual disease prediction is significant for clinical and surgical decision-making. However, traditional methods are either invasive (e.g., laparoscopy) or time-consuming (e.g., manual analysis). Recently, deep learning methods make many efforts in automatic analysis of medical images. Despite the remarkable progress, most of them underestimated the importance of 3D image information of disease, which might brings a limited performance for residual disease prediction, especially in small-scale datasets. To this end, in this paper, we propose a novel Multi-View Attention Learning (MuVAL) method for residual disease prediction, which focuses on the comprehensive learning of 3D Computed Tomography (CT) images in a multi-view manner. Specifically, we first obtain multi-view of 3D CT images from transverse, coronal and sagittal views. To better represent the image features in a multi-view manner, we further leverage attention mechanism to help find the more relevant slices in each view. Extensive experiments on a dataset of 111 patients show that our method outperforms existing deep-learning methods.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
DeSAM: Decoupling Segment Anything Model for Generalizable Medical Image Segmentation
Authors:
Yifan Gao,
Wei Xia,
Dingdu Hu,
Xin Gao
Abstract:
Deep learning based automatic medical image segmentation models often suffer from domain shift, where the models trained on a source domain do not generalize well to other unseen domains. As a vision foundation model with powerful generalization capabilities, Segment Anything Model (SAM) shows potential for improving the cross-domain robustness of medical image segmentation. However, SAM and its f…
▽ More
Deep learning based automatic medical image segmentation models often suffer from domain shift, where the models trained on a source domain do not generalize well to other unseen domains. As a vision foundation model with powerful generalization capabilities, Segment Anything Model (SAM) shows potential for improving the cross-domain robustness of medical image segmentation. However, SAM and its fine-tuned models performed significantly worse in fully automatic mode compared to when given manual prompts. Upon further investigation, we discovered that the degradation in performance was related to the coupling effect of poor prompts and mask segmentation. In fully automatic mode, the presence of inevitable poor prompts (such as points outside the mask or boxes significantly larger than the mask) can significantly mislead mask generation. To address the coupling effect, we propose the decoupling SAM (DeSAM). DeSAM modifies SAM's mask decoder to decouple mask generation and prompt embeddings while leveraging pre-trained weights. We conducted experiments on publicly available prostate cross-site datasets. The results show that DeSAM improves dice score by an average of 8.96% (from 70.06% to 79.02%) compared to previous state-of-the-art domain generalization method. Moreover, DeSAM can be trained on personal devices with entry-level GPU since our approach does not rely on tuning the heavyweight image encoder. The code is publicly available at https://github.com/yifangao112/DeSAM.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
SAR-to-Optical Image Translation via Thermodynamics-inspired Network
Authors:
Ming** Zhang,
Jiamin Xu,
Chengyu He,
Wenteng Shang,
Yunsong Li,
Xinbo Gao
Abstract:
Synthetic aperture radar (SAR) is prevalent in the remote sensing field but is difficult to interpret in human visual perception. Recently, SAR-to-optical (S2O) image conversion methods have provided a prospective solution for interpretation. However, since there is a huge domain difference between optical and SAR images, they suffer from low image quality and geometric distortion in the produced…
▽ More
Synthetic aperture radar (SAR) is prevalent in the remote sensing field but is difficult to interpret in human visual perception. Recently, SAR-to-optical (S2O) image conversion methods have provided a prospective solution for interpretation. However, since there is a huge domain difference between optical and SAR images, they suffer from low image quality and geometric distortion in the produced optical images. Motivated by the analogy between pixels during the S2O image translation and molecules in a heat field, Thermodynamics-inspired Network for SAR-to-Optical Image Translation (S2O-TDN) is proposed in this paper. Specifically, we design a Third-order Finite Difference (TFD) residual structure in light of the TFD equation of thermodynamics, which allows us to efficiently extract inter-domain invariant features and facilitate the learning of the nonlinear translation map**. In addition, we exploit the first law of thermodynamics (FLT) to devise an FLT-guided branch that promotes the state transition of the feature values from the unstable diffusion state to the stable one, aiming to regularize the feature diffusion and preserve image structures during S2O image translation. S2O-TDN follows an explicit design principle derived from thermodynamic theory and enjoys the advantage of explainability. Experiments on the public SEN1-2 dataset show the advantages of the proposed S2O-TDN over the current methods with more delicate textures and higher quantitative results.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Informative Data Selection with Uncertainty for Multi-modal Object Detection
Authors:
Xinyu Zhang,
Zhiwei Li,
Zhenhong Zou,
Xin Gao,
Yi** Xiong,
Dafeng **,
Jun Li,
Hua** Liu
Abstract:
Noise has always been nonnegligible trouble in object detection by creating confusion in model reasoning, thereby reducing the informativeness of the data. It can lead to inaccurate recognition due to the shift in the observed pattern, that requires a robust generalization of the models. To implement a general vision model, we need to develop deep learning models that can adaptively select valid i…
▽ More
Noise has always been nonnegligible trouble in object detection by creating confusion in model reasoning, thereby reducing the informativeness of the data. It can lead to inaccurate recognition due to the shift in the observed pattern, that requires a robust generalization of the models. To implement a general vision model, we need to develop deep learning models that can adaptively select valid information from multi-modal data. This is mainly based on two reasons. Multi-modal learning can break through the inherent defects of single-modal data, and adaptive information selection can reduce chaos in multi-modal data. To tackle this problem, we propose a universal uncertainty-aware multi-modal fusion model. It adopts a multi-pipeline loosely coupled architecture to combine the features and results from point clouds and images. To quantify the correlation in multi-modal information, we model the uncertainty, as the inverse of data information, in different modalities and embed it in the bounding box generation. In this way, our model reduces the randomness in fusion and generates reliable output. Moreover, we conducted a completed investigation on the KITTI 2D object detection dataset and its derived dirty data. Our fusion model is proven to resist severe noise interference like Gaussian, motion blur, and frost, with only slight degradation. The experiment results demonstrate the benefits of our adaptive fusion. Our analysis on the robustness of multi-modal fusion will provide further insights for future research.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
SAWU-Net: Spatial Attention Weighted Unmixing Network for Hyperspectral Images
Authors:
Lin Qi,
Xuewen Qin,
Feng Gao,
Junyu Dong,
Xinbo Gao
Abstract:
Hyperspectral unmixing is a critical yet challenging task in hyperspectral image interpretation. Recently, great efforts have been made to solve the hyperspectral unmixing task via deep autoencoders. However, existing networks mainly focus on extracting spectral features from mixed pixels, and the employment of spatial feature prior knowledge is still insufficient. To this end, we put forward a sp…
▽ More
Hyperspectral unmixing is a critical yet challenging task in hyperspectral image interpretation. Recently, great efforts have been made to solve the hyperspectral unmixing task via deep autoencoders. However, existing networks mainly focus on extracting spectral features from mixed pixels, and the employment of spatial feature prior knowledge is still insufficient. To this end, we put forward a spatial attention weighted unmixing network, dubbed as SAWU-Net, which learns a spatial attention network and a weighted unmixing network in an end-to-end manner for better spatial feature exploitation. In particular, we design a spatial attention module, which consists of a pixel attention block and a window attention block to efficiently model pixel-based spectral information and patch-based spatial information, respectively. While in the weighted unmixing framework, the central pixel abundance is dynamically weighted by the coarse-grained abundances of surrounding pixels. In addition, SAWU-Net generates dynamically adaptive spatial weights through the spatial attention mechanism, so as to dynamically integrate surrounding pixels more effectively. Experimental results on real and synthetic datasets demonstrate the better accuracy and superiority of SAWU-Net, which reflects the effectiveness of the proposed spatial attention mechanism.
△ Less
Submitted 22 April, 2023;
originally announced April 2023.
-
SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual Large Language Model
Authors:
Juexiao Zhou,
Xiaonan He,
Liyuan Sun,
Jiannan Xu,
Xiuying Chen,
Yuetan Chu,
Longxi Zhou,
Xingyu Liao,
Bin Zhang,
Xin Gao
Abstract:
Skin and subcutaneous diseases rank high among the leading contributors to the global burden of nonfatal diseases, impacting a considerable portion of the population. Nonetheless, the field of dermatology diagnosis faces three significant hurdles. Firstly, there is a shortage of dermatologists accessible to diagnose patients, particularly in rural regions. Secondly, accurately interpreting skin di…
▽ More
Skin and subcutaneous diseases rank high among the leading contributors to the global burden of nonfatal diseases, impacting a considerable portion of the population. Nonetheless, the field of dermatology diagnosis faces three significant hurdles. Firstly, there is a shortage of dermatologists accessible to diagnose patients, particularly in rural regions. Secondly, accurately interpreting skin disease images poses a considerable challenge. Lastly, generating patient-friendly diagnostic reports is usually a time-consuming and labor-intensive task for dermatologists. To tackle these challenges, we present SkinGPT-4, which is the world's first interactive dermatology diagnostic system powered by an advanced visual large language model. SkinGPT-4 leverages a fine-tuned version of MiniGPT-4, trained on an extensive collection of skin disease images (comprising 52,929 publicly available and proprietary images) along with clinical concepts and doctors' notes. We designed a two-step training process to allow SkinGPT to express medical features in skin disease images with natural language and make accurate diagnoses of the types of skin diseases. With SkinGPT-4, users could upload their own skin photos for diagnosis, and the system could autonomously evaluate the images, identifies the characteristics and categories of the skin conditions, performs in-depth analysis, and provides interactive treatment recommendations. Meanwhile, SkinGPT-4's local deployment capability and commitment to user privacy also render it an appealing choice for patients in search of a dependable and precise diagnosis of their skin ailments. To demonstrate the robustness of SkinGPT-4, we conducted quantitative evaluations on 150 real-life cases, which were independently reviewed by certified dermatologists, and showed that SkinGPT-4 could provide accurate diagnoses of skin diseases.
△ Less
Submitted 8 June, 2023; v1 submitted 20 April, 2023;
originally announced April 2023.
-
Efficient automatic segmentation for multi-level pulmonary arteries: The PARSE challenge
Authors:
Gongning Luo,
Kuanquan Wang,
Jun Liu,
Shuo Li,
Xinjie Liang,
Xiangyu Li,
Shaowei Gan,
Wei Wang,
Suyu Dong,
Wenyi Wang,
Pengxin Yu,
Enyou Liu,
Hongrong Wei,
Na Wang,
Jia Guo,
Huiqi Li,
Zhao Zhang,
Ziwei Zhao,
Na Gao,
Nan An,
Ashkan Pakzad,
Bojidar Rangelov,
Jiaqi Dou,
Song Tian,
Zeyu Liu
, et al. (5 additional authors not shown)
Abstract:
Efficient automatic segmentation of multi-level (i.e. main and branch) pulmonary arteries (PA) in CTPA images plays a significant role in clinical applications. However, most existing methods concentrate only on main PA or branch PA segmentation separately and ignore segmentation efficiency. Besides, there is no public large-scale dataset focused on PA segmentation, which makes it highly challengi…
▽ More
Efficient automatic segmentation of multi-level (i.e. main and branch) pulmonary arteries (PA) in CTPA images plays a significant role in clinical applications. However, most existing methods concentrate only on main PA or branch PA segmentation separately and ignore segmentation efficiency. Besides, there is no public large-scale dataset focused on PA segmentation, which makes it highly challenging to compare the different methods. To benchmark multi-level PA segmentation algorithms, we organized the first \textbf{P}ulmonary \textbf{AR}tery \textbf{SE}gmentation (PARSE) challenge. On the one hand, we focus on both the main PA and the branch PA segmentation. On the other hand, for better clinical application, we assign the same score weight to segmentation efficiency (mainly running time and GPU memory consumption during inference) while ensuring PA segmentation accuracy. We present a summary of the top algorithms and offer some suggestions for efficient and accurate multi-level PA automatic segmentation. We provide the PARSE challenge as open-access for the community to benchmark future algorithm developments at \url{https://parse2022.grand-challenge.org/Parse2022/}.
△ Less
Submitted 7 April, 2023;
originally announced April 2023.
-
Precoder Design for Massive MIMO Downlink with Matrix Manifold Optimization
Authors:
Rui Sun,
Chen Wang,
An-An Lu,
Xiqi Gao,
Xiang-Gen Xia
Abstract:
We investigate the weighted sum-rate (WSR) maximization linear precoder design for massive multiple-input multiple-output (MIMO) downlink. We consider a single-cell system with multiple users and propose a unified matrix manifold optimization framework applicable to total power constraint (TPC), per-user power constraint (PUPC) and per-antenna power constraint (PAPC). We prove that the precoders u…
▽ More
We investigate the weighted sum-rate (WSR) maximization linear precoder design for massive multiple-input multiple-output (MIMO) downlink. We consider a single-cell system with multiple users and propose a unified matrix manifold optimization framework applicable to total power constraint (TPC), per-user power constraint (PUPC) and per-antenna power constraint (PAPC). We prove that the precoders under TPC, PUPC and PAPC are on distinct Riemannian submanifolds, and transform the constrained problems in Euclidean space to unconstrained ones on manifolds. In accordance with this, we derive Riemannian ingredients, including orthogonal projection, Riemannian gradient, Riemannian Hessian, retraction and vector transport, which are needed for precoder design in the matrix manifold framework. Then, Riemannian design methods using Riemannian steepest descent, Riemannian conjugate gradient and Riemannian trust region are provided to design the WSR-maximization precoders under TPC, PUPC or PAPC. Riemannian methods do not involve the inverses of the large dimensional matrices during the iterations, reducing the computational complexities of the algorithms. Complexity analyses and performance simulations demonstrate the advantages of the proposed precoder design.
△ Less
Submitted 10 April, 2024; v1 submitted 31 March, 2023;
originally announced April 2023.
-
On the Road to 6G: Visions, Requirements, Key Technologies and Testbeds
Authors:
Cheng-Xiang Wang,
Xiaohu You,
Xiqi Gao,
Xiuming Zhu,
Zixin Li,
Chuan Zhang,
Haiming Wang,
Yongming Huang,
Yunfei Chen,
Harald Haas,
John S. Thompson,
Erik G. Larsson,
Marco Di Renzo,
Wen Tong,
Peiying Zhu,
Xuemin,
Shen,
H. Vincent Poor,
Lajos Hanzo
Abstract:
Fifth generation (5G) mobile communication systems have entered the stage of commercial development, providing users with new services and improved user experiences as well as offering a host of novel opportunities to various industries. However, 5G still faces many challenges. To address these challenges, international industrial, academic, and standards organizations have commenced research on s…
▽ More
Fifth generation (5G) mobile communication systems have entered the stage of commercial development, providing users with new services and improved user experiences as well as offering a host of novel opportunities to various industries. However, 5G still faces many challenges. To address these challenges, international industrial, academic, and standards organizations have commenced research on sixth generation (6G) wireless communication systems. A series of white papers and survey papers have been published, which aim to define 6G in terms of requirements, application scenarios, key technologies, etc. Although ITU-R has been working on the 6G vision and it is expected to reach a consensus on what 6G will be by mid-2023, the related global discussions are still wide open and the existing literature has identified numerous open issues. This paper first provides a comprehensive portrayal of the 6G vision, technical requirements, and application scenarios, covering the current common understanding of 6G. Then, a critical appraisal of the 6G network architecture and key technologies is presented. Furthermore, existing testbeds and advanced 6G verification platforms are detailed for the first time. In addition, future research directions and open challenges are identified for stimulating the on-going global debate. Finally, lessons learned to date concerning 6G networks are discussed.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Personalized and privacy-preserving federated heterogeneous medical image analysis with PPPML-HMI
Authors:
Juexiao Zhou,
Longxi Zhou,
Di Wang,
Xiaopeng Xu,
Haoyang Li,
Yuetan Chu,
Wenkai Han,
Xin Gao
Abstract:
Heterogeneous data is endemic due to the use of diverse models and settings of devices by hospitals in the field of medical imaging. However, there are few open-source frameworks for federated heterogeneous medical image analysis with personalization and privacy protection simultaneously without the demand to modify the existing model structures or to share any private data. In this paper, we prop…
▽ More
Heterogeneous data is endemic due to the use of diverse models and settings of devices by hospitals in the field of medical imaging. However, there are few open-source frameworks for federated heterogeneous medical image analysis with personalization and privacy protection simultaneously without the demand to modify the existing model structures or to share any private data. In this paper, we proposed PPPML-HMI, an open-source learning paradigm for personalized and privacy-preserving federated heterogeneous medical image analysis. To our best knowledge, personalization and privacy protection were achieved simultaneously for the first time under the federated scenario by integrating the PerFedAvg algorithm and designing our novel cyclic secure aggregation with the homomorphic encryption algorithm. To show the utility of PPPML-HMI, we applied it to a simulated classification task namely the classification of healthy people and patients from the RAD-ChestCT Dataset, and one real-world segmentation task namely the segmentation of lung infections from COVID-19 CT scans. For the real-world task, PPPML-HMI achieved $\sim$5\% higher Dice score on average compared to conventional FL under the heterogeneous scenario. Meanwhile, we applied the improved deep leakage from gradients to simulate adversarial attacks and showed the solid privacy-preserving capability of PPPML-HMI. By applying PPPML-HMI to both tasks with different neural networks, a varied number of users, and sample sizes, we further demonstrated the strong robustness of PPPML-HMI.
△ Less
Submitted 20 February, 2023;
originally announced February 2023.
-
Reconfigurable Massive MIMO: Harnessing the Power of the Electromagnetic Domain for Enhanced Information Transfer
Authors:
Keke Ying,
Zhen Gao,
Sheng Chen,
Xinyu Gao,
Michail Matthaiou,
Rui Zhang,
Robert Schober
Abstract:
The capacity of commercial massive multiple-input multiple-output (mMIMO) systems is constrained by the limited array aperture at the base station, and cannot meet the ever-increasing traffic demands of wireless networks. Given the array aperture, holographic MIMO with infinitesimal antenna spacing can maximize the capacity, but is physically unrealizable. As a promising alternative, reconfigurabl…
▽ More
The capacity of commercial massive multiple-input multiple-output (mMIMO) systems is constrained by the limited array aperture at the base station, and cannot meet the ever-increasing traffic demands of wireless networks. Given the array aperture, holographic MIMO with infinitesimal antenna spacing can maximize the capacity, but is physically unrealizable. As a promising alternative, reconfigurable mMIMO is proposed to harness the unexploited power of the electromagnetic (EM) domain for enhanced information transfer. Specifically, the reconfigurable pixel antenna technology provides each antenna with an adjustable EM radiation (EMR) pattern, introducing extra degrees of freedom for information transfer in the EM domain. In this article, we present the concept and benefits of availing the EMR domain for mMIMO transmission. Moreover, we propose a viable architecture for reconfigurable mMIMO systems, and the associated system model and downlink precoding are also discussed. In particular, a three-level precoding scheme is proposed, and simulation results verify its considerable spectral and energy efficiency advantages compared to traditional mMIMO systems. Finally, we further discuss the challenges, insights, and prospects of deploying reconfigurable mMIMO, along with the associated hardware, algorithms, and fundamental theory.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
Self-supervised phase unwrap** in fringe projection profilometry
Authors:
Xiaomin Gao,
Wanzhong Song,
Chunqian Tan,
Junzhe Lei
Abstract:
Fast-speed and high-accuracy three-dimensional (3D) shape measurement has been the goal all along in fringe projection profilometry (FPP). The dual-frequency temporal phase unwrap** method (DF-TPU) is one of the prominent technologies to achieve this goal. However, the period number of the high-frequency pattern of existing DF-TPU approaches is usually limited by the inevitable phase errors, set…
▽ More
Fast-speed and high-accuracy three-dimensional (3D) shape measurement has been the goal all along in fringe projection profilometry (FPP). The dual-frequency temporal phase unwrap** method (DF-TPU) is one of the prominent technologies to achieve this goal. However, the period number of the high-frequency pattern of existing DF-TPU approaches is usually limited by the inevitable phase errors, setting a limit to measurement accuracy. Deep-learning-based phase unwrap** methods for single-camera FPP usually require labeled data for training. In this letter, a novel self-supervised phase unwrap** method for single-camera FPP systems is proposed. The trained network can retrieve the absolute fringe order from one phase map of 64-period and overperform DF-TPU approaches in terms of depth accuracy. Experimental results demonstrate the validation of the proposed method on real scenes of motion blur, isolated objects, low reflectivity, and phase discontinuity.
△ Less
Submitted 30 May, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Characterization and Generation of 3D Realistic Geological Particles with Metaball Descriptor based on X-Ray Computed Tomography
Authors:
Yifeng Zhao,
Xiangbo Gao,
Pei Zhang,
Liang Lei,
S. A. Galindo-Torres,
Stan Z. Li
Abstract:
The morphology of geological particles is crucial in determining its granular characteristics and assembly responses. In this paper, Metaball-function based solutions are proposed for morphological characterization and generation of three-dimensional realistic particles according to the X-ray Computed Tomography (XRCT) images. For characterization, we develop a geometric-based Metaball-Imaging alg…
▽ More
The morphology of geological particles is crucial in determining its granular characteristics and assembly responses. In this paper, Metaball-function based solutions are proposed for morphological characterization and generation of three-dimensional realistic particles according to the X-ray Computed Tomography (XRCT) images. For characterization, we develop a geometric-based Metaball-Imaging algorithm. This algorithm can capture the main contour of parental particles with a series of non-overlap** spheres and refine surface-texture details through gradient search. Four types of particles, hundreds of samples, are applied for evaluations. The result shows good matches on key morphological indicators(i.e., volume, surface area, sphericity, circularity, corey-shape factor, nominal diameter and surface-equivalent-sphere diameter), confirming its characterization precision. For generation, we propose the Metaball Variational Autoencoder. Assisted by deep neural networks, this method can generate 3D particles in Metaball form, while retaining coessential morphological features with parental particles. Additionally, this method allows for control over the generated shapes through an arithmetic pattern, enabling the generation of particles with specific shapes. Two sets of XRCT images different in sample number and geometric features are chosen as parental data. On each training set, one thousand particles are generated for validations. The generation fidelity is demonstrated through comparisons of morphologies and shape-feature distributions between generated and parental particles. Examples are also provided to demonstrate controllability on the generated shapes. With Metaball-based simulations frameworks previously proposed by the authors, these methods have the potential to provide valuable insights into the properties and behavior of actual geological particles.
△ Less
Submitted 5 February, 2023;
originally announced February 2023.
-
Rate-Splitting Multiple Access for Uplink Massive MIMO With Electromagnetic Exposure Constraints
Authors:
Hanyu Jiang,
Li You,
Ahmed Elzanaty,
Jue Wang,
Wen** Wang,
Xiqi Gao,
Mohamed-Slim Alouini
Abstract:
Over the past few years, the prevalence of wireless devices has become one of the essential sources of electromagnetic (EM) radiation to the public. Facing with the swift development of wireless communications, people are skeptical about the risks of long-term exposure to EM radiation. As EM exposure is required to be restricted at user terminals, it is inefficient to blindly decrease the transmit…
▽ More
Over the past few years, the prevalence of wireless devices has become one of the essential sources of electromagnetic (EM) radiation to the public. Facing with the swift development of wireless communications, people are skeptical about the risks of long-term exposure to EM radiation. As EM exposure is required to be restricted at user terminals, it is inefficient to blindly decrease the transmit power, which leads to limited spectral efficiency and energy efficiency (EE). Recently, rate-splitting multiple access (RSMA) has been proposed as an effective way to provide higher wireless transmission performance, which is a promising technology for future wireless communications. To this end, we propose using RSMA to increase the EE of massive MIMO uplink while limiting the EM exposure of users. In particularly, we investigate the optimization of the transmit covariance matrices and decoding order using statistical channel state information (CSI). The problem is formulated as non-convex mixed integer program, which is in general difficult to handle. We first propose a modified water-filling scheme to obtain the transmit covariance matrices with fixed decoding order. Then, a greedy approach is proposed to obtain the decoding permutation. Numerical results verify the effectiveness of the proposed EM exposure-aware EE maximization scheme for uplink RSMA.
△ Less
Submitted 13 December, 2022;
originally announced December 2022.
-
Self-Transriber: Few-shot Lyrics Transcription with Self-training
Authors:
Xiaoxue Gao,
Xianghu Yue,
Haizhou Li
Abstract:
The current lyrics transcription approaches heavily rely on supervised learning with labeled data, but such data are scarce and manual labeling of singing is expensive. How to benefit from unlabeled data and alleviate limited data problem have not been explored for lyrics transcription. We propose the first semi-supervised lyrics transcription paradigm, Self-Transcriber, by leveraging on unlabeled…
▽ More
The current lyrics transcription approaches heavily rely on supervised learning with labeled data, but such data are scarce and manual labeling of singing is expensive. How to benefit from unlabeled data and alleviate limited data problem have not been explored for lyrics transcription. We propose the first semi-supervised lyrics transcription paradigm, Self-Transcriber, by leveraging on unlabeled data using self-training with noisy student augmentation. We attempt to demonstrate the possibility of lyrics transcription with a few amount of labeled data. Self-Transcriber generates pseudo labels of the unlabeled singing using teacher model, and augments pseudo-labels to the labeled data for student model update with both self-training and supervised training losses. This work closes the gap between supervised and semi-supervised learning as well as opens doors for few-shot learning of lyrics transcription. Our experiments show that our approach using only 12.7 hours of labeled data achieves competitive performance compared with the supervised approaches trained on 149.1 hours of labeled data for lyrics transcription.
△ Less
Submitted 2 March, 2023; v1 submitted 18 November, 2022;
originally announced November 2022.
-
Prediction of Geometric Transformation on Cardiac MRI via Convolutional Neural Network
Authors:
Xin Gao
Abstract:
In the field of medical image, deep convolutional neural networks(ConvNets) have achieved great success in the classification, segmentation, and registration tasks thanks to their unparalleled capacity to learn image features. However, these tasks often require large amounts of manually annotated data and are labor-intensive. Therefore, it is of significant importance for us to study unsupervised…
▽ More
In the field of medical image, deep convolutional neural networks(ConvNets) have achieved great success in the classification, segmentation, and registration tasks thanks to their unparalleled capacity to learn image features. However, these tasks often require large amounts of manually annotated data and are labor-intensive. Therefore, it is of significant importance for us to study unsupervised semantic feature learning tasks. In our work, we propose to learn features in medical images by training ConvNets to recognize the geometric transformation applied to images and present a simple self-supervised task that can easily predict the geometric transformation. We precisely define a set of geometric transformations in mathematical terms and generalize this model to 3D, taking into account the distinction between spatial and time dimensions. We evaluated our self-supervised method on CMR images of different modalities (bSSFP, T2, LGE) and achieved accuracies of 96.4%, 97.5%, and 96.4%, respectively. The code and models of our paper will be published on: https://github.com/gaoxin492/Geometric_Transformation_CMR
△ Less
Submitted 12 November, 2022;
originally announced November 2022.
-
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text
Authors:
Xianghu Yue,
Junyi Ao,
Xiaoxue Gao,
Haizhou Li
Abstract:
Self-supervised pre-training has been successful in both text and speech processing. Speech and text offer different but complementary information. The question is whether we are able to perform a speech-text joint pre-training on unpaired speech and text. In this paper, we take the idea of self-supervised pre-training one step further and propose token2vec, a novel joint pre-training framework fo…
▽ More
Self-supervised pre-training has been successful in both text and speech processing. Speech and text offer different but complementary information. The question is whether we are able to perform a speech-text joint pre-training on unpaired speech and text. In this paper, we take the idea of self-supervised pre-training one step further and propose token2vec, a novel joint pre-training framework for unpaired speech and text based on discrete representations of speech. Firstly, due to the distinct characteristics between speech and text modalities, where speech is continuous while text is discrete, we first discretize speech into a sequence of discrete speech tokens to solve the modality mismatch problem. Secondly, to solve the length mismatch problem, where the speech sequence is usually much longer than text sequence, we convert the words of text into phoneme sequences and randomly repeat each phoneme in the sequences. Finally, we feed the discrete speech and text tokens into a modality-agnostic Transformer encoder and pre-train with token-level masking language modeling (tMLM). Experiments show that token2vec is significantly superior to various speech-only pre-training baselines, with up to 17.7% relative WER reduction. Token2vec model is also validated on a non-ASR task, i.e., spoken intent classification, and shows good transferability.
△ Less
Submitted 30 October, 2022;
originally announced October 2022.
-
Scale-free and Task-agnostic Attack: Generating Photo-realistic Adversarial Patterns with Patch Quilting Generator
Authors:
Xiangbo Gao,
Cheng Luo,
Qinliang Lin,
Weicheng Xie,
Minmin Liu,
Linlin Shen,
Keerthy Kusumam,
Siyang Song
Abstract:
\noindent Traditional L_p norm-restricted image attack algorithms suffer from poor transferability to black box scenarios and poor robustness to defense algorithms. Recent CNN generator-based attack approaches can synthesize unrestricted and semantically meaningful entities to the image, which is shown to be transferable and robust. However, such methods attack images by either synthesizing local…
▽ More
\noindent Traditional L_p norm-restricted image attack algorithms suffer from poor transferability to black box scenarios and poor robustness to defense algorithms. Recent CNN generator-based attack approaches can synthesize unrestricted and semantically meaningful entities to the image, which is shown to be transferable and robust. However, such methods attack images by either synthesizing local adversarial entities, which are only suitable for attacking specific contents or performing global attacks, which are only applicable to a specific image scale. In this paper, we propose a novel Patch Quilting Generative Adversarial Networks (PQ-GAN) to learn the first scale-free CNN generator that can be applied to attack images with arbitrary scales for various computer vision tasks. The principal investigation on transferability of the generated adversarial examples, robustness to defense frameworks, and visual quality assessment show that the proposed PQG-based attack framework outperforms the other nine state-of-the-art adversarial attack approaches when attacking the neural networks trained on two standard evaluation datasets (i.e., ImageNet and CityScapes).
△ Less
Submitted 19 November, 2022; v1 submitted 12 August, 2022;
originally announced August 2022.
-
Channel Estimation for LEO Satellite Massive MIMO OFDM Communications
Authors:
Ke-Xin Li,
Xiqi Gao,
Xiang-Gen Xia
Abstract:
In this paper, we investigate the massive multiple-input multiple-output orthogonal frequency division multiplexing channel estimation for low-earth-orbit satellite communication systems. First, we use the angle-delay domain channel to characterize the space-frequency domain channel. Then, we show that the asymptotic minimum mean square error (MMSE) of the channel estimation can be minimized if th…
▽ More
In this paper, we investigate the massive multiple-input multiple-output orthogonal frequency division multiplexing channel estimation for low-earth-orbit satellite communication systems. First, we use the angle-delay domain channel to characterize the space-frequency domain channel. Then, we show that the asymptotic minimum mean square error (MMSE) of the channel estimation can be minimized if the array response vectors of the user terminals (UTs) that use the same pilot are orthogonal. Inspired by this, we design an efficient graph-based pilot allocation strategy to enhance the channel estimation performance. In addition, we devise a novel two-stage channel estimation (TSCE) approach, in which the received signals at the satellite are manipulated with per-subcarrier space domain processing followed by per-user frequency domain processing. Moreover, the space domain processing of each UT is shown to be identical for all the subcarriers, and an asymptotically optimal vector for the per-subcarrier space domain linear processing is derived. The frequency domain processing can be efficiently implemented by means of the fast Toeplitz system solver. Simulation results show that the proposed TSCE approach can achieve a near performance to the MMSE estimation with much lower complexity.
△ Less
Submitted 12 March, 2023; v1 submitted 25 July, 2022;
originally announced July 2022.
-
PoLyScriber: Integrated Fine-tuning of Extractor and Lyrics Transcriber for Polyphonic Music
Authors:
Xiaoxue Gao,
Chitralekha Gupta,
Haizhou Li
Abstract:
Lyrics transcription of polyphonic music is challenging as the background music affects lyrics intelligibility. Typically, lyrics transcription can be performed by a two-step pipeline, i.e. a singing vocal extraction front end, followed by a lyrics transcriber back end, where the front end and back end are trained separately. Such a two-step pipeline suffers from both imperfect vocal extraction an…
▽ More
Lyrics transcription of polyphonic music is challenging as the background music affects lyrics intelligibility. Typically, lyrics transcription can be performed by a two-step pipeline, i.e. a singing vocal extraction front end, followed by a lyrics transcriber back end, where the front end and back end are trained separately. Such a two-step pipeline suffers from both imperfect vocal extraction and mismatch between front end and back end. In this work, we propose a novel end-to-end integrated fine-tuning framework, that we call PoLyScriber, to globally optimize the vocal extractor front end and lyrics transcriber back end for lyrics transcription in polyphonic music. The experimental results show that our proposed PoLyScriber achieves substantial improvements over the existing approaches on publicly available test datasets.
△ Less
Submitted 5 May, 2023; v1 submitted 15 July, 2022;
originally announced July 2022.
-
Massive MIMO Hybrid Precoding for LEO Satellite Communications With Twin-Resolution Phase Shifters and Nonlinear Power Amplifiers
Authors:
Li You,
Xiaoyu Qiang,
Ke-Xin Li,
Christos G. Tsinos,
Wen** Wang,
Xiqi Gao,
Björn Ottersten
Abstract:
The massive multiple-input multiple-output (MIMO) transmission technology has recently attracted much attention in the non-geostationary, e.g., low earth orbit (LEO) satellite communication (SATCOM) systems since it can significantly improve the energy efficiency (EE) and spectral efficiency. In this work, we develop a hybrid analog/digital precoding technique in the massive MIMO LEO SATCOM downli…
▽ More
The massive multiple-input multiple-output (MIMO) transmission technology has recently attracted much attention in the non-geostationary, e.g., low earth orbit (LEO) satellite communication (SATCOM) systems since it can significantly improve the energy efficiency (EE) and spectral efficiency. In this work, we develop a hybrid analog/digital precoding technique in the massive MIMO LEO SATCOM downlink, which reduces the onboard hardware complexity and power consumption. In the proposed scheme, the analog precoder is implemented via a more practical twin-resolution phase shifting (TRPS) network to make a meticulous tradeoff between the power consumption and array gain. In addition, we consider and study the impact of the distortion effect of the nonlinear power amplifiers (NPAs) in the system design. By jointly considering all the above factors, we propose an efficient algorithmic approach for the TRPS-based hybrid precoding problem with NPAs. Numerical results show the EE gains considering the nonlinear distortion and the performance superiority of the proposed TRPS-based hybrid precoding scheme over the baselines.
△ Less
Submitted 8 June, 2022;
originally announced June 2022.
-
Pervasive wireless channel modeling theory and applications to 6G GBSMs for all frequency bands and all scenarios
Authors:
Cheng-Xiang Wang,
Zhen Lv,
Xiqi Gao,
Xiaohu You,
Yang Hao,
Harald Haas
Abstract:
In this paper, a pervasive wireless channel modeling theory is first proposed, which uses a unified channel modeling method and a unified equation of channel impulse response (CIR), and can integrate important channel characteristics at different frequency bands and scenarios. Then, we apply the proposed theory to a three dimensional (3D) space-time-frequency (STF) non-stationary geometry-based st…
▽ More
In this paper, a pervasive wireless channel modeling theory is first proposed, which uses a unified channel modeling method and a unified equation of channel impulse response (CIR), and can integrate important channel characteristics at different frequency bands and scenarios. Then, we apply the proposed theory to a three dimensional (3D) space-time-frequency (STF) non-stationary geometry-based stochastic model (GBSM) for the sixth generation (6G) wireless communication systems. The proposed 6G pervasive channel model (6GPCM) can characterize statistical properties of channels at all frequency bands from sub-6 GHz to visible light communication (VLC) bands and all scenarios such as unmanned aerial vehicle (UAV), maritime, (ultra-)massive multiple-input multiple-output (MIMO), reconfigurable intelligent surface (RIS), and industry Internet of things (IIoT) scenarios. By adjusting channel model parameters, the 6GPCM can be reduced to various simplified channel models for specific frequency bands and scenarios. Also, it includes standard fifth generation (5G) channel models as special cases. In addition, key statistical properties of the proposed 6GPCM are derived, simulated, and verified by various channel measurement results, which clearly demonstrates its accuracy, pervasiveness, and applicability.
△ Less
Submitted 6 June, 2022;
originally announced June 2022.