Search | arXiv e-print repository

Advancements in Feature Extraction Recognition of Medical Imaging Systems Through Deep Learning Technique

Authors: Qishi Zhan, Dan Sun, Erdi Gao, Yuhan Ma, Yaxin Liang, Haowei Yang

Abstract: This study introduces a novel unsupervised medical image feature extraction method that employs spatial stratification techniques. An objective function based on weight is proposed to achieve the purpose of fast image recognition. The algorithm divides the pixels of the image into multiple subdomains and uses a quadtree to access the image. A technique for threshold optimization utilizing a simple… ▽ More This study introduces a novel unsupervised medical image feature extraction method that employs spatial stratification techniques. An objective function based on weight is proposed to achieve the purpose of fast image recognition. The algorithm divides the pixels of the image into multiple subdomains and uses a quadtree to access the image. A technique for threshold optimization utilizing a simplex algorithm is presented. Aiming at the nonlinear characteristics of hyperspectral images, a generalized discriminant analysis algorithm based on kernel function is proposed. In this project, a hyperspectral remote sensing image is taken as the object, and we investigate its mathematical modeling, solution methods, and feature extraction techniques. It is found that different types of objects are independent of each other and compact in image processing. Compared with the traditional linear discrimination method, the result of image segmentation is better. This method can not only overcome the disadvantage of the traditional method which is easy to be affected by light, but also extract the features of the object quickly and accurately. It has important reference significance for clinical diagnosis. △ Less

Submitted 23 May, 2024; originally announced June 2024.

Comments: conference

arXiv:2406.16381 [pdf, other]

Polar-Coded Tensor-Based Unsourced Random Access with Soft Decoding

Authors: Jiaqi Fang, Yan Liang, Gangle Sun, Hongwei Hou, Yafei Wang, Li You, Wen** Wang

Abstract: The unsourced random access (URA) has emerged as a viable scheme for supporting the massive machine-type communications (mMTC) in the sixth generation (6G) wireless networks. Notably, the tensor-based URA (TURA), with its inherent tensor structure, stands out by simultaneously enhancing performance and reducing computational complexity for the multi-user separation, especially in mMTC networks wit… ▽ More The unsourced random access (URA) has emerged as a viable scheme for supporting the massive machine-type communications (mMTC) in the sixth generation (6G) wireless networks. Notably, the tensor-based URA (TURA), with its inherent tensor structure, stands out by simultaneously enhancing performance and reducing computational complexity for the multi-user separation, especially in mMTC networks with a large numer of active devices. However, current TURA scheme lacks the soft decoder, thus precluding the incorporation of existing advanced coding techniques. In order to fully explore the potential of the TURA, this paper investigates the Polarcoded TURA (PTURA) scheme and develops the corresponding iterative Bayesian receiver with feedback (IBR-FB). Specifically, in the IBR-FB, we propose the Grassmannian modulation-aided Bayesian tensor decomposition (GM-BTD) algorithm under the variational Bayesian learning (VBL) framework, which leverages the property of the Grassmannian modulation to facilitate the convergence of the VBL process, and has the ability to generate the required soft information without the knowledge of the number of active devices. Furthermore, based on the soft information produced by the GM-BTD, we design the soft Grassmannian demodulator in the IBR-FB. Extensive simulation results demonstrate that the proposed PTURA in conjunction with the IBR-FB surpasses the existing state-of-the-art unsourced random access scheme in terms of accuracy and computational complexity. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2406.13358 [pdf, other]

Multi-scale Restoration of Missing Data in Optical Time-series Images with Masked Spatial-Temporal Attention Network

Authors: Zaiyan Zhang, **ing Yan, Yuanqi Liang, Jiaxin Feng, Haixu He, Wei Han

Abstract: Due to factors such as thick cloud cover and sensor limitations, remote sensing images often suffer from significant missing data, resulting in incomplete time-series information. Existing methods for imputing missing values in remote sensing images do not fully exploit spatio-temporal auxiliary information, leading to limited accuracy in restoration. Therefore, this paper proposes a novel deep le… ▽ More Due to factors such as thick cloud cover and sensor limitations, remote sensing images often suffer from significant missing data, resulting in incomplete time-series information. Existing methods for imputing missing values in remote sensing images do not fully exploit spatio-temporal auxiliary information, leading to limited accuracy in restoration. Therefore, this paper proposes a novel deep learning-based approach called MS2TAN (Multi-scale Masked Spatial-Temporal Attention Network), for reconstructing time-series remote sensing images. Firstly, we introduce an efficient spatio-temporal feature extractor based on Masked Spatial-Temporal Attention (MSTA), to obtain high-quality representations of the spatio-temporal neighborhood features in the missing regions. Secondly, a Multi-scale Restoration Network consisting of the MSTA-based Feature Extractors, is employed to progressively refine the missing values by exploring spatio-temporal neighborhood features at different scales. Thirdly, we propose a ``Pixel-Structure-Perception'' Multi-Objective Joint Optimization method to enhance the visual effects of the reconstruction results from multiple perspectives and preserve more texture structures. Furthermore, the proposed method reconstructs missing values in all input temporal phases in parallel (i.e., Multi-In Multi-Out), achieving higher processing efficiency. Finally, experimental evaluations on two typical missing data restoration tasks across multiple research areas demonstrate that the proposed method outperforms state-of-the-art methods with an improvement of 0.40dB/1.17dB in mean peak signal-to-noise ratio (mPSNR) and 3.77/9.41 thousandths in mean structural similarity (mSSIM), while exhibiting stronger texture and structural consistency. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.11163 [pdf, other]

Explainable Bayesian Recurrent Neural Smoother to Capture Global State Evolutionary Correlations

Authors: Shi Yan, Yan Liang, Huayu Zhang, Le Zheng, Difan Zou, Binglu Wang

Abstract: Through integrating the evolutionary correlations across global states in the bidirectional recursion, an explainable Bayesian recurrent neural smoother (EBRNS) is proposed for offline data-assisted fixed-interval state smoothing. At first, the proposed model, containing global states in the evolutionary interval, is transformed into an equivalent model with bidirectional memory. This transformati… ▽ More Through integrating the evolutionary correlations across global states in the bidirectional recursion, an explainable Bayesian recurrent neural smoother (EBRNS) is proposed for offline data-assisted fixed-interval state smoothing. At first, the proposed model, containing global states in the evolutionary interval, is transformed into an equivalent model with bidirectional memory. This transformation incorporates crucial global state information with support for bi-directional recursive computation. For the transformed model, the joint state-memory-trend Bayesian filtering and smoothing frameworks are derived by introducing the bidirectional memory iteration mechanism and offline data into Bayesian estimation theory. The derived frameworks are implemented using the Gaussian approximation to ensure analytical properties and computational efficiency. Finally, the neural network modules within EBRNS and its two-stage training scheme are designed. Unlike most existing approaches that artificially combine deep learning and model-based estimation, the bidirectional recursion and internal gated structures of EBRNS are naturally derived from Bayesian estimation theory, explainably integrating prior model knowledge, online measurement, and offline data. Experiments on representative real-world datasets demonstrate that the high smoothing accuracy of EBRNS is accompanied by data efficiency and a lightweight parameter scale. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.08837 [pdf]

Research on Deep Learning Model of Feature Extraction Based on Convolutional Neural Network

Authors: Houze Liu, Iris Li, Yaxin Liang, Dan Sun, Yining Yang, Haowei Yang

Abstract: Neural networks with relatively shallow layers and simple structures may have limited ability in accurately identifying pneumonia. In addition, deep neural networks also have a large demand for computing resources, which may cause convolutional neural networks to be unable to be implemented on terminals. Therefore, this paper will carry out the optimal classification of convolutional neural networ… ▽ More Neural networks with relatively shallow layers and simple structures may have limited ability in accurately identifying pneumonia. In addition, deep neural networks also have a large demand for computing resources, which may cause convolutional neural networks to be unable to be implemented on terminals. Therefore, this paper will carry out the optimal classification of convolutional neural networks. Firstly, according to the characteristics of pneumonia images, AlexNet and InceptionV3 were selected to obtain better image recognition results. Combining the features of medical images, the forward neural network with deeper and more complex structure is learned. Finally, knowledge extraction technology is used to extract the obtained data into the AlexNet model to achieve the purpose of improving computing efficiency and reducing computing costs. The results showed that the prediction accuracy, specificity, and sensitivity of the trained AlexNet model increased by 4.25 percentage points, 7.85 percentage points, and 2.32 percentage points, respectively. The graphics processing usage has decreased by 51% compared to the InceptionV3 mode. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2404.17175 [pdf, ps, other]

Over-the-Air Modulation for RIS-assisted Symbiotic Radios: Design, Analysis, and Optimization

Authors: Hu Zhou, Ying-Chang Liang, Chau Yuen

Abstract: In reconfigurable intelligent surface (RIS)-assisted symbiotic radio (SR), an RIS is exploited to assist the primary system and to simultaneously operate as a secondary transmitter by modulating its own information over the incident primary signal from the air. Such an operation is called over-the-air modulation. The existing modulation schemes such as on-off keying and binary phase-shift keying s… ▽ More In reconfigurable intelligent surface (RIS)-assisted symbiotic radio (SR), an RIS is exploited to assist the primary system and to simultaneously operate as a secondary transmitter by modulating its own information over the incident primary signal from the air. Such an operation is called over-the-air modulation. The existing modulation schemes such as on-off keying and binary phase-shift keying suffer from two problems for joint detection of the primary and secondary signals in RIS-assisted SR, i.e., one is the detection ambiguity problem when the direct link is blocked, and the other is the bit error rate (BER) error-floor problem when the direct link is weak. To address the two problems, we propose a novel modulation scheme by dividing the phase-shift matrix into two parts: one is the assistance beamforming matrix for assisting the primary system and the other is the transmission beamforming matrix for delivering the secondary signal. To optimize the assistance and transmission beamforming matrices, we first introduce an assistance factor that describes the performance requirement of the primary system and then formulate a problem to minimize the BER of the secondary system, while guaranteeing the BER requirement of the primary system controlled by the assistance factor. To solve this non-convex problem, we resort to the successive convex approximation technique to obtain a suboptimal solution. Furthermore, to draw more insights, we propose a low-complexity assistance-transmission beamforming structure by borrowing the idea from the classical maximum ratio transmission and zero forcing techniques. Finally, simulation results reveal an interesting tradeoff between the BER performance of the primary and secondary systems by adjusting the assistance factor. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 13 pages, 9 figures

arXiv:2404.16611 [pdf, ps, other]

Towards Symbiotic SAGIN Through Inter-operator Resource and Service Sharing: Joint Orchestration of User Association and Radio Resources

Authors: Shizhao He, Jungang Ge, Ying-Chang Liang, Dusit Niyato

Abstract: The space-air-ground integrated network (SAGIN) is a pivotal architecture to support ubiquitous connectivity in the upcoming 6G era. Inter-operator resource and service sharing is a promising way to realize such a huge network, utilizing resources efficiently and reducing construction costs. Given the rationality of operators, the configuration of resources and services in SAGIN should focus on bo… ▽ More The space-air-ground integrated network (SAGIN) is a pivotal architecture to support ubiquitous connectivity in the upcoming 6G era. Inter-operator resource and service sharing is a promising way to realize such a huge network, utilizing resources efficiently and reducing construction costs. Given the rationality of operators, the configuration of resources and services in SAGIN should focus on both the overall system performance and individual benefits of operators. Motivated by emerging symbiotic communication facilitating mutual benefits across different radio systems, we investigate the resource and service sharing in SAGIN from a symbiotic communication perspective in this paper. In particular, we consider a SAGIN consisting of a ground network operator (GNO) and a satellite network operator (SNO). Specifically, we aim to maximize the weighted sum rate (WSR) of the whole SAGIN by jointly optimizing the user association, resource allocation, and beamforming. Besides, we introduce a sharing coefficient to characterize the revenue of operators. Operators may suffer revenue loss when only focusing on maximizing the WSR. In pursuit of mutual benefits, we propose a mutual benefit constraint (MBC) to ensure that each operator obtains revenue gains. Then, we develop a centralized algorithm based on the successive convex approximation (SCA) method. Considering that the centralized algorithm is difficult to implement, we propose a distributed algorithm based on Lagrangian dual decomposition and the consensus alternating direction method of multipliers (ADMM). Finally, we provide extensive numerical simulations to demonstrate the effectiveness of the two proposed algorithms, and the distributed optimization algorithm can approach the performance of the centralized one. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.06393 [pdf, other]

MuPT: A Generative Symbolic Music Pretrained Transformer

Authors: Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan , et al. (4 additional authors not shown)

Abstract: In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal… ▽ More In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the challenges associated with misaligned measures from different tracks during generation, we propose the development of a Synchronized Multi-Track ABC Notation (SMT-ABC Notation), which aims to preserve coherence across multiple musical tracks. Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set. Furthermore, we explore the implications of the Symbolic Music Scaling Law (SMS Law) on model performance. The results indicate a promising direction for future research in music generation, offering extensive resources for community-led research through our open-source contributions. △ Less

Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2403.20058 [pdf, other]

Revolutionizing Disease Diagnosis with simultaneous functional PET/MR and Deeply Integrated Brain Metabolic, Hemodynamic, and Perfusion Networks

Authors: Luoyu Wang, Yitian Tao, Qing Yang, Yan Liang, Siwei Liu, Hongcheng Shi, Dinggang Shen, Han Zhang

Abstract: Simultaneous functional PET/MR (sf-PET/MR) presents a cutting-edge multimodal neuroimaging technique. It provides an unprecedented opportunity for concurrently monitoring and integrating multifaceted brain networks built by spatiotemporally covaried metabolic activity, neural activity, and cerebral blood flow (perfusion). Albeit high scientific/clinical values, short in hardware accessibility of P… ▽ More Simultaneous functional PET/MR (sf-PET/MR) presents a cutting-edge multimodal neuroimaging technique. It provides an unprecedented opportunity for concurrently monitoring and integrating multifaceted brain networks built by spatiotemporally covaried metabolic activity, neural activity, and cerebral blood flow (perfusion). Albeit high scientific/clinical values, short in hardware accessibility of PET/MR hinders its applications, let alone modern AI-based PET/MR fusion models. Our objective is to develop a clinically feasible AI-based disease diagnosis model trained on comprehensive sf-PET/MR data with the power of, during inferencing, allowing single modality input (e.g., PET only) as well as enforcing multimodal-based accuracy. To this end, we propose MX-ARM, a multimodal MiXture-of-experts Alignment and Reconstruction Model. It is modality detachable and exchangeable, allocating different multi-layer perceptrons dynamically ("mixture of experts") through learnable weights to learn respective representations from different modalities. Such design will not sacrifice model performance in uni-modal situation. To fully exploit the inherent complex and nonlinear relation among modalities while producing fine-grained representations for uni-modal inference, we subsequently add a modal alignment module to line up a dominant modality (e.g., PET) with representations of auxiliary modalities (MR). We further adopt multimodal reconstruction to promote the quality of learned features. Experiments on precious multimodal sf-PET/MR data for Mild Cognitive Impairment diagnosis showcase the efficacy of our model toward clinically feasible precision medicine. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: 11 pages

arXiv:2403.16472 [pdf, ps, other]

Power-Aware Sparse Reflect Beamforming in Active RIS-aided Interference Channels

Authors: Ruizhe Long, Hu Zhou, Ying-Chang Liang

Abstract: Active reconfigurable intelligent surface (RIS) has attracted significant attention in wireless communications, due to its reflecting elements (REs) capable of reflecting incident signals with not only phase shifts but also amplitude amplifications. In this paper, we are interested in active RIS-aided interference channels in which $K$ user pairs share the same time and frequency resources with th… ▽ More Active reconfigurable intelligent surface (RIS) has attracted significant attention in wireless communications, due to its reflecting elements (REs) capable of reflecting incident signals with not only phase shifts but also amplitude amplifications. In this paper, we are interested in active RIS-aided interference channels in which $K$ user pairs share the same time and frequency resources with the aid of active RIS. Thanks to the promising amplitude amplification capability, activating a moderate number of REs, rather than all of them, is sufficient for the active RIS to mitigate cross-channel interferences. Motivated by this, we propose a power-aware sparse reflect beamforming design for the active RIS-aided interference channels, which allows the active RIS to flexibly adjust the number of activated REs for the sake of reducing hardware and power costs. Specifically, we establish the power consumption model in which only those activated REs consume the biasing and operation power that supports the amplitude amplification, yielding an $\ell_0$-norm power consumption function. Based on the proposed model, we investigate a sum-rate maximization problem and an active RIS power minimization problem by carefully designing the sparse reflect beamforming vector. To solve these problems, we first replace the nonconvex $\ell_0$-norm function with an iterative reweighted $\ell_1$-norm function. Then, fractional programming is used to solve the sum-rate maximization, while semidefinite programming together with the difference-of-convex algorithm (DCA) is used to solve the active RIS power minimization. Numerical results show that the proposed sparse designs can notably increase the sum rate of user pairs and decrease the power consumption of active RIS in interference channels. △ Less

Submitted 29 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.15139 [pdf, other]

Deep Generative Model based Rate-Distortion for Image Downscaling Assessment

Authors: Yuanbang Liang, Bhavesh Garg, Paul L Rosin, Yipeng Qin

Abstract: In this paper, we propose Image Downscaling Assessment by Rate-Distortion (IDA-RD), a novel measure to quantitatively evaluate image downscaling algorithms. In contrast to image-based methods that measure the quality of downscaled images, ours is process-based that draws ideas from rate-distortion theory to measure the distortion incurred during downscaling. Our main idea is that downscaling and s… ▽ More In this paper, we propose Image Downscaling Assessment by Rate-Distortion (IDA-RD), a novel measure to quantitatively evaluate image downscaling algorithms. In contrast to image-based methods that measure the quality of downscaled images, ours is process-based that draws ideas from rate-distortion theory to measure the distortion incurred during downscaling. Our main idea is that downscaling and super-resolution (SR) can be viewed as the encoding and decoding processes in the rate-distortion model, respectively, and that a downscaling algorithm that preserves more details in the resulting low-resolution (LR) images should lead to less distorted high-resolution (HR) images in SR. In other words, the distortion should increase as the downscaling algorithm deteriorates. However, it is non-trivial to measure this distortion as it requires the SR algorithm to be blind and stochastic. Our key insight is that such requirements can be met by recent SR algorithms based on deep generative models that can find all matching HR images for a given LR image on their learned image manifolds. Extensive experimental results show the effectiveness of our IDA-RD measure. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: Accepted at CVPR 2024

arXiv:2403.06700 [pdf, other]

Enhancing Adversarial Training with Prior Knowledge Distillation for Robust Image Compression

Authors: Zhi Cao, Youneng Bao, Fanyang Meng, Chao Li, Wen Tan, Genhong Wang, Yongsheng Liang

Abstract: Deep neural network-based image compression (NIC) has achieved excellent performance, but NIC method models have been shown to be susceptible to backdoor attacks. Adversarial training has been validated in image compression models as a common method to enhance model robustness. However, the improvement effect of adversarial training on model robustness is limited. In this paper, we propose a prior… ▽ More Deep neural network-based image compression (NIC) has achieved excellent performance, but NIC method models have been shown to be susceptible to backdoor attacks. Adversarial training has been validated in image compression models as a common method to enhance model robustness. However, the improvement effect of adversarial training on model robustness is limited. In this paper, we propose a prior knowledge-guided adversarial training framework for image compression models. Specifically, first, we propose a gradient regularization constraint for training robust teacher models. Subsequently, we design a knowledge distillation based strategy to generate a priori knowledge from the teacher model to the student model for guiding adversarial training. Experimental results show that our method improves the reconstruction quality by about 9dB when the Kodak dataset is elected as the backdoor attack object for psnr attack. Compared with Ma2023, our method has a 5dB higher PSNR output at high bitrate points. △ Less

Submitted 15 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.04116 [pdf, other]

Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis

Authors: Yuanhao Cai, Yixun Liang, Jiahao Wang, Angtian Wang, Yulun Zhang, Xiaokang Yang, Zongwei Zhou, Alan Yuille

Abstract: X-ray is widely applied for transmission imaging due to its stronger penetration than natural light. When rendering novel view X-ray projections, existing methods mainly based on NeRF suffer from long training time and slow inference speed. In this paper, we propose a 3D Gaussian splatting-based framework, namely X-Gaussian, for X-ray novel view synthesis. Firstly, we redesign a radiative Gaussian… ▽ More X-ray is widely applied for transmission imaging due to its stronger penetration than natural light. When rendering novel view X-ray projections, existing methods mainly based on NeRF suffer from long training time and slow inference speed. In this paper, we propose a 3D Gaussian splatting-based framework, namely X-Gaussian, for X-ray novel view synthesis. Firstly, we redesign a radiative Gaussian point cloud model inspired by the isotropic nature of X-ray imaging. Our model excludes the influence of view direction when learning to predict the radiation intensity of 3D points. Based on this model, we develop a Differentiable Radiative Rasterization (DRR) with CUDA implementation. Secondly, we customize an Angle-pose Cuboid Uniform Initialization (ACUI) strategy that directly uses the parameters of the X-ray scanner to compute the camera information and then uniformly samples point positions within a cuboid enclosing the scanned object. Experiments show that our X-Gaussian outperforms state-of-the-art methods by 6.5 dB while enjoying less than 15% training time and over 73x inference speed. The application on sparse-view CT reconstruction also reveals the practical values of our method. Code and models will be publicly available at https://github.com/caiyuanhao1998/X-Gaussian . A video demo of the training process visualization is at https://www.youtube.com/watch?v=gDVf_Ngeghg . △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: The first 3D Gaussian Splatting-based method for X-ray 3D reconstruction

arXiv:2402.16153 [pdf, other]

ChatMusician: Understanding and Generating Music Intrinsically with LLM

Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, **gcheng Wu, Chenghua Lin, Qifeng Liu , et al. (10 additional authors not shown)

Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the… ▽ More While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: GitHub: https://shanghaicannon.github.io/ChatMusician/

arXiv:2402.13901 [pdf, other]

Non-asymptotic Convergence of Discrete-time Diffusion Models: New Approach and Improved Rate

Authors: Yuchen Liang, Peizhong Ju, Yingbin Liang, Ness Shroff

Abstract: The denoising diffusion model has recently emerged as a powerful generative technique that converts noise into data. While there are many studies providing theoretical guarantees for diffusion processes based on discretized stochastic differential equation (D-SDE), many generative samplers in real applications directly employ a discrete-time (DT) diffusion process. However, there are very few stud… ▽ More The denoising diffusion model has recently emerged as a powerful generative technique that converts noise into data. While there are many studies providing theoretical guarantees for diffusion processes based on discretized stochastic differential equation (D-SDE), many generative samplers in real applications directly employ a discrete-time (DT) diffusion process. However, there are very few studies analyzing these DT processes, e.g., convergence for DT diffusion processes has been obtained only for distributions with bounded support. In this paper, we establish the convergence guarantee for substantially larger classes of distributions under DT diffusion processes and further improve the convergence rate for distributions with bounded support. In particular, we first establish the convergence rates for both smooth and general (possibly non-smooth) distributions having a finite second moment. We then specialize our results to a number of interesting classes of distributions with explicit parameter dependencies, including distributions with Lipschitz scores, Gaussian mixture distributions, and any distributions with early-stop**. We further propose a novel accelerated sampler and show that it improves the convergence rates of the corresponding regular sampler by orders of magnitude with respect to all system parameters. Our study features a novel analytical technique that constructs a tilting factor representation of the convergence error and exploits Tweedie's formula for handling Taylor expansion power terms. △ Less

Submitted 30 May, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.13776 [pdf, other]

Cas-DiffCom: Cascaded diffusion model for infant longitudinal super-resolution 3D medical image completion

Authors: Lianghu Guo, Tianli Tao, Xinyi Cai, Zihao Zhu, Jiawei Huang, Lixuan Zhu, Zhuoyang Gu, Haifeng Tang, Rui Zhou, Siyan Han, Yan Liang, Qing Yang, Dinggang Shen, Han Zhang

Abstract: Early infancy is a rapid and dynamic neurodevelopmental period for behavior and neurocognition. Longitudinal magnetic resonance imaging (MRI) is an effective tool to investigate such a crucial stage by capturing the developmental trajectories of the brain structures. However, longitudinal MRI acquisition always meets a serious data-missing problem due to participant dropout and failed scans, makin… ▽ More Early infancy is a rapid and dynamic neurodevelopmental period for behavior and neurocognition. Longitudinal magnetic resonance imaging (MRI) is an effective tool to investigate such a crucial stage by capturing the developmental trajectories of the brain structures. However, longitudinal MRI acquisition always meets a serious data-missing problem due to participant dropout and failed scans, making longitudinal infant brain atlas construction and developmental trajectory delineation quite challenging. Thanks to the development of an AI-based generative model, neuroimage completion has become a powerful technique to retain as much available data as possible. However, current image completion methods usually suffer from inconsistency within each individual subject in the time dimension, compromising the overall quality. To solve this problem, our paper proposed a two-stage cascaded diffusion model, Cas-DiffCom, for dense and longitudinal 3D infant brain MRI completion and super-resolution. We applied our proposed method to the Baby Connectome Project (BCP) dataset. The experiment results validate that Cas-DiffCom achieves both individual consistency and high fidelity in longitudinal infant brain image completion. We further applied the generated infant brain images to two downstream tasks, brain tissue segmentation and developmental trajectory delineation, to declare its task-oriented potential in the neuroscience field. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.04865 [pdf, other]

Collaborative Computing in Non-Terrestrial Networks: A Multi-Time-Scale Deep Reinforcement Learning Approach

Authors: Yang Cao, Shao-Yu Lien, Ying-Chang Liang, Dusit Niyato, Xuemin, Shen

Abstract: Constructing earth-fixed cells with low-earth orbit (LEO) satellites in non-terrestrial networks (NTNs) has been the most promising paradigm to enable global coverage. The limited computing capabilities on LEO satellites however render tackling resource optimization within a short duration a critical challenge. Although the sufficient computing capabilities of the ground infrastructures can be uti… ▽ More Constructing earth-fixed cells with low-earth orbit (LEO) satellites in non-terrestrial networks (NTNs) has been the most promising paradigm to enable global coverage. The limited computing capabilities on LEO satellites however render tackling resource optimization within a short duration a critical challenge. Although the sufficient computing capabilities of the ground infrastructures can be utilized to assist the LEO satellite, different time-scale control cycles and coupling decisions between the space- and ground-segments still obstruct the joint optimization design for computing agents at different segments. To address the above challenges, in this paper, a multi-time-scale deep reinforcement learning (DRL) scheme is developed for achieving the radio resource optimization in NTNs, in which the LEO satellite and user equipment (UE) collaborate with each other to perform individual decision-making tasks with different control cycles. Specifically, the UE updates its policy toward improving value functions of both the satellite and UE, while the LEO satellite only performs finite-step rollout for decision-makings based on the reference decision trajectory provided by the UE. Most importantly, rigorous analysis to guarantee the performance convergence of the proposed scheme is provided. Comprehensive simulations are conducted to justify the effectiveness of the proposed scheme in balancing the transmission performance and computational complexity. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.04584 [pdf, other]

Troublemaker Learning for Low-Light Image Enhancement

Authors: Yinghao Song, Zhiyuan Cao, Wanhong Xiang, Sifan Long, Bo Yang, Hongwei Ge, Yanchun Liang, Chunguo Wu

Abstract: Low-light image enhancement (LLIE) restores the color and brightness of underexposed images. Supervised methods suffer from high costs in collecting low/normal-light image pairs. Unsupervised methods invest substantial effort in crafting complex loss functions. We address these two challenges through the proposed TroubleMaker Learning (TML) strategy, which employs normal-light images as inputs for… ▽ More Low-light image enhancement (LLIE) restores the color and brightness of underexposed images. Supervised methods suffer from high costs in collecting low/normal-light image pairs. Unsupervised methods invest substantial effort in crafting complex loss functions. We address these two challenges through the proposed TroubleMaker Learning (TML) strategy, which employs normal-light images as inputs for training. TML is simple: we first dim the input and then increase its brightness. TML is based on two core components. First, the troublemaker model (TM) constructs pseudo low-light images from normal images to relieve the cost of pairwise data. Second, the predicting model (PM) enhances the brightness of pseudo low-light images. Additionally, we incorporate an enhancing model (EM) to further improve the visual performance of PM outputs. Moreover, in LLIE tasks, characterizing global element correlations is important because more information on the same object can be captured. CNN cannot achieve this well, and self-attention has high time complexity. Accordingly, we propose Global Dynamic Convolution (GDC) with O(n) time complexity, which essentially imitates the partial calculation process of self-attention to formulate elementwise correlations. Based on the GDC module, we build the UGDC model. Extensive quantitative and qualitative experiments demonstrate that UGDC trained with TML can achieve competitive performance against state-of-the-art approaches on public datasets. The code is available at https://github.com/Rainbowman0/TML_LLIE. △ Less

Submitted 2 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.04056 [pdf, other]

Collaborative Deep Reinforcement Learning for Resource Optimization in Non-Terrestrial Networks

Authors: Yang Cao, Shao-Yu Lien, Ying-Chang Liang, Dusit Niyato, Xuemin, Shen

Abstract: Non-terrestrial networks (NTNs) with low-earth orbit (LEO) satellites have been regarded as promising remedies to support global ubiquitous wireless services. Due to the rapid mobility of LEO satellite, inter-beam/satellite handovers happen frequently for a specific user equipment (UE). To tackle this issue, earth-fixed cell scenarios have been under studied, in which the LEO satellite adjusts its… ▽ More Non-terrestrial networks (NTNs) with low-earth orbit (LEO) satellites have been regarded as promising remedies to support global ubiquitous wireless services. Due to the rapid mobility of LEO satellite, inter-beam/satellite handovers happen frequently for a specific user equipment (UE). To tackle this issue, earth-fixed cell scenarios have been under studied, in which the LEO satellite adjusts its beam direction towards a fixed area within its dwell duration, to maintain stable transmission performance for the UE. Therefore, it is required that the LEO satellite performs real-time resource allocation, which however is unaffordable by the LEO satellite with limited computing capability. To address this issue, in this paper, we propose a two-time-scale collaborative deep reinforcement learning (DRL) scheme for beam management and resource allocation in NTNs, in which LEO satellite and UE with different control cycles update their decision-making policies through a sequential manner. Specifically, UE updates its policy subject to improving the value functions of both the agents. Furthermore, the LEO satellite only makes decisions through finite-step rollouts with a reference decision trajectory received from the UE. Simulation results show that the proposed scheme can effectively balance the throughput performance and computational complexity over traditional greedy-searching schemes. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.03302 [pdf, other]

Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining

Authors: Jiarun Liu, Hao Yang, Hong-Yu Zhou, Yan Xi, Lequan Yu, Yizhou Yu, Yong Liang, Guangming Shi, Shaoting Zhang, Hairong Zheng, Shanshan Wang

Abstract: Accurate medical image segmentation demands the integration of multi-scale information, spanning from local features to global dependencies. However, it is challenging for existing methods to model long-range global information, where convolutional neural networks (CNNs) are constrained by their local receptive fields, and vision transformers (ViTs) suffer from high quadratic complexity of their a… ▽ More Accurate medical image segmentation demands the integration of multi-scale information, spanning from local features to global dependencies. However, it is challenging for existing methods to model long-range global information, where convolutional neural networks (CNNs) are constrained by their local receptive fields, and vision transformers (ViTs) suffer from high quadratic complexity of their attention mechanism. Recently, Mamba-based models have gained great attention for their impressive ability in long sequence modeling. Several studies have demonstrated that these models can outperform popular vision models in various tasks, offering higher accuracy, lower memory consumption, and less computational burden. However, existing Mamba-based models are mostly trained from scratch and do not explore the power of pretraining, which has been proven to be quite effective for data-efficient medical image analysis. This paper introduces a novel Mamba-based model, Swin-UMamba, designed specifically for medical image segmentation tasks, leveraging the advantages of ImageNet-based pretraining. Our experimental results reveal the vital role of ImageNet-based training in enhancing the performance of Mamba-based models. Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models. Notably, on AbdomenMRI, Encoscopy, and Microscopy datasets, Swin-UMamba outperforms its closest counterpart U-Mamba_Enc by an average score of 2.72%. △ Less

Submitted 6 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: Code and models of Swin-UMamba are publicly available at: https://github.com/JiarunLiu/Swin-UMamba

arXiv:2402.02775 [pdf]

Instant square lattice structured illumination microscopy: an optimal strategy towards photon-saving and real-time super-resolution observation

Authors: Tianyu Zhao, Zhaojun Wang, Manming Shu, **gxiang Zhang, Yansheng Liang, Shaowei Wang, Ming Lei

Abstract: Over the past decade, structured illumination microscopy (SIM) has found its niche in super-resolution (SR) microscopy due to its fast imaging speed and low excitation intensity. However, due to the significantly higher light dose compared to wide-field microscopy and the time-consuming post-processing procedures, long-term, real-time, super-resolution observation of living cells is still out of r… ▽ More Over the past decade, structured illumination microscopy (SIM) has found its niche in super-resolution (SR) microscopy due to its fast imaging speed and low excitation intensity. However, due to the significantly higher light dose compared to wide-field microscopy and the time-consuming post-processing procedures, long-term, real-time, super-resolution observation of living cells is still out of reach for most SIM setups, which inevitably limits its routine use by cell biologists. Here, we describe square lattice SIM (SL-SIM) for long-duration live cell imaging by using the square lattice optical field as illumination, which allows continuous super-resolved observation over long periods of time. In addition, by extending the previous joint spatial-frequency reconstruction concept to SL-SIM, a high-speed reconstruction strategy is validated in the GPU environment, whose reconstruction time is even shorter than image acquisition time, thus enabling real-time observation. We have demonstrated the potential of SL-SIM on various biological applications, ranging from microtubule cytoskeleton dynamics to the interactions of mitochondrial cristae and DNAs in COS7 cells. The inherent lower light dose and user-friendly workflow of the SL-SIM could help make long-duration, real-time and super-resolved observations accessible to biological laboratories. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.10544 [pdf, other]

AAT: Adapting Audio Transformer for Various Acoustics Recognition Tasks

Authors: Yun Liang, Hai Lin, Shaojian Qiu, Yihang Zhang

Abstract: Recently, Transformers have been introduced into the field of acoustics recognition. They are pre-trained on large-scale datasets using methods such as supervised learning and semi-supervised learning, demonstrating robust generality--It fine-tunes easily to downstream tasks and shows more robust performance. However, the predominant fine-tuning method currently used is still full fine-tuning, whi… ▽ More Recently, Transformers have been introduced into the field of acoustics recognition. They are pre-trained on large-scale datasets using methods such as supervised learning and semi-supervised learning, demonstrating robust generality--It fine-tunes easily to downstream tasks and shows more robust performance. However, the predominant fine-tuning method currently used is still full fine-tuning, which involves updating all parameters during training. This not only incurs significant memory usage and time costs but also compromises the model's generality. Other fine-tuning methods either struggle to address this issue or fail to achieve matching performance. Therefore, we conducted a comprehensive analysis of existing fine-tuning methods and proposed an efficient fine-tuning approach based on Adapter tuning, namely AAT. The core idea is to freeze the audio Transformer model and insert extra learnable Adapters, efficiently acquiring downstream task knowledge without compromising the model's original generality. Extensive experiments have shown that our method achieves performance comparable to or even superior to full fine-tuning while optimizing only 7.118% of the parameters. It also demonstrates superiority over other fine-tuning methods. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: Preprint version for ICASSP 2024, Korea

arXiv:2401.03497 [pdf, other]

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

Authors: Wenxi Chen, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, Xie Chen

Abstract: Audio self-supervised learning (SSL) pre-training, which aims to learn good representations from unlabeled audio, has made remarkable progress. However, the extensive computational demands during pre-training pose a significant barrier to the potential application and optimization of audio SSL models. In this paper, inspired by the success of data2vec 2.0 in image modality and Audio-MAE in audio m… ▽ More Audio self-supervised learning (SSL) pre-training, which aims to learn good representations from unlabeled audio, has made remarkable progress. However, the extensive computational demands during pre-training pose a significant barrier to the potential application and optimization of audio SSL models. In this paper, inspired by the success of data2vec 2.0 in image modality and Audio-MAE in audio modality, we introduce Efficient Audio Transformer (EAT) to further improve the effectiveness and efficiency in audio SSL. The proposed EAT adopts the bootstrap self-supervised training paradigm to the audio domain. A novel Utterance-Frame Objective (UFO) is designed to enhance the modeling capability of acoustic events. Furthermore, we reveal that the masking strategy is critical in audio SSL pre-training, and superior audio representations can be obtained with large inverse block masks. Experiment results demonstrate that EAT achieves state-of-the-art (SOTA) performance on a range of audio-related tasks, including AudioSet (AS-2M, AS-20K), ESC-50, and SPC-2, along with a significant pre-training speedup up to ~15x compared to existing audio SSL models. △ Less

Submitted 7 January, 2024; originally announced January 2024.

arXiv:2401.00475 [pdf, other]

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models

Authors: Hongfei Xue, Yuhao Liang, Bingshen Mu, Shiliang Zhang, Mengzhe Chen, Qian Chen, Lei Xie

Abstract: This study focuses on emotion-sensitive spoken dialogue in human-machine speech interaction. With the advancement of Large Language Models (LLMs), dialogue systems can handle multimodal data, including audio. Recent models have enhanced the understanding of complex audio signals through the integration of various audio events. However, they are unable to generate appropriate responses based on emo… ▽ More This study focuses on emotion-sensitive spoken dialogue in human-machine speech interaction. With the advancement of Large Language Models (LLMs), dialogue systems can handle multimodal data, including audio. Recent models have enhanced the understanding of complex audio signals through the integration of various audio events. However, they are unable to generate appropriate responses based on emotional speech. To address this, we introduce the Emotional chat Model (E-chat), a novel spoken dialogue system capable of comprehending and responding to emotions conveyed from speech. This model leverages an emotion embedding extracted by a speech encoder, combined with LLMs, enabling it to respond according to different emotional contexts. Additionally, we introduce the E-chat200 dataset, designed explicitly for emotion-sensitive spoken dialogue. In various evaluation metrics, E-chat consistently outperforms baseline LLMs, demonstrating its potential in emotional comprehension and human-machine interaction. △ Less

Submitted 6 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

Comments: 6 pages, 3 figures

arXiv:2312.08097 [pdf, ps, other]

Hierarchical Cognitive Spectrum Sharing in Space-Air-Ground Integrated Networks

Authors: Zizhen Zhou, Qianqian Zhang, Jungang Ge, Ying-Chang Liang

Abstract: In space-air-ground integrated networks (SAGINs), cognitive spectrum sharing has been regarded as a promising solution to improve spectrum efficiency by enabling a secondary network to access the spectrum of a primary network. However, different networks in SAGIN may have different quality of service (QoS) requirements, which can not be well satisfied with the traditional cognitive spectrum sharin… ▽ More In space-air-ground integrated networks (SAGINs), cognitive spectrum sharing has been regarded as a promising solution to improve spectrum efficiency by enabling a secondary network to access the spectrum of a primary network. However, different networks in SAGIN may have different quality of service (QoS) requirements, which can not be well satisfied with the traditional cognitive spectrum sharing architecture. For example, the aerial network typically has high QoS requirements, which however may not be met when it acts as a secondary network. To address this issue, in this paper, we propose a hierarchical cognitive spectrum sharing architecture (HCSSA) for SAGINs, where the secondary networks are divided into a preferential one and an ordinary one. Specifically, the aerial and terrestrial networks can access the spectrum of the satellite network under the condition that the caused interference to the satellite terminal is below a certain threshold. Besides, considering that the aerial network has a higher priority than the terrestrial network, we aim to use a rate constraint to ensure the performance of the aerial network. Subject to these two constraints, we consider a sum-rate maximization for the terrestrial network by jointly optimizing the transmit beamforming vectors of the aerial and terrestrial base stations. To solve this non-convex problem, we propose a penalty-based iterative beamforming (PIBF) scheme that uses the penalty method and the successive convex approximation technique. Moreover, we also develop three low-complexity schemes by optimizing the normalized beamforming vectors and power control. Finally, we provide extensive numerical simulations to compare the performance of the proposed PIBF scheme and the low-complexity schemes. The results also demonstrate the advantages of the proposed HCSSA compared with the traditional cognitive spectrum sharing architecture. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2311.16568 [pdf, ps, other]

Active Reconfigurable Intelligent Surface Enhanced Spectrum Sensing for Cognitive Radio Networks

Authors: Jungang Ge, Ying-Chang Liang, Sumei Sun, Yonghong Zeng, Zhidong Bai

Abstract: In opportunistic cognitive radio networks, when the primary signal is very weak compared to the background noise, the secondary user requires long sensing time to achieve a reliable spectrum sensing performance, leading to little remaining time for the secondary transmission. To tackle this issue, we propose an active reconfigurable intelligent surface (RIS) assisted spectrum sensing system, where… ▽ More In opportunistic cognitive radio networks, when the primary signal is very weak compared to the background noise, the secondary user requires long sensing time to achieve a reliable spectrum sensing performance, leading to little remaining time for the secondary transmission. To tackle this issue, we propose an active reconfigurable intelligent surface (RIS) assisted spectrum sensing system, where the received signal strength from the interested primary user can be enhanced and underlying interference within the background noise can be mitigated as well. In comparison with the passive RIS, the active RIS can not only adapt the phase shift of each reflecting element but also amplify the incident signals. Notably, we study the reflecting coefficient matrix (RCM) optimization problem to improve the detection probability given a maximum tolerable false alarm probability and limited sensing time. Then, we show that the formulated problem can be equivalently transformed to a weighted mean square error minimization problem using the principle of the well-known weighted minimum mean square error (WMMSE) algorithm, and an iterative optimization approach is proposed to obtain the optimal RCM. In addition, to fairly compare passive RIS and active RIS, we study the required power budget of the RIS to achieve a target detection probability under a special case where the direct links are neglected and the RIS-related channels are line-of-sight. Via extensive simulations, the effectiveness of the WMMSE-based RCM optimization approach is demonstrated. Furthermore, the results reveal that the active RIS can outperform the passive RIS when the underlying interference within the background noise is relatively weak, whereas the passive RIS performs better in strong interference scenarios because the same power budget can support a vast number of passive reflecting elements for interference mitigation. △ Less

Submitted 26 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.15128 [pdf, other]

Quickest Change Detection with Post-Change Density Estimation

Authors: Yuchen Liang, Venugopal V. Veeravalli

Abstract: The problem of quickest change detection in a sequence of independent observations is considered. The pre-change distribution is assumed to be known, while the post-change distribution is unknown. Two tests based on post-change density estimation are developed for this problem, the window-limited non-parametric generalized likelihood ratio (NGLR) CuSum test and the non-parametric window-limited ad… ▽ More The problem of quickest change detection in a sequence of independent observations is considered. The pre-change distribution is assumed to be known, while the post-change distribution is unknown. Two tests based on post-change density estimation are developed for this problem, the window-limited non-parametric generalized likelihood ratio (NGLR) CuSum test and the non-parametric window-limited adaptive (NWLA) CuSum test. Both tests do not assume any knowledge of the post-change distribution, except that the post-change density satisfies certain smoothness conditions that allows for efficient non-parametric estimation. Also, they do not require any pre-collected post-change training samples. Under certain convergence conditions on the density estimator, it is shown that both tests are first-order asymptotically optimal, as the false alarm rate goes to zero. The analysis is validated through numerical results, where both tests are compared with baseline tests that have distributional knowledge. △ Less

Submitted 25 November, 2023; originally announced November 2023.

Comments: arXiv admin note: text overlap with arXiv:2211.00223

arXiv:2311.04791 [pdf, other]

Integrated Distributed Semantic Communication and Over-the-air Computation for Cooperative Spectrum Sensing

Authors: Peng Yi, Yang Cao, Xin Kang, Ying-Chang Liang

Abstract: Cooperative spectrum sensing (CSS) is a promising approach to improve the detection of primary users (PUs) using multiple sensors. However, there are several challenges for existing combination methods, i.e., performance degradation and ceiling effect for hard-decision fusion (HDF), as well as significant uploading latency and non-robustness to noise in the reporting channel for soft-data fusion (… ▽ More Cooperative spectrum sensing (CSS) is a promising approach to improve the detection of primary users (PUs) using multiple sensors. However, there are several challenges for existing combination methods, i.e., performance degradation and ceiling effect for hard-decision fusion (HDF), as well as significant uploading latency and non-robustness to noise in the reporting channel for soft-data fusion (SDF). To address these issues, an integrated communication and computation (ICC) framework is proposed in this paper. Specifically, distributed semantic communication (DSC) jointly optimizes multiple sensors and the fusion center to minimize the transmitted data without degrading detection performance. Moreover, over-the-air computation (AirComp) is utilized to further reduce spectrum occupation in reporting channel, taking advantage of characteristics of wireless channel to enable data aggregation. Under the ICC framework, a particular system, namely ICC-CSS, is designed and implemented, which is theoretically proved to be equivalent to the optimal estimator-correlator (E-C) detector with equal gain SDF when the PU signal samples are independent and identically distributed. Extensive simulations verify the superiority of ICC-CSS compared with various conventional CSS schemes in terms of detection performance, robustness to SNR variations in both sensing and reporting channels, as well as scalability with respect to the number of samples and sensors. △ Less

Submitted 25 April, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

Comments: 13 pages,10 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2311.02935 [pdf, ps, other]

doi 10.1109/LWC.2023.3297231

Channel Estimation and Training Design for Active RIS Aided Wireless Communications

Authors: Hao Chen, Nanxi Li, Ruizhe Long, Ying-Chang Liang

Abstract: Active reconfigurable intelligent surface (ARIS) is a newly emerging RIS technique that leverages radio frequency (RF) reflection amplifiers to empower phase-configurable reflection elements (REs) in amplifying the incident signal. Thereby, ARIS can enhance wireless communications with the strengthened ARIS-aided links. In this letter, we propose exploiting the signal amplification capability of A… ▽ More Active reconfigurable intelligent surface (ARIS) is a newly emerging RIS technique that leverages radio frequency (RF) reflection amplifiers to empower phase-configurable reflection elements (REs) in amplifying the incident signal. Thereby, ARIS can enhance wireless communications with the strengthened ARIS-aided links. In this letter, we propose exploiting the signal amplification capability of ARIS for channel estimation, aiming to improve the estimation precision. Nevertheless, the signal amplification inevitably introduces the thermal noise at the ARIS, which can hinder the acquisition of accurate channel state information (CSI) with conventional channel estimation methods based on passive RIS (PRIS). To address this issue, we further investigate this ARIS-specific channel estimation problem and propose a least-square (LS) based channel estimator, whose performance can be further improved with the design on ARIS reflection patterns at the channel training phase. Based on the proposed LS channel estimator, we optimize the training reflection patterns to minimize the channel estimation error variance. Extensive simulation results show that our proposed design can achieve accurate channel estimation in the presence of the ARIS noises. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: This paper has been accepted for publication in IEEE Wireless Communications Letters

Journal ref: IEEE Wireless Communications Letters, early access, 2023

arXiv:2311.02928 [pdf, ps, other]

doi 10.1109/TWC.2023.3286395

Pilot Design and Signal Detection for Symbiotic Radio over OFDM Carriers

Authors: Hao Chen, Qianqian Zhang, Ruizhe Long, Yiyang Pei, Ying-Chang Liang

Abstract: Symbiotic radio (SR) is a promising solution to achieve high spectrum- and energy-efficiency due to its spectrum sharing and low-power consumption properties, in which the secondary system achieves data transmissions by backscattering the signal originating from the primary system. In this paper, we are interested in the pilot design and signal detection when the primary transmission adopts orthog… ▽ More Symbiotic radio (SR) is a promising solution to achieve high spectrum- and energy-efficiency due to its spectrum sharing and low-power consumption properties, in which the secondary system achieves data transmissions by backscattering the signal originating from the primary system. In this paper, we are interested in the pilot design and signal detection when the primary transmission adopts orthogonal frequency division multiplexing (OFDM). In particular, to preserve the channel orthogonality among the OFDM sub-carriers, each secondary symbol is designed to span an entire OFDM symbol. The comb-type pilot structure is employed by the primary transmission, while the preamble pilot structure is used by the secondary transmission. With the designed pilot structures, the primary signal can be detected via the conventional methods by treating the secondary signal as a part of the composite channel, i.e., the effective channel of the primary transmission. Furthermore, the secondary signal can be extracted from the estimated composite channel with the help of the detected primary signal. The bit error rate (BER) performance with both perfect and estimated CSI, the diversity orders of the primary and secondary transmissions, and the sensitivity to symbol synchronization error are analyzed. Simulation results show that the performance of the primary transmission is enhanced thanks to the backscatter link established by the secondary transmission. More importantly, even without the direct link, the primary and secondary transmissions can be supported via only the backscatter link. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: This paper has been accepted for publication in IEEE Transactions on Wireless Communications

Journal ref: IEEE Transactions on Wireless Communications, early access, 2023

arXiv:2311.02837 [pdf, ps, other]

Multi-User Multi-IoT-Device Symbiotic Radio: A Novel Massive Access Scheme for Cellular IoT

Authors: Jun Wang, Ying-Chang Liang, Sumei Sun

Abstract: Symbiotic radio (SR) is a promising technique to support cellular Internet-of-Things (IoT) by forming a mutualistic relationship between IoT and cellular transmissions. In this paper, we propose a novel multi-user multi-IoT-device SR system to enable massive access in cellular IoT. In the considered system, the base station (BS) transmits information to multiple cellular users, and a number of IoT… ▽ More Symbiotic radio (SR) is a promising technique to support cellular Internet-of-Things (IoT) by forming a mutualistic relationship between IoT and cellular transmissions. In this paper, we propose a novel multi-user multi-IoT-device SR system to enable massive access in cellular IoT. In the considered system, the base station (BS) transmits information to multiple cellular users, and a number of IoT devices simultaneously backscatter their information to these users via the cellular signal. The cellular users jointly decode the information from the BS and IoT devices. Noting that the reflective links from the IoT devices can be regarded as the channel uncertainty of the direct links, we apply the robust design method to design the beamforming vectors at the BS. Specifically, the transmit power is minimized under the cellular transmission outage probability constraints and IoT transmission sum rate constraints. The algorithm based on semi-definite programming and difference-of-convex programming is proposed to solve the power minimization problem. Moreover, we consider a special case where each cellular user is associated with several adjacent IoT devices and propose a direction of arrival (DoA)-based transmit beamforming design approach. The DoA-based approach requires only the DoA and angular spread (AS) of the direct links instead of the instantaneous channel state information (CSI) of the reflective link channels, leading to a significant reduction in the channel feedback overhead. Simulation results have substantiated the multi-user multi-IoT-device SR system and the effectiveness of the proposed beamforming approaches. It is shown that the DoA-based beamforming approach achieves comparable performance as the CSI-based approach in the special case when the ASs are small. △ Less

Submitted 5 November, 2023; originally announced November 2023.

Comments: 13 pages, 12 figures, Conference J. Wang and Y.-C. Liang, Transmit beamforming design for multiuser multi-IoT-device symbiotic radios, in Proc. IEEE ICC, Rome, Italy, May 2023, pp. 1-6

arXiv:2311.01167 [pdf, ps, other]

Modulation Design and Optimization for RIS-Assisted Symbiotic Radios

Authors: Hu Zhou, Bowen Cai, Qianqian Zhang, Ruizhe Long, Yiyang Pei, Ying-Chang Liang

Abstract: In reconfigurable intelligent surface (RIS)-assisted symbiotic radio (SR), the RIS acts as a secondary transmitter by modulating its information bits over the incident primary signal and simultaneously assists the primary transmission, then a cooperative receiver is used to jointly decode the primary and secondary signals. Most existing works of SR focus on using RIS to enhance the reflecting link… ▽ More In reconfigurable intelligent surface (RIS)-assisted symbiotic radio (SR), the RIS acts as a secondary transmitter by modulating its information bits over the incident primary signal and simultaneously assists the primary transmission, then a cooperative receiver is used to jointly decode the primary and secondary signals. Most existing works of SR focus on using RIS to enhance the reflecting link while ignoring the ambiguity problem for the joint detection caused by the multiplication relationship of the primary and secondary signals. Particularly, in case of a blocked direct link, joint detection will suffer from severe performance loss due to the ambiguity, when using the conventional on-off keying and binary phase shift keying modulation schemes for RIS. To address this issue, we propose a novel modulation scheme for RIS-assisted SR that divides the phase-shift matrix into two components: the symbol-invariant and symbol-varying components, which are used to assist the primary transmission and carry the secondary signal, respectively. To design these two components, we focus on the detection of the composite signal formed by the primary and secondary signals, through which a problem of minimizing the bit error rate (BER) of the composite signal is formulated to improve both the BER performance of the primary and secondary ones. By solving the problem, we derive the closed-form solution of the optimal symbol-invariant and symbol-varying components, which is related to the channel strength ratio of the direct link to the reflecting link. Moreover, theoretical BER performance is analyzed. Finally, simulation results show the superiority of the proposed modulation scheme over its conventional counterpart. △ Less

Submitted 26 April, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: 16 pages,16 figures

arXiv:2310.17187 [pdf, other]

Explainable Gated Bayesian Recurrent Neural Network for Non-Markov State Estimation

Authors: Shi Yan, Yan Liang, Le Zheng, Mingyang Fan, Xiaoxu Wang, Binglu Wang

Abstract: The optimality of Bayesian filtering relies on the completeness of prior models, while deep learning holds a distinct advantage in learning models from offline data. Nevertheless, the current fusion of these two methodologies remains largely ad hoc, lacking a theoretical foundation. This paper presents a novel solution, namely an explainable gated Bayesian recurrent neural network specifically des… ▽ More The optimality of Bayesian filtering relies on the completeness of prior models, while deep learning holds a distinct advantage in learning models from offline data. Nevertheless, the current fusion of these two methodologies remains largely ad hoc, lacking a theoretical foundation. This paper presents a novel solution, namely an explainable gated Bayesian recurrent neural network specifically designed to state estimation under model mismatches. Firstly, we transform the non-Markov state-space model into an equivalent first-order Markov model with memory. It is a generalized transformation that overcomes the limitations of the first-order Markov property and enables recursive filtering. Secondly, by deriving a data-assisted joint state-memory-mismatch Bayesian filtering, we design a Bayesian gated framework that includes a memory update gate for capturing the temporal regularities in state evolution, a state prediction gate with the evolution mismatch compensation, and a state update gate with the observation mismatch compensation. The Gaussian approximation implementation of the filtering process within the gated framework is derived, taking into account the computational efficiency. Finally, the corresponding internal neural network structures and end-to-end training methods are designed. The Bayesian filtering theory enhances the interpretability of the proposed gated network, enabling the effective integration of offline data and prior models within functionally explicit gated units. In comprehensive experiments, including simulations and real-world datasets, the proposed gated network demonstrates superior estimation performance compared to benchmark filters and state-of-the-art deep learning filtering methods. △ Less

Submitted 7 March, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.04863 [pdf, other]

SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR

Authors: Yangze Li, Fan Yu, Yuhao Liang, Pengcheng Guo, Mohan Shi, Zhihao Du, Shiliang Zhang, Lei Xie

Abstract: Joint modeling of multi-speaker ASR and speaker diarization has recently shown promising results in speaker-attributed automatic speech recognition (SA-ASR).Although being able to obtain state-of-the-art (SOTA) performance, most of the studies are based on an autoregressive (AR) decoder which generates tokens one-by-one and results in a large real-time factor (RTF). To speed up inference, we intro… ▽ More Joint modeling of multi-speaker ASR and speaker diarization has recently shown promising results in speaker-attributed automatic speech recognition (SA-ASR).Although being able to obtain state-of-the-art (SOTA) performance, most of the studies are based on an autoregressive (AR) decoder which generates tokens one-by-one and results in a large real-time factor (RTF). To speed up inference, we introduce a recently proposed non-autoregressive model Paraformer as an acoustic model in the SA-ASR model.Paraformer uses a single-step decoder to enable parallel generation, obtaining comparable performance to the SOTA AR transformer models. Besides, we propose a speaker-filling strategy to reduce speaker identification errors and adopt an inter-CTC strategy to enhance the encoder's ability in acoustic modeling. Experiments on the AliMeeting corpus show that our model outperforms the cascaded SA-ASR model by a 6.1% relative speaker-dependent character error rate (SD-CER) reduction on the test set. Moreover, our model achieves a comparable SD-CER of 34.8% with only 1/10 RTF compared with the SOTA joint AR SA-ASR model. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2309.13573 [pdf, other]

The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR

Authors: Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu

Abstract: With the success of the first Multi-channel Multi-party Meeting Transcription challenge (M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to tackle the complex task of \emph{speaker-attributed ASR (SA-ASR)}, which directly addresses the practical and challenging problem of ``who spoke what at when" at typical meeting scenario. We particularly established two sub-tr… ▽ More With the success of the first Multi-channel Multi-party Meeting Transcription challenge (M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to tackle the complex task of \emph{speaker-attributed ASR (SA-ASR)}, which directly addresses the practical and challenging problem of ``who spoke what at when" at typical meeting scenario. We particularly established two sub-tracks. The fixed training condition sub-track, where the training data is constrained to predetermined datasets, but participants can use any open-source pre-trained model. The open training condition sub-track, which allows for the use of all available data and models without limitation. In addition, we release a new 10-hour test set for challenge ranking. This paper provides an overview of the dataset, track settings, results, and analysis of submitted systems, as a benchmark to show the current state of speaker-attributed ASR. △ Less

Submitted 5 October, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

Comments: 8 pages, Accepted by ASRU2023

arXiv:2309.02855 [pdf, other]

Bandwidth-efficient Inference for Neural Image Compression

Authors: Shanzhi Yin, Tongda Xu, Yongsheng Liang, Yuanyuan Wang, Yanghao Li, Yan Wang, **g**g Liu

Abstract: With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottleneck in implementing network inference on mobile and edge devices. In this paper, we propose an end-to-end differentiable bandwidth efficient neural inference method with the activation compressed by neural data compression method.… ▽ More With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottleneck in implementing network inference on mobile and edge devices. In this paper, we propose an end-to-end differentiable bandwidth efficient neural inference method with the activation compressed by neural data compression method. Specifically, we propose a transform-quantization-entropy coding pipeline for activation compression with symmetric exponential Golomb coding and a data-dependent Gaussian entropy model for arithmetic coding. Optimized with existing model quantization methods, low-level task of image compression can achieve up to 19x bandwidth reduction with 6.21x energy saving. △ Less

Submitted 6 September, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: 9 pages, 6 figures, submitted to ICASSP 2024

MSC Class: 68U10(primary); 94A08 68T07(secondary) ACM Class: I.2.6; I.4.2

arXiv:2308.15957 [pdf, other]

doi 10.1364/OL.465316

Structure-Aware Parametric Representations for Time-Resolved Light Transport

Authors: Diego Royo, Zesheng Huang, Yun Liang, Boyan Song, Adolfo Muñoz, Diego Gutierrez, Julio Marco

Abstract: Time-resolved illumination provides rich spatio-temporal information for applications such as accurate depth sensing or hidden geometry reconstruction, becoming a useful asset for prototy** and as input for data-driven approaches. However, time-resolved illumination measurements are high-dimensional and have a low signal-to-noise ratio, hampering their applicability in real scenarios. We propose… ▽ More Time-resolved illumination provides rich spatio-temporal information for applications such as accurate depth sensing or hidden geometry reconstruction, becoming a useful asset for prototy** and as input for data-driven approaches. However, time-resolved illumination measurements are high-dimensional and have a low signal-to-noise ratio, hampering their applicability in real scenarios. We propose a novel method to compactly represent time-resolved illumination using mixtures of exponentially-modified Gaussians that are robust to noise and preserve structural information. Our method yields representations two orders of magnitude smaller than discretized data, providing consistent results in applications such as hidden scene reconstruction and depth estimation, and quantitative improvements over previous approaches. △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2308.12723 [pdf, other]

doi 10.1109/TSIPN.2022.3176113

Distributed Extended Object Tracking Using Coupled Velocity Model from WLS Perspective

Authors: Zhifei Li, Yan Liang, Linfeng Xu

Abstract: This study proposes a coupled velocity model (CVM) that establishes the relation between the orientation and velocity using their correlation, avoiding that the existing extended object tracking (EOT) models treat them as two independent quantities. As a result, CVM detects the mismatch between the prior dynamic model and actual motion pattern to correct the filtering gain, and simultaneously beco… ▽ More This study proposes a coupled velocity model (CVM) that establishes the relation between the orientation and velocity using their correlation, avoiding that the existing extended object tracking (EOT) models treat them as two independent quantities. As a result, CVM detects the mismatch between the prior dynamic model and actual motion pattern to correct the filtering gain, and simultaneously becomes a nonlinear and state-coupled model with multiplicative noise. The study considers CVM to design a feasible distributed weighted least squares (WLS) filter. The WLS criterion requires a linear state-space model containing only additive noise about the estimated state. To meet the requirement, we derive such two separate pseudo-linearized models by using the first-order Taylor series expansion. The separation is merely in form, and the estimates of interested states are embedded as parameters into each other's model, which implies that their interdependency is still preserved in the iterative operation of two linear filters. With the two models, we first propose a centralized WLS filter by converting the measurements from all nodes into a summation form. Then, a distributed consensus scheme, which directly performs an inner iteration on the priors across different nodes, is proposed to incorporate the cross-covariances between nodes. Under the consensus scheme, a distributed WLS filter over a realistic network with ``naive'' node is developed by proper weighting of the priors and measurements. Finally, the performance of proposed filters in terms of accuracy, robustness, and consistency is testified under different prior situations. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: Corrected Version

Journal ref: Published by IEEE Transactions on Signal and Information Processing over Networks,2022

arXiv:2308.04743 [pdf]

Missile guidance law design based on free-time convergent error dynamics

Authors: Yuanhe Liu, Nianhao Xie, Kebo Li, Yangang Liang

Abstract: The design of guidance law can be considered a kind of finite-time error-tracking problem. A unified free-time convergent guidance law design approach based on the error dynamics and the free-time convergence method is proposed in this paper. Firstly, the desired free-time convergent error dynamics approach is proposed, and its convergent time can be set freely, which is independent of the initial… ▽ More The design of guidance law can be considered a kind of finite-time error-tracking problem. A unified free-time convergent guidance law design approach based on the error dynamics and the free-time convergence method is proposed in this paper. Firstly, the desired free-time convergent error dynamics approach is proposed, and its convergent time can be set freely, which is independent of the initial states and the guidance parameters. Then, the illustrative guidance laws considering the leading angle constraint, impact angle constraint, and impact time constraint are derived based on the proposed free-time convergent error dynamics respectively. The connection and distinction between the proposed and the existing guidance laws are analyzed theoretically. Finally, the performance of the proposed guidance laws is verified by simulation comparison. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 13 pages, 6 figures, accepted by Journal of Systems Engineering and Electronics

arXiv:2307.12255 [pdf, other]

ResWCAE: Biometric Pattern Image Denoising Using Residual Wavelet-Conditioned Autoencoder

Authors: Youzhi Liang, Wen Liang

Abstract: The utilization of biometric authentication with pattern images is increasingly popular in compact Internet of Things (IoT) devices. However, the reliability of such systems can be compromised by image quality issues, particularly in the presence of high levels of noise. While state-of-the-art deep learning algorithms designed for generic image denoising have shown promise, their large number of p… ▽ More The utilization of biometric authentication with pattern images is increasingly popular in compact Internet of Things (IoT) devices. However, the reliability of such systems can be compromised by image quality issues, particularly in the presence of high levels of noise. While state-of-the-art deep learning algorithms designed for generic image denoising have shown promise, their large number of parameters and lack of optimization for unique biometric pattern retrieval make them unsuitable for these devices and scenarios. In response to these challenges, this paper proposes a lightweight and robust deep learning architecture, the Residual Wavelet-Conditioned Convolutional Autoencoder (Res-WCAE) with a Kullback-Leibler divergence (KLD) regularization, designed specifically for fingerprint image denoising. Res-WCAE comprises two encoders - an image encoder and a wavelet encoder - and one decoder. Residual connections between the image encoder and decoder are leveraged to preserve fine-grained spatial features, where the bottleneck layer conditioned on the compressed representation of features obtained from the wavelet encoder using approximation and detail subimages in the wavelet-transform domain. The effectiveness of Res-WCAE is evaluated against several state-of-the-art denoising methods, and the experimental results demonstrate that Res-WCAE outperforms these methods, particularly for heavily degraded fingerprint images in the presence of high levels of noise. Overall, Res-WCAE shows promise as a solution to the challenges faced by biometric authentication systems in compact IoT devices. △ Less

Submitted 23 July, 2023; originally announced July 2023.

Comments: 8 pages, 2 figures

arXiv:2306.15433 [pdf, other]

Recursive LMMSE-Based Iterative Soft Interference Cancellation for MIMO Systems to Save Computations and Memories

Authors: Hufei Zhu, Fuqin Deng, Yikui Zhai, Jiaming Zhong, Yanyang Liang

Abstract: Firstly, a reordered description is given for the linear minimum mean square error (LMMSE)-based iterative soft interference cancellation (ISIC) detection process for Mutipleinput multiple-output (MIMO) wireless communication systems, which is based on the equivalent channel matrix. Then the above reordered description is applied to compare the detection process for LMMSE-ISIC with that for the ha… ▽ More Firstly, a reordered description is given for the linear minimum mean square error (LMMSE)-based iterative soft interference cancellation (ISIC) detection process for Mutipleinput multiple-output (MIMO) wireless communication systems, which is based on the equivalent channel matrix. Then the above reordered description is applied to compare the detection process for LMMSE-ISIC with that for the hard decision (HD)-based ordered successive interference cancellation (OSIC) scheme, to draw the conclusion that the former is the extension of the latter. Finally, the recursive scheme for HD-OSIC with reduced complexity and memory saving is extended to propose the recursive scheme for LMMSE-ISIC, where the required computations and memories are reduced by computing the filtering bias and the estimate from the Hermitian inverse matrix and the symbol estimate vector, and updating the Hermitian inverse matrix and the symbol estimate vector efficiently. Assume N transmitters and M (no less than N) receivers in the MIMO system. Compared to the existing low-complexity LMMSE-ISIC scheme, the proposed recursive LMMSE-ISIC scheme requires no more than 1/6 computations and no more than 1/5 memory units. △ Less

Submitted 5 December, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

arXiv:2306.10125 [pdf, other]

Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects

Authors: Kexin Zhang, Qingsong Wen, Chaoli Zhang, Rongyao Cai, Ming **, Yong Liu, James Zhang, Yuxuan Liang, Guansong Pang, Dong** Song, Shirui Pan

Abstract: Self-supervised learning (SSL) has recently achieved impressive performance on various time series tasks. The most prominent advantage of SSL is that it reduces the dependence on labeled data. Based on the pre-training and fine-tuning strategy, even a small amount of labeled data can achieve high performance. Compared with many published self-supervised surveys on computer vision and natural langu… ▽ More Self-supervised learning (SSL) has recently achieved impressive performance on various time series tasks. The most prominent advantage of SSL is that it reduces the dependence on labeled data. Based on the pre-training and fine-tuning strategy, even a small amount of labeled data can achieve high performance. Compared with many published self-supervised surveys on computer vision and natural language processing, a comprehensive survey for time series SSL is still missing. To fill this gap, we review current state-of-the-art SSL methods for time series data in this article. To this end, we first comprehensively review existing surveys related to SSL and time series, and then provide a new taxonomy of existing time series SSL methods by summarizing them from three perspectives: generative-based, contrastive-based, and adversarial-based. These methods are further divided into ten subcategories with detailed reviews and discussions about their key intuitions, main frameworks, advantages and disadvantages. To facilitate the experiments and validation of time series SSL methods, we also summarize datasets commonly used in time series forecasting, classification, anomaly detection, and clustering tasks. Finally, we present the future directions of SSL for time series analysis. △ Less

Submitted 8 April, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI); 26 pages, 200+ references; the first work to comprehensively and systematically summarize self-supervised learning for time series analysis (SSL4TS). The GitHub repository is https://github.com/qingsongedu/Awesome-SSL4TS

arXiv:2306.07505 [pdf]

Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Authors: Lan Wang, Ruiling He, Lili Zhao, Jia Wang, Zhengzi Geng, Tao Ren, Guo Zhang, Peng Zhang, Kaiqiang Tang, Chaofei Gao, Fei Chen, Liting Zhang, Yonghe Zhou, Xin Li, Fanbin He, Hui Huan, Wenjuan Wang, Yunxiao Liang, Juan Tang, Fang Ai, Tingyu Wang, Liyun Zheng, Zhongwei Zhao, Jiansong Ji, Wei Liu , et al. (22 additional authors not shown)

Abstract: Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with… ▽ More Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with compensated advanced chronic liver disease. 305 patients were enrolled from 12 hospitals, and finally 265 patients were included, with 1136 liver stiffness measurement (LSM) images and 1042 spleen stiffness measurement (SSM) images generated by 2D-SWE. We leveraged deep learning methods to uncover associations between image features and patient risk, and thus conducted models to predict GEV and HRV. Results: A multi-modality Deep Learning Risk Prediction model (DLRP) was constructed to assess GEV and HRV, based on LSM and SSM images, and clinical information. Validation analysis revealed that the AUCs of DLRP were 0.91 for GEV (95% CI 0.90 to 0.93, p < 0.05) and 0.88 for HRV (95% CI 0.86 to 0.89, p < 0.01), which were significantly and robustly better than canonical risk indicators, including the value of LSM and SSM. Moreover, DLPR was better than the model using individual parameters, including LSM and SSM images. In HRV prediction, the 2D-SWE images of SSM outperform LSM (p < 0.01). Conclusion: DLRP shows excellent performance in predicting GEV and HRV over canonical risk indicators LSM and SSM. Additionally, the 2D-SWE images of SSM provided more information for better accuracy in predicting HRV than the LSM. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2306.02682 [pdf, other]

End-to-End Word-Level Pronunciation Assessment with MASK Pre-training

Authors: Yukang Liang, Kaitao Song, Shaoguang Mao, Huiqiang Jiang, Luna Qiu, Yuqing Yang, Dongsheng Li, Linli Xu, Lili Qiu

Abstract: Pronunciation assessment is a major challenge in the computer-aided pronunciation training system, especially at the word (phoneme)-level. To obtain word (phoneme)-level scores, current methods usually rely on aligning components to obtain acoustic features of each word (phoneme), which limits the performance of assessment to the accuracy of alignments. Therefore, to address this problem, we propo… ▽ More Pronunciation assessment is a major challenge in the computer-aided pronunciation training system, especially at the word (phoneme)-level. To obtain word (phoneme)-level scores, current methods usually rely on aligning components to obtain acoustic features of each word (phoneme), which limits the performance of assessment to the accuracy of alignments. Therefore, to address this problem, we propose a simple yet effective method, namely \underline{M}asked pre-training for \underline{P}ronunciation \underline{A}ssessment (MPA). Specifically, by incorporating a mask-predict strategy, our MPA supports end-to-end training without leveraging any aligning components and can solve misalignment issues to a large extent during prediction. Furthermore, we design two evaluation strategies to enable our model to conduct assessments in both unsupervised and supervised settings. Experimental results on SpeechOcean762 dataset demonstrate that MPA could achieve better performance than previous methods, without any explicit alignment. In spite of this, MPA still has some limitations, such as requiring more inference time and reference text. They expect to be addressed in future work. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: Accepted by InterSpeech 2023

arXiv:2305.13716 [pdf, other]

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

Authors: Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen, Lei Xie

Abstract: The recently proposed serialized output training (SOT) simplifies multi-talker automatic speech recognition (ASR) by generating speaker transcriptions separated by a special token. However, frequent speaker changes can make speaker change prediction difficult. To address this, we propose boundary-aware serialized output training (BA-SOT), which explicitly incorporates boundary knowledge into the d… ▽ More The recently proposed serialized output training (SOT) simplifies multi-talker automatic speech recognition (ASR) by generating speaker transcriptions separated by a special token. However, frequent speaker changes can make speaker change prediction difficult. To address this, we propose boundary-aware serialized output training (BA-SOT), which explicitly incorporates boundary knowledge into the decoder via a speaker change detection task and boundary constraint loss. We also introduce a two-stage connectionist temporal classification (CTC) strategy that incorporates token-level SOT CTC to restore temporal context information. Besides typical character error rate (CER), we introduce utterance-dependent character error rate (UD-CER) to further measure the precision of speaker change prediction. Compared to original SOT, BA-SOT reduces CER/UD-CER by 5.1%/14.0%, and leveraging a pre-trained ASR model for BA-SOT model initialization further reduces CER/UD-CER by 8.4%/19.9%. △ Less

Submitted 5 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted by INTERSPEECH 2023

arXiv:2304.02398 [pdf, ps, other]

Robust Secure Transmission for Active RIS Enabled Symbiotic Radio Multicast Communications

Authors: Bin Lyu, Chao Zhou, Shimin Gong, Dinh Thai Hoang, Ying-chang Liang

Abstract: In this paper, we propose a robust secure transmission scheme for an active reconfigurable intelligent surface (RIS) enabled symbiotic radio (SR) system in the presence of multiple eavesdroppers (Eves). In the considered system, the active RIS is adopted to enable the secure transmission of primary signals from the primary transmitter to multiple primary users in a multicasting manner, and simulta… ▽ More In this paper, we propose a robust secure transmission scheme for an active reconfigurable intelligent surface (RIS) enabled symbiotic radio (SR) system in the presence of multiple eavesdroppers (Eves). In the considered system, the active RIS is adopted to enable the secure transmission of primary signals from the primary transmitter to multiple primary users in a multicasting manner, and simultaneously achieve its own information delivery to the secondary user by riding over the primary signals. Taking into account the imperfect channel state information (CSI) related with Eves, we formulate the system power consumption minimization problem by optimizing the transmit beamforming and reflection beamforming for the bounded and statistical CSI error models, taking the worst-case SNR constraints and the SNR outage probability constraints at the Eves into considerations, respectively. Specifically, the S-Procedure and the Bernstein-Type Inequality are implemented to approximately transform the worst-case SNR and the SNR outage probability constraints into tractable forms, respectively. After that, the formulated problems can be solved by the proposed alternating optimization (AO) algorithm with the semi-definite relaxation and sequential rank-one constraint relaxation techniques. Numerical results show that the proposed active RIS scheme can reduce up to 27.0% system power consumption compared to the passive RIS. △ Less

Submitted 13 April, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

Comments: 32 Pages, 12 figures, accepted to IEEE Transactions on Wireless Communications

arXiv:2303.15299 [pdf, other]

Resilient Output Consensus Control of Heterogeneous Multi-agent Systems against Byzantine Attacks: A Twin Layer Approach

Authors: Xin Gong, Yiwen Liang, Yukang Cui, Shi Liang, Tingwen Huang

Abstract: This paper studies the problem of cooperative control of heterogeneous multi-agent systems (MASs) against Byzantine attacks. The agent affected by Byzantine attacks sends different wrong values to all neighbors while applying wrong input signals for itself, which is aggressive and difficult to be defended. Inspired by the concept of Digital Twin, a new hierarchical protocol equipped with a virtual… ▽ More This paper studies the problem of cooperative control of heterogeneous multi-agent systems (MASs) against Byzantine attacks. The agent affected by Byzantine attacks sends different wrong values to all neighbors while applying wrong input signals for itself, which is aggressive and difficult to be defended. Inspired by the concept of Digital Twin, a new hierarchical protocol equipped with a virtual twin layer (TL) is proposed, which decouples the above problems into the defense scheme against Byzantine edge attacks on the TL and the defense scheme against Byzantine node attacks on the cyber-physical layer (CPL). On the TL, we propose a resilient topology reconfiguration strategy by adding a minimum number of key edges to improve network resilience. It is strictly proved that the control strategy is sufficient to achieve asymptotic consensus in finite time with the topology on the TL satisfying strongly $(2f+1)$-robustness. On the CPL, decentralized chattering-free controllers are proposed to guarantee the resilient output consensus for the heterogeneous MASs against Byzantine node attacks. Moreover, the obtained controller shows exponential convergence. The effectiveness and practicality of the theoretical results are verified by numerical examples. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.14082 [pdf, ps, other]

Deep Reinforcement Learning for Distributed Dynamic Coordinated Beamforming in Massive MIMO Cellular Networks

Authors: Jungang Ge, Ying-Chang Liang, Liao Zhang, Ruizhe Long, Sumei Sun

Abstract: To accommodate the explosive wireless traffics, massive multiple-input multiple-output (MIMO) is regarded as one of the key enabling technologies for next-generation communication systems. In massive MIMO cellular networks, coordinated beamforming (CBF), which jointly designs the beamformers of multiple base stations (BSs), is an efficient method to enhance the network performance. In this paper,… ▽ More To accommodate the explosive wireless traffics, massive multiple-input multiple-output (MIMO) is regarded as one of the key enabling technologies for next-generation communication systems. In massive MIMO cellular networks, coordinated beamforming (CBF), which jointly designs the beamformers of multiple base stations (BSs), is an efficient method to enhance the network performance. In this paper, we investigate the sum rate maximization problem in a massive MIMO mobile cellular network, where in each cell a multi-antenna BS serves multiple mobile users simultaneously via downlink beamforming. Although existing optimization-based CBF algorithms can provide near-optimal solutions, they require realtime and global channel state information (CSI), in addition to their high computation complexity. It is almost impossible to apply them in practical wireless networks, especially highly dynamic mobile cellular networks. Motivated by this, we propose a deep reinforcement learning based distributed dynamic coordinated beamforming (DDCBF) framework, which enables each BS to determine the beamformers with only local CSI and some historical information from other BSs.Besides, the beamformers can be calculated with a considerably lower computational complexity by exploiting neural networks and expert knowledge, i.e., a solution structure observed from the iterative procedure of the weighted minimum mean square error (WMMSE) algorithm. Moreover, we provide extensive numerical simulations to validate the effectiveness of the proposed DRL-based approach. With lower computational complexity and less required information, the results show that the proposed approach can achieve comparable performance to the centralized iterative optimization algorithms. △ Less

Submitted 24 March, 2023; originally announced March 2023.

arXiv:2303.13760 [pdf, ps, other]

Multiple Access Design for Symbiotic Radios: Facilitating Massive IoT Connections with Cellular Networks

Authors: Jun Wang, Xiangyu Ding, Qianqian Zhang, Ying-Chang Liang

Abstract: Symbiotic radio (SR) has emerged as a spectrum- and energy-efficient paradigm to support massive Internet of Things (IoT) connections. Two multiple access schemes are proposed in this paper to facilitate the massive IoT connections using the cellular network based on the SR technique, namely, the simultaneous access (SA) scheme and the selection diversity access (SDA) scheme. In the SA scheme, the… ▽ More Symbiotic radio (SR) has emerged as a spectrum- and energy-efficient paradigm to support massive Internet of Things (IoT) connections. Two multiple access schemes are proposed in this paper to facilitate the massive IoT connections using the cellular network based on the SR technique, namely, the simultaneous access (SA) scheme and the selection diversity access (SDA) scheme. In the SA scheme, the base station (BS) transmits information to the receiver while multiple IoT devices transmit their information simultaneously by passively backscattering the BS signal to the receiver, while in the SDA scheme, only the IoT device with the strongest backscatter link transmits information to the receiver. In both of the schemes, the receiver jointly decodes the information from the BS and the IoT devices. To evaluate the above two schemes, in this paper, we have derived the closed-form expressions of the ergodic rates and the outage probabilities for the cellular and IoT transmissions. Finally, numerical results are provided to verify the theoretical analysis and compare the two proposed multiple access schemes. When the number of IoT devices is small, the SDA scheme is more appealing since it can significantly reduce the computational complexity while achieving equivalent performance to the SA scheme. When the number of IoT devices is large, the SA scheme is preferable since it guarantees a significantly better rate performance and a lower outage probability. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.11413 [pdf, other]

doi 10.54364/AAIML.2023.1165

Structural Vibration Signal Denoising Using Stacking Ensemble of Hybrid CNN-RNN

Authors: Youzhi Liang, Wen Liang, Jianguo Jia

Abstract: Vibration signals have been increasingly utilized in various engineering fields for analysis and monitoring purposes, including structural health monitoring, fault diagnosis and damage detection, where vibration signals can provide valuable information about the condition and integrity of structures. In recent years, there has been a growing trend towards the use of vibration signals in the field… ▽ More Vibration signals have been increasingly utilized in various engineering fields for analysis and monitoring purposes, including structural health monitoring, fault diagnosis and damage detection, where vibration signals can provide valuable information about the condition and integrity of structures. In recent years, there has been a growing trend towards the use of vibration signals in the field of bioengineering. Activity-induced structural vibrations, particularly footstep-induced signals, are useful for analyzing the movement of biological systems such as the human body and animals, providing valuable information regarding an individual's gait, body mass, and posture, making them an attractive tool for health monitoring, security, and human-computer interaction. However, the presence of various types of noise can compromise the accuracy of footstep-induced signal analysis. In this paper, we propose a novel ensemble model that leverages both the ensemble of multiple signals and of recurrent and convolutional neural network predictions. The proposed model consists of three stages: preprocessing, hybrid modeling, and ensemble. In the preprocessing stage, features are extracted using the Fast Fourier Transform and wavelet transform to capture the underlying physics-governed dynamics of the system and extract spatial and temporal features. In the hybrid modeling stage, a bi-directional LSTM is used to denoise the noisy signal concatenated with FFT results, and a CNN is used to obtain a condensed feature representation of the signal. In the ensemble stage, three layers of a fully-connected neural network are used to produce the final denoised signal. The proposed model addresses the challenges associated with structural vibration signals, which outperforms the prevailing algorithms for a wide range of noise levels, evaluated using PSNR, SNR, and WMAPE. △ Less

Submitted 22 July, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

Comments: 10 pages, 4 figures

Showing 1–50 of 130 results for author: Liang, Y