-
MMR-Mamba: Multi-Contrast MRI Reconstruction with Mamba and Spatial-Frequency Information Fusion
Authors:
**g Zou,
Lanqing Liu,
Qi Chen,
Shujun Wang,
Xiaohan Xing,
**g Qin
Abstract:
Multi-contrast MRI acceleration has become prevalent in MR imaging, enabling the reconstruction of high-quality MR images from under-sampled k-space data of the target modality, using guidance from a fully-sampled auxiliary modality. The main crux lies in efficiently and comprehensively integrating complementary information from the auxiliary modality. Existing methods either suffer from quadratic…
▽ More
Multi-contrast MRI acceleration has become prevalent in MR imaging, enabling the reconstruction of high-quality MR images from under-sampled k-space data of the target modality, using guidance from a fully-sampled auxiliary modality. The main crux lies in efficiently and comprehensively integrating complementary information from the auxiliary modality. Existing methods either suffer from quadratic computational complexity or fail to capture long-range correlated features comprehensively. In this work, we propose MMR-Mamba, a novel framework that achieves comprehensive integration of multi-contrast features through Mamba and spatial-frequency information fusion. Firstly, we design the \textit{Target modality-guided Cross Mamba} (TCM) module in the spatial domain, which maximally restores the target modality information by selectively absorbing useful information from the auxiliary modality. Secondly, leveraging global properties of the Fourier domain, we introduce the \textit{Selective Frequency Fusion} (SFF) module to efficiently integrate global information in the frequency domain and recover high-frequency signals for the reconstruction of structure details. Additionally, we present the \textit{Adaptive Spatial-Frequency Fusion} (ASFF) module, which enhances fused features by supplementing less informative features from one domain with corresponding features from the other domain. These innovative strategies ensure efficient feature fusion across spatial and frequency domains, avoiding the introduction of redundant information and facilitating the reconstruction of high-quality target images. Extensive experiments on the BraTS and fastMRI knee datasets demonstrate the superiority of the proposed MMR-Mamba over state-of-the-art MRI reconstruction methods.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Using a Convolutional Neural Network and Explainable AI to Diagnose Dementia Based on MRI Scans
Authors:
Tyler Morris,
Ziming Liu,
Longjian Liu,
Xiaopeng Zhao
Abstract:
As the number of dementia patients rises, the need for accurate diagnostic procedures rises as well. Current methods, like using an MRI scan, rely on human input, which can be inaccurate. However, the decision logic behind machine learning algorithms and their outputs cannot be explained, as most operate in black-box models. Therefore, to increase the accuracy of diagnosing dementia through MRIs,…
▽ More
As the number of dementia patients rises, the need for accurate diagnostic procedures rises as well. Current methods, like using an MRI scan, rely on human input, which can be inaccurate. However, the decision logic behind machine learning algorithms and their outputs cannot be explained, as most operate in black-box models. Therefore, to increase the accuracy of diagnosing dementia through MRIs, a convolution neural network has been developed and trained using an open-source database of 6400 MRI scans divided into 4 dementia classes. The model, which attained a 98 percent validation accuracy, was shown to be well fit and able to generalize to new data. Furthermore, to aid in the visualization of the model output, an explainable AI algorithm was developed by visualizing the outputs of individual filters in each convolution layer, which highlighted regions of interest in the scan. These outputs do a great job of identifying the image features that contribute most to the model classification, thus allowing users to visualize and understand the results. Altogether, this combination of the convolution neural network and explainable AI algorithm creates a system that can be used in the medical field to not only aid in the proper classification of dementia but also allow everyone involved to visualize and understand the results.
△ Less
Submitted 25 May, 2024;
originally announced June 2024.
-
Neural Network-based Two-Dimensional Filtering for OTFS Symbol Detection
Authors:
Jiarui Xu,
Karim Said,
Lizhong Zheng,
Lingjia Liu
Abstract:
Orthogonal time frequency space (OTFS) is a promising modulation scheme for wireless communication in high-mobility scenarios. Recently, a reservoir computing (RC) based approach has been introduced for online subframe-based symbol detection in the OTFS system, where only the limited over-the-air (OTA) pilot symbols are utilized for training. However, the previous RC-based approach does not design…
▽ More
Orthogonal time frequency space (OTFS) is a promising modulation scheme for wireless communication in high-mobility scenarios. Recently, a reservoir computing (RC) based approach has been introduced for online subframe-based symbol detection in the OTFS system, where only the limited over-the-air (OTA) pilot symbols are utilized for training. However, the previous RC-based approach does not design the RC architecture based on the properties of the OTFS system to fully unlock the potential of RC. This paper introduces a novel two-dimensional RC (2D-RC) approach for online symbol detection on a subframe basis in the OTFS system. The 2D-RC is designed to have a two-dimensional (2D) filtering structure to equalize the 2D circular channel effect in the delay-Doppler (DD) domain of the OTFS system. With the introduced architecture, the 2D-RC can operate in the DD domain with only a single neural network, unlike our previous work which requires multiple RCs to track channel variations in the time domain. Experimental results demonstrate the advantages of the 2D-RC approach over the previous RC-based approach and the compared model-based methods across different modulation orders.
△ Less
Submitted 8 March, 2024;
originally announced June 2024.
-
Monte Carlo Planning for Stochastic Control on Constrained Markov Decision Processes
Authors:
Larkin Liu,
Shiqi Liu,
Matej Jusup
Abstract:
In the world of stochastic control, especially in economics and engineering, Markov Decision Processes (MDPs) can effectively model various stochastic decision processes, from asset management to transportation optimization. These underlying MDPs, upon closer examination, often reveal a specifically constrained causal structure concerning the transition and reward dynamics. By exploiting this stru…
▽ More
In the world of stochastic control, especially in economics and engineering, Markov Decision Processes (MDPs) can effectively model various stochastic decision processes, from asset management to transportation optimization. These underlying MDPs, upon closer examination, often reveal a specifically constrained causal structure concerning the transition and reward dynamics. By exploiting this structure, we can obtain a reduction in the causal representation of the problem setting, allowing us to solve of the optimal value function more efficiently. This work defines an MDP framework, the \texttt{SD-MDP}, where we disentangle the causal structure of MDPs' transition and reward dynamics, providing distinct partitions on the temporal causal graph. With this stochastic reduction, the \texttt{SD-MDP} reflects a general class of resource allocation problems. This disentanglement further enables us to derive theoretical guarantees on the estimation error of the value function under an optimal policy by allowing independent value estimation from Monte Carlo sampling. Subsequently, by integrating this estimator into well-known Monte Carlo planning algorithms, such as Monte Carlo Tree Search (MCTS), we derive bounds on the simple regret of the algorithm. Finally, we quantify the policy improvement of MCTS under the \texttt{SD-MDP} framework by demonstrating that the MCTS planning algorithm achieves higher expected reward (lower costs) under a constant simulation budget, on a tangible economic example based on maritime refuelling.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
CriDiff: Criss-cross Injection Diffusion Framework via Generative Pre-train for Prostate Segmentation
Authors:
Tingwei Liu,
Miao Zhang,
Leiye Liu,
Jialong Zhong,
Shuyao Wang,
Yongri Piao,
Huchuan Lu
Abstract:
Recently, the Diffusion Probabilistic Model (DPM)-based methods have achieved substantial success in the field of medical image segmentation. However, most of these methods fail to enable the diffusion model to learn edge features and non-edge features effectively and to inject them efficiently into the diffusion backbone. Additionally, the domain gap between the images features and the diffusion…
▽ More
Recently, the Diffusion Probabilistic Model (DPM)-based methods have achieved substantial success in the field of medical image segmentation. However, most of these methods fail to enable the diffusion model to learn edge features and non-edge features effectively and to inject them efficiently into the diffusion backbone. Additionally, the domain gap between the images features and the diffusion model features poses a great challenge to prostate segmentation. In this paper, we proposed CriDiff, a two-stage feature injecting framework with a Crisscross Injection Strategy (CIS) and a Generative Pre-train (GP) approach for prostate segmentation. The CIS maximizes the use of multi-level features by efficiently harnessing the complementarity of high and low-level features. To effectively learn multi-level of edge features and non-edge features, we proposed two parallel conditioners in the CIS: the Boundary Enhance Conditioner (BEC) and the Core Enhance Conditioner (CEC), which discriminatively model the image edge regions and non-edge regions, respectively. Moreover, the GP approach eases the inconsistency between the images features and the diffusion model without adding additional parameters. Extensive experiments on four benchmark datasets demonstrate the effectiveness of the proposed method and achieve state-of-the-art performance on four evaluation metrics.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
Authors:
Di Wang,
Meiqi Hu,
Yao **,
Yuchun Miao,
Jiaqi Yang,
Yichu Xu,
Xiaolei Qin,
Jiaqi Ma,
Lingyu Sun,
Chenxing Li,
Chuan Fu,
Hongruixuan Chen,
Chengxi Han,
Naoto Yokoya,
**g Zhang,
Minqiang Xu,
Lin Liu,
Lefei Zhang,
Chen Wu,
Bo Du,
Dacheng Tao,
Liangpei Zhang
Abstract:
Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,…
▽ More
Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA, a vision transformer-based foundation model for HSI interpretation, scalable to over a billion parameters. To tackle the spectral and spatial redundancy challenges in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images, significantly surpassing existing datasets in scale. Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. Moreover, HyperSIGMA shows significant advantages in scalability, robustness, cross-modal transferring capability, and real-world applicability.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Authors:
Philip Anastassiou,
Jiawei Chen,
Jitong Chen,
Yuanzhe Chen,
Zhuo Chen,
Ziyi Chen,
Jian Cong,
Lelai Deng,
Chuang Ding,
Lu Gao,
Mingqing Gong,
Peisong Huang,
Qingqing Huang,
Zhiying Huang,
Yuanyuan Huo,
Dongya Jia,
Chumin Li,
Feiya Li,
Hui Li,
Jiaxin Li,
Xiaoyang Li,
Xingxing Li,
Lin Liu,
Shouda Liu,
Sichao Liu
, et al. (21 additional authors not shown)
Abstract:
We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub…
▽ More
We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and subjective evaluations. With fine-tuning, we achieve even higher subjective scores across these metrics. Seed-TTS offers superior controllability over various speech attributes such as emotion and is capable of generating highly expressive and diverse speech for speakers in the wild. Furthermore, we propose a self-distillation method for speech factorization, as well as a reinforcement learning approach to enhance model robustness, speaker similarity, and controllability. We additionally present a non-autoregressive (NAR) variant of the Seed-TTS model, named $\text{Seed-TTS}_\text{DiT}$, which utilizes a fully diffusion-based architecture. Unlike previous NAR-based TTS systems, $\text{Seed-TTS}_\text{DiT}$ does not depend on pre-estimated phoneme durations and performs speech generation through end-to-end processing. We demonstrate that this variant achieves comparable performance to the language model-based variant and showcase its effectiveness in speech editing. We encourage readers to listen to demos at \url{https://bytedancespeech.github.io/seedtts_tech_report}.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Towards Point Cloud Compression for Machine Perception: A Simple and Strong Baseline by Learning the Octree Depth Level Predictor
Authors:
Lei Liu,
Zhihao Hu,
Zhenghao Chen
Abstract:
Point cloud compression has garnered significant interest in computer vision. However, existing algorithms primarily cater to human vision, while most point cloud data is utilized for machine vision tasks. To address this, we propose a point cloud compression framework that simultaneously handles both human and machine vision tasks. Our framework learns a scalable bit-stream, using only subsets fo…
▽ More
Point cloud compression has garnered significant interest in computer vision. However, existing algorithms primarily cater to human vision, while most point cloud data is utilized for machine vision tasks. To address this, we propose a point cloud compression framework that simultaneously handles both human and machine vision tasks. Our framework learns a scalable bit-stream, using only subsets for different machine vision tasks to save bit-rate, while employing the entire bit-stream for human vision tasks. Building on mainstream octree-based frameworks like VoxelContext-Net, OctAttention, and G-PCC, we introduce a new octree depth-level predictor. This predictor adaptively determines the optimal depth level for each octree constructed from a point cloud, controlling the bit-rate for machine vision tasks. For simpler tasks (\textit{e.g.}, classification) or objects/scenarios, we use fewer depth levels with fewer bits, saving bit-rate. Conversely, for more complex tasks (\textit{e.g}., segmentation) or objects/scenarios, we use deeper depth levels with more bits to enhance performance. Experimental results on various datasets (\textit{e.g}., ModelNet10, ModelNet40, ShapeNet, ScanNet, and KITTI) show that our point cloud compression approach improves performance for machine vision tasks without compromising human vision quality.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Exploring Channel Estimation and Signal Detection for ODDM-based ISAC Systems
Authors:
Dezhi Wang,
Chongwen Huang,
Lei Liu,
Xiaoming Chen,
Wei Wang,
Zhaoyang Zhang,
Chau Yuen,
Mérouane Debbah
Abstract:
Inspired by providing reliable communications for high-mobility scenarios, in this letter, we investigate the channel estimation and signal detection in integrated sensing and communication~(ISAC) systems based on the orthogonal delay-Doppler multiplexing~(ODDM) modulation, which consists of a pulse-train that can achieve the orthogonality with respect to the resolution of the delay-Doppler~(DD) p…
▽ More
Inspired by providing reliable communications for high-mobility scenarios, in this letter, we investigate the channel estimation and signal detection in integrated sensing and communication~(ISAC) systems based on the orthogonal delay-Doppler multiplexing~(ODDM) modulation, which consists of a pulse-train that can achieve the orthogonality with respect to the resolution of the delay-Doppler~(DD) plane. To enhance the communication performance in the ODDM-based ISAC systems, we first propose a low-complexity approximation algorithm for channel estimation, which addresses the challenge of the high complexity from high resolution in the ODDM modulation, and achieves performance close to that of the maximum likelihood estimator scheme. Then, we employ the orthogonal approximate message-passing scheme to detect the symbols in the communication process based on the estimated channel information. Finally, simulation results show that the detection performance of ODDM is better than other multi-carrier modulation schemes. Specifically, the ODDM outperforms the orthogonal time frequency space scheme by 2.3 dB when the bit error ratio is $10^{-6}$.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
Authors:
Chang Li,
Ruoyu Wang,
Lijuan Liu,
Jun Du,
Yixuan Sun,
Zilu Guo,
Zhenrong Zhang,
Yuan Jiang
Abstract:
In recent years, diffusion-based text-to-music (TTM) generation has gained prominence, offering a novel approach to synthesizing musical content from textual descriptions. Achieving high accuracy and diversity in this generation process requires extensive, high-quality data, which often constitutes only a fraction of available datasets. Within open-source datasets, the prevalence of issues like mi…
▽ More
In recent years, diffusion-based text-to-music (TTM) generation has gained prominence, offering a novel approach to synthesizing musical content from textual descriptions. Achieving high accuracy and diversity in this generation process requires extensive, high-quality data, which often constitutes only a fraction of available datasets. Within open-source datasets, the prevalence of issues like mislabeling, weak labeling, unlabeled data, and low-quality music waveform significantly hampers the development of music generation models. To overcome these challenges, we introduce a novel quality-aware masked diffusion transformer (QA-MDT) approach that enables generative models to discern the quality of input music waveform during training. Building on the unique properties of musical signals, we have adapted and implemented a MDT model for TTM task, while further unveiling its distinct capacity for quality control. Moreover, we address the issue of low-quality captions with a caption refinement data processing approach. Our demo page is shown in https://qa-mdt.github.io/. Code on https://github.com/ivcylc/qa-mdt
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis
Authors:
Zeyi Zhang,
Tenglong Ao,
Yuyao Zhang,
Qingzhe Gao,
Chuan Lin,
Baoquan Chen,
Libin Liu
Abstract:
In this work, we present Semantic Gesticulator, a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence. Semantically meaningful gestures are crucial for effective non-verbal communication, but such gestures often fall within the long tail of the distribution of natural human motion. The sparsity of these movements makes it challenging fo…
▽ More
In this work, we present Semantic Gesticulator, a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence. Semantically meaningful gestures are crucial for effective non-verbal communication, but such gestures often fall within the long tail of the distribution of natural human motion. The sparsity of these movements makes it challenging for deep learning-based systems, trained on moderately sized datasets, to capture the relationship between the movements and the corresponding speech semantics. To address this challenge, we develop a generative retrieval framework based on a large language model. This framework efficiently retrieves suitable semantic gesture candidates from a motion library in response to the input speech. To construct this motion library, we summarize a comprehensive list of commonly used semantic gestures based on findings in linguistics, and we collect a high-quality motion dataset encompassing both body and hand movements. We also design a novel GPT-based model with strong generalization capabilities to audio, capable of generating high-quality gestures that match the rhythm of speech. Furthermore, we propose a semantic alignment mechanism to efficiently align the retrieved semantic gestures with the GPT's output, ensuring the naturalness of the final animation. Our system demonstrates robustness in generating gestures that are rhythmically coherent and semantically explicit, as evidenced by a comprehensive collection of examples. User studies confirm the quality and human-likeness of our results, and show that our system outperforms state-of-the-art systems in terms of semantic appropriateness by a clear margin.
△ Less
Submitted 16 May, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Interleave Frequency Division Multiplexing
Authors:
Yuhao Chi,
Lei Liu,
Yao Ge,
Xuehui Chen,
Ying Li,
Zhaoyang Zhang
Abstract:
In this letter, we study interleave frequency division multiplexing (IFDM) for multicarrier modulation in static multipath and mobile time-varying channels, which outperforms orthogonal frequency division multiplexing (OFDM), orthogonal time frequency space (OTFS), and affine frequency division multiplexing (AFDM) by considering practical advanced detectors. The fundamental principle underlying ex…
▽ More
In this letter, we study interleave frequency division multiplexing (IFDM) for multicarrier modulation in static multipath and mobile time-varying channels, which outperforms orthogonal frequency division multiplexing (OFDM), orthogonal time frequency space (OTFS), and affine frequency division multiplexing (AFDM) by considering practical advanced detectors. The fundamental principle underlying existing modulation techniques is to establish sparse equivalent channel matrices in order to facilitate the design of low-complexity detection algorithms for signal recovery, making a trade-off between performance and implementation complexity. In contrast, the proposed IFDM establishes an equivalent fully dense and right-unitarily invariant channel matrix with the goal of achieving channel capacity, ensuring that the signals undergo sufficient statistical channel fading. Meanwhile, a low-complexity and replica maximum a posteriori (MAP)-optimal cross-domain memory approximate message passing (CD-MAMP) detector is proposed for IFDM by exploiting the sparsity of the time-domain channel and the unitary invariance in interleave-frequency-domain channel. Numerical results show that IFDM with extremely low-complexity CD-MAMP outperforms OFDM, OTFS, and AFDM with state-of-the-art orthogonal approximate message passing detectors, particularly at low velocities.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans
Authors:
Lixing Tan,
Shuang Song,
Kangneng Zhou,
Chengbo Duan,
Lanying Wang,
Huayang Ren,
Linlin Liu,
Wei Zhang,
Ruoxiu Xiao
Abstract:
X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume d…
▽ More
X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume data. Existing methods are mainly realized by modelling the whole X-ray imaging procedure. In this study, we propose a learning-based approach termed CT2X-GAN to synthesize the X-ray images in an end-to-end manner using the content and style disentanglement from three different image domains. Our method decouples the anatomical structure information from CT scans and style information from unpaired real X-ray images/ digital reconstructed radiography (DRR) images via a series of decoupling encoders. Additionally, we introduce a novel consistency regularization term to improve the stylistic resemblance between synthesized X-ray images and real X-ray images. Meanwhile, we also impose a supervised process by computing the similarity of computed real DRR and synthesized DRR images. We further develop a pose attention module to fully strengthen the comprehensive information in the decoupled content code from CT scans, facilitating high-quality multi-view image synthesis in the lower 2D space. Extensive experiments were conducted on the publicly available CTSpine1K dataset and achieved 97.8350, 0.0842 and 3.0938 in terms of FID, KID and defined user-scored X-ray similarity, respectively. In comparison with 3D-aware methods ($Ï€$-GAN, EG3D), CT2X-GAN is superior in improving the synthesis quality and realistic to the real X-ray images.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
UAV Trajectory Optimization for Sensing Exploiting Target Location Distribution Map
Authors:
Xiangming Du,
Shuowen Zhang,
Liang Liu
Abstract:
In this paper, we study the trajectory optimization of a cellular-connected unmanned aerial vehicle (UAV) which aims to sense the location of a target while maintaining satisfactory communication quality with the ground base stations (GBSs). In contrast to most existing works which assumed the target's location is known, we focus on a more challenging scenario where the exact location of the targe…
▽ More
In this paper, we study the trajectory optimization of a cellular-connected unmanned aerial vehicle (UAV) which aims to sense the location of a target while maintaining satisfactory communication quality with the ground base stations (GBSs). In contrast to most existing works which assumed the target's location is known, we focus on a more challenging scenario where the exact location of the target to be sensed is unknown and random, while its distribution is known a priori and stored in a novel target location distribution map. Based on this map, the probability for the UAV to successfully sense the target can be expressed as a function of the UAV's trajectory. We aim to optimize the UAV's trajectory between two pre-determined locations to maximize the overall sensing probability during its flight, subject to a GBS-UAV communication quality constraint at each time instant and a maximum mission completion time constraint. Despite the non-convexity and NP-hardness of this problem, we devise three high-quality suboptimal solutions tailored for it with polynomial complexity. Numerical results show that our proposed designs outperform various benchmark schemes.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Voice Attribute Editing with Text Prompt
Authors:
Zhengyan Sheng,
Yang Ai,
Li-Juan Liu,
Jia Pan,
Zhen-Hua Ling
Abstract:
Despite recent advancements in speech generation with text prompt providing control over speech style, voice attributes in synthesized speech remain elusive and challenging to control. This paper introduces a novel task: voice attribute editing with text prompt, with the goal of making relative modifications to voice attributes according to the actions described in the text prompt. To solve this t…
▽ More
Despite recent advancements in speech generation with text prompt providing control over speech style, voice attributes in synthesized speech remain elusive and challenging to control. This paper introduces a novel task: voice attribute editing with text prompt, with the goal of making relative modifications to voice attributes according to the actions described in the text prompt. To solve this task, VoxEditor, an end-to-end generative model, is proposed. In VoxEditor, addressing the insufficiency of text prompt, a Residual Memory (ResMem) block is designed, that efficiently maps voice attributes and these descriptors into the shared feature space. Additionally, the ResMem block is enhanced with a voice attribute degree prediction (VADP) block to align voice attributes with corresponding descriptors, addressing the imprecision of text prompt caused by non-quantitative descriptions of voice attributes. We also establish the open-source VCTK-RVA dataset, which leads the way in manual annotations detailing voice characteristic differences among different speakers. Extensive experiments demonstrate the effectiveness and generalizability of our proposed method in terms of both objective and subjective metrics. The dataset and audio samples are available on the website.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Authors:
Shujie Hu,
Long Zhou,
Shujie Liu,
Sanyuan Chen,
Hongkun Hao,
**g Pan,
Xunying Liu,
**yu Li,
Sunit Sivasankaran,
Linquan Liu,
Furu Wei
Abstract:
The recent advancements in large language models (LLMs) have revolutionized the field of natural language processing, progressively broadening their scope to multimodal perception and generation. However, effectively integrating listening capabilities into LLMs poses significant challenges, particularly with respect to generalizing across varied contexts and executing complex auditory tasks. In th…
▽ More
The recent advancements in large language models (LLMs) have revolutionized the field of natural language processing, progressively broadening their scope to multimodal perception and generation. However, effectively integrating listening capabilities into LLMs poses significant challenges, particularly with respect to generalizing across varied contexts and executing complex auditory tasks. In this work, we introduce WavLLM, a robust and adaptive speech large language model with dual encoders, and a prompt-aware LoRA weight adapter, optimized by a two-stage curriculum learning approach. Leveraging dual encoders, we decouple different types of speech information, utilizing a Whisper encoder to process the semantic content of speech, and a WavLM encoder to capture the unique characteristics of the speaker's identity. Within the curriculum learning framework, WavLLM first builds its foundational capabilities by optimizing on mixed elementary single tasks, followed by advanced multi-task training on more complex tasks such as combinations of the elementary tasks. To enhance the flexibility and adherence to different tasks and instructions, a prompt-aware LoRA weight adapter is introduced in the second advanced multi-task training stage. We validate the proposed model on universal speech benchmarks including tasks such as ASR, ST, SV, ER, and also apply it to specialized datasets like Gaokao English listening comprehension set for SQA, and speech Chain-of-Thought (CoT) evaluation set. Experiments demonstrate that the proposed model achieves state-of-the-art performance across a range of speech tasks on the same model size, exhibiting robust generalization capabilities in executing complex tasks using CoT approach. Furthermore, our model successfully completes Gaokao tasks without specialized training. The codes, models, audio, and Gaokao evaluation set can be accessed at \url{aka.ms/wavllm}.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Leveraging A Variety of Anchors in Cellular Network for Ubiquitous Sensing
Authors:
Liang Liu,
Shuowen Zhang,
Shuguang Cui
Abstract:
Integrated sensing and communication (ISAC) has recently attracted tremendous attention from both academia and industry, being envisioned as a key part of the standards for the sixth-generation (6G) cellular network. A key challenge of 6G-oriented ISAC lies in how to perform ubiquitous sensing based on the communication signals and devices. Previous works have made great progresses on studying the…
▽ More
Integrated sensing and communication (ISAC) has recently attracted tremendous attention from both academia and industry, being envisioned as a key part of the standards for the sixth-generation (6G) cellular network. A key challenge of 6G-oriented ISAC lies in how to perform ubiquitous sensing based on the communication signals and devices. Previous works have made great progresses on studying the signal waveform design that leads to optimal communication-sensing performance tradeoff. In this article, we aim to focus on issues arising from the exploitation of the communication devices for sensing in 6G network. Particularly, we will discuss about how to leverage various nodes available in the cellular network as anchors to perform ubiquitous sensing. On one hand, the base stations (BSs) will be the most important anchors in the future 6G ISAC network, since they can generate/process radio signals with high range/angle resolutions, and their positions are precisely known. Correspondingly, we will first study the BS-based sensing technique. On the other hand, the BSs alone may not enable ubiquitous sensing, since they cannot cover all the places with strong line-of-sight (LOS) links. This motivates us to investigate the possibility of using other nodes that are with higher density in the network to act as the anchors. Along this line, we are interested in two types of new anchors - user equipments (UEs) and reconfigurable intelligent surfaces (RISs). This paper will shed light on the opportunities and challenges brought by UE-assisted sensing and RIS-assisted sensing. Our goal is to devise a novel 6G-oriented sensing architecture where BSs, UEs, and RISs can work together to provide ubiquitous sensing services.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
UAV Deployment Optimization in UAV-assisted Wireless Communications
Authors:
Xueqi Zhang,
Aimin Wang,
Geng Sun,
Lingling Liu,
**g Zhang
Abstract:
Due to the fact that the locations of base stations (BSs) cannot be changed after they are installed, it is very difficult to communicate directly with remote user equipment (UE), which will directly affect the lifespan of the system. Unmanned aerial vehicles (UAVs) offer a hopeful solution as mobile relays for fifth-generation wireless communications due to the flexible and cost-effective deploym…
▽ More
Due to the fact that the locations of base stations (BSs) cannot be changed after they are installed, it is very difficult to communicate directly with remote user equipment (UE), which will directly affect the lifespan of the system. Unmanned aerial vehicles (UAVs) offer a hopeful solution as mobile relays for fifth-generation wireless communications due to the flexible and cost-effective deployment. However, with the limited onboard energy of UAV and slow progress in energy storage technology, it is a key challenge to achieve the energy-efficient communication. Therefore, in this article, we study a wireless communication network using a UAV as a high-altitude relay, and formulate a UAV relay deployment optimization problem (URDOP) to minimize the energy consumption of system by optimizing the deployment of UAV, including the locations and number of UAV hover points. Since the formulated URDOP is a mixed-integer programming problem, it presents a significant challenge for conventional gradient-based approaches. To this end, we propose a self-adaptive differential evolution with a variable population size (SaDEVPS) algorithm to solve the formulated URDOP. The performance of proposed SaDEVPS is verified through simulations, and the results show that it can successfully decrease the energy consumption of system when compared to other benchmark algorithms across multiple instances.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Secure and Energy-efficient Unmanned Aerial Vehicle-enabled Visible Light Communication via A Multi-objective Optimization Approach
Authors:
Lingling Liu,
Aimin Wang,
**g Wu,
Jiao Lu,
Jiahui Li,
Geng Sun
Abstract:
In this research, a unique approach to provide communication service for terrestrial receivers via using unmanned aerial vehicle-enabled visible light communication is investigated. Specifically, we take into account a unmanned aerial vehicle-enabled visible light communication scenario with multiplex transmitters, multiplex receivers, and a single eavesdropper, each of which is equipped with a si…
▽ More
In this research, a unique approach to provide communication service for terrestrial receivers via using unmanned aerial vehicle-enabled visible light communication is investigated. Specifically, we take into account a unmanned aerial vehicle-enabled visible light communication scenario with multiplex transmitters, multiplex receivers, and a single eavesdropper, each of which is equipped with a single photodetector. Then, a unmanned aerial vehicle deployment multi-objective optimization problem is formulated to simultaneously make the optical power received by receiving surface more uniform, minimize the amount of information collected by a eavesdropper, and minimize the energy consumption of unmanned aerial vehicles, while the locations and transmission power of unmanned aerial vehicles are simultaneously optimized under certain constraints. Since the formulated unmanned aerial vehicle deployment multi-objective optimization problem is complex and nonlinear, it is challenging to be tackled by using conventional methods. For the purpose of solving the problem, a multi-objective evolutionary algorithm based on decomposition with chaos initiation and crossover mutation is proposed. Simulation outcomes show that the proposed approach is superior to other approaches, and is efficient at improving the security and energy efficiency of visible light communication system.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Multi-objective Optimization for Data Collection in UAV-assisted Agricultural IoT
Authors:
Lingling Liu,
Aimin Wang,
Geng Sun,
Jiahui Li,
Hongyang Pan,
Tony Q. S. Quek
Abstract:
The ground fixed base stations (BSs) are often deployed inflexibly, and have high overheads, as well as are susceptible to the damage from natural disasters, making it impractical for them to continuously collect data from sensor devices. To improve the network coverage and performance of wireless communication, unmanned aerial vehicles (UAVs) have been introduced in diverse wireless networks, the…
▽ More
The ground fixed base stations (BSs) are often deployed inflexibly, and have high overheads, as well as are susceptible to the damage from natural disasters, making it impractical for them to continuously collect data from sensor devices. To improve the network coverage and performance of wireless communication, unmanned aerial vehicles (UAVs) have been introduced in diverse wireless networks, therefore in this work we consider employing a UAV as an aerial BS to acquire data of agricultural Internet of Things (IoT) devices. To this end, we first formulate a UAV-assisted data collection multi-objective optimization problem (UDCMOP) to efficiently collect the data from agricultural sensing devices. Specifically, we aim to collaboratively optimize the hovering positions of UAV, visit sequence of UAV, speed of UAV, in addition to the transmit power of devices, to simultaneously achieve the maximization of minimum transmit rate of devices, the minimization of total energy consumption of devices, and the minimization of total energy consumption of UAV. Second, the proposed UDCMOP is a non-convex mixed integer nonlinear optimization problem, which indicates that it includes continuous and discrete solutions, making it intractable to be solved. Therefore, we solve it by proposing an improved multi-objective artificial hummingbird algorithm (IMOAHA) with several specific improvement factors, that are the hybrid initialization operator, Cauchy mutation foraging operator, in addition to the discrete mutation operator. Finally, simulations are carried out to testify that the proposed IMOAHA can effectively improve the system performance comparing to other benchmarks.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Frequency-Reactive Power Optimization Strategy of Grid-forming Offshore Wind Farm Using DRU-HVDC Transmission
Authors:
Zhekai Li,
Kun Han,
Xu Cai,
Renxin Yang,
Haotian Yu,
Kepeng Xia,
Lulu Liu
Abstract:
The diode rectifier unit-based high voltage direct current (DRU-HVDC) transmission with grid-forming (GFM) wind turbine is becoming a promising scheme for offshore wind farm(OWF) integration due to its high reliability and low cost. In this scheme, the AC network of the OWF and the DRU has completely different synchronization mechanisms and power flow characteristics from the traditional power sys…
▽ More
The diode rectifier unit-based high voltage direct current (DRU-HVDC) transmission with grid-forming (GFM) wind turbine is becoming a promising scheme for offshore wind farm(OWF) integration due to its high reliability and low cost. In this scheme, the AC network of the OWF and the DRU has completely different synchronization mechanisms and power flow characteristics from the traditional power system. To optimize the power flow and reduce the net loss, this paper carries out the power flow modeling and optimization analysis for the DRU-HVDC transmission system with grid-forming OWFs. The influence of the DRU and the GFM wind turbines on the power flow of the system is analyzed. On this basis, improved constraint conditions are proposed and an optimal power flow (OPF) method is established. This method can minimize the power loss by adjusting the reactive power output of each wind turbine and internal network frequency. Finally, based on MATLAB, this paper uses YALMIP toolkit and CPLEX mathematical solver to realize the programming solution of the OPF model proposed in this paper. The results show that the proposed optimization strategy can effectively reduce the power loss of the entire OWF and the transmission system with an optimization ratio of network losses exceeding 25.3%.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Biophysics Informed Pathological Regularisation for Brain Tumour Segmentation
Authors:
Lipei Zhang,
Yanqi Cheng,
Lihao Liu,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
Recent advancements in deep learning have significantly improved brain tumour segmentation techniques; however, the results still lack confidence and robustness as they solely consider image data without biophysical priors or pathological information. Integrating biophysics-informed regularisation is one effective way to change this situation, as it provides an prior regularisation for automated e…
▽ More
Recent advancements in deep learning have significantly improved brain tumour segmentation techniques; however, the results still lack confidence and robustness as they solely consider image data without biophysical priors or pathological information. Integrating biophysics-informed regularisation is one effective way to change this situation, as it provides an prior regularisation for automated end-to-end learning. In this paper, we propose a novel approach that designs brain tumour growth Partial Differential Equation (PDE) models as a regularisation with deep learning, operational with any network model. Our method introduces tumour growth PDE models directly into the segmentation process, improving accuracy and robustness, especially in data-scarce scenarios. This system estimates tumour cell density using a periodic activation function. By effectively integrating this estimation with biophysical models, we achieve a better capture of tumour characteristics. This approach not only aligns the segmentation closer to actual biological behaviour but also strengthens the model's performance under limited data conditions. We demonstrate the effectiveness of our framework through extensive experiments on the BraTS 2023 dataset, showcasing significant improvements in both precision and reliability of tumour segmentation.
△ Less
Submitted 17 March, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds
Authors:
Tianrui Lou,
Xiaojun Jia,
**dong Gu,
Li Liu,
Siyuan Liang,
Bangyan He,
Xiaochun Cao
Abstract:
Adversarial attack methods based on point manipulation for 3D point cloud classification have revealed the fragility of 3D models, yet the adversarial examples they produce are easily perceived or defended against. The trade-off between the imperceptibility and adversarial strength leads most point attack methods to inevitably introduce easily detectable outlier points upon a successful attack. An…
▽ More
Adversarial attack methods based on point manipulation for 3D point cloud classification have revealed the fragility of 3D models, yet the adversarial examples they produce are easily perceived or defended against. The trade-off between the imperceptibility and adversarial strength leads most point attack methods to inevitably introduce easily detectable outlier points upon a successful attack. Another promising strategy, shape-based attack, can effectively eliminate outliers, but existing methods often suffer significant reductions in imperceptibility due to irrational deformations. We find that concealing deformation perturbations in areas insensitive to human eyes can achieve a better trade-off between imperceptibility and adversarial strength, specifically in parts of the object surface that are complex and exhibit drastic curvature changes. Therefore, we propose a novel shape-based adversarial attack method, HiT-ADV, which initially conducts a two-stage search for attack regions based on saliency and imperceptibility scores, and then adds deformation perturbations in each attack region using Gaussian kernel functions. Additionally, HiT-ADV is extendable to physical attack. We propose that by employing benign resampling and benign rigid transformations, we can further enhance physical adversarial strength with little sacrifice to imperceptibility. Extensive experiments have validated the superiority of our method in terms of adversarial and imperceptible properties in both digital and physical spaces. Our code is avaliable at: https://github.com/TRLou/HiT-ADV.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Design of Stochastic Quantizers for Privacy Preservation
Authors:
Le Liu,
Yu Kawano,
Ming Cao
Abstract:
In this paper, we examine the role of stochastic quantizers for privacy preservation. We first employ a static stochastic quantizer and investigate its corresponding privacy-preserving properties. Specifically, we demonstrate that a sufficiently large quantization step guarantees $(0, δ)$ differential privacy. Additionally, the degradation of control performance caused by quantization is evaluated…
▽ More
In this paper, we examine the role of stochastic quantizers for privacy preservation. We first employ a static stochastic quantizer and investigate its corresponding privacy-preserving properties. Specifically, we demonstrate that a sufficiently large quantization step guarantees $(0, δ)$ differential privacy. Additionally, the degradation of control performance caused by quantization is evaluated as the tracking error of output regulation. These two analyses characterize the trade-off between privacy and control performance, determined by the quantization step. This insight enables us to use quantization intentionally as a means to achieve the seemingly conflicting two goals of maintaining control performance and preserving privacy at the same time; towards this end, we further investigate a dynamic stochastic quantizer. Under a stability assumption, the dynamic stochastic quantizer can enhance privacy, more than the static one, while achieving the same control performance. We further handle the unstable case by additionally applying input Gaussian noise.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Learning at the Speed of Wireless: Online Real-Time Learning for AI-Enabled MIMO in NextG
Authors:
Jiarui Xu,
Shashank Jere,
Yifei Song,
Yi-Hung Kao,
Lizhong Zheng,
Lingjia Liu
Abstract:
Integration of artificial intelligence (AI) and machine learning (ML) into the air interface has been envisioned as a key technology for next-generation (NextG) cellular networks. At the air interface, multiple-input multiple-output (MIMO) and its variants such as multi-user MIMO (MU-MIMO) and massive/full-dimension MIMO have been key enablers across successive generations of cellular networks wit…
▽ More
Integration of artificial intelligence (AI) and machine learning (ML) into the air interface has been envisioned as a key technology for next-generation (NextG) cellular networks. At the air interface, multiple-input multiple-output (MIMO) and its variants such as multi-user MIMO (MU-MIMO) and massive/full-dimension MIMO have been key enablers across successive generations of cellular networks with evolving complexity and design challenges. Initiating active investigation into leveraging AI/ML tools to address these challenges for MIMO becomes a critical step towards an AI-enabled NextG air interface. At the NextG air interface, the underlying wireless environment will be extremely dynamic with operation adaptations performed on a sub-millisecond basis by MIMO operations such as MU-MIMO scheduling and rank/link adaptation. Given the enormously large number of operation adaptation possibilities, we contend that online real-time AI/ML-based approaches constitute a promising paradigm. To this end, we outline the inherent challenges and offer insights into the design of such online real-time AI/ML-based solutions for MIMO operations. An online real-time AI/ML-based method for MIMO-OFDM channel estimation is then presented, serving as a potential roadmap for develo** similar techniques across various MIMO operations in NextG.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
LUM-ViT: Learnable Under-sampling Mask Vision Transformer for Bandwidth Limited Optical Signal Acquisition
Authors:
Lingfeng Liu,
Dong Ni,
Hangjie Yuan
Abstract:
Bandwidth constraints during signal acquisition frequently impede real-time detection applications. Hyperspectral data is a notable example, whose vast volume compromises real-time hyperspectral detection. To tackle this hurdle, we introduce a novel approach leveraging pre-acquisition modulation to reduce the acquisition volume. This modulation process is governed by a deep learning model, utilizi…
▽ More
Bandwidth constraints during signal acquisition frequently impede real-time detection applications. Hyperspectral data is a notable example, whose vast volume compromises real-time hyperspectral detection. To tackle this hurdle, we introduce a novel approach leveraging pre-acquisition modulation to reduce the acquisition volume. This modulation process is governed by a deep learning model, utilizing prior information. Central to our approach is LUM-ViT, a Vision Transformer variant. Uniquely, LUM-ViT incorporates a learnable under-sampling mask tailored for pre-acquisition modulation. To further optimize for optical calculations, we propose a kernel-level weight binarization technique and a three-stage fine-tuning strategy. Our evaluations reveal that, by sampling a mere 10% of the original image pixels, LUM-ViT maintains the accuracy loss within 1.8% on the ImageNet classification task. The method sustains near-original accuracy when implemented on real-world optical hardware, demonstrating its practicality. Code will be available at https://github.com/MaxLLF/LUM-ViT.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Event-Triggered Robust Cooperative Output Regulation for a Class of Linear Multi-Agent Systems with an Unknown Exosystem
Authors:
Yangyang Qian,
Lu Liu
Abstract:
This paper investigates the robust cooperative output regulation problem for a class of heterogeneous uncertain linear multi-agent systems with an unknown exosystem via event-triggered control (ETC). By utilizing the internal model approach and the adaptive control technique, a distributed adaptive internal model is constructed for each agent. Then, based on this internal model, a fully distribute…
▽ More
This paper investigates the robust cooperative output regulation problem for a class of heterogeneous uncertain linear multi-agent systems with an unknown exosystem via event-triggered control (ETC). By utilizing the internal model approach and the adaptive control technique, a distributed adaptive internal model is constructed for each agent. Then, based on this internal model, a fully distributed ETC strategy composed of a distributed event-triggered adaptive output feedback control law and a distributed dynamic event-triggering mechanism is proposed, in which each agent updates its control input at its own triggering time instants. It is shown that under the proposed ETC strategy, the robust cooperative output regulation problem can be solved without requiring either the global information associated with the communication topology or the bounds of the uncertain or unknown parameters in each agent and the exosystem. A numerical example is provided to illustrate the effectiveness of the proposed control strategy.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Radar Anti-jamming Strategy Learning via Domain-knowledge Enhanced Online Convex Optimization
Authors:
Liangqi Liu,
Wenqiang Pu,
Yingru Li,
Bo Jiu,
Zhi-Quan Luo
Abstract:
The dynamic competition between radar and jammer systems presents a significant challenge for modern Electronic Warfare (EW), as current active learning approaches still lack sample efficiency and fail to exploit jammer's characteristics. In this paper, the competition between a frequency agile radar and a Digital Radio Frequency Memory (DRFM)-based intelligent jammer is considered. We introduce a…
▽ More
The dynamic competition between radar and jammer systems presents a significant challenge for modern Electronic Warfare (EW), as current active learning approaches still lack sample efficiency and fail to exploit jammer's characteristics. In this paper, the competition between a frequency agile radar and a Digital Radio Frequency Memory (DRFM)-based intelligent jammer is considered. We introduce an Online Convex Optimization (OCO) framework designed to illustrate this adversarial interaction. Notably, traditional OCO algorithms exhibit suboptimal sample efficiency due to the limited information obtained per round. To address the limitations, two refined algorithms are proposed, utilizing unbiased gradient estimators that leverage the unique attributes of the jammer system. Sub-linear theoretical results on both static regret and universal regret are provided, marking a significant improvement in OCO performance. Furthermore, simulation results reveal that the proposed algorithms outperform common OCO baselines, suggesting the potential for effective deployment in real-world scenarios.
△ Less
Submitted 28 February, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
Autonomous Mapless Navigation on Uneven Terrains
Authors:
Hassan Jardali,
Mahmoud Ali,
Lantao Liu
Abstract:
We propose a new method for autonomous navigation in uneven terrains by utilizing a sparse Gaussian Process (SGP) based local perception model. The SGP local perception model is trained on local ranging observation (pointcloud) to learn the terrain elevation profile and extract the feasible navigation subgoals around the robot. Subsequently, a cost function, which prioritizes the safety of the rob…
▽ More
We propose a new method for autonomous navigation in uneven terrains by utilizing a sparse Gaussian Process (SGP) based local perception model. The SGP local perception model is trained on local ranging observation (pointcloud) to learn the terrain elevation profile and extract the feasible navigation subgoals around the robot. Subsequently, a cost function, which prioritizes the safety of the robot in terms of kee** the robot's roll and pitch angles bounded within a specified range, is used to select a safety-aware subgoal that leads the robot to its final destination. The algorithm is designed to run in real-time and is intensively evaluated in simulation and real world experiments. The results compellingly demonstrate that our proposed algorithm consistently navigates uneven terrains with high efficiency and surpasses the performance of other planners. The code and video can be found here: https://rb.gy/3ov2r8
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition
Authors:
Lei Liu,
Li Liu,
Haizhou Li
Abstract:
Cued Speech (CS) is a pure visual coding method used by hearing-impaired people that combines lip reading with several specific hand shapes to make the spoken language visible. Automatic CS recognition (ACSR) seeks to transcribe visual cues of speech into text, which can help hearing-impaired people to communicate effectively. The visual information of CS contains lip reading and hand cueing, thus…
▽ More
Cued Speech (CS) is a pure visual coding method used by hearing-impaired people that combines lip reading with several specific hand shapes to make the spoken language visible. Automatic CS recognition (ACSR) seeks to transcribe visual cues of speech into text, which can help hearing-impaired people to communicate effectively. The visual information of CS contains lip reading and hand cueing, thus the fusion of them plays an important role in ACSR. However, most previous fusion methods struggle to capture the global dependency present in long sequence inputs of multi-modal CS data. As a result, these methods generally fail to learn the effective cross-modal relationships that contribute to the fusion. Recently, attention-based transformers have been a prevalent idea for capturing the global dependency over the long sequence in multi-modal fusion, but existing multi-modal fusion transformers suffer from both poor recognition accuracy and inefficient computation for the ACSR task. To address these problems, we develop a novel computation and parameter efficient multi-modal fusion transformer by proposing a novel Token-Importance-Aware Attention mechanism (TIAA), where a token utilization rate (TUR) is formulated to select the important tokens from the multi-modal streams. More precisely, TIAA firstly models the modality-specific fine-grained temporal dependencies over all tokens of each modality, and then learns the efficient cross-modal interaction for the modality-shared coarse-grained temporal dependencies over the important tokens of different modalities. Besides, a light-weight gated hidden projection is designed to control the feature flows of TIAA. The resulting model, named Economical Cued Speech Fusion Transformer (EcoCued), achieves state-of-the-art performance on all existing CS datasets, compared with existing transformer-based fusion methods and ACSR fusion methods.
△ Less
Submitted 8 February, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
GarchingSim: An Autonomous Driving Simulator with Photorealistic Scenes and Minimalist Workflow
Authors:
Liguo Zhou,
Yinglei Song,
Yichao Gao,
Zhou Yu,
Michael Sodamin,
Hongshen Liu,
Liang Ma,
Lian Liu,
Hao Liu,
Yang Liu,
Haichuan Li,
Guang Chen,
Alois Knoll
Abstract:
Conducting real road testing for autonomous driving algorithms can be expensive and sometimes impractical, particularly for small startups and research institutes. Thus, simulation becomes an important method for evaluating these algorithms. However, the availability of free and open-source simulators is limited, and the installation and configuration process can be daunting for beginners and inte…
▽ More
Conducting real road testing for autonomous driving algorithms can be expensive and sometimes impractical, particularly for small startups and research institutes. Thus, simulation becomes an important method for evaluating these algorithms. However, the availability of free and open-source simulators is limited, and the installation and configuration process can be daunting for beginners and interdisciplinary researchers. We introduce an autonomous driving simulator with photorealistic scenes, meanwhile kee** a user-friendly workflow. The simulator is able to communicate with external algorithms through ROS2 or Socket.IO, making it compatible with existing software stacks. Furthermore, we implement a highly accurate vehicle dynamics model within the simulator to enhance the realism of the vehicle's physical effects. The simulator is able to serve various functions, including generating synthetic data and driving with machine learning-based algorithms. Moreover, we prioritize simplicity in the deployment process, ensuring that beginners find it approachable and user-friendly.
△ Less
Submitted 30 January, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
Networked Multiagent Reinforcement Learning for Peer-to-Peer Energy Trading
Authors:
Chen Feng,
Andrew L. Liu
Abstract:
Utilizing distributed renewable and energy storage resources in local distribution networks via peer-to-peer (P2P) energy trading has long been touted as a solution to improve energy systems' resilience and sustainability. Consumers and prosumers (those who have energy generation resources), however, do not have the expertise to engage in repeated P2P trading, and the zero-marginal costs of renewa…
▽ More
Utilizing distributed renewable and energy storage resources in local distribution networks via peer-to-peer (P2P) energy trading has long been touted as a solution to improve energy systems' resilience and sustainability. Consumers and prosumers (those who have energy generation resources), however, do not have the expertise to engage in repeated P2P trading, and the zero-marginal costs of renewables present challenges in determining fair market prices. To address these issues, we propose multi-agent reinforcement learning (MARL) frameworks to help automate consumers' bidding and management of their solar PV and energy storage resources, under a specific P2P clearing mechanism that utilizes the so-called supply-demand ratio. In addition, we show how the MARL frameworks can integrate physical network constraints to realize voltage control, hence ensuring physical feasibility of the P2P energy trading and paving way for real-world implementations.
△ Less
Submitted 27 January, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Maximally Concentrated Sequences after Half-sample Shifts
Authors:
Karim A. Said,
Lingjia Liu,
A. A.,
Beex
Abstract:
It is well known that index (discrete-time)-limited sampled sequences leak outside the support set when a band-limiting operation is applied. Similarly, a fractional shift causes an index-limited sequence to be infinite in extent due to the inherent band-limiting. Index-limited versions of discrete prolate spheroidal sequences (DPSS) are known to experience minimum leakage after band-limiting. In…
▽ More
It is well known that index (discrete-time)-limited sampled sequences leak outside the support set when a band-limiting operation is applied. Similarly, a fractional shift causes an index-limited sequence to be infinite in extent due to the inherent band-limiting. Index-limited versions of discrete prolate spheroidal sequences (DPSS) are known to experience minimum leakage after band-limiting. In this work, we consider the effect of a half-sample shift and provide upper bounds on the resulting leakage energy for arbitrary sequences. Furthermore, we find an orthonormal basis derived from DPSS with members ordered according to energy concentration after half sample shifts; the primary (first) member being the global optimum.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Multi-target Detection for Reconfigurable Holographic Surfaces Enabled Radar
Authors:
Xiaoyu Zhang,
Haobo Zhang,
Ruoqi Deng,
Liang Liu,
Boya Di
Abstract:
Multi-target detection is one of the primary tasks in radar-based localization and sensing, typically built on phased array antennas. However, the bulky hardware in the phased array restricts its potential for enhancing detection accuracy, since the cost and power of the phased array can become unaffordable as its physical aperture scales up to pursue higher beam sha** capabilities. To resolve t…
▽ More
Multi-target detection is one of the primary tasks in radar-based localization and sensing, typically built on phased array antennas. However, the bulky hardware in the phased array restricts its potential for enhancing detection accuracy, since the cost and power of the phased array can become unaffordable as its physical aperture scales up to pursue higher beam sha** capabilities. To resolve this issue, we propose a radar system enabled by reconfigurable holographic surfaces (RHSs), a novel meta-surface antenna composed of meta-material elements with cost-effective and power-efficient hardware, which performs multi-target detection in an adaptive manner. Different from the phase-control structure in the phased array, the RHS is able to apply beamforming by controlling the radiation amplitudes of its elements. Consequently, traditional beamforming schemes designed for phased arrays cannot be directly applied to RHSs due to this structural difference. To tackle this challenge, a waveform and amplitude optimization algorithm (WAOA) is designed to jointly optimize the radar waveform and RHS amplitudes in order to improve the detection accuracy. Simulation results reveal that the proposed RHS-enabled radar increases the probability of detection by 0.13 compared to phased array radars when six iterations of adaptive detection are performed given the same hardware cost.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Channel Estimation for Holographic Communications in Hybrid Near-Far Field
Authors:
Shaohua Yue,
Shuhao Zeng,
Liang Liu,
Boya Di
Abstract:
To realize holographic communications, a potential technology for spectrum efficiency improvement in the future sixth-generation (6G) network, antenna arrays inlaid with numerous antenna elements will be deployed. However, the increase in antenna aperture size makes some users lie in the Fresnel region, leading to the hybrid near-field and far-field communication mode, where the conventional far-f…
▽ More
To realize holographic communications, a potential technology for spectrum efficiency improvement in the future sixth-generation (6G) network, antenna arrays inlaid with numerous antenna elements will be deployed. However, the increase in antenna aperture size makes some users lie in the Fresnel region, leading to the hybrid near-field and far-field communication mode, where the conventional far-field channel estimation methods no longer work well. To tackle the above challenge, this paper considers channel estimation in a hybrid-field multipath environment, where each user and each scatterer can be in either the far-field or the near-field region. First, a joint angular-polar domain channel transform is designed to capture the hybrid-field channel's near-field and far-field features. We then analyze the power diffusion effect in the hybrid-field channel, which indicates that the power corresponding to one near-field (far-field) path component of the multipath channel may spread to far-field (near-field) paths and causes estimation error. We design a novel power-diffusion-based orthogonal matching pursuit channel estimation algorithm (PD-OMP). It can eliminate the prior knowledge requirement of path numbers in the far field and near field, which is a must in other OMP-based channel estimation algorithms. Simulation results show that PD-OMP outperforms current hybrid-field channel estimation methods.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
HA-HI: Synergising fMRI and DTI through Hierarchical Alignments and Hierarchical Interactions for Mild Cognitive Impairment Diagnosis
Authors:
Xiongri Shen,
Zhenxi Song,
Linling Li,
Min Zhang,
Lingyan Liang Honghai Liu,
Demao Deng,
Zhiguo Zhang
Abstract:
Early diagnosis of mild cognitive impairment (MCI) and subjective cognitive decline (SCD) utilizing multi-modal magnetic resonance imaging (MRI) is a pivotal area of research. While various regional and connectivity features from functional MRI (fMRI) and diffusion tensor imaging (DTI) have been employed to develop diagnosis models, most studies integrate these features without adequately addressi…
▽ More
Early diagnosis of mild cognitive impairment (MCI) and subjective cognitive decline (SCD) utilizing multi-modal magnetic resonance imaging (MRI) is a pivotal area of research. While various regional and connectivity features from functional MRI (fMRI) and diffusion tensor imaging (DTI) have been employed to develop diagnosis models, most studies integrate these features without adequately addressing their alignment and interactions. This limits the potential to fully exploit the synergistic contributions of combined features and modalities. To solve this gap, our study introduces a novel Hierarchical Alignments and Hierarchical Interactions (HA-HI) method for MCI and SCD classification, leveraging the combined strengths of fMRI and DTI. HA-HI efficiently learns significant MCI- or SCD- related regional and connectivity features by aligning various feature types and hierarchically maximizing their interactions. Furthermore, to enhance the interpretability of our approach, we have developed the Synergistic Activation Map (SAM) technique, revealing the critical brain regions and connections that are indicative of MCI/SCD. Comprehensive evaluations on the ADNI dataset and our self-collected data demonstrate that HA-HI outperforms other existing methods in diagnosing MCI and SCD, making it a potentially vital and interpretable tool for early detection. The implementation of this method is publicly accessible at https://github.com/ICI-BCI/Dual-MRI-HA-HI.git.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Channel Map** Based on Interleaved Learning with Complex-Domain MLP-Mixer
Authors:
Zirui Chen,
Zhaoyang Zhang,
Zhaohui Yang,
Lei Liu
Abstract:
In multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems, representing the whole channel only based on partial subchannels will significantly reduce the channel acquisition overhead. For such a channel map** task, inspired by the intrinsic coupling across the space and frequency domains, this letter proposes to use interleaved learning with partial anten…
▽ More
In multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems, representing the whole channel only based on partial subchannels will significantly reduce the channel acquisition overhead. For such a channel map** task, inspired by the intrinsic coupling across the space and frequency domains, this letter proposes to use interleaved learning with partial antenna and subcarrier characteristics to represent the whole MIMO-OFDM channel. Specifically, we design a complex-domain multilayer perceptron (MLP)-Mixer (CMixer), which utilizes two kinds of complex-domain MLP modules to learn the space and frequency characteristics respectively and then interleaves them to couple the learned properties. The complex-domain computation facilitates the learning on the complex-valued channel data, while the interleaving tightens the coupling of space and frequency domains. These two designs jointly reduce the learning burden, making the physics-inspired CMixer more effective on channel representation learning than existing data-driven approaches. Simulation shows that the proposed scheme brings 4.6~10dB gains in map** accuracy compared to existing schemes under different settings. Besides, ablation studies show the necessity of complex-domain computation as well as the extent to which the interleaved learning matches the channel properties.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Multiple Access Techniques for Intelligent and Multi-Functional 6G: Tutorial, Survey, and Outlook
Authors:
Bruno Clerckx,
Yijie Mao,
Zhaohui Yang,
Mingzhe Chen,
Ahmed Alkhateeb,
Liang Liu,
Min Qiu,
**hong Yuan,
Vincent W. S. Wong,
Juan Montojo
Abstract:
Multiple access (MA) is a crucial part of any wireless system and refers to techniques that make use of the resource dimensions to serve multiple users/devices/machines/services, ideally in the most efficient way. Given the needs of multi-functional wireless networks for integrated communications, sensing, localization, computing, coupled with the surge of machine learning / artificial intelligenc…
▽ More
Multiple access (MA) is a crucial part of any wireless system and refers to techniques that make use of the resource dimensions to serve multiple users/devices/machines/services, ideally in the most efficient way. Given the needs of multi-functional wireless networks for integrated communications, sensing, localization, computing, coupled with the surge of machine learning / artificial intelligence (AI) in wireless networks, MA techniques are expected to experience a paradigm shift in 6G and beyond. In this paper, we provide a tutorial, survey and outlook of past, emerging and future MA techniques and pay a particular attention to how wireless network intelligence and multi-functionality will lead to a re-thinking of those techniques. The paper starts with an overview of orthogonal, physical layer multicasting, space domain, power domain, ratesplitting, code domain MAs, and other domains, and highlight the importance of researching universal multiple access to shrink instead of grow the knowledge tree of MA schemes by providing a unified understanding of MA schemes across all resource dimensions. It then jumps into rethinking MA schemes in the era of wireless network intelligence, covering AI for MA such as AI-empowered resource allocation, optimization, channel estimation, receiver designs, user behavior predictions, and MA for AI such as federated learning/edge intelligence and over the air computation. We then discuss MA for network multi-functionality and the interplay between MA and integrated sensing, localization, and communications. We finish with studying MA for emerging intelligent applications before presenting a roadmap toward 6G standardization. We also point out numerous directions that are promising for future research.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
User-Assisted Networked Sensing in OFDM Cellular Network with Erroneous Anchor Position Information
Authors:
Xianzhen Guo,
Qin Shi,
Liang Liu,
Shuowen Zhang
Abstract:
In the sixth-generation (6G) integrated sensing and communication (ISAC) cellular network, base stations (BSs) can collaborate with each other to reap not only the cooperative communication gain, but also the networked sensing gain. In contrast to cooperative communication where both line-of-sight (LOS) paths and non-line-of-sight (NLOS) paths are useful, networked sensing mainly relies on the LOS…
▽ More
In the sixth-generation (6G) integrated sensing and communication (ISAC) cellular network, base stations (BSs) can collaborate with each other to reap not only the cooperative communication gain, but also the networked sensing gain. In contrast to cooperative communication where both line-of-sight (LOS) paths and non-line-of-sight (NLOS) paths are useful, networked sensing mainly relies on the LOS paths. However, in practice, the number of BSs possessing LOS paths to a target can be small, leading to marginal networked sensing gain. Because the density of user equipments (UEs) is much larger than that of the BSs, this paper considers a UE-assisted networked sensing architecture, where a BS transmits communication signals in the downlink, while the UEs that receive the echo signals scattered by a target can cooperate with the BS to localize it. Under this scheme, however, the positions of the UEs are estimated by Global Positioning System (GPS) and subject to unknown errors. If some UEs with significantly erroneous position information are used as anchors, the localization performance can be severely degraded. Based on the outlier detection technique, this paper proposes an efficient method to select a subset of UEs with accurate position information as anchors for localizing the target. Numerical results show that our scheme can select good UEs as anchors with very high probability, indicating that networked sensing can be realized in practice with the aid of UEs.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Fast Decision Boundary based Out-of-Distribution Detector
Authors:
Litian Liu,
Yao Qin
Abstract:
Efficient and effective Out-of-Distribution (OOD) detection is essential for the safe deployment of AI systems. Existing feature space methods, while effective, often incur significant computational overhead due to their reliance on auxiliary models built from training features. In this paper, we propose a computationally-efficient OOD detector without using auxiliary models while still leveraging…
▽ More
Efficient and effective Out-of-Distribution (OOD) detection is essential for the safe deployment of AI systems. Existing feature space methods, while effective, often incur significant computational overhead due to their reliance on auxiliary models built from training features. In this paper, we propose a computationally-efficient OOD detector without using auxiliary models while still leveraging the rich information embedded in the feature space. Specifically, we detect OOD samples based on their feature distances to decision boundaries. To minimize computational cost, we introduce an efficient closed-form estimation, analytically proven to tightly lower bound the distance. Based on our estimation, we discover that In-Distribution (ID) features tend to be further from decision boundaries than OOD features. Additionally, ID and OOD samples are better separated when compared at equal deviation levels from the mean of training features. By regularizing the distances to decision boundaries based on feature deviation from the mean, we develop a hyperparameter-free, auxiliary model-free OOD detector. Our method matches or surpasses the effectiveness of state-of-the-art methods in extensive experiments while incurring negligible overhead in inference latency. Overall, our approach significantly improves the efficiency-effectiveness trade-off in OOD detection. Code is available at: https://github.com/litianliu/fDBD-OOD.
△ Less
Submitted 4 June, 2024; v1 submitted 15 December, 2023;
originally announced December 2023.
-
Predicting Gradient is Better: Exploring Self-Supervised Learning for SAR ATR with a Joint-Embedding Predictive Architecture
Authors:
Weijie Li,
Yang Wei,
Tianpeng Liu,
Yuenan Hou,
Yuxuan Li,
Zhen Liu,
Yongxiang Liu,
Li Liu
Abstract:
The growing Synthetic Aperture Radar (SAR) data has the potential to build a foundation model through Self-Supervised Learning (SSL) methods, which can achieve various SAR Automatic Target Recognition (ATR) tasks with pre-training in large-scale unlabeled data and fine-tuning in small labeled samples. SSL aims to construct supervision signals directly from the data, which minimizes the need for ex…
▽ More
The growing Synthetic Aperture Radar (SAR) data has the potential to build a foundation model through Self-Supervised Learning (SSL) methods, which can achieve various SAR Automatic Target Recognition (ATR) tasks with pre-training in large-scale unlabeled data and fine-tuning in small labeled samples. SSL aims to construct supervision signals directly from the data, which minimizes the need for expensive expert annotation and maximizes the use of the expanding data pool for a foundational model. This study investigates an effective SSL method for SAR ATR, which can pave the way for a foundation model in SAR ATR. The primary obstacles faced in SSL for SAR ATR are the small targets in remote sensing and speckle noise in SAR images, corresponding to the SSL approach and signals. To overcome these challenges, we present a novel Joint-Embedding Predictive Architecture for SAR ATR (SAR-JEPA), which leverages local masked patches to predict the multi-scale SAR gradient representations of unseen context. The key aspect of SAR-JEPA is integrating SAR domain features to ensure high-quality self-supervised signals as target features. Besides, we employ local masks and multi-scale features to accommodate the various small targets in remote sensing. By fine-tuning and evaluating our framework on three target recognition datasets (vehicle, ship, and aircraft) with four other datasets as pre-training, we demonstrate its outperformance over other SSL methods and its effectiveness with increasing SAR data. This study showcases the potential of SSL for SAR target recognition across diverse targets, scenes, and sensors.
△ Less
Submitted 28 March, 2024; v1 submitted 25 November, 2023;
originally announced November 2023.
-
Automated Measurement of Pericoronary Adipose Tissue Attenuation and Volume in CT Angiography
Authors:
Andrew M. Nguyen,
Tejas Sudharshan Mathai,
Liangchen Liu,
Jianfei Liu,
Ronald M. Summers
Abstract:
Pericoronary adipose tissue (PCAT) is the deposition of fat in the vicinity of the coronary arteries. It is an indicator of coronary inflammation and associated with coronary artery disease. Non-invasive coronary CT angiography (CCTA) is presently used to obtain measures of the thickness, volume, and attenuation of fat deposition. However, prior works solely focus on measuring PCAT using semi-auto…
▽ More
Pericoronary adipose tissue (PCAT) is the deposition of fat in the vicinity of the coronary arteries. It is an indicator of coronary inflammation and associated with coronary artery disease. Non-invasive coronary CT angiography (CCTA) is presently used to obtain measures of the thickness, volume, and attenuation of fat deposition. However, prior works solely focus on measuring PCAT using semi-automated approaches at the right coronary artery (RCA) over the left coronary artery (LCA). In this pilot work, we developed a fully automated approach for the measurement of PCAT mean attenuation and volume in the region around both coronary arteries. First, we used a large subset of patients from the public ImageCAS dataset (n = 735) to train a 3D full resolution nnUNet to segment LCA and RCA. Then, we automatically measured PCAT in the surrounding arterial regions. We evaluated our method on a held-out test set of patients (n = 183) from the same dataset. A mean Dice score of 83% and PCAT attenuation of -73.81 $\pm$ 12.69 HU was calculated for the RCA, while a mean Dice score of 81% and PCAT attenuation of -77.51 $\pm$ 7.94 HU was computed for the LCA. To the best of our knowledge, we are the first to develop a fully automated method to measure PCAT attenuation and volume at both the RCA and LCA. Our work underscores how automated PCAT measurement holds promise as a biomarker for identification of inflammation and cardiac disease.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
2D-RC: Two-Dimensional Neural Network Approach for OTFS Symbol Detection
Authors:
Jiarui Xu,
Karim Said,
Lizhong Zheng,
Lingjia Liu
Abstract:
Orthogonal time frequency space (OTFS) is a promising modulation scheme for wireless communication in high-mobility scenarios. Recently, a reservoir computing (RC) based approach has been introduced for online subframe-based symbol detection in the OTFS system, where only a limited number of over-the-air (OTA) pilot symbols are utilized for training. However, this approach does not leverage the do…
▽ More
Orthogonal time frequency space (OTFS) is a promising modulation scheme for wireless communication in high-mobility scenarios. Recently, a reservoir computing (RC) based approach has been introduced for online subframe-based symbol detection in the OTFS system, where only a limited number of over-the-air (OTA) pilot symbols are utilized for training. However, this approach does not leverage the domain knowledge specific to the OTFS system to fully unlock the potential of RC. This paper introduces a novel two-dimensional RC (2D-RC) method that incorporates the domain knowledge of the OTFS system into the design for symbol detection in an online subframe-based manner. Specifically, as the channel interaction in the delay-Doppler (DD) domain is a two-dimensional (2D) circular operation, the 2D-RC is designed to have the 2D circular padding procedure and the 2D filtering structure to embed this knowledge. With the introduced architecture, 2D-RC can operate in the DD domain with only a single neural network, instead of necessitating multiple RCs to track channel variations in the time domain as in previous work. Numerical experiments demonstrate the advantages of the 2D-RC approach over the previous RC-based approach and compared model-based methods across different OTFS system variants and modulation orders.
△ Less
Submitted 24 January, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Memory AMP for Generalized MIMO: Coding Principle and Information-Theoretic Optimality
Authors:
Yufei Chen,
Lei Liu,
Yuhao Chi,
Ying Li,
Zhaoyang Zhang
Abstract:
To support complex communication scenarios in next-generation wireless communications, this paper focuses on a generalized MIMO (GMIMO) with practical assumptions, such as massive antennas, practical channel coding, arbitrary input distributions, and general right-unitarily-invariant channel matrices (covering Rayleigh fading, certain ill-conditioned and correlated channel matrices). The orthogona…
▽ More
To support complex communication scenarios in next-generation wireless communications, this paper focuses on a generalized MIMO (GMIMO) with practical assumptions, such as massive antennas, practical channel coding, arbitrary input distributions, and general right-unitarily-invariant channel matrices (covering Rayleigh fading, certain ill-conditioned and correlated channel matrices). The orthogonal/vector approximate message passing (OAMP/VAMP) receiver has been proved to be information-theoretically optimal in GMIMO, but it is limited to high-complexity LMMSE. To solve this problem, a low-complexity memory approximate message passing (MAMP) receiver has recently been shown to be Bayes optimal but limited to uncoded systems. Therefore, how to design a low-complexity and information-theoretically optimal receiver for GMIMO is still an open issue. To address this issue, this paper proposes an information-theoretically optimal MAMP receiver and investigates its achievable rate analysis and optimal coding principle. Specifically, due to the long-memory linear detection, state evolution (SE) for MAMP is intricately multidimensional and cannot be used directly to analyze its achievable rate. To avoid this difficulty, a simplified single-input single-output variational SE (VSE) for MAMP is developed by leveraging the SE fixed-point consistent property of MAMP and OAMP/VAMP. The achievable rate of MAMP is calculated using the VSE, and the optimal coding principle is established to maximize the achievable rate. On this basis, the information-theoretic optimality of MAMP is proved rigorously. Numerical results show that the finite-length performances of MAMP with practical optimized LDPC codes are 0.5-2.7 dB away from the associated constrained capacities. It is worth noting that MAMP can achieve the same performances as OAMP/VAMP with 0.4% of the time consumption for large-scale systems.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Toward ground-truth optical coherence tomography via three-dimensional unsupervised deep learning processing and data
Authors:
Renxiong Wu,
Fei Zheng,
Meixuan Li,
Shaoyan Huang,
Xin Ge,
Linbo Liu,
Yong Liu,
Guangming Ni
Abstract:
Optical coherence tomography (OCT) can perform non-invasive high-resolution three-dimensional (3D) imaging and has been widely used in biomedical fields, while it is inevitably affected by coherence speckle noise which degrades OCT imaging performance and restricts its applications. Here we present a novel speckle-free OCT imaging strategy, named toward-ground-truth OCT (tGT-OCT), that utilizes un…
▽ More
Optical coherence tomography (OCT) can perform non-invasive high-resolution three-dimensional (3D) imaging and has been widely used in biomedical fields, while it is inevitably affected by coherence speckle noise which degrades OCT imaging performance and restricts its applications. Here we present a novel speckle-free OCT imaging strategy, named toward-ground-truth OCT (tGT-OCT), that utilizes unsupervised 3D deep-learning processing and leverages OCT 3D imaging features to achieve speckle-free OCT imaging. Specifically, our proposed tGT-OCT utilizes an unsupervised 3D-convolution deep-learning network trained using random 3D volumetric data to distinguish and separate speckle from real structures in 3D imaging volumetric space; moreover, tGT-OCT effectively further reduces speckle noise and reveals structures that would otherwise be obscured by speckle noise while preserving spatial resolution. Results derived from different samples demonstrated the high-quality speckle-free 3D imaging performance of tGT-OCT and its advancement beyond the previous state-of-the-art.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Age of Information Analysis for CR-NOMA Aided Uplink Systems with Randomly Arrived Packets
Authors:
Yanshi Sun,
Yanglin Ye,
Zhiguo Ding,
Momiao Zhou,
Lei Liu
Abstract:
This paper studies the application of cognitive radio inspired non-orthogonal multiple access (CR-NOMA) to reduce age of information (AoI) for uplink transmission. In particular, a time division multiple access (TDMA) based legacy network is considered, where each user is allocated with a dedicated time slot to transmit its status update information. The CR-NOMA is implemented as an add-on to the…
▽ More
This paper studies the application of cognitive radio inspired non-orthogonal multiple access (CR-NOMA) to reduce age of information (AoI) for uplink transmission. In particular, a time division multiple access (TDMA) based legacy network is considered, where each user is allocated with a dedicated time slot to transmit its status update information. The CR-NOMA is implemented as an add-on to the TDMA legacy network, which enables each user to have more opportunities to transmit by sharing other user's time slots. A rigorous analytical framework is developed to obtain the expressions for AoIs achieved by CR-NOMA with and without re-transmission, by taking the randomness of the status update generating process into consideration. Numerical results are presented to verify the accuracy of the developed analysis. It is shown that the AoI can be significantly reduced by applying CR-NOMA compared to TDMA. Moreover, the use of re-transmission is helpful to reduce AoI, especially when the status arrival rate is low.
△ Less
Submitted 5 November, 2023;
originally announced November 2023.
-
Detecting Out-of-Distribution Through the Lens of Neural Collapse
Authors:
Litian Liu,
Yao Qin
Abstract:
Efficient and versatile Out-of-Distribution (OOD) detection is essential for the safe deployment of AI yet remains challenging for existing algorithms. Inspired by Neural Collapse, we discover that features of in-distribution (ID) samples cluster closer to the weight vectors compared to features of OOD samples. In addition, we reveal that ID features tend to expand in space to structure a simplex…
▽ More
Efficient and versatile Out-of-Distribution (OOD) detection is essential for the safe deployment of AI yet remains challenging for existing algorithms. Inspired by Neural Collapse, we discover that features of in-distribution (ID) samples cluster closer to the weight vectors compared to features of OOD samples. In addition, we reveal that ID features tend to expand in space to structure a simplex Equiangular Tight Framework, which nicely explains the prevalent observation that ID features reside further from the origin than OOD features. Taking both insights from Neural Collapse into consideration, we propose to leverage feature proximity to weight vectors for OOD detection and further complement this perspective by using feature norms to filter OOD samples. Extensive experiments on off-the-shelf models demonstrate the efficiency and effectiveness of our method across diverse classification tasks and model architectures, enhancing the generalization capability of OOD detection.
△ Less
Submitted 30 May, 2024; v1 submitted 2 November, 2023;
originally announced November 2023.
-
Deep Learning Enables Large Depth-of-Field Images for Sub-Diffraction-Limit Scanning Superlens Microscopy
Authors:
Hui Sun,
Hao Luo,
Feifei Wang,
Qingjiu Chen,
Meng Chen,
Xiaoduo Wang,
Haibo Yu,
Guanglie Zhang,
Lianqing Liu,
Jian** Wang,
Dapeng Wu,
Wen Jung Li
Abstract:
Scanning electron microscopy (SEM) is indispensable in diverse applications ranging from microelectronics to food processing because it provides large depth-of-field images with a resolution beyond the optical diffraction limit. However, the technology requires coating conductive films on insulator samples and a vacuum environment. We use deep learning to obtain the map** relationship between op…
▽ More
Scanning electron microscopy (SEM) is indispensable in diverse applications ranging from microelectronics to food processing because it provides large depth-of-field images with a resolution beyond the optical diffraction limit. However, the technology requires coating conductive films on insulator samples and a vacuum environment. We use deep learning to obtain the map** relationship between optical super-resolution (OSR) images and SEM domain images, which enables the transformation of OSR images into SEM-like large depth-of-field images. Our custom-built scanning superlens microscopy (SSUM) system, which requires neither coating samples by conductive films nor a vacuum environment, is used to acquire the OSR images with features down to ~80 nm. The peak signal-to-noise ratio (PSNR) and structural similarity index measure values indicate that the deep learning method performs excellently in image-to-image translation, with a PSNR improvement of about 0.74 dB over the optical super-resolution images. The proposed method provides a high level of detail in the reconstructed results, indicating that it has broad applicability to chip-level defect detection, biological sample analysis, forensics, and various other fields.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Detecting Abrupt Change of Channel Covariance Matrix in IRS-Assisted Communication
Authors:
Runnan Liu,
Liang Liu,
Yin Xu,
Dazhi He,
Wenjun Zhang,
Chang Wen Chen
Abstract:
The knowledge of channel covariance matrices is crucial to the design of intelligent reflecting surface (IRS) assisted communication. However, channel covariance matrices may change suddenly in practice. This letter focuses on the detection of the above change in IRS-assisted communication. Specifically, we consider the uplink communication system consisting of a single-antenna user (UE), an IRS,…
▽ More
The knowledge of channel covariance matrices is crucial to the design of intelligent reflecting surface (IRS) assisted communication. However, channel covariance matrices may change suddenly in practice. This letter focuses on the detection of the above change in IRS-assisted communication. Specifically, we consider the uplink communication system consisting of a single-antenna user (UE), an IRS, and a multi-antenna base station (BS). We first categorize two types of channel covariance matrix changes based on their impact on system design: Type I change, which denotes the change in the BS receive covariance matrix, and Type II change, which denotes the change in the IRS transmit/receive covariance matrix. Secondly, a powerful method is proposed to detect whether a Type I change occurs, a Type II change occurs, or no change occurs. The effectiveness of our proposed scheme is verified by numerical results.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Acoustic Model Fusion for End-to-end Speech Recognition
Authors:
Zhihong Lei,
Mingbin Xu,
Shiyi Han,
Leo Liu,
Zhen Huang,
Tim Ng,
Yuanyuan Zhang,
Ernest Pusateri,
Mirko Hannemann,
Yaqiao Deng,
Man-Hung Siu
Abstract:
Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, tr…
▽ More
Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, trained exclusively on text corpora, into the E2E system has proven to be beneficial. However, the application of LM fusion presents certain drawbacks, such as its inability to address the domain mismatch issue inherent to the internal AM. Drawing inspiration from the concept of LM fusion, we propose the integration of an external AM into the E2E system to better address the domain mismatch. By implementing this novel approach, we have achieved a significant reduction in the word error rate, with an impressive drop of up to 14.3% across varied test sets. We also discovered that this AM fusion approach is particularly beneficial in enhancing named entity recognition.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.