-
Multi-Satellite MIMO Systems for Direct User-Satellite Communications: A Survey
Authors:
Zohre Mashayekh Bakhsh,
Yasaman Omid,
Gaojie Chen,
Farbod Kayhan,
Yi Ma,
Rahim Tafazolli
Abstract:
Advancements in satellite technology have made direct-to-device connectivity a viable solution for ensuring global access. This method is designed to provide internet connectivity to remote, rural, or underserved areas where traditional cellular or broadband networks are lacking or insufficient. This paper is a survey providing an in-depth review of multi-satellite Multiple Input Multiple Output (…
▽ More
Advancements in satellite technology have made direct-to-device connectivity a viable solution for ensuring global access. This method is designed to provide internet connectivity to remote, rural, or underserved areas where traditional cellular or broadband networks are lacking or insufficient. This paper is a survey providing an in-depth review of multi-satellite Multiple Input Multiple Output (MIMO) systems as a potential solution for addressing the link budget challenge in direct user-satellite communication. Special attention is given to works considering multi-satellite MIMO systems, both with and without satellite collaboration. In this context, collaboration refers to sharing data between satellites to improve the performance of the system. This survey begins by explaining several fundamental aspects of satellite communications (SatComs), which are vital prerequisites before investigating the multi-satellite MIMO systems. These aspects encompass satellite orbits, the structure of satellite systems, SatCom links, including the inter-satellite links (ISL) which facilitate satellite cooperation, satellite frequency bands, satellite antenna design, and satellite channel models, which should be known or estimated for effective data transmission to and from multiple satellites. Furthermore, this survey distinguishes itself by providing more comprehensive insights in comparison to other surveys. It specifically delves into the Orthogonal Time Frequency Space (OTFS) within the channel model section. It goes into detail about ISL noise and channel models, and it extends the ISL section by thoroughly investigating hybrid FSO/RF ISLs. Furthermore, analytical comparisons of simulation results from these works are presented to highlight the advantages of employing multi-satellite MIMO systems.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
Authors:
Yifan Yang,
Zheshu Song,
Jianheng Zhuo,
Mingyu Cui,
**peng Li,
Bo Yang,
Yexing Du,
Ziyang Ma,
Xunying Liu,
Ziyuan Wang,
Ke Li,
Shuai Fan,
Kai Yu,
Wei-Qiang Zhang,
Guoguo Chen,
Xie Chen
Abstract:
The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired spee…
▽ More
The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired speech and text data. GigaSpeech 2 comprises about 30,000 hours of automatically transcribed speech, including Thai, Indonesian, and Vietnamese, gathered from unlabeled YouTube videos. We also introduce an automated pipeline for data crawling, transcription, and label refinement. Specifically, this pipeline uses Whisper for initial transcription and TorchAudio for forced alignment, combined with multi-dimensional filtering for data quality assurance. A modified Noisy Student Training is developed to further refine flawed pseudo labels iteratively, thus enhancing model performance. Experimental results on our manually transcribed evaluation set and two public test sets from Common Voice and FLEURS confirm our corpus's high quality and broad applicability. Notably, ASR models trained on GigaSpeech 2 can reduce the word error rate for Thai, Indonesian, and Vietnamese on our challenging and realistic YouTube test set by 25% to 40% compared to the Whisper large-v3 model, with merely 10% model parameters. Furthermore, our ASR models trained on Gigaspeech 2 yield superior performance compared to commercial services. We believe that our newly introduced corpus and pipeline will open a new avenue for low-resource speech recognition and significantly facilitate research in this area.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network
Authors:
Yanxiong Li,
Jiaxin Tan,
Guoqing Chen,
Jialong Li,
Yongjie Si,
Qianhua He
Abstract:
This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method of low-complexity acoustic scene classification by a parallel attention-convolution network which consists of four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed network is computationally efficient to capture global and local contextu…
▽ More
This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method of low-complexity acoustic scene classification by a parallel attention-convolution network which consists of four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed network is computationally efficient to capture global and local contextual information from each audio clip. In addition, we integrate other techniques into our method, such as knowledge distillation, data augmentation, and adaptive residual normalization. When evaluated on the official dataset of DCASE2023 challenge, our method obtains the highest accuracy of 56.10% with parameter number of 5.21 kilo and multiply-accumulate operations of 1.44 million. It exceeds the top two systems of DCASE2023 challenge in accuracy and complexity, and obtains state-of-the-art result. Code is at: https://github.com/Jessytan/Low-complexity-ASC.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution
Authors:
Kai Liu,
Haotong Qin,
Yong Guo,
Xin Yuan,
Linghe Kong,
Guihai Chen,
Yulun Zhang
Abstract:
Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment, which allows advanced SR models to enjoy compact low-bit parameters and efficient integer/bitwise constructions for storage compression and inference acceleration, respectively. However, it is notorious that low-bit quantization degrades the accuracy of SR models compared to their ful…
▽ More
Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment, which allows advanced SR models to enjoy compact low-bit parameters and efficient integer/bitwise constructions for storage compression and inference acceleration, respectively. However, it is notorious that low-bit quantization degrades the accuracy of SR models compared to their full-precision (FP) counterparts. Despite several efforts to alleviate the degradation, the transformer-based SR model still suffers severe degradation due to its distinctive activation distribution. In this work, we present a dual-stage low-bit post-training quantization (PTQ) method for image super-resolution, namely 2DQuant, which achieves efficient and accurate SR under low-bit quantization. The proposed method first investigates the weight and activation and finds that the distribution is characterized by coexisting symmetry and asymmetry, long tails. Specifically, we propose Distribution-Oriented Bound Initialization (DOBI), using different searching strategies to search a coarse bound for quantizers. To obtain refined quantizer parameters, we further propose Distillation Quantization Calibration (DQC), which employs a distillation approach to make the quantized model learn from its FP counterpart. Through extensive experiments on different bits and scaling factors, the performance of DOBI can reach the state-of-the-art (SOTA) while after stage two, our method surpasses existing PTQ in both metrics and visual effects. 2DQuant gains an increase in PSNR as high as 4.52dB on Set5 (x2) compared with SOTA when quantized to 2-bit and enjoys a 3.60x compression ratio and 5.08x speedup ratio. The code and models will be available at https://github.com/Kai-Liu001/2DQuant.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Large Language Model Assisted Optimal Bidding of BESS in FCAS Market: An AI-agent based Approach
Authors:
Borui Zhang,
Chaojie Li,
Guo Chen,
Zhaoyang Dong
Abstract:
To incentivize flexible resources such as Battery Energy Storage Systems (BESSs) to offer Frequency Control Ancillary Services (FCAS), Australia's National Electricity Market (NEM) has implemented changes in recent years towards shorter-term bidding rules and faster service requirements. However, firstly, existing bidding optimization methods often overlook or oversimplify the key aspects of FCAS…
▽ More
To incentivize flexible resources such as Battery Energy Storage Systems (BESSs) to offer Frequency Control Ancillary Services (FCAS), Australia's National Electricity Market (NEM) has implemented changes in recent years towards shorter-term bidding rules and faster service requirements. However, firstly, existing bidding optimization methods often overlook or oversimplify the key aspects of FCAS market procedures, resulting in an inaccurate depiction of the market bidding process. Thus, the BESS bidding problem is modeled based on the actual bidding records and the latest market specifications and then formulated as a deep reinforcement learning (DRL) problem. Secondly, the erratic decisions of the DRL agent caused by imperfectly predicted market information increases the risk of profit loss. Hence, a Conditional Value at Risk (CVaR)-based DRL algorithm is developed to enhance the risk resilience of bidding strategies. Thirdly, well-trained DRL models still face performance decline in uncommon scenarios during online operations. Therefore, a Large Language Models (LLMs)-assisted artificial intelligence (AI)-agent interactive decision-making framework is proposed to improve the strategy timeliness, reliability and interpretability in uncertain new scenarios, where conditional hybrid decision and self-reflection mechanisms are designed to address LLMs' hallucination challenge. The experiment results demonstrate that our proposed framework has higher bidding profitability compared to the baseline methods by effectively mitigating the profit loss caused by various uncertainties.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Joint Detection and Classification of Communication and Radar Signals in Congested RF Environments Using YOLOv8
Authors:
Xiwen Kang,
Hua-mei Chen,
Genshe Chen,
Kuo-Chu Chang,
Thomas M. Clemons
Abstract:
In this paper, we present a comprehensive study on the application of YOLOv8, a state-of-the-art computer vision (CV) model, to the challenging problem of joint detection and classification of signals in a highly dynamic and congested RF environment. Using our synthetic RF datasets, we explored three different scenarios with congested communication and radar signals. In the first study, we applied…
▽ More
In this paper, we present a comprehensive study on the application of YOLOv8, a state-of-the-art computer vision (CV) model, to the challenging problem of joint detection and classification of signals in a highly dynamic and congested RF environment. Using our synthetic RF datasets, we explored three different scenarios with congested communication and radar signals. In the first study, we applied YOLOv8 to detect and classify multiple digital modulation signals coexisting within a highly congested and dynamic spectral environment with significant overlap in both frequency and time domains. The trained model was able to achieve an impressive mean average precision (mAP) of 0.888 at an IoU threshold of 50%, signifying its robustness against spectral congestion. The second part of our research focuses on the detection and classification of multiple polyphase pulse radar signals, including Frank code and P1 through P4 codes. We were able to successfully train YOLOv8 to deliver a nearly perfect mAP50 score of 0.995 in a densely populated signal environment, further showcasing its capability in radar signal processing. In the last scenario, we demonstrated that the model can also be applied to the multi-target detection problem for continuous-wave radar. The synthetic datasets used in these experiments reflect a realistic mix of communication and radar signals with varying degrees of interference and congestion - a setup that has been overlooked by many past research efforts, which have primarily focused on ML-based classification of digital communication signal modulation schemes. Our study demonstrated the potential of advanced CV models in addressing spectrum sensing challenges in congested and dynamic RF environments involving both communication and radar signals. We hope our findings will spur further collaborative efforts to tackle the complexities of congested RF spectrum environments.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Efficient Bilevel Source Mask Optimization
Authors:
Guo** Chen,
Hongquan He,
Peng Xu,
Hao Geng,
Bei Yu
Abstract:
Resolution Enhancement Techniques (RETs) are critical to meet the demands of advanced technology nodes. Among RETs, Source Mask Optimization (SMO) is pivotal, concurrently optimizing both the source and the mask to expand the process window. Traditional SMO methods, however, are limited by sequential and alternating optimizations, leading to extended runtimes without performance guarantees. This p…
▽ More
Resolution Enhancement Techniques (RETs) are critical to meet the demands of advanced technology nodes. Among RETs, Source Mask Optimization (SMO) is pivotal, concurrently optimizing both the source and the mask to expand the process window. Traditional SMO methods, however, are limited by sequential and alternating optimizations, leading to extended runtimes without performance guarantees. This paper introduces a unified SMO framework utilizing the accelerated Abbe forward imaging to enhance precision and efficiency. Further, we propose the innovative \texttt{BiSMO} framework, which reformulates SMO through a bilevel optimization approach, and present three gradient-based methods to tackle the challenges of bilevel SMO. Our experimental results demonstrate that \texttt{BiSMO} achieves a remarkable 40\% reduction in error metrics and 8$\times$ increase in runtime efficiency, signifying a major leap forward in SMO.
△ Less
Submitted 7 March, 2024;
originally announced May 2024.
-
An Efficient Algorithm for Sum-Rate Maximization in Fluid Antenna-Assisted ISAC System
Authors:
Qian Zhang,
Mingjie Shao,
Tong Zhang,
Gaojie Chen,
Ju Liu
Abstract:
In this letter, we investigate the fluid antenna (FA)-assisted integrated sensing and communication (ISAC) system, where communication and radar sensing employ the co-waveform design. Specifically, we focus on the beamformer design and antenna position configuration to realize a higher communication rate while guaranteeing the minimum radar probing power. Different from existing beamformer algorit…
▽ More
In this letter, we investigate the fluid antenna (FA)-assisted integrated sensing and communication (ISAC) system, where communication and radar sensing employ the co-waveform design. Specifically, we focus on the beamformer design and antenna position configuration to realize a higher communication rate while guaranteeing the minimum radar probing power. Different from existing beamformer algorithms, we propose an efficient proximal distance algorithm (PDA) to solve the multiuser sum-rate maximization problem with radar sensing constraint to obtain the closed-form beamforming vector. In addition, we develop an extrapolated projected gradient (EPG) algorithm to obtain a better antenna location configuration for FA to enhance the ISAC performance. Numerical results show that the considered FA-assisted ISAC system enjoys a higher sum-rate by the proposed algorithm, compared with that in existing non-FA ISAC systems.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Intelligent Reflecting Surface Aided AirComp: Multi-Timescale Design and Performance Analysis
Authors:
Guangji Chen,
Jun Li,
Qingqing Wu,
Meng Hua,
Kaitao Meng,
Zhonghao Lyu
Abstract:
The integration of intelligent reflecting surface (IRS) into over-the-air computation (AirComp) is an effective solution for reducing the computational mean squared error (MSE) via its high passive beamforming gain. Prior works on IRS aided AirComp generally rely on the full instantaneous channel state information (I-CSI), which is not applicable to large-scale systems due to its heavy signalling…
▽ More
The integration of intelligent reflecting surface (IRS) into over-the-air computation (AirComp) is an effective solution for reducing the computational mean squared error (MSE) via its high passive beamforming gain. Prior works on IRS aided AirComp generally rely on the full instantaneous channel state information (I-CSI), which is not applicable to large-scale systems due to its heavy signalling overhead. To address this issue, we propose a novel multi-timescale transmission protocol. In particular, the receive beamforming at the access point (AP) is pre-determined based on the static angle information and the IRS phase-shifts are optimized relying on the long-term statistical CSI. With the obtained AP receive beamforming and IRS phase-shifts, the effective low-dimensional I-CSI is exploited to determine devices' transmit power in each coherence block, thus substantially reducing the signalling overhead. Theoretical analysis unveils that the achievable MSE scales on the order of ${\cal O}\left( {K/\left( {{N^2}M} \right)} \right)$, where $M$, $N$, and $K$ are the number of AP antennas, IRS elements, and devices, respectively. We also prove that the channel-inversion power control is asymptotically optimal for large $N$, which reveals that the full power transmission policy is not needed for lowering the power consumption of energy-limited devices.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Characterizing Regional Importance in Cities with Human Mobility Motifs in Metro Networks
Authors:
Shuyang Shi,
Ding Lyu,
Lin Wang,
Xiaofan Wang,
Guanrong Chen
Abstract:
Uncovering higher-order spatiotemporal dependencies within human mobility networks offers valuable insights into the analysis of urban structures. In most existing studies, human mobility networks are typically constructed by aggregating all trips without distinguishing who takes which specific trip. Instead, we claim individual mobility motifs, higher-order structures generated by daily trips of…
▽ More
Uncovering higher-order spatiotemporal dependencies within human mobility networks offers valuable insights into the analysis of urban structures. In most existing studies, human mobility networks are typically constructed by aggregating all trips without distinguishing who takes which specific trip. Instead, we claim individual mobility motifs, higher-order structures generated by daily trips of people, as fundamental units of human mobility networks. In this paper, we propose two network construction frameworks at the level of mobility motifs in characterizing regional importance in cities. Firstly, we enhance the structural dependencies within mobility motifs and proceed to construct mobility networks based on the enhanced mobility motifs. Secondly, taking inspiration from PageRank, we speculate that people would allocate values of importance to destinations according to their trip intentions. A motif-wise network construction framework is proposed based on the established mechanism. Leveraging large-scale metro data across cities, we construct three types of human mobility networks and characterize the regional importance by node importance indicators. Our comparison results suggest that the motif-based mobility network outperforms the classic mobility network, thus highlighting the efficacy of the introduced human mobility motifs. Finally, we demonstrate that the performance in characterizing the regional importance is significantly improved by our motif-wise framework.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Risk Assessment for Nonlinear Cyber-Physical Systems under Stealth Attacks
Authors:
Guang Chen,
Zhicong Sun,
Yulong Ding,
Shuang-hua Yang
Abstract:
Stealth attacks pose potential risks to cyber-physical systems because they are difficult to detect. Assessing the risk of systems under stealth attacks remains an open challenge, especially in nonlinear systems. To comprehensively quantify these risks, we propose a framework that considers both the reachability of a system and the risk distribution of a scenario. We propose an algorithm to approx…
▽ More
Stealth attacks pose potential risks to cyber-physical systems because they are difficult to detect. Assessing the risk of systems under stealth attacks remains an open challenge, especially in nonlinear systems. To comprehensively quantify these risks, we propose a framework that considers both the reachability of a system and the risk distribution of a scenario. We propose an algorithm to approximate the reachability of a nonlinear system under stealth attacks with a union of standard sets. Meanwhile, we present a method to construct a risk field to formally describe the risk distribution in a given scenario. The intersection relationships of system reachability and risk regions in the risk field indicate that attackers can cause corresponding risks without being detected. Based on this, we introduce a metric to dynamically quantify the risk. Compared to traditional methods, our framework predicts the risk value in an explainable way and provides early warnings for safety control. We demonstrate the effectiveness of our framework through a case study of an automated warehouse.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
The Continuous-Time Weighted-Median Opinion Dynamics
Authors:
Yi Han,
Ge Chen,
Florian Dörfler,
Wenjun Mei
Abstract:
Opinion dynamics models are important in understanding and predicting opinion formation processes within social groups. Although the weighted-averaging opinion-update mechanism is widely adopted as the micro-foundation of opinion dynamics, it bears a non-negligibly unrealistic implication: opinion attractiveness increases with opinion distance. Recently, the weighted-median mechanism has been prop…
▽ More
Opinion dynamics models are important in understanding and predicting opinion formation processes within social groups. Although the weighted-averaging opinion-update mechanism is widely adopted as the micro-foundation of opinion dynamics, it bears a non-negligibly unrealistic implication: opinion attractiveness increases with opinion distance. Recently, the weighted-median mechanism has been proposed as a new microscopic mechanism of opinion exchange. Numerous advancements have been achieved regarding this new micro-foundation, from theoretical analysis to empirical validation, in a discrete-time asynchronous setup. However, the original discrete-time weighted-median model does not allow for "compromise behavior" in opinion exchanges, i.e., no intermediate opinions are created between disagreeing agents. To resolve this problem, this paper propose a novel continuous-time weighted-median opinion dynamics model, in which agents' opinions move towards the weighted-medians of their out-neighbors' opinions. It turns out that the proof methods for the original discrete-time asynchronous model are no longer applicable to the analysis of the continuous-time model. In this paper, we first establish the existence and uniqueness of the solution to the continuous-time weighted-median opinion dynamics by showing that the weighted-median map** is contractive on any graph. We also characterize the set of all the equilibria. Then, by leveraging a new LaSalle invariance principle argument, we prove the convergence of the continuous-time weighted-median model for any initial condition and derive a necessary and sufficient condition for the convergence to consensus.
△ Less
Submitted 28 April, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model
Authors:
Gehui Chen,
Guan'an Wang,
Xiaowen Huang,
Jitao Sang
Abstract:
Existing works have made strides in video generation, but the lack of sound effects (SFX) and background music (BGM) hinders a complete and immersive viewer experience. We introduce a novel semantically consistent v ideo-to-audio generation framework, namely SVA, which automatically generates audio semantically consistent with the given video content. The framework harnesses the power of multimoda…
▽ More
Existing works have made strides in video generation, but the lack of sound effects (SFX) and background music (BGM) hinders a complete and immersive viewer experience. We introduce a novel semantically consistent v ideo-to-audio generation framework, namely SVA, which automatically generates audio semantically consistent with the given video content. The framework harnesses the power of multimodal large language model (MLLM) to understand video semantics from a key frame and generate creative audio schemes, which are then utilized as prompts for text-to-audio models, resulting in video-to-audio generation with natural language as an interface. We show the satisfactory performance of SVA through case study and discuss the limitations along with the future research direction. The project page is available at https://huiz-a.github.io/audio4video.github.io/.
△ Less
Submitted 25 April, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Real-Time Compressed Sensing for Joint Hyperspectral Image Transmission and Restoration for CubeSat
Authors:
Chih-Chung Hsu,
Chih-Yu Jian,
Eng-Shen Tu,
Chia-Ming Lee,
Guan-Lin Chen
Abstract:
This paper addresses the challenges associated with hyperspectral image (HSI) reconstruction from miniaturized satellites, which often suffer from stripe effects and are computationally resource-limited. We propose a Real-Time Compressed Sensing (RTCS) network designed to be lightweight and require only relatively few training samples for efficient and robust HSI reconstruction in the presence of…
▽ More
This paper addresses the challenges associated with hyperspectral image (HSI) reconstruction from miniaturized satellites, which often suffer from stripe effects and are computationally resource-limited. We propose a Real-Time Compressed Sensing (RTCS) network designed to be lightweight and require only relatively few training samples for efficient and robust HSI reconstruction in the presence of the stripe effect and under noisy transmission conditions. The RTCS network features a simplified architecture that reduces the required training samples and allows for easy implementation on integer-8-based encoders, facilitating rapid compressed sensing for stripe-like HSI, which exactly matches the moderate design of miniaturized satellites on push broom scanning mechanism. This contrasts optimization-based models that demand high-precision floating-point operations, making them difficult to deploy on edge devices. Our encoder employs an integer-8-compatible linear projection for stripe-like HSI data transmission, ensuring real-time compressed sensing. Furthermore, based on the novel two-streamed architecture, an efficient HSI restoration decoder is proposed for the receiver side, allowing for edge-device reconstruction without needing a sophisticated central server. This is particularly crucial as an increasing number of miniaturized satellites necessitates significant computing resources on the ground station. Extensive experiments validate the superior performance of our approach, offering new and vital capabilities for existing miniaturized satellite systems.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task
Authors:
Kangzhen Yang,
Tao Hu,
Kexin Dai,
Genggeng Chen,
Yu Cao,
Wei Dong,
Peng Wu,
Yanning Zhang,
Qingsen Yan
Abstract:
In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. Howev…
▽ More
In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. However, merely performing a single type of image enhancement still cannot yield satisfactory images. In this paper, to deal with the challenge above, we propose the Composite Refinement Network (CRNet) to address this issue using multiple exposure images. By fully integrating information-rich multiple exposure inputs, CRNet can perform unified image restoration and enhancement. To improve the quality of image details, CRNet explicitly separates and strengthens high and low-frequency information through pooling layers, using specially designed Multi-Branch Blocks for effective fusion of these frequencies. To increase the receptive field and fully integrate input features, CRNet employs the High-Frequency Enhancement Module, which includes large kernel convolutions and an inverted bottleneck ConvFFN. Our model secured third place in the first track of the Bracketing Image Restoration and Enhancement Challenge, surpassing previous SOTA models in both testing metrics and visual quality.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition
Authors:
Genggeng Chen,
Kexin Dai,
Kangzhen Yang,
Tao Hu,
Xiangyu Chen,
Yongqing Yang,
Wei Dong,
Peng Wu,
Yanning Zhang,
Qingsen Yan
Abstract:
In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resul…
▽ More
In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resulting in less-than-ideal restoration outcomes. Inspired by the notion that high/low frequency information is applicable to different degradations, we introduce HLNet, a Bracketing Image Restoration and Enhancement method based on high-low frequency decomposition. Specifically, we employ two modules for feature extraction: shared weight modules and non-shared weight modules. In the shared weight modules, we use SCConv to extract common features from different degradations. In the non-shared weight modules, we introduce the High-Low Frequency Decomposition Block (HLFDB), which employs different methods to handle high-low frequency information, enabling the model to address different degradations more effectively. Compared to other networks, our method takes into account the characteristics of different degradations, thus achieving higher-quality image restoration.
△ Less
Submitted 24 April, 2024; v1 submitted 21 April, 2024;
originally announced April 2024.
-
GPU-Accelerated RSF Level Set Evolution for Large-Scale Microvascular Segmentation
Authors:
Meher Niger,
Helya Goharbavang,
Taeyong Ahn,
Emily K. Alley,
Joshua D. Wythe,
Guoning Chen,
David Mayerich
Abstract:
Microvascular networks are challenging to model because these structures are currently near the diffraction limit for most advanced three-dimensional imaging modalities, including confocal and light sheet microscopy. This makes semantic segmentation difficult, because individual components of these networks fluctuate within the confines of individual pixels. Level set methods are ideally suited to…
▽ More
Microvascular networks are challenging to model because these structures are currently near the diffraction limit for most advanced three-dimensional imaging modalities, including confocal and light sheet microscopy. This makes semantic segmentation difficult, because individual components of these networks fluctuate within the confines of individual pixels. Level set methods are ideally suited to solve this problem by providing surface and topological constraints on the resulting model, however these active contour techniques are extremely time intensive and impractical for terabyte-scale images. We propose a reformulation and implementation of the region-scalable fitting (RSF) level set model that makes it amenable to three-dimensional evaluation using both single-instruction multiple data (SIMD) and single-program multiple-data (SPMD) parallel processing. This enables evaluation of the level set equation on independent regions of the data set using graphics processing units (GPUs), making large-scale segmentation of high-resolution networks practical and inexpensive.
We tested this 3D parallel RSF approach on multiple data sets acquired using state-of-the-art imaging techniques to acquire microvascular data, including micro-CT, light sheet fluorescence microscopy (LSFM) and milling microscopy. To assess the performance and accuracy of the RSF model, we conducted a Monte-Carlo-based validation technique to compare results to other segmentation methods. We also provide a rigorous profiling to show the gains in processing speed leveraging parallel hardware. This study showcases the practical application of the RSF model, emphasizing its utility in the challenging domain of segmenting large-scale high-topology network structures with a particular focus on building microvascular models.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Computationally Efficient Unsupervised Deep Learning for Robust Joint AP Clustering and Beamforming Design in Cell-Free Systems
Authors:
Guanghui Chen,
Zheng Wang,
Hongxin Lin,
Yongming Huang,
Luxi Yang
Abstract:
In this paper, we consider robust joint access point (AP) clustering and beamforming design with imperfect channel state information (CSI) in cell-free systems. Specifically, we jointly optimize AP clustering and beamforming with imperfect CSI to simultaneously maximize the worst-case sum rate and minimize the number of AP clustering under power constraint and the sparsity constraint of AP cluster…
▽ More
In this paper, we consider robust joint access point (AP) clustering and beamforming design with imperfect channel state information (CSI) in cell-free systems. Specifically, we jointly optimize AP clustering and beamforming with imperfect CSI to simultaneously maximize the worst-case sum rate and minimize the number of AP clustering under power constraint and the sparsity constraint of AP clustering. By transformations, the semi-infinite constraints caused by the imperfect CSI are converted into more tractable forms for facilitating a computationally efficient unsupervised deep learning algorithm. In addition, to further reduce the computational complexity, a computationally effective unsupervised deep learning algorithm is proposed to implement robust joint AP clustering and beamforming design with imperfect CSI in cell-free systems. Numerical results demonstrate that the proposed unsupervised deep learning algorithm achieves a higher worst-case sum rate under a smaller number of AP clustering with computational efficiency.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
SPMamba: State-space model is all you need in speech separation
Authors:
Kai Li,
Guo Chen
Abstract:
In speech separation, both CNN- and Transformer-based models have demonstrated robust separation capabilities, garnering significant attention within the research community. However, CNN-based methods have limited modelling capability for long-sequence audio, leading to suboptimal separation performance. Conversely, Transformer-based methods are limited in practical applications due to their high…
▽ More
In speech separation, both CNN- and Transformer-based models have demonstrated robust separation capabilities, garnering significant attention within the research community. However, CNN-based methods have limited modelling capability for long-sequence audio, leading to suboptimal separation performance. Conversely, Transformer-based methods are limited in practical applications due to their high computational complexity. Notably, within computer vision, Mamba-based methods have been celebrated for their formidable performance and reduced computational requirements. In this paper, we propose a network architecture for speech separation using a state-space model, namely SPMamba. We adopt the TF-GridNet model as the foundational framework and substitute its Transformer component with a bidirectional Mamba module, aiming to capture a broader range of contextual information. Our experimental results reveal an important role in the performance aspects of Mamba-based models. SPMamba demonstrates superior performance with a significant advantage over existing separation models in a dataset built on Librispeech. Notably, SPMamba achieves a substantial improvement in separation quality, with a 2.42 dB enhancement in SI-SNRi compared to the TF-GridNet. The source code for SPMamba is publicly accessible at https://github.com/JusperLee/SPMamba .
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Knowledge and Data Dual-Driven Channel Estimation and Feedback for Ultra-Massive MIMO Systems under Hybrid Field Beam Squint Effect
Authors:
Kuiyu Wang,
Zhen Gao,
Sheng Chen,
Boyu Ning,
Gaojie Chen,
Yu Su,
Zhaocheng Wang,
H. Vincent Poor
Abstract:
Acquiring accurate channel state information (CSI) at an access point (AP) is challenging for wideband millimeter wave (mmWave) ultra-massive multiple-input and multiple-output (UMMIMO) systems, due to the high-dimensional channel matrices, hybrid near- and far- field channel feature, beam squint effects, and imperfect hardware constraints, such as low-resolution analog-to-digital converters, and…
▽ More
Acquiring accurate channel state information (CSI) at an access point (AP) is challenging for wideband millimeter wave (mmWave) ultra-massive multiple-input and multiple-output (UMMIMO) systems, due to the high-dimensional channel matrices, hybrid near- and far- field channel feature, beam squint effects, and imperfect hardware constraints, such as low-resolution analog-to-digital converters, and in-phase and quadrature imbalance. To overcome these challenges, this paper proposes an efficient downlink channel estimation (CE) and CSI feedback approach based on knowledge and data dual-driven deep learning (DL) networks. Specifically, we first propose a data-driven residual neural network de-quantizer (ResNet-DQ) to pre-process the received pilot signals at user equipment (UEs), where the noise and distortion brought by imperfect hardware can be mitigated. A knowledge-driven generalized multiple measurement vector learned approximate message passing (GMMV-LAMP) network is then developed to jointly estimate the channels by exploiting the approximately same physical angle shared by different subcarriers. In particular, two wideband redundant dictionaries (WRDs) are proposed such that the measurement matrices of the GMMV-LAMP network can accommodate the far-field and near-field beam squint effect, respectively. Finally, we propose an encoder at the UEs and a decoder at the AP by a data-driven CSI residual network (CSI-ResNet) to compress the CSI matrix into a low-dimensional quantized bit vector for feedback, thereby reducing the feedback overhead substantially. Simulation results show that the proposed knowledge and data dual-driven approach outperforms conventional downlink CE and CSI feedback methods, especially in the case of low signal-to-noise ratios.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Inverse learning of black-box aggregator for robust Nash equilibrium
Authors:
Guanpu Chen,
Gehui Xu,
Fengxiang He,
Dacheng Tao,
Thomas Parisini,
Karl Henrik Johansson
Abstract:
In this note, we investigate the robustness of Nash equilibria (NE) in multi-player aggregative games with coupling constraints. There are many algorithms for computing an NE of an aggregative game given a known aggregator. When the coupling parameters are affected by uncertainty, robust NE need to be computed. We consider a scenario where players' weight in the aggregator is unknown, making the a…
▽ More
In this note, we investigate the robustness of Nash equilibria (NE) in multi-player aggregative games with coupling constraints. There are many algorithms for computing an NE of an aggregative game given a known aggregator. When the coupling parameters are affected by uncertainty, robust NE need to be computed. We consider a scenario where players' weight in the aggregator is unknown, making the aggregator kind of "a black box". We pursue a suitable learning approach to estimate the unknown aggregator by proposing an inverse variational inequality-based relationship. We then utilize the counterpart to reconstruct the game and obtain first-order conditions for robust NE in the worst case. Furthermore, we characterize the generalization property of the learning methodology via an upper bound on the violation probability. Simulation experiments show the effectiveness of the proposed inverse learning approach.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
Authors:
Jiayu Du,
**peng Li,
Guoguo Chen,
Wei-Qiang Zhang
Abstract:
In the wake of the surging tide of deep learning over the past decade, Automatic Speech Recognition (ASR) has garnered substantial attention, leading to the emergence of numerous publicly accessible ASR systems that are actively being integrated into our daily lives. Nonetheless, the impartial and replicable evaluation of these ASR systems encounters challenges due to various crucial subtleties. I…
▽ More
In the wake of the surging tide of deep learning over the past decade, Automatic Speech Recognition (ASR) has garnered substantial attention, leading to the emergence of numerous publicly accessible ASR systems that are actively being integrated into our daily lives. Nonetheless, the impartial and replicable evaluation of these ASR systems encounters challenges due to various crucial subtleties. In this paper we introduce the SpeechColab Leaderboard, a general-purpose, open-source platform designed for ASR evaluation. With this platform: (i) We report a comprehensive benchmark, unveiling the current state-of-the-art panorama for ASR systems, covering both open-source models and industrial commercial services. (ii) We quantize how distinct nuances in the scoring pipeline influence the final benchmark outcomes. These include nuances related to capitalization, punctuation, interjection, contraction, synonym usage, compound words, etc. These issues have gained prominence in the context of the transition towards an End-to-End future. (iii) We propose a practical modification to the conventional Token-Error-Rate (TER) evaluation metric, with inspirations from Kolmogorov complexity and Normalized Information Distance (NID). This adaptation, called modified-TER (mTER), achieves proper normalization and symmetrical treatment of reference and hypothesis. By leveraging this platform as a large-scale testing ground, this study demonstrates the robustness and backward compatibility of mTER when compared to TER. The SpeechColab Leaderboard is accessible at https://github.com/SpeechColab/Leaderboard
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
On the Choice of Loss Function in Learning-based Optimal Power Flow
Authors:
Ge Chen,
Junjie Qin
Abstract:
We analyze and contrast two ways to train machine learning models for solving AC optimal power flow (OPF) problems, distinguished with the loss functions used. The first trains a map** from the loads to the optimal dispatch decisions, utilizing mean square error (MSE) between predicted and optimal dispatch decisions as the loss function. The other intends to learn the same map**, but directly…
▽ More
We analyze and contrast two ways to train machine learning models for solving AC optimal power flow (OPF) problems, distinguished with the loss functions used. The first trains a map** from the loads to the optimal dispatch decisions, utilizing mean square error (MSE) between predicted and optimal dispatch decisions as the loss function. The other intends to learn the same map**, but directly uses the OPF cost of the predicted decisions, referred to as decision loss, as the loss function. In addition to better aligning with the OPF cost which results in reduced suboptimality, the use of decision loss can circumvent feasibility issues that arise with MSE when the underlying map** from loads to optimal dispatch is discontinuous. Since decision loss does not capture the OPF constraints, we further develop a neural network with a specific structure and introduce a modified training algorithm incorporating Lagrangian duality to improve feasibility.} This result in an improved performance measured by feasibility and suboptimality as demonstrated with an IEEE 39-bus case study.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Neural Risk Limiting Dispatch in Power Networks: Formulation and Generalization Guarantees
Authors:
Ge Chen,
Junjie Qin
Abstract:
Risk limiting dispatch (RLD) has been proposed as an approach that effectively trades off economic costs with operational risks for power dispatch under uncertainty. However, how to solve the RLD problem with provably near-optimal performance still remains an open problem. This paper presents a learning-based solution to this challenge. We first design a data-driven formulation for the RLD problem…
▽ More
Risk limiting dispatch (RLD) has been proposed as an approach that effectively trades off economic costs with operational risks for power dispatch under uncertainty. However, how to solve the RLD problem with provably near-optimal performance still remains an open problem. This paper presents a learning-based solution to this challenge. We first design a data-driven formulation for the RLD problem, which aims to construct a decision rule that directly maps day-ahead observable information to cost-effective dispatch decisions for the future delivery interval. Unlike most existing works that follow a predict-then-optimize paradigm, this end-to-end rule bypasses the additional suboptimality introduced by separately handling prediction and optimization. We then propose neural RLD, a novel solution method to the data-driven formulation. This method leverages an L2-regularized neural network to learn the decision rule, thereby transforming the data-driven formulation into a neural network training task that can be efficiently completed by stochastic gradient descent. A theoretical performance guarantee is further established to bound the suboptimality of our method, which implies that its suboptimality approaches to zero with high probability as more samples are utilized. Simulation tests across various systems demonstrate our method's superior performance in convergence, suboptimality, and computational efficiency compared with benchmarks.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
A Proactive and Dual Prevention Mechanism against Illegal Song Covers empowered by Singing Voice Conversion
Authors:
Guangke Chen,
Yedi Zhang,
Fu Song,
Ting Wang,
Xiaoning Du,
Yang Liu
Abstract:
Singing voice conversion (SVC) automates song covers by converting one singer's singing voice into another target singer's singing voice with the original lyrics and melody. However, it raises serious concerns about copyright and civil right infringements to multiple entities. This work proposes SongBsAb, the first proactive approach to mitigate unauthorized SVC-based illegal song covers. SongBsAb…
▽ More
Singing voice conversion (SVC) automates song covers by converting one singer's singing voice into another target singer's singing voice with the original lyrics and melody. However, it raises serious concerns about copyright and civil right infringements to multiple entities. This work proposes SongBsAb, the first proactive approach to mitigate unauthorized SVC-based illegal song covers. SongBsAb introduces human-imperceptible perturbations to singing voices before releasing them, so that when they are used, the generation process of SVC will be interfered, resulting in unexpected singing voices. SongBsAb features a dual prevention effect by causing both (singer) identity disruption and lyric disruption, namely, the SVC-covered singing voice neither imitates the target singer nor preserves the original lyrics. To improve the imperceptibility of perturbations, we refine a psychoacoustic model-based loss with the backing track as an additional masker, a unique accompanying element for singing voices compared to ordinary speech voices. To enhance the transferability, we propose to utilize a frame-level interaction reduction-based loss. We demonstrate the prevention effectiveness, utility, and robustness of SongBsAb on three SVC models and two datasets using both objective and human study-based subjective metrics. Our work fosters an emerging research direction for mitigating illegal automated song covers.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
GarchingSim: An Autonomous Driving Simulator with Photorealistic Scenes and Minimalist Workflow
Authors:
Liguo Zhou,
Yinglei Song,
Yichao Gao,
Zhou Yu,
Michael Sodamin,
Hongshen Liu,
Liang Ma,
Lian Liu,
Hao Liu,
Yang Liu,
Haichuan Li,
Guang Chen,
Alois Knoll
Abstract:
Conducting real road testing for autonomous driving algorithms can be expensive and sometimes impractical, particularly for small startups and research institutes. Thus, simulation becomes an important method for evaluating these algorithms. However, the availability of free and open-source simulators is limited, and the installation and configuration process can be daunting for beginners and inte…
▽ More
Conducting real road testing for autonomous driving algorithms can be expensive and sometimes impractical, particularly for small startups and research institutes. Thus, simulation becomes an important method for evaluating these algorithms. However, the availability of free and open-source simulators is limited, and the installation and configuration process can be daunting for beginners and interdisciplinary researchers. We introduce an autonomous driving simulator with photorealistic scenes, meanwhile kee** a user-friendly workflow. The simulator is able to communicate with external algorithms through ROS2 or Socket.IO, making it compatible with existing software stacks. Furthermore, we implement a highly accurate vehicle dynamics model within the simulator to enhance the realism of the vehicle's physical effects. The simulator is able to serve various functions, including generating synthetic data and driving with machine learning-based algorithms. Moreover, we prioritize simplicity in the deployment process, ensuring that beginners find it approachable and user-friendly.
△ Less
Submitted 30 January, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
Near-Space Communications: the Last Piece of 6G Space-Air-Ground-Sea Integrated Network Puzzle
Authors:
Hongshan Liu,
Tong Qin,
Zhen Gao,
Tianqi Mao,
Keke Ying,
Ziwei Wan,
Li Qiao,
Rui Na,
Zhongxiang Li,
Chun Hu,
Yikun Mei,
Tuan Li,
Guanghui Wen,
Lei Chen,
Zhonghuai Wu,
Ruiqi Liu,
Gaojie Chen,
Shuo Wang,
Dezhi Zheng
Abstract:
This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis…
▽ More
This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis between the NS-COM network and other counterparts in SAGSIN is conducted, covering aspects of deployment, coverage, channel characteristics and unique problems of NS-COM network. Afterwards, the technical aspects of NS-COM, including channel modeling, random access, channel estimation, array-based beam management and joint network optimization, are examined in detail. Furthermore, we explore the potential applications of NS-COM, such as structural expansion in SAGSIN communication, civil aviation communication, remote and urgent communication, weather monitoring and carbon neutrality. Finally, some promising research avenues are identified, including stratospheric satellite (StratoSat) -to-ground direct links for mobile terminals, reconfigurable multiple-input multiple-output (MIMO) and holographic MIMO, federated learning in NS-COM networks, maritime communication, electromagnetic spectrum sensing and adversarial game, integrated sensing and communications, StratoSat-based radar detection and imaging, NS-COM assisted enhanced global navigation system, NS-COM assisted intelligent unmanned system and free space optical (FSO) communication. Overall, this paper highlights that the NS-COM plays an indispensable role in the SAGSIN puzzle, providing substantial performance and coverage enhancement to the traditional SAGSIN architecture.
△ Less
Submitted 4 March, 2024; v1 submitted 30 December, 2023;
originally announced January 2024.
-
Deep Radon Prior: A Fully Unsupervised Framework for Sparse-View CT Reconstruction
Authors:
Shuo Xu,
Yucheng Zhang,
Gang Chen,
Xincheng Xiang,
Peng Cong,
Yuewen Sun
Abstract:
Although sparse-view computed tomography (CT) has significantly reduced radiation dose, it also introduces severe artifacts which degrade the image quality. In recent years, deep learning-based methods for inverse problems have made remarkable progress and have become increasingly popular in CT reconstruction. However, most of these methods suffer several limitations: dependence on high-quality tr…
▽ More
Although sparse-view computed tomography (CT) has significantly reduced radiation dose, it also introduces severe artifacts which degrade the image quality. In recent years, deep learning-based methods for inverse problems have made remarkable progress and have become increasingly popular in CT reconstruction. However, most of these methods suffer several limitations: dependence on high-quality training data, weak interpretability, etc. In this study, we propose a fully unsupervised framework called Deep Radon Prior (DRP), inspired by Deep Image Prior (DIP), to address the aforementioned limitations. DRP introduces a neural network as an implicit prior into the iterative method, thereby realizing cross-domain gradient feedback. During the reconstruction process, the neural network is progressively optimized in multiple stages to narrow the solution space in radon domain for the under-constrained imaging protocol, and the convergence of the proposed method has been discussed in this work. Compared with the popular pre-trained method, the proposed framework requires no dataset and exhibits superior interpretability and generalization ability. The experimental results demonstrate that the proposed method can generate detailed images while effectively suppressing image artifacts.Meanwhile, DRP achieves comparable or better performance than the supervised methods.
△ Less
Submitted 29 December, 2023;
originally announced January 2024.
-
Index Modulation for Fluid Antenna-Assisted MIMO Communications: System Design and Performance Analysis
Authors:
**g Zhu,
Gaojie Chen,
Pengyu Gao,
Pei Xiao,
Zihuai Lin,
Atta Quddus
Abstract:
In this paper, we propose a transmission mechanism for fluid antennas (FAs) enabled multiple-input multiple-output (MIMO) communication systems based on index modulation (IM), named FA-IM, which incorporates the principle of IM into FAs-assisted MIMO system to improve the spectral efficiency (SE) without increasing the hardware complexity. In FA-IM, the information bits are mapped not only to the…
▽ More
In this paper, we propose a transmission mechanism for fluid antennas (FAs) enabled multiple-input multiple-output (MIMO) communication systems based on index modulation (IM), named FA-IM, which incorporates the principle of IM into FAs-assisted MIMO system to improve the spectral efficiency (SE) without increasing the hardware complexity. In FA-IM, the information bits are mapped not only to the modulation symbols, but also the index of FA position patterns. Additionally, the FA position pattern codebook is carefully designed to further enhance the system performance by maximizing the effective channel gains. Then, a low-complexity detector, referred to efficient sparse Bayesian detector, is proposed by exploiting the inherent sparsity of the transmitted FA-IM signal vectors. Finally, a closed-form expression for the upper bound on the average bit error probability (ABEP) is derived under the finite-path and infinite-path channel condition. Simulation results show that the proposed scheme is capable of improving the SE performance compared to the existing FAs-assisted MIMO and the fixed position antennas (FPAs)-assisted MIMO systems while obviating any additional hardware costs. It has also been shown that the proposed scheme outperforms the conventional FA-assisted MIMO scheme in terms of error performance under the same transmission rate.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions
Authors:
Yang Liu,
Haoqin Sun,
Geng Chen,
Qingyue Wang,
Zhen Zhao,
Xugang Lu,
Longbiao Wang
Abstract:
Speech emotion recognition (SER) performance deteriorates significantly in the presence of noise, making it challenging to achieve competitive performance in noisy conditions. To this end, we propose a multi-level knowledge distillation (MLKD) method, which aims to transfer the knowledge from a teacher model trained on clean speech to a simpler student model trained on noisy speech. Specifically,…
▽ More
Speech emotion recognition (SER) performance deteriorates significantly in the presence of noise, making it challenging to achieve competitive performance in noisy conditions. To this end, we propose a multi-level knowledge distillation (MLKD) method, which aims to transfer the knowledge from a teacher model trained on clean speech to a simpler student model trained on noisy speech. Specifically, we use clean speech features extracted by the wav2vec-2.0 as the learning goal and train the distil wav2vec-2.0 to approximate the feature extraction ability of the original wav2vec-2.0 under noisy conditions. Furthermore, we leverage the multi-level knowledge of the original wav2vec-2.0 to supervise the single-level output of the distil wav2vec-2.0. We evaluate the effectiveness of our proposed method by conducting extensive experiments using five types of noise-contaminated speech on the IEMOCAP dataset, which show promising results compared to state-of-the-art models.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Design and Performance Analysis of Index Modulation Empowered AFDM System
Authors:
**g Zhu,
Qu Luo,
Gaojie Chen,
Pei Xiao,
Lixia Xiao
Abstract:
In this letter, we incorporate index modulation (IM) into affine frequency division multiplexing (AFDM), called AFDM-IM, to enhance the bit error rate (BER) and energy efficiency (EE) performance. In this scheme, the information bits are conveyed not only by $M$-ary constellation symbols, but also by the activation of the chirp subcarriers (SCs) indices, which are determined based on the incoming…
▽ More
In this letter, we incorporate index modulation (IM) into affine frequency division multiplexing (AFDM), called AFDM-IM, to enhance the bit error rate (BER) and energy efficiency (EE) performance. In this scheme, the information bits are conveyed not only by $M$-ary constellation symbols, but also by the activation of the chirp subcarriers (SCs) indices, which are determined based on the incoming bit streams. Then, two power allocation strategies, namely power reallocation (PR) strategy and power saving (PS) strategy, are proposed to enhance BER and EE performance, respectively. Furthermore, the average bit error probability (ABEP) is theoretically analyzed. Simulation results demonstrate that the proposed AFDM-IM scheme achieves better BER performance than the conventional AFDM scheme.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Joint Diffusion: Mutual Consistency-Driven Diffusion Model for PET-MRI Co-Reconstruction
Authors:
Taofeng Xie,
Zhuo-Xu Cui,
Chen Luo,
Huayu Wang,
Congcong Liu,
Yuanzhi Zhang,
Xuemei Wang,
Yanjie Zhu,
Qiyu **,
Guoqing Chen,
Yihang Zhou,
Dong Liang,
Haifeng Wang
Abstract:
Positron Emission Tomography and Magnetic Resonance Imaging (PET-MRI) systems can obtain functional and anatomical scans. PET suffers from a low signal-to-noise ratio. Meanwhile, the k-space data acquisition process in MRI is time-consuming. The study aims to accelerate MRI and enhance PET image quality. Conventional approaches involve the separate reconstruction of each modality within PET-MRI sy…
▽ More
Positron Emission Tomography and Magnetic Resonance Imaging (PET-MRI) systems can obtain functional and anatomical scans. PET suffers from a low signal-to-noise ratio. Meanwhile, the k-space data acquisition process in MRI is time-consuming. The study aims to accelerate MRI and enhance PET image quality. Conventional approaches involve the separate reconstruction of each modality within PET-MRI systems. However, there exists complementary information among multi-modal images. The complementary information can contribute to image reconstruction. In this study, we propose a novel PET-MRI joint reconstruction model employing a mutual consistency-driven diffusion mode, namely MC-Diffusion. MC-Diffusion learns the joint probability distribution of PET and MRI for utilizing complementary information. We conducted a series of contrast experiments about LPLS, Joint ISAT-net and MC-Diffusion by the ADNI dataset. The results underscore the qualitative and quantitative improvements achieved by MC-Diffusion, surpassing the state-of-the-art method.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
A Region of Interest Focused Triple UNet Architecture for Skin Lesion Segmentation
Authors:
Guoqing Liu,
Yu Guo,
Caiying Wu,
Guoqing Chen,
Barintag Saheya,
Qiyu **
Abstract:
Skin lesion segmentation is of great significance for skin lesion analysis and subsequent treatment. It is still a challenging task due to the irregular and fuzzy lesion borders, and diversity of skin lesions. In this paper, we propose Triple-UNet to automatically segment skin lesions. It is an organic combination of three UNet architectures with suitable modules. In order to concatenate the first…
▽ More
Skin lesion segmentation is of great significance for skin lesion analysis and subsequent treatment. It is still a challenging task due to the irregular and fuzzy lesion borders, and diversity of skin lesions. In this paper, we propose Triple-UNet to automatically segment skin lesions. It is an organic combination of three UNet architectures with suitable modules. In order to concatenate the first and second sub-networks more effectively, we design a region of interest enhancement module (ROIE). The ROIE enhances the target object region of the image by using the predicted score map of the first UNet. The features learned by the first UNet and the enhanced image help the second UNet obtain a better score map. Finally, the results are fine-tuned by the third UNet. We evaluate our algorithm on a publicly available dataset of skin lesion segmentation. Experiments show that Triple-UNet outperforms the state-of-the-art on skin lesion segmentation.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Network-Level Integrated Sensing and Communication: Interference Management and BS Coordination Using Stochastic Geometry
Authors:
Kaitao Meng,
Christos Masouros,
Guangji Chen,
Fan Liu
Abstract:
In this work, we study integrated sensing and communication (ISAC) networks with the aim of effectively balancing sensing and communication (S&C) performance at the network level. Focusing on monostatic sensing, the tool of stochastic geometry is exploited to capture the S&C performance, which facilitates us to illuminate key cooperative dependencies in the ISAC network and optimize key network-le…
▽ More
In this work, we study integrated sensing and communication (ISAC) networks with the aim of effectively balancing sensing and communication (S&C) performance at the network level. Focusing on monostatic sensing, the tool of stochastic geometry is exploited to capture the S&C performance, which facilitates us to illuminate key cooperative dependencies in the ISAC network and optimize key network-level parameters. Based on the derived tractable expression of area spectral efficiency (ASE), we formulate the optimization problem to maximize the network performance from the view point of two joint S&C metrics. Towards this end, we further jointly optimize the cooperative BS cluster sizes for S&C and the serving/probing numbers of users/targets to achieve a flexible tradeoff between S&C at the network level. It is verified that interference nulling can effectively improve the average data rate and radar information rate. Surprisingly, the optimal communication tradeoff for the case of the ASE maximization tends to employ all spacial resources towards multiplexing and diversity gain, without interference nulling. By contrast, for the sensing objectives, resource allocation tends to eliminate certain interference especially when the antenna resources are sufficient, because the inter-cell interference becomes a more dominant factor affecting sensing performance. Furthermore, we prove that the ratio of the optimal number of users and the number of transmit antennas is a constant value when the communication performance is optimal. Simulation results demonstrate that the proposed cooperative ISAC scheme achieves a substantial gain in S&C performance at the network level.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
3D Multi-Target Localization Via Intelligent Reflecting Surface: Protocol and Analysis
Authors:
Meng Hua,
Guangji Chen,
Kaitao Meng,
Shaodan Ma,
Chau Yuen,
Hing Cheung So
Abstract:
With the emerging environment-aware applications, ubiquitous sensing is expected to play a key role in future networks. In this paper, we study a 3-dimensional (3D) multi-target localization system where multiple intelligent reflecting surfaces (IRSs) are applied to create virtual line-of-sight (LoS) links that bypass the base station (BS) and targets. To fully unveil the fundamental limit of IRS…
▽ More
With the emerging environment-aware applications, ubiquitous sensing is expected to play a key role in future networks. In this paper, we study a 3-dimensional (3D) multi-target localization system where multiple intelligent reflecting surfaces (IRSs) are applied to create virtual line-of-sight (LoS) links that bypass the base station (BS) and targets. To fully unveil the fundamental limit of IRS for sensing, we first study a single-target-single-IRS case and propose a novel \textit{two-stage localization protocol} by controlling the on/off state of IRS. To be specific, in the IRS-off stage, we derive the Cramér-Rao bound (CRB) of the azimuth/elevation direction-of-arrival (DoA) of the BS-target link and design a DoA estimator based on the MUSIC algorithm. In the IRS-on stage, the CRB of the azimuth/elevation DoA of the IRS-target link is derived and a simple DoA estimator based on the on-grid IRS beam scanning method is proposed. Particularly, the impact of echo signals reflected by IRS from different paths on sensing performance is analyzed. Moreover, we prove that the single-beam of the IRS is not capable of sensing, but it can be achieved with \textit{multi-beam}. Based on the two obtained DoAs, the 3D single-target location is constructed. We then extend to the multi-target-multi-IRS case and propose an \textit{IRS-adaptive sensing protocol} by controlling the on/off state of multiple IRSs, and a multi-target localization algorithm is developed. Simulation results demonstrate the effectiveness of our scheme and show that sub-meter-level positioning accuracy can be achieved.
△ Less
Submitted 28 February, 2024; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Block Backstep** for Isotachic Hyperbolic PDEs and Multilayer Timoshenko Beams
Authors:
Guangwei Chen,
Rafael Vazquez,
Junfei Qiao,
Miroslav Krstic
Abstract:
In this paper, we investigate the rapid stabilization of N-layer Timoshenko composite beams with anti-dam** and anti-stiffness at the uncontrolled boundaries. The problem of stabilization for a two-layer composite beam has been previously studied by transforming the model into a 1-D hyperbolic PIDE-ODE form and then applying backstep** to this new system. In principle this approach is generali…
▽ More
In this paper, we investigate the rapid stabilization of N-layer Timoshenko composite beams with anti-dam** and anti-stiffness at the uncontrolled boundaries. The problem of stabilization for a two-layer composite beam has been previously studied by transforming the model into a 1-D hyperbolic PIDE-ODE form and then applying backstep** to this new system. In principle this approach is generalizable to any number of layers. However, when some of the layers have the same physical properties (as e.g. in lamination of repeated layers), the approach leads to isotachic hyperbolic PDEs (i.e. where some states have the same transport speed). This particular yet physical and interesting case has not received much attention beyond a few remarks in the early hyperbolic design. Thus, this work starts by extending the theory of backstep** control of (m + n) hyperbolic PIDEs and m ODEs to blocks of isotachic states, leading to a block backstep** design. Then, returning to multilayer Timoshenko beams, the Riemann transformation is used to transform the states of N-layer Timoshenko beams into a 1-D hyperbolic PIDE-ODE system. The block backstep** method is then applied to this model, obtaining closed-loop stability of the origin in the L2 sense. An arbitrarily rapid convergence rate can be obtained by adjusting control parameters. Finally, numerical simulations are presented corroborating the theoretical developments.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Towards Intelligent Network Management: Leveraging AI for Network Service Detection
Authors:
Khuong N. Nguyen,
Abhishek Sehgal,
Yuming Zhu,
Junsu Choi,
Guanbo Chen,
Hao Chen,
Boon Loong Ng,
Charlie Zhang
Abstract:
As the complexity and scale of modern computer networks continue to increase, there has emerged an urgent need for precise traffic analysis, which plays a pivotal role in cutting-edge wireless connectivity technologies. This study focuses on leveraging Machine Learning methodologies to create an advanced network traffic classification system. We introduce a novel data-driven approach that excels i…
▽ More
As the complexity and scale of modern computer networks continue to increase, there has emerged an urgent need for precise traffic analysis, which plays a pivotal role in cutting-edge wireless connectivity technologies. This study focuses on leveraging Machine Learning methodologies to create an advanced network traffic classification system. We introduce a novel data-driven approach that excels in identifying various network service types in real-time, by analyzing patterns within the network traffic. Our method organizes similar kinds of network traffic into distinct categories, referred to as network services, based on latency requirement. Furthermore, it decomposes the network traffic stream into multiple, smaller traffic flows, with each flow uniquely carrying a specific service. Our ML models are trained on a dataset comprised of labeled examples representing different network service types collected on various Wi-Fi network conditions. Upon evaluation, our system demonstrates a remarkable accuracy in distinguishing the network services. These results emphasize the substantial promise of integrating Artificial Intelligence in wireless technologies. Such an approach encourages more efficient energy consumption, enhances Quality of Service assurance, and optimizes the allocation of network resources, thus laying a solid groundwork for the development of advanced intelligent networks.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
Implementation of Fuzzy Control Algorithm in Two-Wheeled Differential Drive Platform
Authors:
Guoyi Chen
Abstract:
Designing and develo** Artificial Intelligence controllers on separately dedicated chips have many advantages. This report reviews the development of a real-time fuzzy logic controller for optimizing locomotion control of a two-wheeled differential drive platform using an Arduino Uno board. Based on the Raspberry Pi board, fuzzy sets are used to optimize color recognition, enabling the color sen…
▽ More
Designing and develo** Artificial Intelligence controllers on separately dedicated chips have many advantages. This report reviews the development of a real-time fuzzy logic controller for optimizing locomotion control of a two-wheeled differential drive platform using an Arduino Uno board. Based on the Raspberry Pi board, fuzzy sets are used to optimize color recognition, enabling the color sensor to correctly recognize color at long distances, across a wide range of light intensity, and with high fault tolerance.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Facilitating Battery Swap** Services for Freight Trucks with Spatial-Temporal Demand Prediction
Authors:
Linyu Liu,
Zhen Dai,
Shiji Song,
Xiaocheng Li,
Guanting Chen
Abstract:
Electrifying heavy-duty trucks offers a substantial opportunity to curtail carbon emissions, advancing toward a carbon-neutral future. However, the inherent challenges of limited battery energy and the sheer weight of heavy-duty trucks lead to reduced mileage and prolonged charging durations. Consequently, battery-swap** services emerge as an attractive solution for these trucks. This paper empl…
▽ More
Electrifying heavy-duty trucks offers a substantial opportunity to curtail carbon emissions, advancing toward a carbon-neutral future. However, the inherent challenges of limited battery energy and the sheer weight of heavy-duty trucks lead to reduced mileage and prolonged charging durations. Consequently, battery-swap** services emerge as an attractive solution for these trucks. This paper employs a two-fold approach to investigate the potential and enhance the efficacy of such services. Firstly, spatial-temporal demand prediction models are adopted to predict the traffic patterns for the upcoming hours. Subsequently, the prediction guides an optimization module for efficient battery allocation and deployment. Analyzing the heavy-duty truck data on a highway network spanning over 2,500 miles, our model and analysis underscore the value of prediction/machine learning in facilitating future decision-makings. In particular, we find that the initial phase of implementing battery-swap** services favors mobile battery-swap** stations, but as the system matures, fixed-location stations are preferred.
△ Less
Submitted 23 May, 2024; v1 submitted 1 October, 2023;
originally announced October 2023.
-
BAAF: A Benchmark Attention Adaptive Framework for Medical Ultrasound Image Segmentation Tasks
Authors:
Gong** Chen,
Lei Zhao,
Xiaotao Yin,
Liang Cui,
Jianxun Zhang,
Yu Dai
Abstract:
The AI-based assisted diagnosis programs have been widely investigated on medical ultrasound images. Complex scenario of ultrasound image, in which the coupled interference of internal and external factors is severe, brings a unique challenge for localize the object region automatically and precisely in ultrasound images. In this study, we seek to propose a more general and robust Benchmark Attent…
▽ More
The AI-based assisted diagnosis programs have been widely investigated on medical ultrasound images. Complex scenario of ultrasound image, in which the coupled interference of internal and external factors is severe, brings a unique challenge for localize the object region automatically and precisely in ultrasound images. In this study, we seek to propose a more general and robust Benchmark Attention Adaptive Framework (BAAF) to assist doctors segment or diagnose lesions and tissues in ultrasound images more quickly and accurately. Different from existing attention schemes, the BAAF consists of a parallel hybrid attention module (PHAM) and an adaptive calibration mechanism (ACM). Specifically, BAAF first coarsely calibrates the input features from the channel and spatial dimensions, and then adaptively selects more robust lesion or tissue characterizations from the coarse-calibrated feature maps. The design of BAAF further optimizes the "what" and "where" focus and selection problems in CNNs and seeks to improve the segmentation accuracy of lesions or tissues in medical ultrasound images. The method is evaluated on four medical ultrasound segmentation tasks, and the adequate experimental results demonstrate the remarkable performance improvement over existing state-of-the-art methods. In addition, the comparison with existing attention mechanisms also demonstrates the superiority of BAAF. This work provides the possibility for automated medical ultrasound assisted diagnosis and reduces reliance on human accuracy and precision.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Matrix Completion-Informed Deep Unfolded Equilibrium Models for Self-Supervised k-Space Interpolation in MRI
Authors:
Chen Luo,
Huayu Wang,
Taofeng Xie,
Qiyu **,
Guoqing Chen,
Zhuo-Xu Cui,
Dong Liang
Abstract:
Recently, regularization model-driven deep learning (DL) has gained significant attention due to its ability to leverage the potent representational capabilities of DL while retaining the theoretical guarantees of regularization models. However, most of these methods are tailored for supervised learning scenarios that necessitate fully sampled labels, which can pose challenges in practical MRI app…
▽ More
Recently, regularization model-driven deep learning (DL) has gained significant attention due to its ability to leverage the potent representational capabilities of DL while retaining the theoretical guarantees of regularization models. However, most of these methods are tailored for supervised learning scenarios that necessitate fully sampled labels, which can pose challenges in practical MRI applications. To tackle this challenge, we propose a self-supervised DL approach for accelerated MRI that is theoretically guaranteed and does not rely on fully sampled labels. Specifically, we achieve neural network structure regularization by exploiting the inherent structural low-rankness of the $k$-space data. Simultaneously, we constrain the network structure to resemble a nonexpansive map**, ensuring the network's convergence to a fixed point. Thanks to this well-defined network structure, this fixed point can completely reconstruct the missing $k$-space data based on matrix completion theory, even in situations where full-sampled labels are unavailable. Experiments validate the effectiveness of our proposed method and demonstrate its superiority over existing self-supervised approaches and traditional regularization methods, achieving performance comparable to that of supervised learning methods in certain scenarios.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
Convex Latent-Optimized Adversarial Regularizers for Imaging Inverse Problems
Authors:
Huayu Wang,
Chen Luo,
Taofeng Xie,
Qiyu **,
Guoqing Chen,
Zhuo-Xu Cui,
Dong Liang
Abstract:
Recently, data-driven techniques have demonstrated remarkable effectiveness in addressing challenges related to MR imaging inverse problems. However, these methods still exhibit certain limitations in terms of interpretability and robustness. In response, we introduce Convex Latent-Optimized Adversarial Regularizers (CLEAR), a novel and interpretable data-driven paradigm. CLEAR represents a fusion…
▽ More
Recently, data-driven techniques have demonstrated remarkable effectiveness in addressing challenges related to MR imaging inverse problems. However, these methods still exhibit certain limitations in terms of interpretability and robustness. In response, we introduce Convex Latent-Optimized Adversarial Regularizers (CLEAR), a novel and interpretable data-driven paradigm. CLEAR represents a fusion of deep learning (DL) and variational regularization. Specifically, we employ a latent optimization technique to adversarially train an input convex neural network, and its set of minima can fully represent the real data manifold. We utilize it as a convex regularizer to formulate a CLEAR-informed variational regularization model that guides the solution of the imaging inverse problem on the real data manifold. Leveraging its inherent convexity, we have established the convergence of the projected subgradient descent algorithm for the CLEAR-informed regularization model. This convergence guarantees the attainment of a unique solution to the imaging inverse problem, subject to certain assumptions. Furthermore, we have demonstrated the robustness of our CLEAR-informed model, explicitly showcasing its capacity to achieve stable reconstruction even in the presence of measurement interference. Finally, we illustrate the superiority of our approach using MRI reconstruction as an example. Our method consistently outperforms conventional data-driven techniques and traditional regularization approaches, excelling in both reconstruction quality and robustness.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
SLMIA-SR: Speaker-Level Membership Inference Attacks against Speaker Recognition Systems
Authors:
Guangke Chen,
Yedi Zhang,
Fu Song
Abstract:
Membership inference attacks allow adversaries to determine whether a particular example was contained in the model's training dataset. While previous works have confirmed the feasibility of such attacks in various applications, none has focused on speaker recognition (SR), a promising voice-based biometric recognition technique. In this work, we propose SLMIA-SR, the first membership inference at…
▽ More
Membership inference attacks allow adversaries to determine whether a particular example was contained in the model's training dataset. While previous works have confirmed the feasibility of such attacks in various applications, none has focused on speaker recognition (SR), a promising voice-based biometric recognition technique. In this work, we propose SLMIA-SR, the first membership inference attack tailored to SR. In contrast to conventional example-level attack, our attack features speaker-level membership inference, i.e., determining if any voices of a given speaker, either the same as or different from the given inference voices, have been involved in the training of a model. It is particularly useful and practical since the training and inference voices are usually distinct, and it is also meaningful considering the open-set nature of SR, namely, the recognition speakers were often not present in the training data. We utilize intra-similarity and inter-dissimilarity, two training objectives of SR, to characterize the differences between training and non-training speakers and quantify them with two groups of features driven by carefully-established feature engineering to mount the attack. To improve the generalizability of our attack, we propose a novel mixing ratio training strategy to train attack models. To enhance the attack performance, we introduce voice chunk splitting to cope with the limited number of inference voices and propose to train attack models dependent on the number of inference voices. Our attack is versatile and can work in both white-box and black-box scenarios. Additionally, we propose two novel techniques to reduce the number of black-box queries while maintaining the attack performance. Extensive experiments demonstrate the effectiveness of SLMIA-SR.
△ Less
Submitted 27 November, 2023; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Integrated Robotics Networks with Co-optimization of Drone Placement and Air-Ground Communications
Authors:
Menghao Hu,
Tong Zhang,
Shuai Wang,
Guoliang Li,
Yingyang Chen,
Qiang Li,
Gaojie Chen
Abstract:
Terrestrial robots, i.e., unmanned ground vehicles (UGVs), and aerial robots, i.e., unmanned aerial vehicles (UAVs), operate in separate spaces. To exploit their complementary features (e.g., fields of views, communication links, computing capabilities), a promising paradigm termed integrated robotics network emerges, which provides communications for cooperative UAVs-UGVs applications. However, h…
▽ More
Terrestrial robots, i.e., unmanned ground vehicles (UGVs), and aerial robots, i.e., unmanned aerial vehicles (UAVs), operate in separate spaces. To exploit their complementary features (e.g., fields of views, communication links, computing capabilities), a promising paradigm termed integrated robotics network emerges, which provides communications for cooperative UAVs-UGVs applications. However, how to efficiently deploy UAVs and schedule the UAVs-UGVs connections according to different UGV tasks become challenging. In this paper, we propose a sum-rate maximization problem, where UGVs plan their trajectories autonomously and are dynamically associated with UAVs according to their planned trajectories. Although the problem is a NP-hard mixed integer program, a fast polynomial time algorithm using alternating gradient descent and penalty-based binary relaxation, is devised. Simulation results demonstrate the effectiveness of the proposed algorithm.
△ Less
Submitted 3 December, 2023; v1 submitted 9 September, 2023;
originally announced September 2023.
-
LLaSM: Large Language and Speech Model
Authors:
Yu Shu,
Siwei Dong,
Guangyao Chen,
Wenhao Huang,
Ruihua Zhang,
Daochen Shi,
Qiqi Xiang,
Yemin Shi
Abstract:
Multi-modal large language models have garnered significant interest recently. Though, most of the works focus on vision-language multi-modal models providing strong capabilities in following vision-and-language instructions. However, we claim that speech is also an important modality through which humans interact with the world. Hence, it is crucial for a general-purpose assistant to be able to f…
▽ More
Multi-modal large language models have garnered significant interest recently. Though, most of the works focus on vision-language multi-modal models providing strong capabilities in following vision-and-language instructions. However, we claim that speech is also an important modality through which humans interact with the world. Hence, it is crucial for a general-purpose assistant to be able to follow multi-modal speech-and-language instructions. In this work, we propose Large Language and Speech Model (LLaSM). LLaSM is an end-to-end trained large multi-modal speech-language model with cross-modal conversational abilities, capable of following speech-and-language instructions. Our early experiments show that LLaSM demonstrates a more convenient and natural way for humans to interact with artificial intelligence. Specifically, we also release a large Speech Instruction Following dataset LLaSM-Audio-Instructions. Code and demo are available at https://github.com/LinkSoul-AI/LLaSM and https://huggingface.co/spaces/LinkSoul/LLaSM. The LLaSM-Audio-Instructions dataset is available at https://huggingface.co/datasets/LinkSoul/LLaSM-Audio-Instructions.
△ Less
Submitted 16 September, 2023; v1 submitted 30 August, 2023;
originally announced August 2023.
-
Enhancing Signal Space Diversity for SCMA Over Rayleigh Fading Channels
Authors:
Qu Luo,
Zilong Liu,
Gaojie Chen,
Pei Xiao
Abstract:
Sparse code multiple access (SCMA) is a promising technique for the enabling of massive connectivity in future machine-type communication networks, but it suffers from a limited diversity order which is a bottleneck for significant improvement of error performance. This paper aims for enhancing the signal space diversity of sparse code multiple access (SCMA) by introducing quadrature component del…
▽ More
Sparse code multiple access (SCMA) is a promising technique for the enabling of massive connectivity in future machine-type communication networks, but it suffers from a limited diversity order which is a bottleneck for significant improvement of error performance. This paper aims for enhancing the signal space diversity of sparse code multiple access (SCMA) by introducing quadrature component delay to the transmitted codeword of a downlink SCMA system in Rayleigh fading channels. Such a system is called SSD-SCMA throughout this work. By looking into the average mutual information (AMI) and the pairwise error probability (PEP) of the proposed SSD-SCMA, we develop novel codebooks by maximizing the derived AMI lower bound and a modified minimum product distance (MMPD), respectively. The intrinsic asymptotic relationship between the AMI lower bound and proposed MMPD based codebook designs is revealed. Numerical results show significant error performance improvement in the both uncoded and coded SSD-SCMA systems.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
WavMark: Watermarking for Audio Generation
Authors:
Guangyu Chen,
Yu Wu,
Shujie Liu,
Tao Liu,
Xiaoyong Du,
Furu Wei
Abstract:
Recent breakthroughs in zero-shot voice synthesis have enabled imitating a speaker's voice using just a few seconds of recording while maintaining a high level of realism. Alongside its potential benefits, this powerful technology introduces notable risks, including voice fraud and speaker impersonation. Unlike the conventional approach of solely relying on passive methods for detecting synthetic…
▽ More
Recent breakthroughs in zero-shot voice synthesis have enabled imitating a speaker's voice using just a few seconds of recording while maintaining a high level of realism. Alongside its potential benefits, this powerful technology introduces notable risks, including voice fraud and speaker impersonation. Unlike the conventional approach of solely relying on passive methods for detecting synthetic data, watermarking presents a proactive and robust defence mechanism against these looming risks. This paper introduces an innovative audio watermarking framework that encodes up to 32 bits of watermark within a mere 1-second audio snippet. The watermark is imperceptible to human senses and exhibits strong resilience against various attacks. It can serve as an effective identifier for synthesized voices and holds potential for broader applications in audio copyright protection. Moreover, this framework boasts high flexibility, allowing for the combination of multiple watermark segments to achieve heightened robustness and expanded capacity. Utilizing 10 to 20-second audio as the host, our approach demonstrates an average Bit Error Rate (BER) of 0.48\% across ten common attacks, a remarkable reduction of over 2800\% in BER compared to the state-of-the-art watermarking tool. See https://aka.ms/wavmark for demos of our work.
△ Less
Submitted 7 January, 2024; v1 submitted 24 August, 2023;
originally announced August 2023.
-
Neurological Prognostication of Post-Cardiac-Arrest Coma Patients Using EEG Data: A Dynamic Survival Analysis Framework with Competing Risks
Authors:
Xiaobin Shen,
Jonathan Elmer,
George H. Chen
Abstract:
Patients resuscitated from cardiac arrest who enter a coma are at high risk of death. Forecasting neurological outcomes of these patients (the task of neurological prognostication) could help with treatment decisions. In this paper, we propose, to the best of our knowledge, the first dynamic framework for neurological prognostication of post-cardiac-arrest comatose patients using EEG data: our fra…
▽ More
Patients resuscitated from cardiac arrest who enter a coma are at high risk of death. Forecasting neurological outcomes of these patients (the task of neurological prognostication) could help with treatment decisions. In this paper, we propose, to the best of our knowledge, the first dynamic framework for neurological prognostication of post-cardiac-arrest comatose patients using EEG data: our framework makes predictions for a patient over time as more EEG data become available, and different training patients' available EEG time series could vary in length. Predictions are phrased in terms of either time-to-event outcomes (time-to-awakening or time-to-death) or as the patient's probability of awakening or of dying across multiple time horizons. Our framework uses any dynamic survival analysis model that supports competing risks in the form of estimating patient-level cumulative incidence functions. We consider three competing risks as to what happens first to a patient: awakening, being withdrawn from life-sustaining therapies (and thus deterministically dying), or dying (by other causes). We demonstrate our framework by benchmarking three existing dynamic survival analysis models that support competing risks on a real dataset of 922 patients. Our main experimental findings are that: (1) the classical Fine and Gray model which only uses a patient's static features and summary statistics from the patient's latest hour's worth of EEG data is highly competitive, achieving accuracy scores as high as the recently developed Dynamic-DeepHit model that uses substantially more of the patient's EEG data; and (2) in an ablation study, we show that our choice of modeling three competing risks results in a model that is at least as accurate while learning more information than simpler models (using two competing risks or a standard survival analysis setup with no competing risks).
△ Less
Submitted 30 November, 2023; v1 submitted 16 August, 2023;
originally announced August 2023.
-
Intelligent Reflecting Surface Aided Multi-Tier Hybrid Computing
Authors:
Yapeng Zhao,
Qingqing Wu,
Guangji Chen,
Wen Chen,
Ruiqi Liu,
Ming-Min Zhao,
Yuan Wu,
Shaodan Ma
Abstract:
The digital twin edge network (DITEN) aims to integrate mobile edge computing (MEC) and digital twin (DT) to provide real-time system configuration and flexible resource allocation for the sixth-generation network. This paper investigates an intelligent reflecting surface (IRS)-aided multi-tier hybrid computing system that can achieve mutual benefits for DT and MEC in the DITEN. For the first time…
▽ More
The digital twin edge network (DITEN) aims to integrate mobile edge computing (MEC) and digital twin (DT) to provide real-time system configuration and flexible resource allocation for the sixth-generation network. This paper investigates an intelligent reflecting surface (IRS)-aided multi-tier hybrid computing system that can achieve mutual benefits for DT and MEC in the DITEN. For the first time, this paper presents the opportunity to realize the network-wide convergence of DT and MEC. In the considered system, specifically, over-the-air computation (AirComp) is employed to monitor the status of the DT system, while MEC is performed with the assistance of DT to provide low-latency computing services. Besides, the IRS is utilized to enhance signal transmission and mitigate interference among heterogeneous nodes. We propose a framework for designing the hybrid computing system, aiming to maximize the sum computation rate under communication and computation resources constraints. To tackle the non-convex optimization problem, alternative optimization and successive convex approximation techniques are leveraged to decouple variables and then transform the problem into a more tractable form. Simulation results verify the effectiveness of the proposed algorithm and demonstrate the IRS can significantly improve the system performance with appropriate phase shift configurations. Moreover, the results indicate that the DT assisted MEC system can precisely achieve the balance between local computing and task offloading since real-time system status can be obtained with the help of DT.
△ Less
Submitted 25 October, 2023; v1 submitted 18 August, 2023;
originally announced August 2023.
-
Tissue Segmentation of Thick-Slice Fetal Brain MR Scans with Guidance from High-Quality Isotropic Volumes
Authors:
Shijie Huang,
Xukun Zhang,
Zhiming Cui,
He Zhang,
Geng Chen,
Dinggang Shen
Abstract:
Accurate tissue segmentation of thick-slice fetal brain magnetic resonance (MR) scans is crucial for both reconstruction of isotropic brain MR volumes and the quantification of fetal brain development. However, this task is challenging due to the use of thick-slice scans in clinically-acquired fetal brain data. To address this issue, we propose to leverage high-quality isotropic fetal brain MR vol…
▽ More
Accurate tissue segmentation of thick-slice fetal brain magnetic resonance (MR) scans is crucial for both reconstruction of isotropic brain MR volumes and the quantification of fetal brain development. However, this task is challenging due to the use of thick-slice scans in clinically-acquired fetal brain data. To address this issue, we propose to leverage high-quality isotropic fetal brain MR volumes (and also their corresponding annotations) as guidance for segmentation of thick-slice scans. Due to existence of significant domain gap between high-quality isotropic volume (i.e., source data) and thick-slice scans (i.e., target data), we employ a domain adaptation technique to achieve the associated knowledge transfer (from high-quality <source> volumes to thick-slice <target> scans). Specifically, we first register the available high-quality isotropic fetal brain MR volumes across different gestational weeks to construct longitudinally-complete source data. To capture domain-invariant information, we then perform Fourier decomposition to extract image content and style codes. Finally, we propose a novel Cycle-Consistent Domain Adaptation Network (C2DA-Net) to efficiently transfer the knowledge learned from high-quality isotropic volumes for accurate tissue segmentation of thick-slice scans. Our C2DA-Net can fully utilize a small set of annotated isotropic volumes to guide tissue segmentation on unannotated thick-slice scans. Extensive experiments on a large-scale dataset of 372 clinically acquired thick-slice MR scans demonstrate that our C2DA-Net achieves much better performance than cutting-edge methods quantitatively and qualitatively.
△ Less
Submitted 4 December, 2023; v1 submitted 13 August, 2023;
originally announced August 2023.