-
Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems
Authors:
Zheng Fang,
Tao Wang,
Lingchen Zhao,
Shenyi Zhang,
Bowen Li,
Yunjie Ge,
Qi Li,
Chao Shen,
Qian Wang
Abstract:
In recent years, extensive research has been conducted on the vulnerability of ASR systems, revealing that black-box adversarial example attacks pose significant threats to real-world ASR systems. However, most existing black-box attacks rely on queries to the target ASRs, which is impractical when queries are not permitted. In this paper, we propose ZQ-Attack, a transfer-based adversarial attack…
▽ More
In recent years, extensive research has been conducted on the vulnerability of ASR systems, revealing that black-box adversarial example attacks pose significant threats to real-world ASR systems. However, most existing black-box attacks rely on queries to the target ASRs, which is impractical when queries are not permitted. In this paper, we propose ZQ-Attack, a transfer-based adversarial attack on ASR systems in the zero-query black-box setting. Through a comprehensive review and categorization of modern ASR technologies, we first meticulously select surrogate ASRs of diverse types to generate adversarial examples. Following this, ZQ-Attack initializes the adversarial perturbation with a scaled target command audio, rendering it relatively imperceptible while maintaining effectiveness. Subsequently, to achieve high transferability of adversarial perturbations, we propose a sequential ensemble optimization algorithm, which iteratively optimizes the adversarial perturbation on each surrogate model, leveraging collaborative information from other models. We conduct extensive experiments to evaluate ZQ-Attack. In the over-the-line setting, ZQ-Attack achieves a 100% success rate of attack (SRoA) with an average signal-to-noise ratio (SNR) of 21.91dB on 4 online speech recognition services, and attains an average SRoA of 100% and SNR of 19.67dB on 16 open-source ASRs. For commercial intelligent voice control devices, ZQ-Attack also achieves a 100% SRoA with an average SNR of 15.77dB in the over-the-air setting.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI
Authors:
Zi Wang,
Fanwen Wang,
Chen Qin,
Jun Lyu,
Ouyang Cheng,
Shuo Wang,
Yan Li,
Mengyao Yu,
Haoyu Zhang,
Kunyuan Guo,
Zhang Shi,
Qirong Li,
Ziqiang Xu,
Ya**g Zhang,
Hao Li,
Sha Hua,
Binghua Chen,
Longyu Sun,
Mengting Sun,
Qin Li,
Ying-Hua Chu,
Wenjia Bai,
**g Qin,
Xiahai Zhuang,
Claudia Prieto
, et al. (7 additional authors not shown)
Abstract:
Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h…
▽ More
Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover high-quality, clinically interpretable images from undersampled measurements. However, the lack of publicly available cardiac MRI k-space dataset in terms of both quantity and diversity has severely hindered substantial technological progress, particularly for data-driven artificial intelligence. Here, we provide a standardized, diverse, and high-quality CMRxRecon2024 dataset to facilitate the technical development, fair evaluation, and clinical transfer of cardiac MRI reconstruction approaches, towards promoting the universal frameworks that enable fast and robust reconstructions across different cardiac MRI protocols in clinical practice. To the best of our knowledge, the CMRxRecon2024 dataset is the largest and most diverse publicly available cardiac k-space dataset. It is acquired from 330 healthy volunteers, covering commonly used modalities, anatomical views, and acquisition trajectories in clinical cardiac MRI workflows. Besides, an open platform with tutorials, benchmarks, and data processing tools is provided to facilitate data usage, advanced method development, and fair performance evaluation.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Cost-Effective RF Fingerprinting Based on Hybrid CVNN-RF Classifier with Automated Multi-Dimensional Early-Exit Strategy
Authors:
Jiayan Gan,
Zhixing Du,
Qiang Li,
Huaizong Shao,
**gran Lin,
Ye Pan,
Zhongyi Wen,
Shafei Wang
Abstract:
While the Internet of Things (IoT) technology is booming and offers huge opportunities for information exchange, it also faces unprecedented security challenges. As an important complement to the physical layer security technologies for IoT, radio frequency fingerprinting (RFF) is of great interest due to its difficulty in counterfeiting. Recently, many machine learning (ML)-based RFF algorithms h…
▽ More
While the Internet of Things (IoT) technology is booming and offers huge opportunities for information exchange, it also faces unprecedented security challenges. As an important complement to the physical layer security technologies for IoT, radio frequency fingerprinting (RFF) is of great interest due to its difficulty in counterfeiting. Recently, many machine learning (ML)-based RFF algorithms have emerged. In particular, deep learning (DL) has shown great benefits in automatically extracting complex and subtle features from raw data with high classification accuracy. However, DL algorithms face the computational cost problem as the difficulty of the RFF task and the size of the DNN have increased dramatically. To address the above challenge, this paper proposes a novel costeffective early-exit neural network consisting of a complex-valued neural network (CVNN) backbone with multiple random forest branches, called hybrid CVNN-RF. Unlike conventional studies that use a single fixed DL model to process all RF samples, our hybrid CVNN-RF considers differences in the recognition difficulty of RF samples and introduces an early-exit mechanism to dynamically process the samples. When processing "easy" samples that can be well classified with high confidence, the hybrid CVNN-RF can end early at the random forest branch to reduce computational cost. Conversely, subsequent network layers will be activated to ensure accuracy. To further improve the early-exit rate, an automated multi-dimensional early-exit strategy is proposed to achieve scheduling control from multiple dimensions within the network depth and classification category. Finally, our experiments on the public ADS-B dataset show that the proposed algorithm can reduce the computational cost by 83% while improving the accuracy by 1.6% under a classification task with 100 categories.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation
Authors:
Qin Li,
Yizhe Zhang,
Yan Li,
Jun Lyu,
Meng Liu,
Longyu Sun,
Mengting Sun,
Qirong Li,
Wenyue Mao,
Xinran Wu,
Ya**g Zhang,
Yinghua Chu,
Shuo Wang,
Chengyan Wang
Abstract:
The segmentation foundation model, e.g., Segment Anything Model (SAM), has attracted increasing interest in the medical image community. Early pioneering studies primarily concentrated on assessing and improving SAM's performance from the perspectives of overall accuracy and efficiency, yet little attention was given to the fairness considerations. This oversight raises questions about the potenti…
▽ More
The segmentation foundation model, e.g., Segment Anything Model (SAM), has attracted increasing interest in the medical image community. Early pioneering studies primarily concentrated on assessing and improving SAM's performance from the perspectives of overall accuracy and efficiency, yet little attention was given to the fairness considerations. This oversight raises questions about the potential for performance biases that could mirror those found in task-specific deep learning models like nnU-Net. In this paper, we explored the fairness dilemma concerning large segmentation foundation models. We prospectively curate a benchmark dataset of 3D MRI and CT scans of the organs including liver, kidney, spleen, lung and aorta from a total of 1056 healthy subjects with expert segmentations. Crucially, we document demographic details such as gender, age, and body mass index (BMI) for each subject to facilitate a nuanced fairness analysis. We test state-of-the-art foundation models for medical image segmentation, including the original SAM, medical SAM and SAT models, to evaluate segmentation efficacy across different demographic groups and identify disparities. Our comprehensive analysis, which accounts for various confounding factors, reveals significant fairness concerns within these foundational models. Moreover, our findings highlight not only disparities in overall segmentation metrics, such as the Dice Similarity Coefficient but also significant variations in the spatial distribution of segmentation errors, offering empirical evidence of the nuanced challenges in ensuring fairness in medical image segmentation.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
I Still See You: Why Existing IoT Traffic Resha** Fails
Authors:
Su Wang,
Keyang Yu,
Qi Li,
Dong Chen
Abstract:
The Internet traffic data produced by the Internet of Things (IoT) devices are collected by Internet Service Providers (ISPs) and device manufacturers, and often shared with their third parties to maintain and enhance user services. Unfortunately, on-path adversaries could infer and fingerprint users' sensitive privacy information such as occupancy and user activities by analyzing these network tr…
▽ More
The Internet traffic data produced by the Internet of Things (IoT) devices are collected by Internet Service Providers (ISPs) and device manufacturers, and often shared with their third parties to maintain and enhance user services. Unfortunately, on-path adversaries could infer and fingerprint users' sensitive privacy information such as occupancy and user activities by analyzing these network traffic traces. While there's a growing body of literature on defending against this side-channel attack-malicious IoT traffic analytics (TA), there's currently no systematic method to compare and evaluate the comprehensiveness of these existing studies. To address this problem, we design a new low-cost, open-source system framework-IoT Traffic Exposure Monitoring Toolkit (ITEMTK) that enables people to comprehensively examine and validate prior attack models and their defending approaches. In particular, we also design a novel image-based attack capable of inferring sensitive user information, even when users employ the most robust preventative measures in their smart homes. Researchers could leverage our new image-based attack to systematize and understand the existing literature on IoT traffic analysis attacks and preventing studies. Our results show that current defending approaches are not sufficient to protect IoT device user privacy. IoT devices are significantly vulnerable to our new image-based user privacy inference attacks, posing a grave threat to IoT device user privacy. We also highlight potential future improvements to enhance the defending approaches. ITEMTK's flexibility allows other researchers for easy expansion by integrating new TA attack models and prevention methods to benchmark their future work.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
Authors:
Rong Gong,
Hongfei Xue,
Lezhi Wang,
Xin Xu,
Qisheng Li,
Lei Xie,
Hui Bu,
Shaomei Wu,
Jiaming Zhou,
Yong Qin,
Binbin Zhang,
Jun Du,
Jia Bin,
Ming Li
Abstract:
The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the large…
▽ More
The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the largest dataset in its category. Encompassing conversational and voice command reading speech, AS-70 includes verbatim manual transcription, rendering it suitable for various speech-related tasks. Furthermore, baseline systems are established, and experimental results are presented for ASR and stuttering event detection (SED) tasks. By incorporating this dataset into the model fine-tuning, significant improvements in the state-of-the-art ASR models, e.g., Whisper and Hubert, are observed, enhancing their inclusivity in addressing stuttered speech.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Study of Robust Direction Finding Based on Joint Sparse Representation
Authors:
Y. Li,
W. Xiao,
L. Zhao,
Z. Huang,
Q. Li,
L. Li,
R. C. de Lamare
Abstract:
Standard Direction of Arrival (DOA) estimation methods are typically derived based on the Gaussian noise assumption, making them highly sensitive to outliers. Therefore, in the presence of impulsive noise, the performance of these methods may significantly deteriorate. In this paper, we model impulsive noise as Gaussian noise mixed with sparse outliers. By exploiting their statistical differences,…
▽ More
Standard Direction of Arrival (DOA) estimation methods are typically derived based on the Gaussian noise assumption, making them highly sensitive to outliers. Therefore, in the presence of impulsive noise, the performance of these methods may significantly deteriorate. In this paper, we model impulsive noise as Gaussian noise mixed with sparse outliers. By exploiting their statistical differences, we propose a novel DOA estimation method based on sparse signal recovery (SSR). Furthermore, to address the issue of grid mismatch, we utilize an alternating optimization approach that relies on the estimated outlier matrix and the on-grid DOA estimates to obtain the off-grid DOA estimates. Simulation results demonstrate that the proposed method exhibits robustness against large outliers.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Modeling and simulation of a mechanism for suppressing the flip** problem of a jum** robot
Authors:
Qi Li,
Liang Peng,
Zhiyuan Wu,
Pengda Ye,
Weitao Zhang,
Yi Xu,
Qing Shi
Abstract:
In order to solve the problem of stable jum** of micro robot, we design a special mechanism: elastic passive joint (EPJ). EPJ can assist in achieving smooth jum** through the opening-closing process when the robot jumps. First, we introduce the composition and operation principle of EPJ, and perform a dynamic modeling of the robot's jum** process. Then, in order to verify the effectiveness o…
▽ More
In order to solve the problem of stable jum** of micro robot, we design a special mechanism: elastic passive joint (EPJ). EPJ can assist in achieving smooth jum** through the opening-closing process when the robot jumps. First, we introduce the composition and operation principle of EPJ, and perform a dynamic modeling of the robot's jum** process. Then, in order to verify the effectiveness of EPJ in controlling the robot's smooth jump, we design a simulation experiment based on MATLAB. Through comparative experiments, it was proved that EPJ can greatly adjust the angular velocity of the robot and increase the jump distance of the robot. Finally, we analyze each parameter in EPJ and performs parameter optimization. After optimization, EPJ achieves a completely flip-free jump of the robot, laying an important foundation for improving the mobility of micro-robot.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Stacked Intelligent Metasurfaces for Holographic MIMO Aided Cell-Free Networks
Authors:
Qingchao Li,
Mohammed El-Hajjar,
Chao Xu,
Jiancheng An,
Chau Yuen,
Lajos Hanzo
Abstract:
Large-scale multiple-input and multiple-output (MIMO) systems are capable of achieving high date rate. However, given the high hardware cost and excessive power consumption of massive MIMO systems, as a remedy, intelligent metasurfaces have been designed for efficient holographic MIMO (HMIMO) systems. In this paper, we propose a HMIMO architecture based on stacked intelligent metasurfaces (SIM) fo…
▽ More
Large-scale multiple-input and multiple-output (MIMO) systems are capable of achieving high date rate. However, given the high hardware cost and excessive power consumption of massive MIMO systems, as a remedy, intelligent metasurfaces have been designed for efficient holographic MIMO (HMIMO) systems. In this paper, we propose a HMIMO architecture based on stacked intelligent metasurfaces (SIM) for the uplink of cell-free systems, where the SIM is employed at the access points (APs) for improving the spectral- and energy-efficiency. Specifically, we conceive distributed beamforming for SIM-assisted cell-free networks, where both the SIM coefficients and the local receiver combiner vectors of each AP are optimized based on the local channel state information (CSI) for the local detection of each user equipment (UE) information. Afterward, the central processing unit (CPU) fuses the local detections gleaned from all APs to detect the aggregate multi-user signal. Specifically, to design the SIM coefficients and the combining vectors of the APs, a low-complexity layer-by-layer iterative optimization algorithm is proposed for maximizing the equivalent gain of the channel spanning from the UEs to the APs. At the CPU, the weight vector used for combining the local detections from all APs is designed based on the minimum mean square error (MMSE) criterion, where the hardware impairments (HWIs) are also taken into consideration based on their statistics. The simulation results show that the SIM-based HMIMO outperforms the conventional single-layer HMIMO in terms of the achievable rate. We demonstrate that both the HWI of the radio frequency (RF) chains at the APs and the UEs limit the achievable rate in the high signal-to-noise-ratio (SNR) region.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Sensing-Assisted Adaptive Channel Contention for Mobile Delay-Sensitive Communications
Authors:
Bojie Lv,
Qianren Li,
Rui Wang
Abstract:
This paper proposes an adaptive channel contention mechanism to optimize the queuing performance of a distributed millimeter wave (mmWave) uplink system with the capability of environment and mobility sensing. The mobile agents determine their back-off timer parameters according to their local knowledge of the uplink queue lengths, channel quality, and future channel statistics, where the channel…
▽ More
This paper proposes an adaptive channel contention mechanism to optimize the queuing performance of a distributed millimeter wave (mmWave) uplink system with the capability of environment and mobility sensing. The mobile agents determine their back-off timer parameters according to their local knowledge of the uplink queue lengths, channel quality, and future channel statistics, where the channel prediction relies on the environment and mobility sensing. The optimization of queuing performance with this adaptive channel contention mechanism is formulated as a decentralized multi-agent Markov decision process (MDP). Although the channel contention actions are determined locally at the mobile agents, the optimization of local channel contention policies of all mobile agents is conducted in a centralized manner according to the system statistics before the scheduling. In the solution, the local policies are approximated by analytical models, and the optimization of their parameters becomes a stochastic optimization problem along an adaptive Markov chain. An unbiased gradient estimation is proposed so that the local policies can be optimized efficiently via the stochastic gradient descent method. It is demonstrated by simulation that the proposed gradient estimation is significantly more efficient in optimization than the existing methods, e.g., simultaneous perturbation stochastic approximation (SPSA).
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Autonomous Robotic Ultrasound System for Liver Follow-up Diagnosis: Pilot Phantom Study
Authors:
Tianpeng Zhang,
Sekeun Kim,
Jerome Charton,
Haitong Ma,
Kyungsang Kim,
Na Li,
Quanzheng Li
Abstract:
The paper introduces a novel autonomous robot ultrasound (US) system targeting liver follow-up scans for outpatients in local communities. Given a computed tomography (CT) image with specific target regions of interest, the proposed system carries out the autonomous follow-up scan in three steps: (i) initial robot contact to surface, (ii) coordinate map** between CT image and robot, and (iii) ta…
▽ More
The paper introduces a novel autonomous robot ultrasound (US) system targeting liver follow-up scans for outpatients in local communities. Given a computed tomography (CT) image with specific target regions of interest, the proposed system carries out the autonomous follow-up scan in three steps: (i) initial robot contact to surface, (ii) coordinate map** between CT image and robot, and (iii) target US scan. Utilizing 3D US-CT registration and deep learning-based segmentation networks, we can achieve precise imaging of 3D hepatic veins, facilitating accurate coordinate map** between CT and the robot. This enables the automatic localization of follow-up targets within the CT image, allowing the robot to navigate precisely to the target's surface. Evaluation of the ultrasound phantom confirms the quality of the US-CT registration and shows the robot reliably locates the targets in repeated trials. The proposed framework holds the potential to significantly reduce time and costs for healthcare providers, clinicians, and follow-up patients, thereby addressing the increasing healthcare burden associated with chronic disease in local communities.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Revealing Decision Conservativeness Through Inverse Distributionally Robust Optimization
Authors:
Qi Li,
Zhirui Liang,
Andrey Bernstein,
Yury Dvorkin
Abstract:
This paper introduces Inverse Distributionally Robust Optimization (I-DRO) as a method to infer the conservativeness level of a decision-maker, represented by the size of a Wasserstein metric-based ambiguity set, from the optimal decisions made using Forward Distributionally Robust Optimization (F-DRO). By leveraging the Karush-Kuhn-Tucker (KKT) conditions of the convex F-DRO model, we formulate I…
▽ More
This paper introduces Inverse Distributionally Robust Optimization (I-DRO) as a method to infer the conservativeness level of a decision-maker, represented by the size of a Wasserstein metric-based ambiguity set, from the optimal decisions made using Forward Distributionally Robust Optimization (F-DRO). By leveraging the Karush-Kuhn-Tucker (KKT) conditions of the convex F-DRO model, we formulate I-DRO as a bi-linear program, which can be solved using off-the-shelf optimization solvers. Additionally, this formulation exhibits several advantageous properties. We demonstrate that I-DRO not only guarantees the existence and uniqueness of an optimal solution but also establishes the necessary and sufficient conditions for this optimal solution to accurately match the actual conservativeness level in F-DRO. Furthermore, we identify three extreme scenarios that may impact I-DRO effectiveness. Our case study applies F-DRO for power system scheduling under uncertainty and employs I-DRO to recover the conservativeness level of system operators. Numerical experiments based on an IEEE 5-bus system and a realistic NYISO 11-zone system demonstrate I-DRO performance in both normal and extreme scenarios.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
I$^3$Net: Inter-Intra-slice Interpolation Network for Medical Slice Synthesis
Authors:
Haofei Song,
Xintian Mao,
**g Yu,
Qingli Li,
Yan Wang
Abstract:
Medical imaging is limited by acquisition time and scanning equipment. CT and MR volumes, reconstructed with thicker slices, are anisotropic with high in-plane resolution and low through-plane resolution. We reveal an intriguing phenomenon that due to the mentioned nature of data, performing slice-wise interpolation from the axial view can yield greater benefits than performing super-resolution fr…
▽ More
Medical imaging is limited by acquisition time and scanning equipment. CT and MR volumes, reconstructed with thicker slices, are anisotropic with high in-plane resolution and low through-plane resolution. We reveal an intriguing phenomenon that due to the mentioned nature of data, performing slice-wise interpolation from the axial view can yield greater benefits than performing super-resolution from other views. Based on this observation, we propose an Inter-Intra-slice Interpolation Network (I$^3$Net), which fully explores information from high in-plane resolution and compensates for low through-plane resolution. The through-plane branch supplements the limited information contained in low through-plane resolution from high in-plane resolution and enables continual and diverse feature learning. In-plane branch transforms features to the frequency domain and enforces an equal learning opportunity for all frequency bands in a global context learning paradigm. We further propose a cross-view block to take advantage of the information from all three views online. Extensive experiments on two public datasets demonstrate the effectiveness of I$^3$Net, and noticeably outperforms state-of-the-art super-resolution, video frame interpolation and slice interpolation methods by a large margin. We achieve 43.90dB in PSNR, with at least 1.14dB improvement under the upscale factor of $\times$2 on MSD dataset with faster inference. Code is available at https://github.com/DeepMed-Lab-ECNU/Medical-Image-Reconstruction.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Ergodic Spectral Efficiency Analysis of Intelligent Omni-Surface Aided Systems Suffering From Imperfect CSI and Hardware Impairments
Authors:
Qingchao Li,
Mohammed El-Hajjar,
Lajos Hanzo
Abstract:
In contrast to the conventional reconfigurable intelligent surfaces (RIS), intelligent omni-surfaces (IOS) are capable of full-space coverage of smart radio environments by simultaneously transmitting and reflecting the incident signals. In this paper, we investigate the ergodic spectral efficiency of IOS-aided systems for transmission over random channel links, while considering both realistic im…
▽ More
In contrast to the conventional reconfigurable intelligent surfaces (RIS), intelligent omni-surfaces (IOS) are capable of full-space coverage of smart radio environments by simultaneously transmitting and reflecting the incident signals. In this paper, we investigate the ergodic spectral efficiency of IOS-aided systems for transmission over random channel links, while considering both realistic imperfect channel state information (CSI) and transceiver hardware impairments (HWIs). Firstly, we formulate the linear minimum mean square error estimator of the equivalent channel spanning from the user equipments (UEs) to the access point (AP), where the transceiver HWIs are also considered. Then, we apply a two-timescale protocol for designing the beamformer of the IOS-aided system. Specifically, for the active AP beamformer, the minimum mean square error combining method is employed, which relies on the estimated equivalent channels, on the statistical information of the channel estimation error, on the inter-user interference as well as on the HWIs at the AP and UEs. By contrast, the passive IOS beamformer is designed based on the statistical CSI for maximizing the upper bound of the ergodic spectral efficiency. The theoretical analysis and simulation results show that the transceiver HWIs have a significant effect on the ergodic spectral efficiency, especially in the high transmit power region. Furthermore, we show that the HWIs at the AP can be effectively compensated by deploying more AP antennas.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Energy-Efficient Reconfigurable Holographic Surfaces Operating in the Presence of Realistic Hardware Impairments
Authors:
Qingchao Li,
Mohammed El-Hajjar,
Yanshi Sun,
Ibrahim Hemadeh,
Arman Shojaeifard,
Lajos Hanzo
Abstract:
Reconfigurable holographic surfaces (RHSs) constitute a promising technique of supporting energy-efficient communications. In this paper, we formulate the energy efficiency maximization problem of the switch-controlled RHS-aided beamforming architecture by alternately optimizing the holographic beamformer at the RHS, the digital beamformer, the total transmit power and the power sharing ratio of e…
▽ More
Reconfigurable holographic surfaces (RHSs) constitute a promising technique of supporting energy-efficient communications. In this paper, we formulate the energy efficiency maximization problem of the switch-controlled RHS-aided beamforming architecture by alternately optimizing the holographic beamformer at the RHS, the digital beamformer, the total transmit power and the power sharing ratio of each user. Specifically, to deal with this challenging non-convex optimization problem, we decouple it into three sub-problems. Firstly, the coefficients of RHS elements responsible for the holographic beamformer are optimized to maximize the sum of the eigen-channel gains of all users by our proposed low-complexity eigen-decomposition (ED) method. Then, the digital beamformer is designed by the singular value decomposition (SVD) method to support multi-user information transfer. Finally, the total transmit power and the power sharing ratio are alternately optimized, while considering the effect of transceiver hardware impairments (HWI). We theoretically derive the spectral efficiency and energy efficiency performance upper bound for the RHS-based beamforming architectures in the presence of HWIs. Our simulation results show that the switch-controlled RHS-aided beamforming architecture achieves higher energy efficiency than the conventional fully digital beamformer and the hybrid beamformer based on phase shift arrays (PSA). Moreover, considering the effect of HWI in the beamforming design can bring about further energy efficiency enhancements.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Achievable Rate Analysis of Intelligent Omni-Surface Assisted NOMA Holographic MIMO Systems
Authors:
Qingchao Li,
Mohammed El-Hajjar,
Yanshi Sun,
Ibrahim Hemadeh,
Yingming Tsai,
Arman Shojaeifard,
Lajos Hanzo
Abstract:
An intelligent omni-surface (IOS) assisted holographic multiple-input and multiple-output architecture is conceived for $360^\circ$ full-space coverage at a low energy consumption. The theoretical ergodic rate lower bound of our non-orthogonal multiple access (NOMA) scheme is derived based on the moment matching approximation method, while considering the signal distortion at transceivers imposed…
▽ More
An intelligent omni-surface (IOS) assisted holographic multiple-input and multiple-output architecture is conceived for $360^\circ$ full-space coverage at a low energy consumption. The theoretical ergodic rate lower bound of our non-orthogonal multiple access (NOMA) scheme is derived based on the moment matching approximation method, while considering the signal distortion at transceivers imposed by hardware impairments (HWIs). Furthermore, the asymptotically ergodic rate lower bound is derived both for an infinite number of IOS elements and for continuous aperture surfaces. Both the theoretical analysis and the simulation results show that the achievable rate of the NOMA scheme is higher than that of its orthogonal multiple access counterpart. Furthermore, owing to the HWIs at the transceivers, the achievable rate saturates at high signal-to-noise ratio region, instead of reaching its theoretical maximum.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Charting the Path Forward: CT Image Quality Assessment -- An In-Depth Review
Authors:
Siyi Xun,
Qiaoyu Li,
Xiaohong Liu,
Guangtao Zhai,
Mingxiang Wu,
Tao Tan
Abstract:
Computed Tomography (CT) is a frequently utilized imaging technology that is employed in the clinical diagnosis of many disorders. However, clinical diagnosis, data storage, and management are posed huge challenges by a huge volume of non-homogeneous CT data in terms of imaging quality. As a result, the quality assessment of CT images is a crucial problem that demands consideration. The history, a…
▽ More
Computed Tomography (CT) is a frequently utilized imaging technology that is employed in the clinical diagnosis of many disorders. However, clinical diagnosis, data storage, and management are posed huge challenges by a huge volume of non-homogeneous CT data in terms of imaging quality. As a result, the quality assessment of CT images is a crucial problem that demands consideration. The history, advancements in research, and current developments in CT image quality assessment (IQA) are examined in this paper. In this review, we collected and researched more than 500 CT-IQA publications published before August 2023. And we provide the visualization analysis of keywords and co-citations in the knowledge graph of these papers. Prospects and obstacles for the continued development of CT-IQA are also covered. At present, significant research branches in the CT-IQA domain include Phantom study, Artificial intelligence deep-learning reconstruction algorithm, Dose reduction opportunity, and Virtual monoenergetic reconstruction. Artificial intelligence (AI)-based CT-IQA also becomes a trend. It increases the accuracy of the CT scanning apparatus, amplifies the impact of the CT system reconstruction algorithm, and creates an effective algorithm for post-processing CT images. AI-based medical IQA offers excellent application opportunities in clinical work. AI can provide uniform quality assessment criteria and more comprehensive guidance amongst various healthcare facilities, and encourage them to identify one another's images. It will help lower the number of unnecessary tests and associated costs, and enhance the quality of medical imaging and assessment efficiency.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
Mitigating Receiver Impact on Radio Frequency Fingerprint Identification via Domain Adaptation
Authors:
Liu Yang,
Qiang Li,
Xiaoyang Ren,
Yi Fang,
Shafei Wang
Abstract:
Radio Frequency Fingerprint Identification (RFFI), which exploits non-ideal hardware-induced unique distortion resident in the transmit signals to identify an emitter, is emerging as a means to enhance the security of communication systems. Recently, machine learning has achieved great success in develo** state-of-the-art RFFI models. However, few works consider cross-receiver RFFI problems, whe…
▽ More
Radio Frequency Fingerprint Identification (RFFI), which exploits non-ideal hardware-induced unique distortion resident in the transmit signals to identify an emitter, is emerging as a means to enhance the security of communication systems. Recently, machine learning has achieved great success in develo** state-of-the-art RFFI models. However, few works consider cross-receiver RFFI problems, where the RFFI model is trained and deployed on different receivers. Due to altered receiver characteristics, direct deployment of RFFI model on a new receiver leads to significant performance degradation. To address this issue, we formulate the cross-receiver RFFI as a model adaptation problem, which adapts the trained model to unlabeled signals from a new receiver. We first develop a theoretical generalization error bound for the adaptation model. Motivated by the bound, we propose a novel method to solve the cross-receiver RFFI problem, which includes domain alignment and adaptive pseudo-labeling. The former aims at finding a feature space where both domains exhibit similar distributions, effectively reducing the domain discrepancy. Meanwhile, the latter employs a dynamic pseudo-labeling scheme to implicitly transfer the label information from the labeled receiver to the new receiver. Experimental results indicate that the proposed method can effectively mitigate the receiver impact and improve the cross-receiver RFFI performance.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023
Authors:
Jun Lyu,
Chen Qin,
Shuo Wang,
Fanwen Wang,
Yan Li,
Zi Wang,
Kunyuan Guo,
Cheng Ouyang,
Michael Tänzer,
Meng Liu,
Longyu Sun,
Mengting Sun,
Qin Li,
Zhang Shi,
Sha Hua,
Hao Li,
Zhensen Chen,
Zhenlin Zhang,
Bingyu Xin,
Dimitris N. Metaxas,
George Yiasemis,
Jonas Teuwen,
Li** Zhang,
Weitian Chen,
Yidong Zhao
, et al. (25 additional authors not shown)
Abstract:
Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation p…
▽ More
Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation platform hinder the development of data-driven reconstruction algorithms. To address this issue, we organized the Cardiac MRI Reconstruction Challenge (CMRxRecon) in 2023, in collaboration with the 26th International Conference on MICCAI. CMRxRecon presented an extensive k-space dataset comprising cine and map** raw data, accompanied by detailed annotations of cardiac anatomical structures. With overwhelming participation, the challenge attracted more than 285 teams and over 600 participants. Among them, 22 teams successfully submitted Docker containers for the testing phase, with 7 teams submitted for both cine and map** tasks. All teams use deep learning based approaches, indicating that deep learning has predominately become a promising solution for the problem. The first-place winner of both tasks utilizes the E2E-VarNet architecture as backbones. In contrast, U-Net is still the most popular backbone for both multi-coil and single-coil reconstructions. This paper provides a comprehensive overview of the challenge design, presents a summary of the submitted results, reviews the employed methods, and offers an in-depth discussion that aims to inspire future advancements in cardiac MRI reconstruction models. The summary emphasizes the effective strategies observed in Cardiac MRI reconstruction, including backbone architecture, loss function, pre-processing techniques, physical modeling, and model complexity, thereby providing valuable insights for further developments in this field.
△ Less
Submitted 16 April, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
A Distributionally Robust Model Predictive Control for Static and Dynamic Uncertainties in Smart Grids
Authors:
Qi Li,
Ye Shi,
Yuning Jiang,
Yuanming Shi,
Haoyu Wang,
H. Vincent Poor
Abstract:
The integration of various power sources, including renewables and electric vehicles, into smart grids is expanding, introducing uncertainties that can result in issues like voltage imbalances, load fluctuations, and power losses. These challenges negatively impact the reliability and stability of online scheduling in smart grids. Existing research often addresses uncertainties affecting current s…
▽ More
The integration of various power sources, including renewables and electric vehicles, into smart grids is expanding, introducing uncertainties that can result in issues like voltage imbalances, load fluctuations, and power losses. These challenges negatively impact the reliability and stability of online scheduling in smart grids. Existing research often addresses uncertainties affecting current states but overlooks those that impact future states, such as the unpredictable charging patterns of electric vehicles. To distinguish between these, we term them static uncertainties and dynamic uncertainties, respectively. This paper introduces WDR-MPC, a novel approach that stands for two-stage Wasserstein-based Distributionally Robust (WDR) optimization within a Model Predictive Control (MPC) framework, aimed at effectively managing both types of uncertainties in smart grids. The dynamic uncertainties are first reformulated into ambiguity tubes and then the distributionally robust bounds of both dynamic and static uncertainties can be established using WDR optimization. By employing ambiguity tubes and WDR optimization, the stochastic MPC system is converted into a nominal one. Moreover, we develop a convex reformulation method to speed up WDR computation during the two-stage optimization. The distinctive contribution of this paper lies in its holistic approach to both static and dynamic uncertainties in smart grids. Comprehensive experiment results on IEEE 38-bus and 94-bus systems reveal the method's superior performance and the potential to enhance grid stability and reliability.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Innovative Quantitative Analysis for Disease Progression Assessment in Familial Cerebral Cavernous Malformations
Authors:
Ruige Zong,
Tao Wang,
Chunwang Li,
Xinlin Zhang,
Yuanbin Chen,
Longxuan Zhao,
Qixuan Li,
Qinquan Gao,
Dezhi Kang,
Fuxin Lin,
Tong Tong
Abstract:
Familial cerebral cavernous malformation (FCCM) is a hereditary disorder characterized by abnormal vascular structures within the central nervous system. The FCCM lesions are often numerous and intricate, making quantitative analysis of the lesions a labor-intensive task. Consequently, clinicians face challenges in quantitatively assessing the severity of lesions and determining whether lesions ha…
▽ More
Familial cerebral cavernous malformation (FCCM) is a hereditary disorder characterized by abnormal vascular structures within the central nervous system. The FCCM lesions are often numerous and intricate, making quantitative analysis of the lesions a labor-intensive task. Consequently, clinicians face challenges in quantitatively assessing the severity of lesions and determining whether lesions have progressed. To alleviate this problem, we propose a quantitative statistical framework for FCCM, comprising an efficient annotation module, an FCCM lesion segmentation module, and an FCCM lesion quantitative statistics module. Our framework demonstrates precise segmentation of the FCCM lesion based on efficient data annotation, achieving a Dice coefficient of 93.22\%. More importantly, we focus on quantitative statistics of lesions, which is combined with image registration to realize the quantitative comparison of lesions between different examinations of patients, and a visualization framework has been established for doctors to comprehensively compare and analyze lesions. The experimental results have demonstrated that our proposed framework not only obtains objective, accurate, and comprehensive quantitative statistical information, which provides a quantitative assessment method for disease progression and drug efficacy study, but also considerably reduces the manual measurement and statistical workload of lesions, assisting clinical decision-making for FCCM and accelerating progress in FCCM clinical research. This highlights the potential of practical application of the framework in FCCM clinical research and clinical decision-making. The codes are available at https://github.com/6zrg/Quantitative-Statistics-of-FCCM.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
Advancing COVID-19 Detection in 3D CT Scans
Authors:
Qingqiu Li,
Runtian Yuan,
Junlin Hou,
Jilan Xu,
Yuejie Zhang,
Rui Feng,
Hao Chen
Abstract:
To make a more accurate diagnosis of COVID-19, we propose a straightforward yet effective model. Firstly, we analyse the characteristics of 3D CT scans and remove the non-lung parts, facilitating the model to focus on lesion-related areas and reducing computational cost. We use ResNeSt50 as the strong feature extractor, initializing it with pretrained weights which have COVID-19-specific prior kno…
▽ More
To make a more accurate diagnosis of COVID-19, we propose a straightforward yet effective model. Firstly, we analyse the characteristics of 3D CT scans and remove the non-lung parts, facilitating the model to focus on lesion-related areas and reducing computational cost. We use ResNeSt50 as the strong feature extractor, initializing it with pretrained weights which have COVID-19-specific prior knowledge. Our model achieves a Macro F1 Score of 0.94 on the validation set of the 4th COV19D Competition Challenge $\mathrm{I}$, surpassing the baseline by 16%. This indicates its effectiveness in distinguishing between COVID-19 and non-COVID-19 cases, making it a robust method for COVID-19 detection.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Domain Adaptation Using Pseudo Labels for COVID-19 Detection
Authors:
Runtian Yuan,
Qingqiu Li,
Junlin Hou,
Jilan Xu,
Yuejie Zhang,
Rui Feng,
Hao Chen
Abstract:
In response to the need for rapid and accurate COVID-19 diagnosis during the global pandemic, we present a two-stage framework that leverages pseudo labels for domain adaptation to enhance the detection of COVID-19 from CT scans. By utilizing annotated data from one domain and non-annotated data from another, the model overcomes the challenge of data scarcity and variability, common in emergent he…
▽ More
In response to the need for rapid and accurate COVID-19 diagnosis during the global pandemic, we present a two-stage framework that leverages pseudo labels for domain adaptation to enhance the detection of COVID-19 from CT scans. By utilizing annotated data from one domain and non-annotated data from another, the model overcomes the challenge of data scarcity and variability, common in emergent health crises. The innovative approach of generating pseudo labels enables the model to iteratively refine its learning process, thereby improving its accuracy and adaptability across different hospitals and medical centres. Experimental results on COV19-CT-DB database showcase the model's potential to achieve high diagnostic precision, significantly contributing to efficient patient management and alleviating the strain on healthcare systems. Our method achieves 0.92 Macro F1 Score on the validation set of Covid-19 domain adaptation challenge.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Joint Optimization for Achieving Covertness in MIMO Over-the-Air Computation Networks
Authors:
Junteng Yao,
Tuo Wu,
Ming **,
Cunhua Pan,
Quanzhong Li,
**hong Yuan
Abstract:
This paper investigates covert data transmission within a multiple-input multiple-output (MIMO) over-the-air computation (AirComp) network, where sensors transmit data to the access point (AP) while guaranteeing covertness to the warden (Willie). Simultaneously, the AP introduces artificial noise (AN) to confuse Willie, meeting the covert requirement. We address the challenge of minimizing mean-sq…
▽ More
This paper investigates covert data transmission within a multiple-input multiple-output (MIMO) over-the-air computation (AirComp) network, where sensors transmit data to the access point (AP) while guaranteeing covertness to the warden (Willie). Simultaneously, the AP introduces artificial noise (AN) to confuse Willie, meeting the covert requirement. We address the challenge of minimizing mean-square-error (MSE) of the AP, while considering transmit power constraints at both the AP and the sensors, as well as ensuring the covert transmission to Willie with a low detection error probability (DEP). However, obtaining globally optimal solutions for the investigated non-convex problem is challenging due to the interdependence of optimization variables. To tackle this problem, we introduce an exact penalty algorithm and transform the optimization problem into a difference-of-convex (DC) form problem to find a locally optimal solution. Simulation results showcase the superior performance in terms of our proposed scheme in comparison to the benchmark schemes.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Cardiac Magnetic Resonance 2D+T Short- and Long-axis Segmentation via Spatio-temporal SAM Adaptation
Authors:
Zhennong Chen,
Sekeun Kim,
Hui Ren,
Quanzheng Li,
Xiang Li
Abstract:
Accurate 2D+T myocardium segmentation in cine cardiac magnetic resonance (CMR) scans is essential to analyze LV motion throughout the cardiac cycle comprehensively. The Segment Anything Model (SAM), known for its accurate segmentation and zero-shot generalization, has not yet been tailored for CMR 2D+T segmentation. We therefore introduce CMR2D+T-SAM, a novel approach to adapt SAM for CMR 2D+T seg…
▽ More
Accurate 2D+T myocardium segmentation in cine cardiac magnetic resonance (CMR) scans is essential to analyze LV motion throughout the cardiac cycle comprehensively. The Segment Anything Model (SAM), known for its accurate segmentation and zero-shot generalization, has not yet been tailored for CMR 2D+T segmentation. We therefore introduce CMR2D+T-SAM, a novel approach to adapt SAM for CMR 2D+T segmentation using spatio-temporal adaption. This approach also incorporates a U-Net framework for multi-scale feature extraction, as well as text prompts for accurate segmentation on both short-axis (SAX) and long-axis (LAX) views using a single model. CMR2D+T-SAM outperforms existing deep learning methods on the STACOM2011 dataset, achieving a myocardium Dice score of 0.885 and a Hausdorff distance (HD) of 2.900 pixels. It also demonstrates superior zero-shot generalization on the ACDC dataset with a Dice score of 0.840 and a HD of 4.076 pixels.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Conditional Score-Based Diffusion Model for Cortical Thickness Trajectory Prediction
Authors:
Qing Xiao,
Siyeop Yoon,
Hui Ren,
Matthew Tivnan,
Lichao Sun,
Quanzheng Li,
Tianming Liu,
Yu Zhang,
Xiang Li
Abstract:
Alzheimer's Disease (AD) is a neurodegenerative condition characterized by diverse progression rates among individuals, with changes in cortical thickness (CTh) closely linked to its progression. Accurately forecasting CTh trajectories can significantly enhance early diagnosis and intervention strategies, providing timely care. However, the longitudinal data essential for these studies often suffe…
▽ More
Alzheimer's Disease (AD) is a neurodegenerative condition characterized by diverse progression rates among individuals, with changes in cortical thickness (CTh) closely linked to its progression. Accurately forecasting CTh trajectories can significantly enhance early diagnosis and intervention strategies, providing timely care. However, the longitudinal data essential for these studies often suffer from temporal sparsity and incompleteness, presenting substantial challenges in modeling the disease's progression accurately. Existing methods are limited, focusing primarily on datasets without missing entries or requiring predefined assumptions about CTh progression. To overcome these obstacles, we propose a conditional score-based diffusion model specifically designed to generate CTh trajectories with the given baseline information, such as age, sex, and initial diagnosis. Our conditional diffusion model utilizes all available data during the training phase to make predictions based solely on baseline information during inference without needing prior history about CTh progression. The prediction accuracy of the proposed CTh prediction pipeline using a conditional score-based model was compared for sub-groups consisting of cognitively normal, mild cognitive impairment, and AD subjects. The Bland-Altman analysis shows our diffusion-based prediction model has a near-zero bias with narrow 95% confidential interval compared to the ground-truth CTh in 6-36 months. In addition, our conditional diffusion model has a stochastic generative nature, therefore, we demonstrated an uncertainty analysis of patient-specific CTh prediction through multiple realizations.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Implicit Image-to-Image Schrodinger Bridge for CT Super-Resolution and Denoising
Authors:
Yuang Wang,
Siyeop Yoon,
Pengfei **,
Matthew Tivnan,
Zhennong Chen,
Rui Hu,
Li Zhang,
Zhiqiang Chen,
Quanzheng Li,
Dufan Wu
Abstract:
Conditional diffusion models have gained recognition for their effectiveness in image restoration tasks, yet their iterative denoising process, starting from Gaussian noise, often leads to slow inference speeds. As a promising alternative, the Image-to-Image Schrödinger Bridge (I2SB) initializes the generative process from corrupted images and integrates training techniques from conditional diffus…
▽ More
Conditional diffusion models have gained recognition for their effectiveness in image restoration tasks, yet their iterative denoising process, starting from Gaussian noise, often leads to slow inference speeds. As a promising alternative, the Image-to-Image Schrödinger Bridge (I2SB) initializes the generative process from corrupted images and integrates training techniques from conditional diffusion models. In this study, we extended the I2SB method by introducing the Implicit Image-to-Image Schrodinger Bridge (I3SB), transitioning its generative process to a non-Markovian process by incorporating corrupted images in each generative step. This enhancement empowers I3SB to generate images with better texture restoration using a small number of generative steps. The proposed method was validated on CT super-resolution and denoising tasks and outperformed existing methods, including the conditional denoising diffusion probabilistic model (cDDPM) and I2SB, in both visual quality and quantitative metrics. These findings underscore the potential of I3SB in improving medical image restoration by providing fast and accurate generative modeling.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Reinforcement Learning Based Robust Volt/Var Control in Active Distribution Networks With Imprecisely Known Delay
Authors:
Hong Cheng,
Huan Luo,
Zhi Liu,
Wei Sun,
Weitao Li,
Qiyue Li
Abstract:
Active distribution networks (ADNs) incorporating massive photovoltaic (PV) devices encounter challenges of rapid voltage fluctuations and potential violations. Due to the fluctuation and intermittency of PV generation, the state gap, arising from time-inconsistent states and exacerbated by imprecisely known system delays, significantly impacts the accuracy of voltage control. This paper addresses…
▽ More
Active distribution networks (ADNs) incorporating massive photovoltaic (PV) devices encounter challenges of rapid voltage fluctuations and potential violations. Due to the fluctuation and intermittency of PV generation, the state gap, arising from time-inconsistent states and exacerbated by imprecisely known system delays, significantly impacts the accuracy of voltage control. This paper addresses this challenge by introducing a framework for delay adaptive Volt/Var control (VVC) in the presence of imprecisely known system delays to regulate the reactive power of PV inverters. The proposed approach formulates the voltage control, based on predicted system operation states, as a robust VVC problem. It employs sample selection from the state prediction interval to promptly identify the worst-performing system operation state. Furthermore, we leverage the decentralized partially observable Markov decision process (Dec-POMDP) to reformulate the robust VVC problem. We design Multiple Policy Networks and employ Multiple Policy Networks and Reward Sha**-based Multi-agent Twin Delayed Deep Deterministic Policy Gradient (MPNRS-MATD3) algorithm to efficiently address and solve the Dec-POMDP model-based problem. Simulation results show the delay adaption characteristic of our proposed framework, and the MPNRS-MATD3 outperforms other multi-agent reinforcement learning algorithms in robust voltage control.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
On A Class of Greedy Sparse Recovery Algorithms -- A High Dimensional Approach
Authors:
Gang Li,
Qiuwei Li,
Shuang Li,
Wu Angela Li
Abstract:
Sparse signal recovery deals with finding the sparest solution of an under-determined linear system $x = Qs$. In this paper, we propose a novel greedy approach to addressing the challenges from such a problem. Such an approach is based on a characterization of solutions to the system, which allows us to work on the sparse recovery in the $s$-space directly with a given measure. With $l_2$-based me…
▽ More
Sparse signal recovery deals with finding the sparest solution of an under-determined linear system $x = Qs$. In this paper, we propose a novel greedy approach to addressing the challenges from such a problem. Such an approach is based on a characterization of solutions to the system, which allows us to work on the sparse recovery in the $s$-space directly with a given measure. With $l_2$-based measure, two OMP-type algorithms are proposed, which significantly outperform the classical OMP algorithm in terms of recovery accuracy while maintaining comparable computational complexity. An $l_1$-based algorithm, denoted as $\text{Alg}_{GBP}$ (greedy basis pursuit) algorithm, is derived. Such an algorithm significantly outperforms the classical BP algorithm. A CoSaMP-type algorithm is also proposed to further enhance the performance of the two proposed OMP-type algorithms. The superior performance of our proposed algorithms is demonstrated through extensive numerical simulations using synthetic data as well as video signals, highlighting their potential for various applications in compressed sensing and signal processing.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Low Complexity Turbo SIC-MMSE Detection for Orthogonal Time Frequency Space Modulation
Authors:
Qi Li,
**hong Yuan,
Min Qiu,
Shuangyang Li,
Yixuan Xie
Abstract:
Recently, orthogonal time frequency space (OTFS) modulation has garnered considerable attention due to its robustness against doubly-selective wireless channels. In this paper, we propose a low-complexity iterative successive interference cancellation based minimum mean squared error (SIC-MMSE) detection algorithm for zero-padded OTFS (ZP-OTFS) modulation. In the proposed algorithm, signals are de…
▽ More
Recently, orthogonal time frequency space (OTFS) modulation has garnered considerable attention due to its robustness against doubly-selective wireless channels. In this paper, we propose a low-complexity iterative successive interference cancellation based minimum mean squared error (SIC-MMSE) detection algorithm for zero-padded OTFS (ZP-OTFS) modulation. In the proposed algorithm, signals are detected based on layers processed by multiple SIC-MMSE linear filters for each sub-channel, with interference on the targeted signal layer being successively canceled either by hard or soft information. To reduce the complexity of computing individual layer filter coefficients, we also propose a novel filter coefficients recycling approach in place of generating the exact form of MMSE filter weights. Moreover, we design a joint detection and decoding algorithm for ZP-OTFS to enhance error performance. Compared to the conventional SIC-MMSE detection, our proposed algorithms outperform other linear detectors, e.g., maximal ratio combining (MRC), for ZP-OTFS with up to 3 dB gain while maintaining comparable computation complexity.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR
Authors:
Junwen Bai,
Bo Li,
Qiujia Li,
Tara N. Sainath,
Trevor Strohman
Abstract:
The end-to-end ASR model is often desired in the streaming multilingual scenario since it is easier to deploy and can benefit from pre-trained speech models such as powerful foundation models. Meanwhile, the heterogeneous nature and imbalanced data abundance of different languages may cause performance degradation, leading to asynchronous peak performance for different languages during training, e…
▽ More
The end-to-end ASR model is often desired in the streaming multilingual scenario since it is easier to deploy and can benefit from pre-trained speech models such as powerful foundation models. Meanwhile, the heterogeneous nature and imbalanced data abundance of different languages may cause performance degradation, leading to asynchronous peak performance for different languages during training, especially on tail ones. Sometimes even the data itself may become unavailable as a result of the enhanced privacy protection. Existing work tend to significantly increase the model size or learn language-specific decoders to accommodate each language separately. In this study, we explore simple yet effective Language-Dependent Adapter (LDA) finetuning under a cascaded Conformer transducer framework enhanced by teacher pseudo-labeling for tail languages in the streaming multilingual ASR. The adapter only accounts for 0.4% of the full model per language. It is plugged into the frozen foundation model and is the only trainable module during the finetuning process with noisy student training. The final model merges the adapter parameters from different checkpoints for different languages. The model performance is validated on a challenging multilingual dictation dataset, which includes 39 tail languages across Latin, Greek, Arabic, etc. Our proposed method brings 12.2% word error rate reduction on average and up to 37.5% on a single locale. Furthermore, we show that our parameter-efficient LDA can match the quality of the full model finetuning, thus greatly alleviating the asynchronous peak performance issue.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Estimating the time-evolving refractivity of a turbulent medium using optical beam measurements: a data assimilation approach
Authors:
Anjali Nair,
Qin Li,
Samuel N. Stechmann
Abstract:
In applications such as free-space optical communication, a signal is often recovered after propagation through a turbulent medium. In this setting, it is common to assume that limited information is known about the turbulent medium, such as a space- and time-averaged statistic (e.g., root-mean-square), but without information about the state of the spatial variations. It could be helpful to gain…
▽ More
In applications such as free-space optical communication, a signal is often recovered after propagation through a turbulent medium. In this setting, it is common to assume that limited information is known about the turbulent medium, such as a space- and time-averaged statistic (e.g., root-mean-square), but without information about the state of the spatial variations. It could be helpful to gain more information if the state of the turbulent medium can be characterized with the spatial variations and evolution in time described. Here, we propose to investigate the use of data assimilation techniques for this purpose. A computational setting is used with the paraxial wave equation, and the extended Kalman filter is used to conduct data assimilation using intensity measurements. To reduce computational cost, the evolution of the turbulent medium is modeled as a stochastic process. Following some past studies, the process has only a small number of Fourier wavelengths for spatial variations. The results show that the spatial and temporal variations of the medium are recovered accurately in many cases. In some time windows in some cases, the error is larger for the recovery. Finally we discuss the potential use of the spatial variation information for aiding the recovery of the transmitted signal or beam source.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Frame-level emotional state alignment method for speech emotion recognition
Authors:
Qifei Li,
Yingming Gao,
Cong Wang,
Yayue Deng,
**long Xue,
Yichen Han,
Ya Li
Abstract:
Speech emotion recognition (SER) systems aim to recognize human emotional state during human-computer interaction. Most existing SER systems are trained based on utterance-level labels. However, not all frames in an audio have affective states consistent with utterance-level label, which makes it difficult for the model to distinguish the true emotion of the audio and perform poorly. To address th…
▽ More
Speech emotion recognition (SER) systems aim to recognize human emotional state during human-computer interaction. Most existing SER systems are trained based on utterance-level labels. However, not all frames in an audio have affective states consistent with utterance-level label, which makes it difficult for the model to distinguish the true emotion of the audio and perform poorly. To address this problem, we propose a frame-level emotional state alignment method for SER. First, we fine-tune HuBERT model to obtain a SER system with task-adaptive pretraining (TAPT) method, and extract embeddings from its transformer layers to form frame-level pseudo-emotion labels with clustering. Then, the pseudo labels are used to pretrain HuBERT. Hence, the each frame output of HuBERT has corresponding emotional information. Finally, we fine-tune the above pretrained HuBERT for SER by adding an attention layer on the top of it, which can focus only on those frames that are emotionally more consistent with utterance-level label. The experimental results performed on IEMOCAP indicate that our proposed method performs better than state-of-the-art (SOTA) methods.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
SCUNet++: Swin-UNet and CNN Bottleneck Hybrid Architecture with Multi-Fusion Dense Skip Connection for Pulmonary Embolism CT Image Segmentation
Authors:
Yifei Chen,
Binfeng Zou,
Zhaoxin Guo,
Yiyu Huang,
Yifan Huang,
Feiwei Qin,
Qinhai Li,
Changmiao Wang
Abstract:
Pulmonary embolism (PE) is a prevalent lung disease that can lead to right ventricular hypertrophy and failure in severe cases, ranking second in severity only to myocardial infarction and sudden death. Pulmonary artery CT angiography (CTPA) is a widely used diagnostic method for PE. However, PE detection presents challenges in clinical practice due to limitations in imaging technology. CTPA can p…
▽ More
Pulmonary embolism (PE) is a prevalent lung disease that can lead to right ventricular hypertrophy and failure in severe cases, ranking second in severity only to myocardial infarction and sudden death. Pulmonary artery CT angiography (CTPA) is a widely used diagnostic method for PE. However, PE detection presents challenges in clinical practice due to limitations in imaging technology. CTPA can produce noises similar to PE, making confirmation of its presence time-consuming and prone to overdiagnosis. Nevertheless, the traditional segmentation method of PE can not fully consider the hierarchical structure of features, local and global spatial features of PE CT images. In this paper, we propose an automatic PE segmentation method called SCUNet++ (Swin Conv UNet++). This method incorporates multiple fusion dense skip connections between the encoder and decoder, utilizing the Swin Transformer as the encoder. And fuses features of different scales in the decoder subnetwork to compensate for spatial information loss caused by the inevitable downsampling in Swin-UNet or other state-of-the-art methods, effectively solving the above problem. We provide a theoretical analysis of this method in detail and validate it on publicly available PE CT image datasets FUMPE and CAD-PE. The experimental results indicate that our proposed method achieved a Dice similarity coefficient (DSC) of 83.47% and a Hausdorff distance 95th percentile (HD95) of 3.83 on the FUMPE dataset, as well as a DSC of 83.42% and an HD95 of 5.10 on the CAD-PE dataset. These findings demonstrate that our method exhibits strong performance in PE segmentation tasks, potentially enhancing the accuracy of automatic segmentation of PE and providing a powerful diagnostic tool for clinical physicians. Our source code and new FUMPE dataset are available at https://github.com/JustlfC03/SCUNet-plusplus.
△ Less
Submitted 2 January, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response
Authors:
Junfeng Long,
Zirui Wang,
Quanyi Li,
Jiawei Gao,
Liu Cao,
Jiangmiao Pang
Abstract:
Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introdu…
▽ More
Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introduce Hybrid Internal Model (HIM) to estimate them according to the response of the robot. The response, which we refer to as the hybrid internal embedding, contains the robot's explicit velocity and implicit stability representation, corresponding to two primary goals for locomotion tasks: explicitly tracking velocity and implicitly maintaining stability. We use contrastive learning to optimize the embedding to be close to the robot's successor state, in which the response is naturally embedded. HIM has several appealing benefits: It only needs the robot's proprioceptions, i.e., those from joint encoders and IMU as observations. It innovatively maintains consistent observations between simulation reference and reality that avoids information loss in mimicking learning. It exploits batch-level information that is more robust to noises and keeps better sample efficiency. It only requires 1 hour of training on an RTX 4090 to enable a quadruped robot to traverse any terrain under any disturbances. A wealth of real-world experiments demonstrates its agility, even in high-difficulty tasks and cases never occurred during the training process, revealing remarkable open-world generalizability.
△ Less
Submitted 1 January, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Computing Optimal Joint Chance Constrained Control Policies
Authors:
Niklas Schmid,
Marta Fochesato,
Sarah H. Q. Li,
Tobias Sutter,
John Lygeros
Abstract:
We consider the problem of optimally controlling stochastic, Markovian systems subject to joint chance constraints over a finite-time horizon. For such problems, standard Dynamic Programming is inapplicable due to the time correlation of the joint chance constraints, which calls for non-Markovian, and possibly stochastic, policies. Hence, despite the popularity of this problem, solution approaches…
▽ More
We consider the problem of optimally controlling stochastic, Markovian systems subject to joint chance constraints over a finite-time horizon. For such problems, standard Dynamic Programming is inapplicable due to the time correlation of the joint chance constraints, which calls for non-Markovian, and possibly stochastic, policies. Hence, despite the popularity of this problem, solution approaches capable of providing provably-optimal and easy-to-compute policies are still missing. We fill this gap by introducing an augmented binary state to the system dynamics, allowing us to characterize the optimal policies and propose a Dynamic Programming based solution method. Our analysis provides a deep insight into the impact of joint chance constraints on the optimal control policies.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
Privacy-Preserving Distributed Optimisation using Stochastic PDMM
Authors:
Sebastian O. Jordan,
Qiongxiu Li,
Richard Heusdens
Abstract:
Privacy-preserving distributed processing has received considerable attention recently. The main purpose of these algorithms is to solve certain signal processing tasks over a network in a decentralised fashion without revealing private/secret data to the outside world. Because of the iterative nature of these distributed algorithms, computationally complex approaches such as (homomorphic) encrypt…
▽ More
Privacy-preserving distributed processing has received considerable attention recently. The main purpose of these algorithms is to solve certain signal processing tasks over a network in a decentralised fashion without revealing private/secret data to the outside world. Because of the iterative nature of these distributed algorithms, computationally complex approaches such as (homomorphic) encryption are undesired. Recently, an information theoretic method called subspace perturbation has been introduced for synchronous update schemes. The main idea is to exploit a certain structure in the update equations for noise insertion such that the private data is protected without compromising the algorithm's accuracy. This structure, however, is absent in asynchronous update schemes. In this paper we will investigate such asynchronous schemes and derive a lower bound on the noise variance after random initialisation of the algorithm. This bound shows that the privacy level of asynchronous schemes is always better than or at least equal to that of synchronous schemes. Computer simulations are conducted to consolidate our theoretical results.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
An efficient algorithm for multiuser sum-rate maximization of large-scale active RIS-aided MIMO system
Authors:
Qian Zhang,
Mingjie Shao,
Qiang Li,
Ju Liu
Abstract:
Active reconfigurable intelligent surface (RIS) is a new RIS architecture that can reflect and amplify communication signals. It can provide enhanced performance gain compared to the conventional passive RIS systems that can only reflect the signals. On the other hand, the design problem of active RIS-aided systems is more challenging than the passive RIS-aided systems and its efficient algorithms…
▽ More
Active reconfigurable intelligent surface (RIS) is a new RIS architecture that can reflect and amplify communication signals. It can provide enhanced performance gain compared to the conventional passive RIS systems that can only reflect the signals. On the other hand, the design problem of active RIS-aided systems is more challenging than the passive RIS-aided systems and its efficient algorithms are less studied. In this paper, we consider the sum rate maximization problem in the multiuser massive multiple-input single-output (MISO) downlink with the aid of a large-scale active RIS. Existing approaches for handling this problem usually resort to general optimization solvers and can be computationally prohibitive. We propose an efficient block successive upper bound minimization (BSUM) method, of which each step has a (semi) closed-form update. Thus, the proposed algorithm has an attractive low per-iteration complexity. By simulation, our proposed algorithm consumes much less computation than the existing approaches. In particular, when the MIMO and/or RIS sizes are large, our proposed algorithm can be orders-of-magnitude faster than existing approaches.
△ Less
Submitted 11 January, 2024; v1 submitted 13 December, 2023;
originally announced December 2023.
-
Holistic Evaluation of GPT-4V for Biomedical Imaging
Authors:
Zhengliang Liu,
Hanqi Jiang,
Tianyang Zhong,
Zihao Wu,
Chong Ma,
Yiwei Li,
Xiaowei Yu,
Yutong Zhang,
Yi Pan,
Peng Shu,
Yanjun Lyu,
Lu Zhang,
Junjie Yao,
Peixin Dong,
Chao Cao,
Zhenxiang Xiao,
Jiaqi Wang,
Huan Zhao,
Shaochen Xu,
Yaonai Wei,
**gyuan Chen,
Haixing Dai,
Peilong Wang,
Hao He,
Zewei Wang
, et al. (25 additional authors not shown)
Abstract:
In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor…
▽ More
In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more. Tasks include modality recognition, anatomy localization, disease diagnosis, report generation, and lesion detection. The extensive experiments provide insights into GPT-4V's strengths and weaknesses. Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization. GPT-4V excels at diagnostic report generation, indicating strong image captioning skills. While promising for biomedical imaging AI, GPT-4V requires further enhancement and validation before clinical deployment. We emphasize responsible development and testing for trustworthy integration of biomedical AGI. This rigorous evaluation of GPT-4V on diverse medical images advances understanding of multimodal large language models (LLMs) and guides future work toward impactful healthcare applications.
△ Less
Submitted 10 November, 2023;
originally announced December 2023.
-
Micro Energy-Water-Hydrogen Nexus: Data-driven Real-time Optimal Operation
Authors:
Mostafa Goodarzi,
Qifeng Li
Abstract:
This paper extends a new concept of energy-water-hydrogen (EWH) nexus, which was recently developed as a solution for reducing carbon emissions from the generation side of power systems, to the distribution side. Under the concept of distribution-level EWH (micro EWH) nexus, renewable energy sources (RES) are utilized to meet the energy needs of a small community. To avoid the uncertainty caused b…
▽ More
This paper extends a new concept of energy-water-hydrogen (EWH) nexus, which was recently developed as a solution for reducing carbon emissions from the generation side of power systems, to the distribution side. Under the concept of distribution-level EWH (micro EWH) nexus, renewable energy sources (RES) are utilized to meet the energy needs of a small community. To avoid the uncertainty caused by RESs, this paper aims to investigate the real-time optimal operation of the micro EWH nexus which is however a challenging optimization problem. First, such a large-scale mixed-integer nonlinear programming problem is relaxed into a mixed-integer convex program (MICP) by leveraging the effective convex-hull relaxation technique. Second, a fast data-driven solution method based on active constraint and integer variable prediction is presented, which can solve the MICP problem very fast since it utilizes historical optimization data to quickly predict binary variable values and a limited set of active constraints.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Economic Viability of the Energy-Water-Hydrogen Nexus for Power System Decarbonization
Authors:
Mostafa Goodarzi,
Qifeng Li
Abstract:
This paper aims to evaluate the economic viability of the energy-water-hydrogen (EWH) nexus as a new solution for reducing carbon emissions from power systems. The urgency around climate change emphasizes the pressing need to mitigate carbon emissions, especially from the electricity sector, which accounts for a significant portion of total emissions in the US. In response, incorporating more rene…
▽ More
This paper aims to evaluate the economic viability of the energy-water-hydrogen (EWH) nexus as a new solution for reducing carbon emissions from power systems. The urgency around climate change emphasizes the pressing need to mitigate carbon emissions, especially from the electricity sector, which accounts for a significant portion of total emissions in the US. In response, incorporating more renewable energy sources (RESs) and green hydrogen, created through water electrolysis and RES, stands out as a crucial strategy to combat climate challenges. We delve into various aspects of the EWH nexus, including carbon emissions from different power plants, capturing these emissions, and potential options for their reuse or storage. This paper involves modeling different sections of the EWH nexus and conducting an economic analysis across scenarios in power plants to determine optimal water supply methods, suitable chemical products for carbon reuse, and an appropriate carbon emission penalty to encourage emission reduction through the EWH nexus. The results indicate that reusing captured carbon emissions emerges as the most beneficial option across all power plant types. This finding underscores the potential of carbon reuse as a pivotal strategy within the EWH nexus framework for addressing carbon emissions.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Deep Learning Assisted Multiuser MIMO Load Modulated Systems for Enhanced Downlink mmWave Communications
Authors:
Ercong Yu,
**le Zhu,
Qiang Li,
Zilong Liu,
Hongyang Chen,
Shlomo Shamai,
H. Vincent Poor
Abstract:
This paper is focused on multiuser load modulation arrays (MU-LMAs) which are attractive due to their low system complexity and reduced cost for millimeter wave (mmWave) multi-input multi-output (MIMO) systems. The existing precoding algorithm for downlink MU-LMA relies on a sub-array structured (SAS) transmitter which may suffer from decreased degrees of freedom and complex system configuration.…
▽ More
This paper is focused on multiuser load modulation arrays (MU-LMAs) which are attractive due to their low system complexity and reduced cost for millimeter wave (mmWave) multi-input multi-output (MIMO) systems. The existing precoding algorithm for downlink MU-LMA relies on a sub-array structured (SAS) transmitter which may suffer from decreased degrees of freedom and complex system configuration. Furthermore, a conventional LMA codebook with codewords uniformly distributed on a hypersphere may not be channel-adaptive and may lead to increased signal detection complexity. In this paper, we conceive an MU-LMA system employing a full-array structured (FAS) transmitter and propose two algorithms accordingly. The proposed FAS-based system addresses the SAS structural problems and can support larger numbers of users. For LMA-imposed constant-power downlink precoding, we propose an FAS-based normalized block diagonalization (FAS-NBD) algorithm. However, the forced normalization may result in performance degradation. This degradation, together with the aforementioned codebook design problems, is difficult to solve analytically. This motivates us to propose a Deep Learning-enhanced (FAS-DL-NBD) algorithm for adaptive codebook design and codebook-independent decoding. It is shown that the proposed algorithms are robust to imperfect knowledge of channel state information and yield excellent error performance. Moreover, the FAS-DL-NBD algorithm enables signal detection with low complexity as the number of bits per codeword increases.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Long-term Dependency for 3D Reconstruction of Freehand Ultrasound Without External Tracker
Authors:
Qi Li,
Ziyi Shen,
Qian Li,
Dean C. Barratt,
Thomas Dowrick,
Matthew J. Clarkson,
Tom Vercauteren,
Yipeng Hu
Abstract:
Objective: Reconstructing freehand ultrasound in 3D without any external tracker has been a long-standing challenge in ultrasound-assisted procedures. We aim to define new ways of parameterising long-term dependencies, and evaluate the performance. Methods: First, long-term dependency is encoded by transformation positions within a frame sequence. This is achieved by combining a sequence model wit…
▽ More
Objective: Reconstructing freehand ultrasound in 3D without any external tracker has been a long-standing challenge in ultrasound-assisted procedures. We aim to define new ways of parameterising long-term dependencies, and evaluate the performance. Methods: First, long-term dependency is encoded by transformation positions within a frame sequence. This is achieved by combining a sequence model with a multi-transformation prediction. Second, two dependency factors are proposed, anatomical image content and scanning protocol, for contributing towards accurate reconstruction. Each factor is quantified experimentally by reducing respective training variances. Results: 1) The added long-term dependency up to 400 frames at 20 frames per second (fps) indeed improved reconstruction, with an up to 82.4% lowered accumulated error, compared with the baseline performance. The improvement was found to be dependent on sequence length, transformation interval and scanning protocol and, unexpectedly, not on the use of recurrent networks with long-short term modules; 2) Decreasing either anatomical or protocol variance in training led to poorer reconstruction accuracy. Interestingly, greater performance was gained from representative protocol patterns, than from representative anatomical features. Conclusion: The proposed algorithm uses hyperparameter tuning to effectively utilise long-term dependency. The proposed dependency factors are of practical significance in collecting diverse training data, regulating scanning protocols and develo** efficient networks. Significance: The proposed new methodology with publicly available volunteer data and code for parametersing the long-term dependency, experimentally shown to be valid sources of performance improvement, which could potentially lead to better model development and practical optimisation of the reconstruction application.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
One-Bit Channel Estimation for IRS-aided Millimeter-Wave Massive MU-MISO System
Authors:
Silei Wang,
Qiang Li,
**gran Lin
Abstract:
Recently, intelligent reflecting surface (IRS)-assisted communication has gained considerable attention due to its advantage in extending the coverage and compensating the path loss with low-cost passive metasurface. This paper considers the uplink channel estimation for IRS-aided multiuser massive MISO communications with one-bit ADCs at the base station (BS). The use of one-bit ADC is impelled b…
▽ More
Recently, intelligent reflecting surface (IRS)-assisted communication has gained considerable attention due to its advantage in extending the coverage and compensating the path loss with low-cost passive metasurface. This paper considers the uplink channel estimation for IRS-aided multiuser massive MISO communications with one-bit ADCs at the base station (BS). The use of one-bit ADC is impelled by the low-cost and power efficient implementation of massive antennas techniques. However, the passiveness of IRS and the lack of signal level information after one-bit quantization make the IRS channel estimation challenging. To tackle this problem, we exploit the structured sparsity of the user-IRS-BS cascaded channels and develop three channel estimators, each of which utilizes the structured sparsity at different levels. Specifically, the first estimator exploits the elementwise sparsity of the cascaded channel and employs the sparse Bayesian learning (SBL) to infer the channel responses via the type-II maximum likelihood (ML) estimation. However, due to the one-bit quantization, the type-II ML in general is intractable. As such, a variational expectation-maximization (EM) algorithm is custom-derived to iteratively compute an ML solution. The second estimator utilizes the common row-structured sparsity induced by the IRS-to-BS channel shared among the users, and develops another type-II ML solution via the block SBL (BSBL) and the variational EM. To further improve the performance of BSBL, a third two-stage estimator is proposed, which can utilize both the common row-structured sparsity and the column-structured sparsity arising from the limited scattering around the users. Simulation results show that the more diverse structured sparsity is exploited, the better estimation performance is achieved, and that the proposed estimators are superior to state-of-the-art one-bit estimators.
△ Less
Submitted 29 September, 2023;
originally announced October 2023.
-
MediViSTA-SAM: Zero-shot Medical Video Analysis with Spatio-temporal SAM Adaptation for Echocardiography
Authors:
Sekeun Kim,
Kyungsang Kim,
Jiang Hu,
Cheng Chen,
Zhiliang Lyu,
Ren Hui,
Sunghwan Kim,
Zhengliang Liu,
Aoxiao Zhong,
Xiang Li,
Tianming Liu,
Quanzheng Li
Abstract:
The Segmentation Anything Model (SAM) has gained significant attention for its robust generalization capabilities across diverse downstream tasks. However, the performance of SAM is noticeably diminished in medical images due to the substantial disparity between natural and medical image domain. In this paper, we present a zero-shot generalization model specifically designed for echocardiography a…
▽ More
The Segmentation Anything Model (SAM) has gained significant attention for its robust generalization capabilities across diverse downstream tasks. However, the performance of SAM is noticeably diminished in medical images due to the substantial disparity between natural and medical image domain. In this paper, we present a zero-shot generalization model specifically designed for echocardiography analysis, called MediViSTA-SAM. Our key components include (i) the introduction of frame-level self-attention, which leverages cross-frame attention across each frame and its neighboring frames to guarantee consistent segmentation outcomes, and (ii) we utilize CNN backbone for feature embedding for the subsequent Transformer for efficient fine-tuning while kee** most of the SAM's parameter reusable. Experiments were conducted using zero-shot segmentation on multi-vendor in-house echocardiography datasets, indicating evaluation without prior exposure to the in-house dataset during training. MediViSTA-SAM effectively overcomes SAM's limitations and can be deployed across various hospital settings without the necessity of re-training models on their respective datasets. Our code is open sourced at: \url{https://github.com/kimsekeun/MediViSTA-SAM}
△ Less
Submitted 6 April, 2024; v1 submitted 23 September, 2023;
originally announced September 2023.
-
Massive End-to-end Models for Short Search Queries
Authors:
Weiran Wang,
Rohit Prabhavalkar,
Dongseong Hwang,
Qiujia Li,
Khe Chai Sim,
Bo Li,
James Qin,
Xingyu Cai,
Adam Stooke,
Zhong Meng,
CJ Zheng,
Yanzhang He,
Tara Sainath,
Pedro Moreno Mengibar
Abstract:
In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model parameters. The encoders of our models use the neural architecture of Google's universal speech model (USM), with additional funnel pooling layers to signifi…
▽ More
In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model parameters. The encoders of our models use the neural architecture of Google's universal speech model (USM), with additional funnel pooling layers to significantly reduce the frame rate and speed up training and inference. We perform extensive studies on vocabulary size, time reduction strategy, and its generalization performance on long-form test sets. Despite the speculation that, as the model size increases, CTC can be as good as RNN-T which builds label dependency into the prediction, we observe that a 900M RNN-T clearly outperforms a 1.8B CTC and is more tolerant to severe time reduction, although the WER gap can be largely removed by LM shallow fusion.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation
Authors:
Zhichao Wu,
Qiulin Li,
Sixing Liu,
Qun Yang
Abstract:
In the Text-to-speech(TTS) task, the latent diffusion model has excellent fidelity and generalization, but its expensive resource consumption and slow inference speed have always been a challenging. This paper proposes Discrete Diffusion Model with Contrastive Learning for Text-to-Speech Generation(DCTTS). The following contributions are made by DCTTS: 1) The TTS diffusion model based on discrete…
▽ More
In the Text-to-speech(TTS) task, the latent diffusion model has excellent fidelity and generalization, but its expensive resource consumption and slow inference speed have always been a challenging. This paper proposes Discrete Diffusion Model with Contrastive Learning for Text-to-Speech Generation(DCTTS). The following contributions are made by DCTTS: 1) The TTS diffusion model based on discrete space significantly lowers the computational consumption of the diffusion model and improves sampling speed; 2) The contrastive learning method based on discrete space is used to enhance the alignment connection between speech and text and improve sampling quality; and 3) It uses an efficient text encoder to simplify the model's parameters and increase computational efficiency. The experimental results demonstrate that the approach proposed in this paper has outstanding speech synthesis quality and sampling speed while significantly reducing the resource consumption of diffusion model. The synthesized samples are available at https://github.com/lawtherWu/DCTTS.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Integrated Robotics Networks with Co-optimization of Drone Placement and Air-Ground Communications
Authors:
Menghao Hu,
Tong Zhang,
Shuai Wang,
Guoliang Li,
Yingyang Chen,
Qiang Li,
Gaojie Chen
Abstract:
Terrestrial robots, i.e., unmanned ground vehicles (UGVs), and aerial robots, i.e., unmanned aerial vehicles (UAVs), operate in separate spaces. To exploit their complementary features (e.g., fields of views, communication links, computing capabilities), a promising paradigm termed integrated robotics network emerges, which provides communications for cooperative UAVs-UGVs applications. However, h…
▽ More
Terrestrial robots, i.e., unmanned ground vehicles (UGVs), and aerial robots, i.e., unmanned aerial vehicles (UAVs), operate in separate spaces. To exploit their complementary features (e.g., fields of views, communication links, computing capabilities), a promising paradigm termed integrated robotics network emerges, which provides communications for cooperative UAVs-UGVs applications. However, how to efficiently deploy UAVs and schedule the UAVs-UGVs connections according to different UGV tasks become challenging. In this paper, we propose a sum-rate maximization problem, where UGVs plan their trajectories autonomously and are dynamically associated with UAVs according to their planned trajectories. Although the problem is a NP-hard mixed integer program, a fast polynomial time algorithm using alternating gradient descent and penalty-based binary relaxation, is devised. Simulation results demonstrate the effectiveness of the proposed algorithm.
△ Less
Submitted 3 December, 2023; v1 submitted 9 September, 2023;
originally announced September 2023.
-
Variational Tracking and Redetection for Closely-spaced Objects in Heavy Clutter: Supplementary Materials
Authors:
Runze Gan,
Qing Li,
Simon Godsill
Abstract:
The non-homogeneous Poisson process (NHPP) is a widely used measurement model that allows for an object to generate multiple measurements over time. However, it can be difficult to efficiently and reliably track multiple objects under this NHPP model in scenarios with a high density of closely-spaced objects and heavy clutter. Therefore, based on the general coordinate ascent variational filtering…
▽ More
The non-homogeneous Poisson process (NHPP) is a widely used measurement model that allows for an object to generate multiple measurements over time. However, it can be difficult to efficiently and reliably track multiple objects under this NHPP model in scenarios with a high density of closely-spaced objects and heavy clutter. Therefore, based on the general coordinate ascent variational filtering framework, this paper presents a variational Bayes association-based NHPP tracker (VB-AbNHPP) that can efficiently perform tracking, data association, and learning of target and clutter rates with a parallelisable implementation. In addition, a variational localisation strategy is proposed, which enables rapid rediscovery of missed targets from a large surveillance area under extremely heavy clutter. This strategy is integrated into the VB-AbNHPP tracker, resulting in a robust methodology that can automatically detect and recover from track loss. This tracker demonstrates improved tracking performance compared with existing trackers in challenging scenarios, in terms of both accuracy and efficiency.
△ Less
Submitted 23 April, 2024; v1 submitted 4 September, 2023;
originally announced September 2023.
-
Consensus-based Distributed Variational Multi-object Tracker in Multi-Sensor Network
Authors:
Qing Li,
Runze Gan,
Simon Godsill
Abstract:
The growing need for accurate and reliable tracking systems has driven significant progress in sensor fusion and object tracking techniques. In this paper, we design two variational Bayesian trackers that effectively track multiple targets in cluttered environments within a sensor network. We first present a centralised sensor fusion scheme, which involves transmitting sensor data to a fusion cent…
▽ More
The growing need for accurate and reliable tracking systems has driven significant progress in sensor fusion and object tracking techniques. In this paper, we design two variational Bayesian trackers that effectively track multiple targets in cluttered environments within a sensor network. We first present a centralised sensor fusion scheme, which involves transmitting sensor data to a fusion center. Then, we develop a distributed version leveraging the average consensus algorithm, which is theoretically equivalent to the centralised sensor fusion tracker and requires only local message passing with neighbouring sensors. In addition, we empirically verify that our proposed distributed variational tracker performs on par with the centralised version with equal tracking accuracy. Simulation results show that our distributed multi-target tracker outperforms the suboptimal distributed sensor fusion strategy that fuses each sensor's posterior based on arithmetic sensor fusion and an average consensus strategy.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.