-
SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR
Authors:
Shuaishuai Ye,
Shunfei Chen,
Xinhui Hu,
Xinkang Xu
Abstract:
In this work, we propose a Switch-Conformer-based MoE system named SC-MoE for unified streaming and non-streaming code-switching (CS) automatic speech recognition (ASR), where we design a streaming MoE layer consisting of three language experts, which correspond to Mandarin, English, and blank, respectively, and equipped with a language identification (LID) network with a Connectionist Temporal Cl…
▽ More
In this work, we propose a Switch-Conformer-based MoE system named SC-MoE for unified streaming and non-streaming code-switching (CS) automatic speech recognition (ASR), where we design a streaming MoE layer consisting of three language experts, which correspond to Mandarin, English, and blank, respectively, and equipped with a language identification (LID) network with a Connectionist Temporal Classification (CTC) loss as a router in the encoder of SC-MoE to achieve a real-time streaming CS ASR system. To further utilize the language information embedded in text, we also incorporate MoE layers into the decoder of SC-MoE. In addition, we introduce routers into every MoE layer of the encoder and the decoder and achieve better recognition performance. Experimental results show that the SC-MoE significantly improves CS ASR performances over baseline with comparable computational efficiency.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge
Authors:
Hongwei Bran Li,
Fernando Navarro,
Ivan Ezhov,
Amirhossein Bayat,
Dhritiman Das,
Florian Kofler,
Suprosanna Shit,
Diana Waldmannstetter,
Johannes C. Paetzold,
Xiaobin Hu,
Benedikt Wiestler,
Lucas Zimmer,
Tamaz Amiranashvili,
Chinmay Prabhakar,
Christoph Berger,
Jonas Weidner,
Michelle Alonso-Basant,
Arif Rashid,
Ujjwal Baid,
Wesam Adel,
Deniz Ali,
Bhakti Baheti,
Yingbin Bai,
Ishaan Bhatt,
Sabri Can Cetindag
, et al. (55 additional authors not shown)
Abstract:
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de…
▽ More
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks.
△ Less
Submitted 24 June, 2024; v1 submitted 19 March, 2024;
originally announced May 2024.
-
A New Solution for MU-MISO Symbol-Level Precoding: Extrapolation and Deep Unfolding
Authors:
Mu Liang,
Ang Li,
Xiaoyan Hu,
Christos Masouros
Abstract:
Constructive interference (CI) precoding, which converts the harmful multi-user interference into beneficial signals, is a promising and efficient interference management scheme in multi-antenna communication systems. However, CI-based symbol-level precoding (SLP) experiences high computational complexity as the number of symbol slots increases within a transmission block, rendering it unaffordabl…
▽ More
Constructive interference (CI) precoding, which converts the harmful multi-user interference into beneficial signals, is a promising and efficient interference management scheme in multi-antenna communication systems. However, CI-based symbol-level precoding (SLP) experiences high computational complexity as the number of symbol slots increases within a transmission block, rendering it unaffordable in practical communication systems. In this paper, we propose a symbol-level extrapolation (SLE) strategy to extrapolate the precoding matrix by leveraging the relationship between different symbol slots within in a transmission block, during which the channel state information (CSI) remains constant, where we design a closed-form iterative algorithm based on SLE for both PSK and QAM modulation. In order to further reduce the computational complexity, a sub-optimal closed-form solution based on SLE is further developed for PSK and QAM, respectively. Moreover, we design an unsupervised SLE-based neural network (SLE-Net) to unfold the proposed iterative algorithm, which helps enhance the interpretability of the neural network. By carefully designing the loss function of the SLE-Net, the time-complexity of the network can be reduced effectively. Extensive simulation results illustrate that the proposed algorithms can dramatically reduce the computational complexity and time complexity with only marginal performance loss, compared with the conventional SLP design methods.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
FairLENS: Assessing Fairness in Law Enforcement Speech Recognition
Authors:
Yicheng Wang,
Mark Cusick,
Mohamed Laila,
Kate Puech,
Zheng** Ji,
Xia Hu,
Michael Wilson,
Noah Spitzer-Williams,
Bryan Wheeler,
Yasser Ibrahim
Abstract:
Automatic speech recognition (ASR) techniques have become powerful tools, enhancing efficiency in law enforcement scenarios. To ensure fairness for demographic groups in different acoustic environments, ASR engines must be tested across a variety of speakers in realistic settings. However, describing the fairness discrepancies between models with confidence remains a challenge. Meanwhile, most pub…
▽ More
Automatic speech recognition (ASR) techniques have become powerful tools, enhancing efficiency in law enforcement scenarios. To ensure fairness for demographic groups in different acoustic environments, ASR engines must be tested across a variety of speakers in realistic settings. However, describing the fairness discrepancies between models with confidence remains a challenge. Meanwhile, most public ASR datasets are insufficient to perform a satisfying fairness evaluation. To address the limitations, we built FairLENS - a systematic fairness evaluation framework. We propose a novel and adaptable evaluation method to examine the fairness disparity between different models. We also collected a fairness evaluation dataset covering multiple scenarios and demographic dimensions. Leveraging this framework, we conducted fairness assessments on 1 open-source and 11 commercially available state-of-the-art ASR models. Our results reveal that certain models exhibit more biases than others, serving as a fairness guideline for users to make informed choices when selecting ASR models for a given real-world scenario. We further explored model biases towards specific demographic groups and observed that shifts in the acoustic domain can lead to the emergence of new biases.
△ Less
Submitted 28 May, 2024; v1 submitted 21 May, 2024;
originally announced May 2024.
-
The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge
Authors:
**gguang Tian,
Shuaishuai Ye,
Shunfei Chen,
Yang Xiang,
Zhaohui Yin,
Xinhui Hu,
Xinkang Xu
Abstract:
This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58\% compared to the official baseline on t…
▽ More
This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58\% compared to the official baseline on the development set. For speech recognition, we utilize self-supervised learning representations to train end-to-end ASR models. By integrating these models, we achieve a character error rate (CER) of 16.93\% on the track 1 evaluation set, and a concatenated minimum permutation character error rate (cpCER) of 25.88\% on the track 2 evaluation set.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning
Authors:
Xiao Hu,
Tianshu Wang,
Min Gong,
Shaoshi Yang
Abstract:
Guidance commands of flight vehicles are a series of data sets with fixed time intervals, thus guidance design constitutes a sequential decision problem and satisfies the basic conditions for using deep reinforcement learning (DRL). In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on DRL and the pursuit flight vehicle (PFV) generates g…
▽ More
Guidance commands of flight vehicles are a series of data sets with fixed time intervals, thus guidance design constitutes a sequential decision problem and satisfies the basic conditions for using deep reinforcement learning (DRL). In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on DRL and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated, where the time instant when the optimal solution can be attained is uncertain and the optimum solution depends on all the intermediate guidance commands generated before. For solving this problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Simulation results demonstrate that the proposed guidance design method based on the PPO algorithm is capable of achieving a residual velocity of 67.24 m/s, higher than the residual velocities achieved by the benchmark soft actor-critic and deep deterministic policy gradient algorithms. Furthermore, the proposed ES-enhanced PPO algorithm outperforms the PPO algorithm by 2.7\%, achieving a residual velocity of 69.04 m/s.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
POPDG: Popular 3D Dance Generation with PopDanceSet
Authors:
Zhenye Luo,
Min Ren,
Xuecai Hu,
Yongzhen Huang,
Li Yao
Abstract:
Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. M…
▽ More
Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. Moreover, the proposed POPDG model within the iDDPM framework enhances dance diversity and, through the Space Augmentation Algorithm, strengthens spatial physical connections between human body joints, ensuring that increased diversity does not compromise generation quality. A streamlined Alignment Module is also designed to improve the temporal alignment between dance and music. Extensive experiments show that POPDG achieves SOTA results on two datasets. Furthermore, the paper also expands on current evaluation metrics. The dataset and code are available at https://github.com/Luke-Luo1/POPDG.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
SiamQuality: A ConvNet-Based Foundation Model for Imperfect Physiological Signals
Authors:
Cheng Ding,
Zhicheng Guo,
Zhaoliang Chen,
Randall J Lee,
Cynthia Rudin,
Xiao Hu
Abstract:
Foundation models, especially those using transformers as backbones, have gained significant popularity, particularly in language and language-vision tasks. However, large foundation models are typically trained on high-quality data, which poses a significant challenge, given the prevalence of poor-quality real-world data. This challenge is more pronounced for develo** foundation models for phys…
▽ More
Foundation models, especially those using transformers as backbones, have gained significant popularity, particularly in language and language-vision tasks. However, large foundation models are typically trained on high-quality data, which poses a significant challenge, given the prevalence of poor-quality real-world data. This challenge is more pronounced for develo** foundation models for physiological data; such data are often noisy, incomplete, or inconsistent. The present work aims to provide a toolset for develo** foundation models on physiological data. We leverage a large dataset of photoplethysmography (PPG) signals from hospitalized intensive care patients. For this data, we propose SimQuality, a novel self-supervised learning task based on convolutional neural networks (CNNs) as the backbone to enforce representations to be similar for good and poor quality signals that are from similar physiological states. We pre-trained the SimQuality on over 36 million 30-second PPG pairs and then fine-tuned and tested on six downstream tasks using external datasets. The results demonstrate the superiority of the proposed approach on all the downstream tasks, which are extremely important for heart monitoring on wearable devices. Our method indicates that CNNs can be an effective backbone for foundation models that are robust to training data quality.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
SQUWA: Signal Quality Aware DNN Architecture for Enhanced Accuracy in Atrial Fibrillation Detection from Noisy PPG Signals
Authors:
Runze Yan,
Cheng Ding,
Ran Xiao,
Aleksandr Fedorov,
Randall J Lee,
Fadi Nahab,
Xiao Hu
Abstract:
Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambu…
▽ More
Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambulatory settings. Conventional approaches typically discard corrupted segments or attempt to reconstruct original signals, allowing for the use of standard machine learning techniques. However, this reduces dataset size and introduces biases, compromising prediction accuracy and the effectiveness of continuous monitoring. We propose a novel deep learning model, Signal Quality Weighted Fusion of Attentional Convolution and Recurrent Neural Network (SQUWA), designed to learn how to retain accurate predictions from partially corrupted PPG. Specifically, SQUWA innovatively integrates an attention mechanism that directly considers signal quality during the learning process, dynamically adjusting the weights of time series segments based on their quality. This approach enhances the influence of higher-quality segments while reducing that of lower-quality ones, effectively utilizing partially corrupted segments. This approach represents a departure from the conventional methods that exclude such segments, enabling the utilization of a broader range of data, which has great implications for less disruption when monitoring of AF risks and more accurate estimation of AF burdens. Our extensive experiments show that SQUWA outperform existing PPG-based models, achieving the highest AUCPR of 0.89 with label noise mitigation. This also exceeds the 0.86 AUCPR of models trained with using both electrocardiogram (ECG) and PPG data.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
P-Count: Persistence-based Counting of White Matter Hyperintensities in Brain MRI
Authors:
Xiaoling Hu,
Annabel Sorby-Adams,
Frederik Barkhof,
W Taylor Kimberly,
Oula Puonti,
Juan Eugenio Iglesias
Abstract:
White matter hyperintensities (WMH) are a hallmark of cerebrovascular disease and multiple sclerosis. Automated WMH segmentation methods enable quantitative analysis via estimation of total lesion load, spatial distribution of lesions, and number of lesions (i.e., number of connected components after thresholding), all of which are correlated with patient outcomes. While the two former measures ca…
▽ More
White matter hyperintensities (WMH) are a hallmark of cerebrovascular disease and multiple sclerosis. Automated WMH segmentation methods enable quantitative analysis via estimation of total lesion load, spatial distribution of lesions, and number of lesions (i.e., number of connected components after thresholding), all of which are correlated with patient outcomes. While the two former measures can generally be estimated robustly, the number of lesions is highly sensitive to noise and segmentation mistakes -- even when small connected components are eroded or disregarded. In this article, we present P-Count, an algebraic WMH counting tool based on persistent homology that accounts for the topological features of WM lesions in a robust manner. Using computational geometry, P-Count takes the persistence of connected components into consideration, effectively filtering out the noisy WMH positives, resulting in a more accurate count of true lesions. We validated P-Count on the ISBI2015 longitudinal lesion segmentation dataset, where it produces significantly more accurate results than direct thresholding.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Maximizing Energy Charging for UAV-assisted MEC Systems with SWIPT
Authors:
Xiaoyan Hu,
Pengle Wen,
Han Xiao,
Wenjie Wang,
Kai-Kit Wong
Abstract:
A Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) scheme with simultaneous wireless information and power transfer (SWIPT) is proposed in this paper. Unlike existing MEC-WPT schemes that disregard the downlink period for returning computing results to the ground equipment (GEs), our proposed scheme actively considers and capitalizes on this period. By leveraging the SWIPT techni…
▽ More
A Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) scheme with simultaneous wireless information and power transfer (SWIPT) is proposed in this paper. Unlike existing MEC-WPT schemes that disregard the downlink period for returning computing results to the ground equipment (GEs), our proposed scheme actively considers and capitalizes on this period. By leveraging the SWIPT technique, the UAV can simultaneously transmit energy and the computing results during the downlink period. In this scheme, our objective is to maximize the remaining energy among all GEs by jointly optimizing computing task scheduling, UAV transmit and receive beamforming, BS receive beamforming, GEs' transmit power and power splitting ratio for information decoding, time scheduling, and UAV trajectory. We propose an alternating optimization algorithm that utilizes the semidefinite relaxation (SDR), singular value decomposition (SVD), and fractional programming (FP) methods to effectively solve the nonconvex problem. Numerous experiments validate the effectiveness of the proposed scheme.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Simulation-based Analysis of a Novel Loop-based Road Topology for Autonomous Vehicles
Authors:
Stefan Ramdhan,
Winnie Trandinh,
Sathurshan Arulmohan,
Xiayong Hu,
Spencer Deevy,
Victor Bandur,
Vera Pantelic,
Mark Lawford,
Alan Wassyng
Abstract:
The challenges in implementing SAE Level 4/5 autonomous vehicles are manifold, with intersection navigation being a pervasive one. We analyze a novel road topology invented by a co-author of this paper, Xiayong Hu. The topology eliminates the need for traditional traffic control and cross-traffic at intersections, potentially improving the safety of autonomous driving systems. The topology, herein…
▽ More
The challenges in implementing SAE Level 4/5 autonomous vehicles are manifold, with intersection navigation being a pervasive one. We analyze a novel road topology invented by a co-author of this paper, Xiayong Hu. The topology eliminates the need for traditional traffic control and cross-traffic at intersections, potentially improving the safety of autonomous driving systems. The topology, herein called the Zonal Road Topology, consists of unidirectional loops of road with traffic flowing either clockwise or counter-clockwise. Adjacent loops are directionally aligned with one another, allowing vehicles to transfer from one loop to another through a simple lane change. To evaluate the Zonal Road Topology, a one km2 pilot-track near Changshu, China is currently being set aside for testing. In parallel, traffic simulations are being performed. To this end, we conduct a simulation-based comparison between the Zonal Road Topology and a traditional road topology for a generic Electric Vehicle (EV) using the Simulation for Urban MObility (SUMO) platform and MATLAB/Simulink. We analyze the topologies in terms of their travel efficiency, safety, energy usage, and capacity. Drive time, number of halts, progress rate, and other metrics are analyzed across varied traffic levels to investigate the advantages and disadvantages of the Zonal Road Topology. Our results indicate that vehicles on the Zonal Road Topology have a lower, more consistent drive time with greater traffic throughput, while using less energy on average. These results become more prominent at higher traffic densities.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion
Authors:
Samuel Pegg,
Kai Li,
Xiaolin Hu
Abstract:
Audio-visual speech separation has gained significant traction in recent years due to its potential applications in various fields such as speech recognition, diarization, scene analysis and assistive technologies. Designing a lightweight audio-visual speech separation network is important for low-latency applications, but existing methods often require higher computational costs and more paramete…
▽ More
Audio-visual speech separation has gained significant traction in recent years due to its potential applications in various fields such as speech recognition, diarization, scene analysis and assistive technologies. Designing a lightweight audio-visual speech separation network is important for low-latency applications, but existing methods often require higher computational costs and more parameters to achieve better separation performance. In this paper, we present an audio-visual speech separation model called Top-Down-Fusion Net (TDFNet), a state-of-the-art (SOTA) model for audio-visual speech separation, which builds upon the architecture of TDANet, an audio-only speech separation method. TDANet serves as the architectural foundation for the auditory and visual networks within TDFNet, offering an efficient model with fewer parameters. On the LRS2-2Mix dataset, TDFNet achieves a performance increase of up to 10\% across all performance metrics compared with the previous SOTA method CTCNet. Remarkably, these results are achieved using fewer parameters and only 28\% of the multiply-accumulate operations (MACs) of CTCNet. In essence, our method presents a highly effective and efficient solution to the challenges of speech separation within the audio-visual domain, making significant strides in harnessing visual information optimally.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Bias-Compensated State of Charge and State of Health Joint Estimation for Lithium Iron Phosphate Batteries
Authors:
Baozhao Yi,
Xinhao Du,
Jiawei Zhang,
Xiaogang Wu,
Qiuhao Hu,
Weiran Jiang,
Xiaosong Hu,
Ziyou Song
Abstract:
Accurate estimation of the state of charge (SOC) and state of health (SOH) is crucial for the safe and reliable operation of batteries. Voltage measurement bias highly affects state estimation accuracy, especially in Lithium Iron Phosphate (LFP) batteries, which are susceptible due to their flat open-circuit voltage (OCV) curves. This work introduces a bias-compensated algorithm to reliably estima…
▽ More
Accurate estimation of the state of charge (SOC) and state of health (SOH) is crucial for the safe and reliable operation of batteries. Voltage measurement bias highly affects state estimation accuracy, especially in Lithium Iron Phosphate (LFP) batteries, which are susceptible due to their flat open-circuit voltage (OCV) curves. This work introduces a bias-compensated algorithm to reliably estimate the SOC and SOH of LFP batteries under the influence of voltage measurement bias. Specifically, SOC and SOH are estimated using the Dual Extended Kalman Filter (DEKF) in the high-slope SOC range, where voltage measurement bias effects are weak. Besides, the voltage measurement biases estimated in the low-slope SOC regions are compensated in the following joint estimation of SOC and SOH to enhance the state estimation accuracy further. Experimental results indicate that the proposed algorithm significantly outperforms the traditional method, which does not consider biases under different temperatures and aging conditions. Additionally, the bias-compensated algorithm can achieve low estimation errors of below 1.5% for SOC and 2% for SOH, even with a 30mV voltage measurement bias. Finally, even if the voltage measurement biases change in operation, the proposed algorithm can remain robust and keep the estimated errors of states around 2%.
△ Less
Submitted 12 March, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Energy-Efficient STAR-RIS Enhanced UAV-Enabled MEC Networks with Bi-Directional Task Offloading
Authors:
Han Xiao,
Xiaoyan Hu,
Weile Zhang,
Wenjie Wang,
Kai-Kit Wong,
Kun Yang
Abstract:
This paper introduces a novel multi-user mobile edge computing (MEC) scheme facilitated by the simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) and the unmanned aerial vehicle (UAV). Unlike existing MEC approaches, the proposed scheme enables bidirectional offloading, allowing users to concurrently offload tasks to the MEC servers located at the ground base…
▽ More
This paper introduces a novel multi-user mobile edge computing (MEC) scheme facilitated by the simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) and the unmanned aerial vehicle (UAV). Unlike existing MEC approaches, the proposed scheme enables bidirectional offloading, allowing users to concurrently offload tasks to the MEC servers located at the ground base station (BS) and UAV with STAR-RIS support. Specifically, we formulate an optimization problem aiming at maximizing the energy efficiency of the system while ensuring the quality of service (QoS) constraints by jointly optimizing the resource allocation, user scheduling, passive beamforming of the STAR-RIS, and the UAV trajectory. A block coordinate descent (BCD) iterative algorithm designed with the Dinkelbach's algorithm and the successive convex approximation (SCA) technique is proposed to effectively handle the formulated non-convex optimization problem with significant coupling among variables. Simulation results indicate that the proposed STAR-RIS enhanced UAV-enabled MEC scheme possesses significant advantages in enhancing the system energy efficiency over other baseline schemes including the conventional RIS-aided scheme.
△ Less
Submitted 9 June, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
SAR Despeckling via Regional Denoising Diffusion Probabilistic Model
Authors:
Xuran Hu,
Ziqiang Xu,
Zhihan Chen,
Zhengpeng Feng,
Mingzhe Zhu,
LJubisa Stankovic
Abstract:
Speckle noise poses a significant challenge in maintaining the quality of synthetic aperture radar (SAR) images, so SAR despeckling techniques have drawn increasing attention. Despite the tremendous advancements of deep learning in fixed-scale SAR image despeckling, these methods still struggle to deal with large-scale SAR images. To address this problem, this paper introduces a novel despeckling…
▽ More
Speckle noise poses a significant challenge in maintaining the quality of synthetic aperture radar (SAR) images, so SAR despeckling techniques have drawn increasing attention. Despite the tremendous advancements of deep learning in fixed-scale SAR image despeckling, these methods still struggle to deal with large-scale SAR images. To address this problem, this paper introduces a novel despeckling approach termed Region Denoising Diffusion Probabilistic Model (R-DDPM) based on generative models. R-DDPM enables versatile despeckling of SAR images across various scales, accomplished within a single training session. Moreover, The artifacts in the fused SAR images can be avoided effectively with the utilization of region-guided inverse sampling. Experiments of our proposed R-DDPM on Sentinel-1 data demonstrates superior performance to existing methods.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Computational Spectral Imaging with Unified Encoding Model: A Comparative Study and Beyond
Authors:
Xinyuan Liu,
Lizhi Wang,
Lingen Li,
Chang Chen,
Xue Hu,
Fenglong Song,
Youliang Yan
Abstract:
Computational spectral imaging is drawing increasing attention owing to the snapshot advantage, and amplitude, phase, and wavelength encoding systems are three types of representative implementations. Fairly comparing and understanding the performance of these systems is essential, but challenging due to the heterogeneity in encoding design. To overcome this limitation, we propose the unified enco…
▽ More
Computational spectral imaging is drawing increasing attention owing to the snapshot advantage, and amplitude, phase, and wavelength encoding systems are three types of representative implementations. Fairly comparing and understanding the performance of these systems is essential, but challenging due to the heterogeneity in encoding design. To overcome this limitation, we propose the unified encoding model (UEM) that covers all physical systems using the three encoding types. Specifically, the UEM comprises physical amplitude, physical phase, and physical wavelength encoding models that can be combined with a digital decoding model in a joint encoder-decoder optimization framework to compare the three systems under a unified experimental setup fairly. Furthermore, we extend the UEMs to ideal versions, namely, ideal amplitude, ideal phase, and ideal wavelength encoding models, which are free from physical constraints, to explore the full potential of the three types of computational spectral imaging systems. Finally, we conduct a holistic comparison of the three types of computational spectral imaging systems and provide valuable insights for designing and exploiting these systems in the future.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Learning Exhaustive Correlation for Spectral Super-Resolution: Where Spatial-Spectral Attention Meets Linear Dependence
Authors:
Hongyuan Wang,
Lizhi Wang,
Jiang Xu,
Chang Chen,
Xue Hu,
Fenglong Song,
Youliang Yan
Abstract:
Spectral super-resolution that aims to recover hyperspectral image (HSI) from easily obtainable RGB image has drawn increasing interest in the field of computational photography. The crucial aspect of spectral super-resolution lies in exploiting the correlation within HSIs. However, two types of bottlenecks in existing Transformers limit performance improvement and practical applications. First, e…
▽ More
Spectral super-resolution that aims to recover hyperspectral image (HSI) from easily obtainable RGB image has drawn increasing interest in the field of computational photography. The crucial aspect of spectral super-resolution lies in exploiting the correlation within HSIs. However, two types of bottlenecks in existing Transformers limit performance improvement and practical applications. First, existing Transformers often separately emphasize either spatial-wise or spectral-wise correlation, disrupting the 3D features of HSI and hindering the exploitation of unified spatial-spectral correlation. Second, existing self-attention mechanism always establishes full-rank correlation matrix by learning the correlation between pairs of tokens, leading to its inability to describe linear dependence widely existing in HSI among multiple tokens. To address these issues, we propose a novel Exhaustive Correlation Transformer (ECT) for spectral super-resolution. First, we propose a Spectral-wise Discontinuous 3D (SD3D) splitting strategy, which models unified spatial-spectral correlation by integrating spatial-wise continuous splitting strategy and spectral-wise discontinuous splitting strategy. Second, we propose a Dynamic Low-Rank Map** (DLRM) model, which captures linear dependence among multiple tokens through a dynamically calculated low-rank dependence map. By integrating unified spatial-spectral attention and linear dependence, our ECT can model exhaustive correlation within HSI. The experimental results on both simulated and real data indicate that our method achieves state-of-the-art performance. Codes and pretrained models will be available later.
△ Less
Submitted 18 March, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
A Deep Representation Learning-based Speech Enhancement Method Using Complex Convolution Recurrent Variational Autoencoder
Authors:
Yang Xiang,
**gguang Tian,
Xinhui Hu,
Xinkang Xu,
ZhaoHui Yin
Abstract:
Generally, the performance of deep neural networks (DNNs) heavily depends on the quality of data representation learning. Our preliminary work has emphasized the significance of deep representation learning (DRL) in the context of speech enhancement (SE) applications. Specifically, our initial SE algorithm employed a gated recurrent unit variational autoencoder (VAE) with a Gaussian distribution t…
▽ More
Generally, the performance of deep neural networks (DNNs) heavily depends on the quality of data representation learning. Our preliminary work has emphasized the significance of deep representation learning (DRL) in the context of speech enhancement (SE) applications. Specifically, our initial SE algorithm employed a gated recurrent unit variational autoencoder (VAE) with a Gaussian distribution to enhance the performance of certain existing SE systems. Building upon our preliminary framework, this paper introduces a novel approach for SE using deep complex convolutional recurrent networks with a VAE (DCCRN-VAE). DCCRN-VAE assumes that the latent variables of signals follow complex Gaussian distributions that are modeled by DCCRN, as these distributions can better capture the behaviors of complex signals. Additionally, we propose the application of a residual loss in DCCRN-VAE to further improve the quality of the enhanced speech. {Compared to our preliminary work, DCCRN-VAE introduces a more sophisticated DCCRN structure and probability distribution for DRL. Furthermore, in comparison to DCCRN, DCCRN-VAE employs a more advanced DRL strategy. The experimental results demonstrate that the proposed SE algorithm outperforms both our preliminary SE framework and the state-of-the-art DCCRN SE method in terms of scale-invariant signal-to-distortion ratio, speech quality, and speech intelligibility.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Holistic Evaluation of GPT-4V for Biomedical Imaging
Authors:
Zhengliang Liu,
Hanqi Jiang,
Tianyang Zhong,
Zihao Wu,
Chong Ma,
Yiwei Li,
Xiaowei Yu,
Yutong Zhang,
Yi Pan,
Peng Shu,
Yanjun Lyu,
Lu Zhang,
Junjie Yao,
Peixin Dong,
Chao Cao,
Zhenxiang Xiao,
Jiaqi Wang,
Huan Zhao,
Shaochen Xu,
Yaonai Wei,
**gyuan Chen,
Haixing Dai,
Peilong Wang,
Hao He,
Zewei Wang
, et al. (25 additional authors not shown)
Abstract:
In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor…
▽ More
In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more. Tasks include modality recognition, anatomy localization, disease diagnosis, report generation, and lesion detection. The extensive experiments provide insights into GPT-4V's strengths and weaknesses. Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization. GPT-4V excels at diagnostic report generation, indicating strong image captioning skills. While promising for biomedical imaging AI, GPT-4V requires further enhancement and validation before clinical deployment. We emphasize responsible development and testing for trustworthy integration of biomedical AGI. This rigorous evaluation of GPT-4V on diverse medical images advances understanding of multimodal large language models (LLMs) and guides future work toward impactful healthcare applications.
△ Less
Submitted 10 November, 2023;
originally announced December 2023.
-
Reconsideration on evaluation of machine learning models in continuous monitoring using wearables
Authors:
Cheng Ding,
Zhicheng Guo,
Cynthia Rudin,
Ran Xiao,
Fadi B Nahab,
Xiao Hu
Abstract:
This paper explores the challenges in evaluating machine learning (ML) models for continuous health monitoring using wearable devices beyond conventional metrics. We state the complexities posed by real-world variability, disease dynamics, user-specific characteristics, and the prevalence of false notifications, necessitating novel evaluation strategies. Drawing insights from large-scale heart stu…
▽ More
This paper explores the challenges in evaluating machine learning (ML) models for continuous health monitoring using wearable devices beyond conventional metrics. We state the complexities posed by real-world variability, disease dynamics, user-specific characteristics, and the prevalence of false notifications, necessitating novel evaluation strategies. Drawing insights from large-scale heart studies, the paper offers a comprehensive guideline for robust ML model evaluation on continuous health monitoring.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
TopoSemiSeg: Enforcing Topological Consistency for Semi-Supervised Segmentation of Histopathology Images
Authors:
Meilong Xu,
Xiaoling Hu,
Saumya Gupta,
Shahira Abousamra,
Chao Chen
Abstract:
In computational pathology, segmenting densely distributed objects like glands and nuclei is crucial for downstream analysis. To alleviate the burden of obtaining pixel-wise annotations, semi-supervised learning methods learn from large amounts of unlabeled data. Nevertheless, existing semi-supervised methods overlook the topological information hidden in the unlabeled images and are thus prone to…
▽ More
In computational pathology, segmenting densely distributed objects like glands and nuclei is crucial for downstream analysis. To alleviate the burden of obtaining pixel-wise annotations, semi-supervised learning methods learn from large amounts of unlabeled data. Nevertheless, existing semi-supervised methods overlook the topological information hidden in the unlabeled images and are thus prone to topological errors, e.g., missing or incorrectly merged/separated glands or nuclei. To address this issue, we propose TopoSemiSeg, the first semi-supervised method that learns the topological representation from unlabeled data. In particular, we propose a topology-aware teacher-student approach in which the teacher and student networks learn shared topological representations. To achieve this, we introduce topological consistency loss, which contains signal consistency and noise removal losses to ensure the learned representation is robust and focuses on true topological signals. Extensive experiments on public pathology image datasets show the superiority of our method, especially on topology-wise evaluation metrics. Code is available at https://github.com/Melon-Xu/TopoSemiSeg.
△ Less
Submitted 6 December, 2023; v1 submitted 27 November, 2023;
originally announced November 2023.
-
LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement
Authors:
Zili Qi,
Xinhui Hu,
Wang** Zhou,
Sheng Li,
Hao Wu,
Jian Lu,
Xinkang Xu
Abstract:
Recently, researchers have shown an increasing interest in automatically predicting the subjective evaluation for speech synthesis systems. This prediction is a challenging task, especially on the out-of-domain test set. In this paper, we proposed a novel fusion model for MOS prediction that combines supervised and unsupervised approaches. In the supervised aspect, we developed an SSL-based predic…
▽ More
Recently, researchers have shown an increasing interest in automatically predicting the subjective evaluation for speech synthesis systems. This prediction is a challenging task, especially on the out-of-domain test set. In this paper, we proposed a novel fusion model for MOS prediction that combines supervised and unsupervised approaches. In the supervised aspect, we developed an SSL-based predictor called LE-SSL-MOS. The LE-SSL-MOS utilizes pre-trained self-supervised learning models and further improves prediction accuracy by utilizing the opinion scores of each utterance in the listener enhancement branch. In the unsupervised aspect, two steps are contained: we fine-tuned the unit language model (ULM) using highly intelligible domain data to improve the correlation of an unsupervised metric - SpeechLMScore. Another is that we utilized ASR confidence as a new metric with the help of ensemble learning. To our knowledge, this is the first architecture that fuses supervised and unsupervised methods for MOS prediction. With these approaches, our experimental results on the VoiceMOS Challenge 2023 show that LE-SSL-MOS performs better than the baseline. Our fusion system achieved an absolute improvement of 13% over LE-SSL-MOS on the noisy and enhanced speech track. Our system ranked 1st and 2nd, respectively, in the French speech synthesis track and the challenge's noisy and enhanced speech track.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
TTMFN: Two-stream Transformer-based Multimodal Fusion Network for Survival Prediction
Authors:
Ruiquan Ge,
Xiangyang Hu,
Rungen Huang,
Gangyong Jia,
Yaqi Wang,
Renshu Gu,
Changmiao Wang,
Elazab Ahmed,
Linyan Wang,
Juan Ye,
Ye Li
Abstract:
Survival prediction plays a crucial role in assisting clinicians with the development of cancer treatment protocols. Recent evidence shows that multimodal data can help in the diagnosis of cancer disease and improve survival prediction. Currently, deep learning-based approaches have experienced increasing success in survival prediction by integrating pathological images and gene expression data. H…
▽ More
Survival prediction plays a crucial role in assisting clinicians with the development of cancer treatment protocols. Recent evidence shows that multimodal data can help in the diagnosis of cancer disease and improve survival prediction. Currently, deep learning-based approaches have experienced increasing success in survival prediction by integrating pathological images and gene expression data. However, most existing approaches overlook the intra-modality latent information and the complex inter-modality correlations. Furthermore, existing modalities do not fully exploit the immense representational capabilities of neural networks for feature aggregation and disregard the importance of relationships between features. Therefore, it is highly recommended to address these issues in order to enhance the prediction performance by proposing a novel deep learning-based method. We propose a novel framework named Two-stream Transformer-based Multimodal Fusion Network for survival prediction (TTMFN), which integrates pathological images and gene expression data. In TTMFN, we present a two-stream multimodal co-attention transformer module to take full advantage of the complex relationships between different modalities and the potential connections within the modalities. Additionally, we develop a multi-head attention pooling approach to effectively aggregate the feature representations of the two modalities. The experiment results on four datasets from The Cancer Genome Atlas demonstrate that TTMFN can achieve the best performance or competitive results compared to the state-of-the-art methods in predicting the overall survival of patients.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
C-Silicon-based metasurfaces for aperture-robust spectrometer/imaging with angle integration
Authors:
Weizhu Xu,
Qingbin Fan,
Peicheng Lin,
Jiarong Wang,
Hao Hu,
Tao Yue,
Xuemei Hu,
Ting Xu
Abstract:
Compared with conventional grating-based spectrometers, reconstructive spectrometers based on spectrally engineered filtering have the advantage of miniaturization because of the less demand for dispersive optics and free propagation space. However, available reconstructive spectrometers fail to balance the performance on operational bandwidth, spectral diversity and angular stability. In this wor…
▽ More
Compared with conventional grating-based spectrometers, reconstructive spectrometers based on spectrally engineered filtering have the advantage of miniaturization because of the less demand for dispersive optics and free propagation space. However, available reconstructive spectrometers fail to balance the performance on operational bandwidth, spectral diversity and angular stability. In this work, we proposed a compact silicon metasurfaces based spectrometer/camera. After angle integration, the spectral response of the system is robust to angle/aperture within a wide working bandwidth from 400nm to 800nm. It is experimentally demonstrated that the proposed method could maintain the spectral consistency from F/1.8 to F/4 (The corresponding angle of incident light ranges from 7° to 16°) and the incident hyperspectral signal could be accurately reconstructed with a fidelity exceeding 99%. Additionally, a spectral imaging system with 400x400 pixels is also established in this work. The accurate reconstructed hyperspectral image indicates that the proposed aperture-robust spectrometer has the potential to be extended as a high-resolution broadband hyperspectral camera.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
FetusMapV2: Enhanced Fetal Pose Estimation in 3D Ultrasound
Authors:
Chaoyu Chen,
Xin Yang,
Yuhao Huang,
Wenlong Shi,
Yan Cao,
Mingyuan Luo,
Xindi Hu,
Lei Zhue,
Lequan Yu,
Kejuan Yue,
Yuanji Zhang,
Yi Xiong,
Dong Ni,
Weijun Huang
Abstract:
Fetal pose estimation in 3D ultrasound (US) involves identifying a set of associated fetal anatomical landmarks. Its primary objective is to provide comprehensive information about the fetus through landmark connections, thus benefiting various critical applications, such as biometric measurements, plane localization, and fetal movement monitoring. However, accurately estimating the 3D fetal pose…
▽ More
Fetal pose estimation in 3D ultrasound (US) involves identifying a set of associated fetal anatomical landmarks. Its primary objective is to provide comprehensive information about the fetus through landmark connections, thus benefiting various critical applications, such as biometric measurements, plane localization, and fetal movement monitoring. However, accurately estimating the 3D fetal pose in US volume has several challenges, including poor image quality, limited GPU memory for tackling high dimensional data, symmetrical or ambiguous anatomical structures, and considerable variations in fetal poses. In this study, we propose a novel 3D fetal pose estimation framework (called FetusMapV2) to overcome the above challenges. Our contribution is three-fold. First, we propose a heuristic scheme that explores the complementary network structure-unconstrained and activation-unreserved GPU memory management approaches, which can enlarge the input image resolution for better results under limited GPU memory. Second, we design a novel Pair Loss to mitigate confusion caused by symmetrical and similar anatomical structures. It separates the hidden classification task from the landmark localization task and thus progressively eases model learning. Last, we propose a shape priors-based self-supervised learning by selecting the relatively stable landmarks to refine the pose online. Extensive experiments and diverse applications on a large-scale fetal US dataset including 1000 volumes with 22 landmarks per volume demonstrate that our method outperforms other strong competitors.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Controllability of networked multiagent systems based on linearized Turing's model
Authors:
Tianhao Li,
Ruichang Zhang,
Zhixin Liu,
Zhuo Zou,
Xiaoming Hu
Abstract:
Turing's model has been widely used to explain how simple, uniform structures can give rise to complex, patterned structures during the development of organisms. However, it is very hard to establish rigorous theoretical results for the dynamic evolution behavior of Turing's model since it is described by nonlinear partial differential equations. We focus on controllability of Turing's model by li…
▽ More
Turing's model has been widely used to explain how simple, uniform structures can give rise to complex, patterned structures during the development of organisms. However, it is very hard to establish rigorous theoretical results for the dynamic evolution behavior of Turing's model since it is described by nonlinear partial differential equations. We focus on controllability of Turing's model by linearization and spatial discretization. This linearized model is a networked system whose agents are second order linear systems and these agents interact with each other by Laplacian dynamics on a graph. A control signal can be added to agents of choice. Under mild conditions on the parameters of the linearized Turing's model, we prove the equivalence between controllability of the linearized Turing's model and controllability of a Laplace dynamic system with agents of first order dynamics. When the graph is a grid graph or a cylinder grid graph, we then give precisely the minimal number of control nodes and a corresponding control node set such that the Laplace dynamic systems on these graphs with agents of first order dynamics are controllable.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Photoplethysmography based atrial fibrillation detection: an updated review from July 2019
Authors:
Cheng Ding,
Ran Xiao,
Weijia Wang,
Elizabeth Holdsworth,
Xiao Hu
Abstract:
Atrial fibrillation (AF) is a prevalent cardiac arrhythmia associated with significant health ramifications, including an elevated susceptibility to ischemic stroke, heart disease, and heightened mortality. Photoplethysmography (PPG) has emerged as a promising technology for continuous AF monitoring for its cost-effectiveness and widespread integration into wearable devices. Our team previously co…
▽ More
Atrial fibrillation (AF) is a prevalent cardiac arrhythmia associated with significant health ramifications, including an elevated susceptibility to ischemic stroke, heart disease, and heightened mortality. Photoplethysmography (PPG) has emerged as a promising technology for continuous AF monitoring for its cost-effectiveness and widespread integration into wearable devices. Our team previously conducted an exhaustive review on PPG-based AF detection before June 2019. However, since then, more advanced technologies have emerged in this field. This paper offers a comprehensive review of the latest advancements in PPG-based AF detection, utilizing digital health and artificial intelligence (AI) solutions, within the timeframe spanning from July 2019 to December 2022. Through extensive exploration of scientific databases, we have identified 59 pertinent studies. Our comprehensive review encompasses an in-depth assessment of the statistical methodologies, traditional machine learning techniques, and deep learning approaches employed in these studies. In addition, we address the challenges encountered in the domain of PPG-based AF detection. Furthermore, we maintain a dedicated website to curate the latest research in this area, with regular updates on a regular basis.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
Robust Anti-jamming Communications with DMA-Based Reconfigurable Heterogeneous Array
Authors:
Kaizhi Huang,
Wenyu Jiang,
Yajun Chen,
Liang **,
Qingqing Wu,
Xiaoling Hu
Abstract:
In the future commercial and military communication systems, anti-jamming remains a critical issue. Existing homogeneous or heterogeneous arrays with a limited degrees of freedom (DoF) and high consumption are unable to meet the requirements of communication in rapidly changing and intense jamming environments. To address these challenges, we propose a reconfigurable heterogeneous array (RHA) arch…
▽ More
In the future commercial and military communication systems, anti-jamming remains a critical issue. Existing homogeneous or heterogeneous arrays with a limited degrees of freedom (DoF) and high consumption are unable to meet the requirements of communication in rapidly changing and intense jamming environments. To address these challenges, we propose a reconfigurable heterogeneous array (RHA) architecture based on dynamic metasurface antenna (DMA), which will increase the DoF and further improve anti-jamming capabilities. We propose a two-step anti-jamming scheme based on RHA, where the multipaths are estimated by an atomic norm minimization (ANM) based scheme, and then the received signal-to-interference-plus-noise ratio (SINR) is maximized by jointly designing the phase shift of each DMA element and the weights of the array elements. To solve the challenging non-convex discrete fractional problem along with the estimation error in the direction of arrival (DoA) and channel state information (CSI), we propose a robust alternative algorithm based on the S-procedure to solve the lower-bound SINR maximization problem. Simulation results demonstrate that the proposed RHA architecture and corresponding schemes have superior performance in terms of jamming immunity and robustness.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech Separation
Authors:
Samuel Pegg,
Kai Li,
Xiaolin Hu
Abstract:
Audio-visual speech separation methods aim to integrate different modalities to generate high-quality separated speech, thereby enhancing the performance of downstream tasks such as speech recognition. Most existing state-of-the-art (SOTA) models operate in the time domain. However, their overly simplistic approach to modeling acoustic features often necessitates larger and more computationally in…
▽ More
Audio-visual speech separation methods aim to integrate different modalities to generate high-quality separated speech, thereby enhancing the performance of downstream tasks such as speech recognition. Most existing state-of-the-art (SOTA) models operate in the time domain. However, their overly simplistic approach to modeling acoustic features often necessitates larger and more computationally intensive models in order to achieve SOTA performance. In this paper, we present a novel time-frequency domain audio-visual speech separation method: Recurrent Time-Frequency Separation Network (RTFS-Net), which applies its algorithms on the complex time-frequency bins yielded by the Short-Time Fourier Transform. We model and capture the time and frequency dimensions of the audio independently using a multi-layered RNN along each dimension. Furthermore, we introduce a unique attention-based fusion technique for the efficient integration of audio and visual information, and a new mask separation approach that takes advantage of the intrinsic spectral nature of the acoustic features for a clearer separation. RTFS-Net outperforms the prior SOTA method in both inference speed and separation quality while reducing the number of parameters by 90% and MACs by 83%. This is the first time-frequency domain audio-visual speech separation method to outperform all contemporary time-domain counterparts.
△ Less
Submitted 21 March, 2024; v1 submitted 29 September, 2023;
originally announced September 2023.
-
Identification of Ghost Targets for Automotive Radar in the Presence of Multipath
Authors:
Le Zheng,
Jiamin Long,
Marco Lops,
Fan Liu,
Xueyao Hu
Abstract:
Colocated multiple-input multiple-output (MIMO) technology has been widely used in automotive radars as it provides accurate angular estimation of the objects with relatively small number of transmitting and receiving antennas. Since the Direction Of Departure (DOD) and the Direction Of Arrival (DOA) of line-of-sight targets coincide, MIMO signal processing allows forming a larger virtual array fo…
▽ More
Colocated multiple-input multiple-output (MIMO) technology has been widely used in automotive radars as it provides accurate angular estimation of the objects with relatively small number of transmitting and receiving antennas. Since the Direction Of Departure (DOD) and the Direction Of Arrival (DOA) of line-of-sight targets coincide, MIMO signal processing allows forming a larger virtual array for angle finding. However, multiple paths im**ing the receiver is a major limiting factor, in that radar signals may bounce off obstacles, creating echoes for which the DOD does not equal the DOA. Thus, in complex scenarios with multiple scatterers, the direct paths of the intended targets may be corrupted by indirect paths from other objects, which leads to inaccurate angle estimation or ghost targets. In this paper, we focus on detecting the presence of ghosts due to multipath by regarding it as the problem of deciding between a composite hypothesis, ${\cal H}_0$ say, that the observations only contain an unknown number of direct paths sharing the same (unknown) DOD's and DOA's, and a composite alternative, ${\cal H}_1$ say, that the observations also contain an unknown number of indirect paths, for which DOD's and DOA's do not coincide. We exploit the Generalized Likelihood Ratio Test (GLRT) philosophy to determine the detector structure, wherein the unknown parameters are replaced by carefully designed estimators. The angles of both the active direct paths and of the multi-paths are indeed estimated through a sparsity-enforced Compressed Sensing (CS) approach with Levenberg-Marquardt (LM) optimization to estimate the angular parameters in the continuous domain. An extensive performance analysis is finally offered in order to validate the proposed solution.
△ Less
Submitted 26 September, 2023; v1 submitted 24 September, 2023;
originally announced September 2023.
-
Expert Uncertainty and Severity Aware Chest X-Ray Classification by Multi-Relationship Graph Learning
Authors:
Mengliang Zhang,
Xinyue Hu,
Lin Gu,
Liangchen Liu,
Kazuma Kobayashi,
Tatsuya Harada,
Ronald M. Summers,
Yingying Zhu
Abstract:
Patients undergoing chest X-rays (CXR) often endure multiple lung diseases. When evaluating a patient's condition, due to the complex pathologies, subtle texture changes of different lung lesions in images, and patient condition differences, radiologists may make uncertain even when they have experienced long-term clinical training and professional guidance, which makes much noise in extracting di…
▽ More
Patients undergoing chest X-rays (CXR) often endure multiple lung diseases. When evaluating a patient's condition, due to the complex pathologies, subtle texture changes of different lung lesions in images, and patient condition differences, radiologists may make uncertain even when they have experienced long-term clinical training and professional guidance, which makes much noise in extracting disease labels based on CXR reports. In this paper, we re-extract disease labels from CXR reports to make them more realistic by considering disease severity and uncertainty in classification. Our contributions are as follows: 1. We re-extracted the disease labels with severity and uncertainty by a rule-based approach with keywords discussed with clinical experts. 2. To further improve the explainability of chest X-ray diagnosis, we designed a multi-relationship graph learning method with an expert uncertainty-aware loss function. 3. Our multi-relationship graph learning method can also interpret the disease classification results. Our experimental results show that models considering disease severity and uncertainty outperform previous state-of-the-art methods.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
WindMill: A Parameterized and Pluggable CGRA Implemented by DIAG Design Flow
Authors:
Haojia Hui,
Jiangyuan Gu,
Xunbo Hu,
Yang Hu,
Leibo Liu,
Shaojun Wei,
Shouyi Yin
Abstract:
With the cross-fertilization of applications and the ever-increasing scale of models, the efficiency and productivity of hardware computing architectures have become inadequate. This inadequacy further exacerbates issues in design flexibility, design complexity, development cycle, and development costs (4-d problems) in divergent scenarios. To address these challenges, this paper proposed a flexib…
▽ More
With the cross-fertilization of applications and the ever-increasing scale of models, the efficiency and productivity of hardware computing architectures have become inadequate. This inadequacy further exacerbates issues in design flexibility, design complexity, development cycle, and development costs (4-d problems) in divergent scenarios. To address these challenges, this paper proposed a flexible design flow called DIAG based on plugin techniques. The proposed flow guides hardware development through four layers: definition(D), implementation(I), application(A), and generation(G). Furthermore, a versatile CGRA generator called WindMill is implemented, allowing for agile generation of customized hardware accelerators based on specific application demands. Applications and algorithm tasks from three aspects is experimented. In the case of reinforcement learning algorithm, a significant performance improvement of $2.3\times$ compared to GPU is achieved.
△ Less
Submitted 3 September, 2023;
originally announced September 2023.
-
FFPN: Fourier Feature Pyramid Network for Ultrasound Image Segmentation
Authors:
Chaoyu Chen,
Xin Yang,
Rusi Chen,
Junxuan Yu,
Liwei Du,
Jian Wang,
Xindi Hu,
Yan Cao,
Yingying Liu,
Dong Ni
Abstract:
Ultrasound (US) image segmentation is an active research area that requires real-time and highly accurate analysis in many scenarios. The detect-to-segment (DTS) frameworks have been recently proposed to balance accuracy and efficiency. However, existing approaches may suffer from inadequate contour encoding or fail to effectively leverage the encoded results. In this paper, we introduce a novel F…
▽ More
Ultrasound (US) image segmentation is an active research area that requires real-time and highly accurate analysis in many scenarios. The detect-to-segment (DTS) frameworks have been recently proposed to balance accuracy and efficiency. However, existing approaches may suffer from inadequate contour encoding or fail to effectively leverage the encoded results. In this paper, we introduce a novel Fourier-anchor-based DTS framework called Fourier Feature Pyramid Network (FFPN) to address the aforementioned issues. The contributions of this paper are two fold. First, the FFPN utilizes Fourier Descriptors to adequately encode contours. Specifically, it maps Fourier series with similar amplitudes and frequencies into the same layer of the feature map, thereby effectively utilizing the encoded Fourier information. Second, we propose a Contour Sampling Refinement (CSR) module based on the contour proposals and refined features produced by the FFPN. This module extracts rich features around the predicted contours to further capture detailed information and refine the contours. Extensive experimental results on three large and challenging datasets demonstrate that our method outperforms other DTS methods in terms of accuracy and efficiency. Furthermore, our framework can generalize well to other detection or segmentation tasks.
△ Less
Submitted 26 August, 2023;
originally announced August 2023.
-
Block-Level Interference Exploitation Precoding for MU-MISO: An ADMM Approach
Authors:
Yiran Wang,
Yunsi Wen,
Ang Li,
Xiaoyan Hu,
Christos Masouros
Abstract:
We study constructive interference based block-level precoding (CI-BLP) in the downlink of multi-user multiple-input single-output (MU-MISO) systems. Specifically, our aim is to extend the analysis on CI-BLP to the case where the considered number of symbol slots is smaller than that of the users. To this end, we mathematically prove the feasibility of using the pseudo-inverse to obtain the optima…
▽ More
We study constructive interference based block-level precoding (CI-BLP) in the downlink of multi-user multiple-input single-output (MU-MISO) systems. Specifically, our aim is to extend the analysis on CI-BLP to the case where the considered number of symbol slots is smaller than that of the users. To this end, we mathematically prove the feasibility of using the pseudo-inverse to obtain the optimal CI-BLP precoding matrix in a closed form. Similar to the case when the number of users is small, we show that a quadratic programming (QP) optimization on simplex can be constructed. We also design a low-complexity algorithm based on the alternating direction method of multipliers (ADMM) framework, which can efficiently solve large-scale QP problems. We further analyze the convergence and complexity of the proposed algorithm. Numerical results validate our analysis and the optimality of the proposed algorithm, and further show that the proposed algorithm offers a flexible performance-complexity tradeoff by limiting the maximum number of iterations, which motivates the use of CI-BLP in practical wireless systems.
△ Less
Submitted 30 August, 2023; v1 submitted 24 August, 2023;
originally announced August 2023.
-
IIANet: An Intra- and Inter-Modality Attention Network for Audio-Visual Speech Separation
Authors:
Kai Li,
Runxuan Yang,
Fuchun Sun,
Xiaolin Hu
Abstract:
Recent research has made significant progress in designing fusion modules for audio-visual speech separation. However, they predominantly focus on multi-modal fusion at a single temporal scale of auditory and visual features without employing selective attention mechanisms, which is in sharp contrast with the brain. To address this issue, We propose a novel model called Intra- and Inter-Attention…
▽ More
Recent research has made significant progress in designing fusion modules for audio-visual speech separation. However, they predominantly focus on multi-modal fusion at a single temporal scale of auditory and visual features without employing selective attention mechanisms, which is in sharp contrast with the brain. To address this issue, We propose a novel model called Intra- and Inter-Attention Network (IIANet), which leverages the attention mechanism for efficient audio-visual feature fusion. IIANet consists of two types of attention blocks: intra-attention (IntraA) and inter-attention (InterA) blocks, where the InterA blocks are distributed at the top, middle and bottom of IIANet. Heavily inspired by the way how human brain selectively focuses on relevant content at various temporal scales, these blocks maintain the ability to learn modality-specific features and enable the extraction of different semantics from audio-visual features. Comprehensive experiments on three standard audio-visual separation benchmarks (LRS2, LRS3, and VoxCeleb2) demonstrate the effectiveness of IIANet, outperforming previous state-of-the-art methods while maintaining comparable inference time. In particular, the fast version of IIANet (IIANet-fast) has only 7% of CTCNet's MACs and is 40% faster than CTCNet on CPUs while achieving better separation quality, showing the great potential of attention mechanism for efficient and effective multimodal fusion.
△ Less
Submitted 2 February, 2024; v1 submitted 16 August, 2023;
originally announced August 2023.
-
Challenges and Opportunities for Second-life Batteries: A Review of Key Technologies and Economy
Authors:
Xubo Gu,
Hanyu Bai,
Xiaofan Cui,
Juner Zhu,
Weichao Zhuang,
Zhaojian Li,
Xiaosong Hu,
Ziyou Song
Abstract:
Due to the increasing volume of Electric Vehicles in automotive markets and the limited lifetime of onboard lithium-ion batteries (LIBs), the large-scale retirement of LIBs is imminent. The battery packs retired from Electric Vehicles still own 70%-80% of the initial capacity, thus having the potential to be utilized in scenarios with lower energy and power requirements to maximize the value of LI…
▽ More
Due to the increasing volume of Electric Vehicles in automotive markets and the limited lifetime of onboard lithium-ion batteries (LIBs), the large-scale retirement of LIBs is imminent. The battery packs retired from Electric Vehicles still own 70%-80% of the initial capacity, thus having the potential to be utilized in scenarios with lower energy and power requirements to maximize the value of LIBs. However, spent batteries are commonly less reliable than fresh batteries due to their degraded performance, thereby necessitating a comprehensive assessment from safety and economic perspectives before further utilization. To this end, this paper reviews the key technological and economic aspects of second-life batteries (SLBs). Firstly, we introduce various degradation models for first-life batteries and identify an opportunity to combine physics-based theories with data-driven methods to establish explainable models with physical laws that can be generalized. However, degradation models specifically tailored to SLBs are currently absent. Therefore, we analyze the applicability of existing battery degradation models developed for first-life batteries in SLB applications. Secondly, we investigate fast screening and regrou** techniques and discuss the regrou** standards for the first time to guide the classification procedure and enhance the performance and safety of SLBs. Thirdly, we scrutinize the economic analysis of SLBs and summarize the potentially profitable applications. Finally, we comprehensively examine and compare power electronics technologies that can substantially improve the performance of SLBs, including high-efficiency energy transformation technologies, active equalization technologies, and technologies to improve reliability and safety.
△ Less
Submitted 13 August, 2023;
originally announced August 2023.
-
The Color Clifford Hardy Signal: Application to Color Edge Detection and Optical Flow
Authors:
Xiaoxiao Hu,
Kit Ian Kou,
Cuiming Zou,
Dong Cheng
Abstract:
This paper introduces the idea of the color Clifford Hardy signal, which can be used to process color images. As a complex analytic function's high-dimensional analogue, the color Clifford Hardy signal inherits many desirable qualities of analyticity. A crucial tool for getting the color and structural data is the local feature representation of a color image in the color Clifford Hardy signal. By…
▽ More
This paper introduces the idea of the color Clifford Hardy signal, which can be used to process color images. As a complex analytic function's high-dimensional analogue, the color Clifford Hardy signal inherits many desirable qualities of analyticity. A crucial tool for getting the color and structural data is the local feature representation of a color image in the color Clifford Hardy signal. By looking at the extended Cauchy-Riemann equations in the high-dimensional space, it is possible to see the connection between the different parts of the color Clifford Hardy signal. Based on the distinctive and important local amplitude and local phase generated by the color Clifford Hardy signal, we propose five methods to identify the edges of color images with relation to a certain color. To prove the superiority of the offered methodologies, numerous comparative studies employing image quality assessment criteria are used. Specifically by using the multi-scale structure of the color Clifford Hardy signal, the proposed approaches are resistant to a variety of noises. In addition, a color optical flow detection method with anti-noise ability is provided as an example of application.
△ Less
Submitted 12 August, 2023;
originally announced August 2023.
-
Large-Scale Learning on Overlapped Speech Detection: New Benchmark and New General System
Authors:
Zhaohui Yin,
**gguang Tian,
Xinhui Hu,
Xinkang Xu,
Yang Xiang
Abstract:
Overlapped Speech Detection (OSD) is an important part of speech applications involving analysis of multi-party conversations. However, most of existing OSD systems are trained and evaluated on small datasets with limited application domains, which led to the robustness of them lacks benchmark for evaluation and the accuracy of them remains inadequate in realistic acoustic environments. To solve t…
▽ More
Overlapped Speech Detection (OSD) is an important part of speech applications involving analysis of multi-party conversations. However, most of existing OSD systems are trained and evaluated on small datasets with limited application domains, which led to the robustness of them lacks benchmark for evaluation and the accuracy of them remains inadequate in realistic acoustic environments. To solve these problem, we conduct a study of large-scale learning (LSL) in OSD tasks and propose a new general OSD system named CF-OSD with LSL based on Conformer network and LSL. In our study, a large-scale test set consisting of 151h labeled speech of different styles, languages and sound-source distances is produced and used as a new benchmark for evaluating the generality of OSD systems. Rigorous comparative experiments are designed and used to evaluate the effectiveness of LSL in OSD tasks and define the OSD model of our general OSD system. The experiment results show that LSL can significantly improve the accuracy and robustness of OSD systems, and the CF-OSD with LSL system significantly outperforms other OSD systems on our proposed benchmark. Moreover, our system has also achieved state-of-the-art performance on existing small dataset benchmarks, reaching 81.6\% and 53.8\% in the Alimeeting testset and DIHARD II evaluation set, respectively.
△ Less
Submitted 7 September, 2023; v1 submitted 11 August, 2023;
originally announced August 2023.
-
Deployment of Leader-Follower Automated Vehicle Systems for Smart Work Zone Applications with a Queuing-based Traffic Assignment Approach
Authors:
Qing Tang,
Xianbiao Hu
Abstract:
The emerging technology of the Autonomous Truck Mounted Attenuator (ATMA), a leader-follower style vehicle system, utilizes connected and automated vehicle capabilities to enhance safety during transportation infrastructure maintenance in work zones. However, the speed difference between ATMA vehicles and general vehicles creates a moving bottleneck that reduces capacity and increases queue length…
▽ More
The emerging technology of the Autonomous Truck Mounted Attenuator (ATMA), a leader-follower style vehicle system, utilizes connected and automated vehicle capabilities to enhance safety during transportation infrastructure maintenance in work zones. However, the speed difference between ATMA vehicles and general vehicles creates a moving bottleneck that reduces capacity and increases queue length, resulting in additional delays. The different routes taken by ATMA cause diverse patterns of time-varying capacity drops, which may affect the user equilibrium traffic assignment and lead to different system costs. This manuscript focuses on optimizing the routing for ATMA vehicles in a network to minimize the system cost associated with the slow-moving operation.
To achieve this, a queuing-based traffic assignment approach is proposed to identify the system cost caused by the ATMA system. A queuing-based time-dependent (QBTD) travel time function, considering capacity drop, is introduced and applied in the static user equilibrium traffic assignment problem, with a result of adding dynamic characteristics. Subsequently, we formulate the queuing-based traffic assignment problem and solve it using a modified path-based algorithm. The methodology is validated using a small-size and a large-size network and compared with two benchmark models to analyze the benefit of capacity drop modeling and QBTD travel time function. Furthermore, the approach is applied to quantify the impact of different routes on the traffic system and identify an optimal route for ATMA vehicles performing maintenance work. Finally, sensitivity analysis is conducted to explore how the impact changes with variations in traffic demand and capacity reduction.
△ Less
Submitted 23 July, 2023;
originally announced August 2023.
-
DRAW: Defending Camera-shooted RAW against Image Manipulation
Authors:
Xiaoxiao Hu,
Qichao Ying,
Zhenxing Qian,
Sheng Li,
Xinpeng Zhang
Abstract:
RAW files are the initial measurement of scene radiance widely used in most cameras, and the ubiquitously-used RGB images are converted from RAW data through Image Signal Processing (ISP) pipelines. Nowadays, digital images are risky of being nefariously manipulated. Inspired by the fact that innate immunity is the first line of body defense, we propose DRAW, a novel scheme of defending images aga…
▽ More
RAW files are the initial measurement of scene radiance widely used in most cameras, and the ubiquitously-used RGB images are converted from RAW data through Image Signal Processing (ISP) pipelines. Nowadays, digital images are risky of being nefariously manipulated. Inspired by the fact that innate immunity is the first line of body defense, we propose DRAW, a novel scheme of defending images against manipulation by protecting their sources, i.e., camera-shooted RAWs. Specifically, we design a lightweight Multi-frequency Partial Fusion Network (MPF-Net) friendly to devices with limited computing resources by frequency learning and partial feature fusion. It introduces invisible watermarks as protective signal into the RAW data. The protection capability can not only be transferred into the rendered RGB images regardless of the applied ISP pipeline, but also is resilient to post-processing operations such as blurring or compression. Once the image is manipulated, we can accurately identify the forged areas with a localization network. Extensive experiments on several famous RAW datasets, e.g., RAISE, FiveK and SIDD, indicate the effectiveness of our method. We hope that this technique can be used in future cameras as an option for image protection, which could effectively restrict image manipulation at the source.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
Learned Kernels for Sparse, Interpretable, and Efficient Medical Time Series Processing
Authors:
Sully F. Chen,
Zhicheng Guo,
Cheng Ding,
Xiao Hu,
Cynthia Rudin
Abstract:
Background: Rapid, reliable, and accurate interpretation of medical signals is crucial for high-stakes clinical decision-making. The advent of deep learning allowed for an explosion of new models that offered unprecedented performance in medical time series processing but at a cost: deep learning models are often compute-intensive and lack interpretability.
Methods: We propose Sparse Mixture of…
▽ More
Background: Rapid, reliable, and accurate interpretation of medical signals is crucial for high-stakes clinical decision-making. The advent of deep learning allowed for an explosion of new models that offered unprecedented performance in medical time series processing but at a cost: deep learning models are often compute-intensive and lack interpretability.
Methods: We propose Sparse Mixture of Learned Kernels (SMoLK), an interpretable architecture for medical time series processing. The method learns a set of lightweight flexible kernels to construct a single-layer neural network, providing not only interpretability, but also efficiency and robustness. We introduce novel parameter reduction techniques to further reduce the size of our network. We demonstrate the power of our architecture on two important tasks: photoplethysmography (PPG) artifact detection and atrial fibrillation detection from single-lead electrocardiograms (ECGs). Our approach has performance similar to the state-of-the-art deep neural networks with several orders of magnitude fewer parameters, allowing for deep neural network level performance with extremely low-power wearable devices.
Results: Our interpretable method achieves greater than 99% of the performance of the state-of-the-art methods on the PPG artifact detection task, and even outperforms the state-of-the-art on a challenging out-of-distribution test set, while using dramatically fewer parameters (2% of the parameters of Segade, and about half of the parameters of Tiny-PPG). On single lead atrial fibrillation detection, our method matches the performance of a 1D-residual convolutional network, at less than 1% the parameter count, while exhibiting considerably better performance in the low-data regime, even when compared to a parameter-matched control deep network.
△ Less
Submitted 2 April, 2024; v1 submitted 6 July, 2023;
originally announced July 2023.
-
A Self-Supervised Algorithm for Denoising Photoplethysmography Signals for Heart Rate Estimation from Wearables
Authors:
Pranay Jain,
Cheng Ding,
Cynthia Rudin,
Xiao Hu
Abstract:
Smart watches and other wearable devices are equipped with photoplethysmography (PPG) sensors for monitoring heart rate and other aspects of cardiovascular health. However, PPG signals collected from such devices are susceptible to corruption from noise and motion artifacts, which cause errors in heart rate estimation. Typical denoising approaches filter or reconstruct the signal in ways that elim…
▽ More
Smart watches and other wearable devices are equipped with photoplethysmography (PPG) sensors for monitoring heart rate and other aspects of cardiovascular health. However, PPG signals collected from such devices are susceptible to corruption from noise and motion artifacts, which cause errors in heart rate estimation. Typical denoising approaches filter or reconstruct the signal in ways that eliminate much of the morphological information, even from the clean parts of the signal that would be useful to preserve. In this work, we develop an algorithm for denoising PPG signals that reconstructs the corrupted parts of the signal, while preserving the clean parts of the PPG signal. Our novel framework relies on self-supervised training, where we leverage a large database of clean PPG signals to train a denoising autoencoder. As we show, our reconstructed signals provide better estimates of heart rate from PPG signals than the leading heart rate estimation methods. Further experiments show significant improvement in Heart Rate Variability (HRV) estimation from PPG signals using our algorithm. We conclude that our algorithm denoises PPG signals in a way that can improve downstream analysis of many different health metrics from wearable devices.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Multi-IMU with Online Self-Consistency for Freehand 3D Ultrasound Reconstruction
Authors:
Mingyuan Luo,
Xin Yang,
Zhongnuo Yan,
Junyu Li,
Yuanji Zhang,
Jiongquan Chen,
Xindi Hu,
Jikuan Qian,
Jun Cheng,
Dong Ni
Abstract:
Ultrasound (US) imaging is a popular tool in clinical diagnosis, offering safety, repeatability, and real-time capabilities. Freehand 3D US is a technique that provides a deeper understanding of scanned regions without increasing complexity. However, estimating elevation displacement and accumulation error remains challenging, making it difficult to infer the relative position using images alone.…
▽ More
Ultrasound (US) imaging is a popular tool in clinical diagnosis, offering safety, repeatability, and real-time capabilities. Freehand 3D US is a technique that provides a deeper understanding of scanned regions without increasing complexity. However, estimating elevation displacement and accumulation error remains challenging, making it difficult to infer the relative position using images alone. The addition of external lightweight sensors has been proposed to enhance reconstruction performance without adding complexity, which has been shown to be beneficial. We propose a novel online self-consistency network (OSCNet) using multiple inertial measurement units (IMUs) to improve reconstruction performance. OSCNet utilizes a modal-level self-supervised strategy to fuse multiple IMU information and reduce differences between reconstruction results obtained from each IMU data. Additionally, a sequence-level self-consistency strategy is proposed to improve the hierarchical consistency of prediction results among the scanning sequence and its sub-sequences. Experiments on large-scale arm and carotid datasets with multiple scanning tactics demonstrate that our OSCNet outperforms previous methods, achieving state-of-the-art reconstruction performance.
△ Less
Submitted 18 July, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Synchronous dynamic game on system observability considering one or two steps optimality
Authors:
Yueyue Xu,
Xiaoming Hu,
Lin Wang
Abstract:
This paper studies a system security problem in the context of observability based on a two-party non-cooperative asynchronous dynamic game. A system is assumed to be secure if it is not observable. Both the defender and the attacker have means to modify dimension of the unobservable subspace, which is set as the value function. Utilizing tools from geometric control, we construct the best respons…
▽ More
This paper studies a system security problem in the context of observability based on a two-party non-cooperative asynchronous dynamic game. A system is assumed to be secure if it is not observable. Both the defender and the attacker have means to modify dimension of the unobservable subspace, which is set as the value function. Utilizing tools from geometric control, we construct the best response set under one-step or two-step optimality to minimize or maximize the value function. We find that the best response sets under one-step optimality are not single-valued maps, resulting in a variety of game outcomes. In the dynamic game considering two-step optimality, definition and existence conditions of lock and oscillation game modes are given. Finally, the best response under two-step optimality and the Stackelberg game equilibrium are compared.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
On the Generalization and Advancement of Half-Sine-Based Pulse Sha** Filters for Constant Envelope OQPSK Modulation
Authors:
Pengcheng Mu,
Yan Liu,
Zihao Guo,
Xiaoyan Hu,
Kai-Kit Wong
Abstract:
The offset quadrature phase-shift keying (OQPSK) modulation is a key factor for the technique of ZigBee, which has been adopted in IEEE 802.15.4 for wireless communications of Internet of Things (IoT) and Internet of Vehicles (IoV), etc. In this paper, we propose the general conditions of pulse sha** filters (PSFs) with constant envelope (CE) property for OQPSK modulation, which can be easily le…
▽ More
The offset quadrature phase-shift keying (OQPSK) modulation is a key factor for the technique of ZigBee, which has been adopted in IEEE 802.15.4 for wireless communications of Internet of Things (IoT) and Internet of Vehicles (IoV), etc. In this paper, we propose the general conditions of pulse sha** filters (PSFs) with constant envelope (CE) property for OQPSK modulation, which can be easily leveraged to design the PSFs with CE property. Based on these conditions, we further design an advanced PSF called $α$-half-sine PSF. It is verified that the newly designed $α$-half-sine PSF can not only keep the CE property for OQPSK but also achieve better performance than the traditional PSFs in certain scenarios. Moreover, the $α$-half-sine PSF can be simply adjusted to achieve a flexible performance tradeoff between the transition roll-off speed and out-of-band leakage.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
STAR-RIS Assisted Covert Communications in NOMA Systems
Authors:
Han Xiao,
Xiaoyan Hu,
Tong-Xing Zheng,
Kai-Kit Wong
Abstract:
Covert communications assisted by simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) in non-orthogonal multiple access (NOMA) systems have been explored in this paper. In particular, the access point (AP) transmitter adopts NOMA to serve a downlink covert user and a public user. The minimum detection error probability (DEP) at the warden is derived considering…
▽ More
Covert communications assisted by simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) in non-orthogonal multiple access (NOMA) systems have been explored in this paper. In particular, the access point (AP) transmitter adopts NOMA to serve a downlink covert user and a public user. The minimum detection error probability (DEP) at the warden is derived considering the uncertainty of its background noise, which is used as a covertness constraint. We aim at maximizing the covert rate of the system by jointly optimizing APs transmit power and passive beamforming of STAR-RIS, under the covertness and quality of service (QoS) constraints. An iterative algorithm is proposed to effectively solve the non-convex optimization problem. Simulation results show that the proposed scheme significantly outperforms the conventional RIS-based scheme in ensuring system covert performance.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
Sensing-based Beamforming Design for Joint Performance Enhancement of RIS-Aided ISAC Systems
Authors:
Xiaowei Qian,
Xiaoling Hu,
Chenxi Liu,
Mugen Peng,
Caijun Zhong
Abstract:
Reconfigurable intelligent surface (RIS) has shown its great potential in facilitating device-based integrated sensing and communication (ISAC), where sensing and communication tasks are mostly conducted on different time-frequency resources. While the more challenging scenarios of simultaneous sensing and communication (SSC) have so far drawn little attention. In this paper, we propose a novel RI…
▽ More
Reconfigurable intelligent surface (RIS) has shown its great potential in facilitating device-based integrated sensing and communication (ISAC), where sensing and communication tasks are mostly conducted on different time-frequency resources. While the more challenging scenarios of simultaneous sensing and communication (SSC) have so far drawn little attention. In this paper, we propose a novel RIS-aided ISAC framework where the inherent location information in the received communication signals from a blind-zone user equipment is exploited to enable SSC. We first design a two-phase ISAC transmission protocol. In the first phase, communication and coarse-grained location sensing are performed concurrently by exploiting the very limited channel state information, while in the second phase, by using the coarse-grained sensing information obtained from the first phase, simple-yet-efficient sensing-based beamforming designs are proposed to realize both higher-rate communication and fine-grained location sensing. We demonstrate that our proposed framework can achieve almost the same performance as the communication-only frameworks, while providing up to millimeter-level positioning accuracy. In addition, we show how the communication and sensing performance can be simultaneously boosted through our proposed sensing-based beamforming designs. The results presented in this work provide valuable insights into the design and implementation of other ISAC systems considering SSC.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
DataAI-6G: A System Parameters Configurable Channel Dataset for AI-6G Research
Authors:
Zibing Shen,
Jianhua Zhang,
Li Yu,
Yuxiang Zhang,
Zhen Zhang,
Xidong Hu
Abstract:
With the acceleration of the commercialization of fifth generation (5G) mobile communication technology and the research for 6G communication systems, the communication system has the characteristics of high frequency, multi-band, high speed movement of users and large antenna array. These bring many difficulties to obtain accurate channel state information (CSI), which makes the performance of tr…
▽ More
With the acceleration of the commercialization of fifth generation (5G) mobile communication technology and the research for 6G communication systems, the communication system has the characteristics of high frequency, multi-band, high speed movement of users and large antenna array. These bring many difficulties to obtain accurate channel state information (CSI), which makes the performance of traditional communication methods be greatly restricted. Therefore, there has been a lot of interest in using artificial intelligence (AI) instead of traditional methods to improve performance. A common and accurate dataset is essential for the research of AI communication. However, the common datasets nowadays still lack some important features, such as mobile features, spatial non-stationary features etc. To address these issues, we give a dataset for future 6G communication. In this dataset, we address these issues with specific simulation methods and accompanying code processing.
△ Less
Submitted 3 June, 2023;
originally announced June 2023.
-
Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model
Authors:
Héctor Martel,
Julius Richter,
Kai Li,
Xiaolin Hu,
Timo Gerkmann
Abstract:
We propose Audio-Visual Lightweight ITerative model (AVLIT), an effective and lightweight neural network that uses Progressive Learning (PL) to perform audio-visual speech separation in noisy environments. To this end, we adopt the Asynchronous Fully Recurrent Convolutional Neural Network (A-FRCNN), which has shown successful results in audio-only speech separation. Our architecture consists of an…
▽ More
We propose Audio-Visual Lightweight ITerative model (AVLIT), an effective and lightweight neural network that uses Progressive Learning (PL) to perform audio-visual speech separation in noisy environments. To this end, we adopt the Asynchronous Fully Recurrent Convolutional Neural Network (A-FRCNN), which has shown successful results in audio-only speech separation. Our architecture consists of an audio branch and a video branch, with iterative A-FRCNN blocks sharing weights for each modality. We evaluated our model in a controlled environment using the NTCD-TIMIT dataset and in-the-wild using a synthetic dataset that combines LRS3 and WHAM!. The experiments demonstrate the superiority of our model in both settings with respect to various audio-only and audio-visual baselines. Furthermore, the reduced footprint of our model makes it suitable for low resource applications.
△ Less
Submitted 31 May, 2023;
originally announced June 2023.