Search | arXiv e-print repository

Exploiting Dependency-Aware Priority Adjustment for Mixed-Criticality TSN Flow Scheduling

Authors: Miao Guo, Yifei Sun, Chaojie Gu, Shibo He, Zhiguo Shi

Abstract: Time-Sensitive Networking (TSN) serves as a one-size-fits-all solution for mixed-criticality communication, in which flow scheduling is vital to guarantee real-time transmissions. Traditional approaches statically assign priorities to flows based on their associated applications, resulting in significant queuing delays. In this paper, we observe that assigning different priorities to a flow leads… ▽ More Time-Sensitive Networking (TSN) serves as a one-size-fits-all solution for mixed-criticality communication, in which flow scheduling is vital to guarantee real-time transmissions. Traditional approaches statically assign priorities to flows based on their associated applications, resulting in significant queuing delays. In this paper, we observe that assigning different priorities to a flow leads to varying delays due to different sha** mechanisms applied to different flow types. Leveraging this insight, we introduce a new scheduling method in mixed-criticality TSN that incorporates a priority adjustment scheme among diverse flow types to mitigate queuing delays and enhance schedulability. Specifically, we propose dependency-aware priority adjustment algorithms tailored to different link-overlap** conditions. Experiments in various settings validate the effectiveness of the proposed method, which enhances the schedulability by 20.57% compared with the SOTA method. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: This paper has been accepted by IWQoS'24

arXiv:2406.18548 [pdf]

Exploration of Multi-Scale Image Fusion Systems in Intelligent Medical Image Analysis

Authors: Yuxiang Hu, Haowei Yang, Ting Xu, Shuyao He, Jiajie Yuan, Haozhang Deng

Abstract: The diagnosis of brain cancer relies heavily on medical imaging techniques, with MRI being the most commonly used. It is necessary to perform automatic segmentation of brain tumors on MRI images. This project intends to build an MRI algorithm based on U-Net. The residual network and the module used to enhance the context information are combined, and the void space convolution pooling pyramid is a… ▽ More The diagnosis of brain cancer relies heavily on medical imaging techniques, with MRI being the most commonly used. It is necessary to perform automatic segmentation of brain tumors on MRI images. This project intends to build an MRI algorithm based on U-Net. The residual network and the module used to enhance the context information are combined, and the void space convolution pooling pyramid is added to the network for processing. The brain glioma MRI image dataset provided by cancer imaging archives was experimentally verified. A multi-scale segmentation method based on a weighted least squares filter was used to complete the 3D reconstruction of brain tumors. Thus, the accuracy of three-dimensional reconstruction is further improved. Experiments show that the local texture features obtained by the proposed algorithm are similar to those obtained by laser scanning. The algorithm is improved by using the U-Net method and an accuracy of 0.9851 is obtained. This approach significantly enhances the precision of image segmentation and boosts the efficiency of image classification. △ Less

Submitted 23 May, 2024; originally announced June 2024.

arXiv:2406.00492 [pdf, other]

SAM-VMNet: Deep Neural Networks For Coronary Angiography Vessel Segmentation

Authors: Xueying Zeng, Baixiang Huang, Yu Luo, Guangyu Wei, Songyan He, Yushuang Shao

Abstract: Coronary artery disease (CAD) is one of the most prevalent diseases in the cardiovascular field and one of the major contributors to death worldwide. Computed Tomography Angiography (CTA) images are regarded as the authoritative standard for the diagnosis of coronary artery disease, and by performing vessel segmentation and stenosis detection on CTA images, physicians are able to diagnose coronary… ▽ More Coronary artery disease (CAD) is one of the most prevalent diseases in the cardiovascular field and one of the major contributors to death worldwide. Computed Tomography Angiography (CTA) images are regarded as the authoritative standard for the diagnosis of coronary artery disease, and by performing vessel segmentation and stenosis detection on CTA images, physicians are able to diagnose coronary artery disease more accurately. In order to combine the advantages of both the base model and the domain-specific model, and to achieve high-precision and fully-automatic segmentation and detection with a limited number of training samples, we propose a novel architecture, SAM-VMNet, which combines the powerful feature extraction capability of MedSAM with the advantage of the linear complexity of the visual state-space model of VM-UNet, giving it faster inferences than Vision Transformer with faster inference speed and stronger data processing capability, achieving higher segmentation accuracy and stability for CTA images. Experimental results show that the SAM-VMNet architecture performs excellently in the CTA image segmentation task, with a segmentation accuracy of up to 98.32% and a sensitivity of up to 99.33%, which is significantly better than other existing models and has stronger domain adaptability. Comprehensive evaluation of the CTA image segmentation task shows that SAM-VMNet accurately extracts the vascular trunks and capillaries, demonstrating its great potential and wide range of application scenarios for the vascular segmentation task, and also laying a solid foundation for further stenosis detection. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2404.16611 [pdf, ps, other]

Towards Symbiotic SAGIN Through Inter-operator Resource and Service Sharing: Joint Orchestration of User Association and Radio Resources

Authors: Shizhao He, Jungang Ge, Ying-Chang Liang, Dusit Niyato

Abstract: The space-air-ground integrated network (SAGIN) is a pivotal architecture to support ubiquitous connectivity in the upcoming 6G era. Inter-operator resource and service sharing is a promising way to realize such a huge network, utilizing resources efficiently and reducing construction costs. Given the rationality of operators, the configuration of resources and services in SAGIN should focus on bo… ▽ More The space-air-ground integrated network (SAGIN) is a pivotal architecture to support ubiquitous connectivity in the upcoming 6G era. Inter-operator resource and service sharing is a promising way to realize such a huge network, utilizing resources efficiently and reducing construction costs. Given the rationality of operators, the configuration of resources and services in SAGIN should focus on both the overall system performance and individual benefits of operators. Motivated by emerging symbiotic communication facilitating mutual benefits across different radio systems, we investigate the resource and service sharing in SAGIN from a symbiotic communication perspective in this paper. In particular, we consider a SAGIN consisting of a ground network operator (GNO) and a satellite network operator (SNO). Specifically, we aim to maximize the weighted sum rate (WSR) of the whole SAGIN by jointly optimizing the user association, resource allocation, and beamforming. Besides, we introduce a sharing coefficient to characterize the revenue of operators. Operators may suffer revenue loss when only focusing on maximizing the WSR. In pursuit of mutual benefits, we propose a mutual benefit constraint (MBC) to ensure that each operator obtains revenue gains. Then, we develop a centralized algorithm based on the successive convex approximation (SCA) method. Considering that the centralized algorithm is difficult to implement, we propose a distributed algorithm based on Lagrangian dual decomposition and the consensus alternating direction method of multipliers (ADMM). Finally, we provide extensive numerical simulations to demonstrate the effectiveness of the two proposed algorithms, and the distributed optimization algorithm can approach the performance of the centralized one. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.14700 [pdf, other]

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Authors: Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan, Jianyi Chen, Yiwen Lu, Peiwen Sun, Jiahao Pan, Weizhen Bian, Shulin He, Qifeng Liu, Yike Guo, Wei Xue

Abstract: Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large… ▽ More Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large-scale zero-shot speech synthesis system with approximately 5\% of the inference time compared with previous work. FlashSpeech is built on the latent consistency model and applies a novel adversarial consistency training approach that can train from scratch without the need for a pre-trained diffusion model as the teacher. Furthermore, a new prosody generator module enhances the diversity of prosody, making the rhythm of the speech sound more natural. The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation. Our experimental results demonstrate the superior performance of FlashSpeech. Notably, FlashSpeech can be about 20 times faster than other zero-shot speech synthesis systems while maintaining comparable performance in terms of voice quality and similarity. Furthermore, FlashSpeech demonstrates its versatility by efficiently performing tasks like voice conversion, speech editing, and diverse speech sampling. Audio samples can be found in https://flashspeech.github.io/. △ Less

Submitted 24 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: Efficient zero-shot speech synthesis

arXiv:2404.12170 [pdf, other]

Secure Semantic Communication for Image Transmission in the Presence of Eavesdroppers

Authors: Shunpu Tang, Chen Liu, Qianqian Yang, Shibo He, Dusit Niyato

Abstract: Semantic communication (SemCom) has emerged as a key technology for the forthcoming sixth-generation (6G) network, attributed to its enhanced communication efficiency and robustness against channel noise. However, the open nature of wireless channels renders them vulnerable to eavesdrop**, posing a serious threat to privacy. To address this issue, we propose a novel secure semantic communication… ▽ More Semantic communication (SemCom) has emerged as a key technology for the forthcoming sixth-generation (6G) network, attributed to its enhanced communication efficiency and robustness against channel noise. However, the open nature of wireless channels renders them vulnerable to eavesdrop**, posing a serious threat to privacy. To address this issue, we propose a novel secure semantic communication (SemCom) approach for image transmission, which integrates steganography technology to conceal private information within non-private images (host images). Specifically, we propose an invertible neural network (INN)-based signal steganography approach, which embeds channel input signals of a private image into those of a host image before transmission. This ensures that the original private image can be reconstructed from the received signals at the legitimate receiver, while the eavesdropper can only decode the information of the host image. Simulation results demonstrate that the proposed approach maintains comparable reconstruction quality of both host and private images at the legitimate receiver, compared to scenarios without any secure mechanisms. Experiments also show that the eavesdropper is only able to reconstruct host images, showcasing the enhanced security provided by our approach. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.11313 [pdf, other]

NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The purpose is to build new benchmarks and advance the development of S-UGC VQA. The competition had 200 participants and 13 teams submitted valid solutions for the final testing phase. The proposed solutions achieved state-of-the-art performances for S-UGC VQA. The project can be found at https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

arXiv:2404.10365 [pdf, other]

Learning Wireless Data Knowledge Graph for Green Intelligent Communications: Methodology and Experiments

Authors: Yongming Huang, Xiaohu You, Hang Zhan, Shiwen He, Ningning Fu, Wei Xu

Abstract: Intelligent communications have played a pivotal role in sha** the evolution of 6G networks. Native artificial intelligence (AI) within green communication systems must meet stringent real-time requirements. To achieve this, deploying lightweight and resource-efficient AI models is necessary. However, as wireless networks generate a multitude of data fields and indicators during operation, only… ▽ More Intelligent communications have played a pivotal role in sha** the evolution of 6G networks. Native artificial intelligence (AI) within green communication systems must meet stringent real-time requirements. To achieve this, deploying lightweight and resource-efficient AI models is necessary. However, as wireless networks generate a multitude of data fields and indicators during operation, only a fraction of them imposes significant impact on the network AI models. Therefore, real-time intelligence of communication systems heavily relies on a small but critical set of the data that profoundly influences the performance of network AI models. These challenges underscore the need for innovative architectures and solutions. In this paper, we propose a solution, termed the pervasive multi-level (PML) native AI architecture, which integrates the concept of knowledge graph (KG) into the intelligent operational manipulations of mobile networks, resulting in the establishment of a wireless data KG. Leveraging the wireless data KG, we characterize the massive and complex data collected from wireless communication networks and analyze the relationships among various data fields. The obtained graph of data field relations enables the on-demand generation of minimal and effective datasets, referred to as feature datasets, tailored to specific application requirements. Consequently, this architecture not only enhances AI training, inference, and validation processes but also significantly reduces resource wastage and overhead for communication networks. To implement this architecture, we have developed a specific solution comprising a spatio-temporal heterogeneous graph attention neural network model (STREAM) as well as a feature dataset generation algorithm. Experiments are conducted to validate the effectiveness of the proposed architecture. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 12 pages,11 figures

arXiv:2404.08943 [pdf, other]

A Novel State-Centric Necessary Condition for Time-Optimal Control of Controllable Linear Systems Based on Augmented Switching Laws

Authors: Yunan Wang, Chuxiong Hu, Yujie Lin, Zeyang Li, Shize Lin, Suqin He

Abstract: Most existing necessary conditions for optimal control based on adjoining methods require both state information and costate information, yet the lack of costates for a given feasible trajectory in practice impedes the determination of optimality. This paper establishes a novel theoretical framework for time-optimal control of controllable linear systems, proposing the augmented switching law that… ▽ More Most existing necessary conditions for optimal control based on adjoining methods require both state information and costate information, yet the lack of costates for a given feasible trajectory in practice impedes the determination of optimality. This paper establishes a novel theoretical framework for time-optimal control of controllable linear systems, proposing the augmented switching law that represents the input control and the feasibility in a compact form. Given a feasible trajectory, the disturbed trajectory under the constraints of augmented switching law is guaranteed to be feasible, resulting in a novel state-centric necessary condition without dependence on costate information. A first order necessary condition is proposed that the Jacobian matrix of the augmented switching law is not full row rank, which also results in an approach to optimizing a given feasible trajectory further. The proposed necessary condition is applied to the chain-of-integrators systems with full box constraints, contributing to some conclusions challenging to reason by traditional costate-based necessary conditions. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2403.17675 [pdf, other]

Chattering Phenomena in Time-Optimal Control for High-Order Chain-of-Integrators Systems with Full State Constraints

Authors: Yunan Wang, Chuxiong Hu, Zeyang Li, Yujie Lin, Shize Lin, Suqin He

Abstract: Time-optimal control for high-order chain-of-integrators systems with full state constraints remains an open and challenging problem in the optimal control theory domain. The behaviors of optimal control in high-order problems lack precision characterization, even where the existence of the chattering phenomenon remains unknown and overlooked. This paper establishes a theoretical framework for cha… ▽ More Time-optimal control for high-order chain-of-integrators systems with full state constraints remains an open and challenging problem in the optimal control theory domain. The behaviors of optimal control in high-order problems lack precision characterization, even where the existence of the chattering phenomenon remains unknown and overlooked. This paper establishes a theoretical framework for chattering phenomena in the considered problem, providing novel findings on the uniqueness of state constraints inducing chattering, the upper bound on switching times in an unconstrained arc during chattering, and the convergence of states and costates to the chattering limit point. For the first time, this paper proves the existence of the chattering phenomenon in the considered problem. The chattering optimal control for 4th order problems with velocity constraints is precisely solved, providing an approach to plan strictly time-optimal snap-limited trajectories. Other cases of order $n\leq4$ are proved not to allow chattering. The conclusions correct the longstanding misconception in the industry regarding the time-optimality of S-shaped trajectories with minimal switching times. △ Less

Submitted 29 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

arXiv:2402.14225 [pdf, other]

SICRN: Advancing Speech Enhancement through State Space Model and Inplace Convolution Techniques

Authors: Changjiang Zhao, Shulin He, Xueliang Zhang

Abstract: Speech enhancement aims to improve speech quality and intelligibility, especially in noisy environments where background noise degrades speech signals. Currently, deep learning methods achieve great success in speech enhancement, e.g. the representative convolutional recurrent neural network (CRN) and its variants. However, CRN typically employs consecutive downsampling and upsampling convolution… ▽ More Speech enhancement aims to improve speech quality and intelligibility, especially in noisy environments where background noise degrades speech signals. Currently, deep learning methods achieve great success in speech enhancement, e.g. the representative convolutional recurrent neural network (CRN) and its variants. However, CRN typically employs consecutive downsampling and upsampling convolution for frequency modeling, which destroys the inherent structure of the signal over frequency. Additionally, convolutional layers lacks of temporal modelling abilities. To address these issues, we propose an innovative module combing a State space model and Inplace Convolution (SIC), and to replace the conventional convolution in CRN, called SICRN. Specifically, a dual-path multidimensional State space model captures the global frequencies dependency and long-term temporal dependencies. Meanwhile, the 2D-inplace convolution is used to capture the local structure, which abandons the downsampling and upsampling. Systematic evaluations on the public INTERSPEECH 2020 DNS challenge dataset demonstrate SICRN's efficacy. Compared to strong baselines, SICRN achieves performance close to state-of-the-art while having advantages in model parameters, computations, and algorithmic delay. The proposed SICRN shows great promise for improved speech enhancement. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2312.15633 [pdf, other]

MuLA-GAN: Multi-Level Attention GAN for Enhanced Underwater Visibility

Authors: Ahsan Baidar Bakht, Zikai Jia, Muhayy ud Din, Waseem Akram, Lyes Saad Soud, Lakmal Seneviratne, Defu Lin, Shaoming He, Irfan Hussain

Abstract: The underwater environment presents unique challenges, including color distortions, reduced contrast, and blurriness, hindering accurate analysis. In this work, we introduce MuLA-GAN, a novel approach that leverages the synergistic power of Generative Adversarial Networks (GANs) and Multi-Level Attention mechanisms for comprehensive underwater image enhancement. The integration of Multi-Level Atte… ▽ More The underwater environment presents unique challenges, including color distortions, reduced contrast, and blurriness, hindering accurate analysis. In this work, we introduce MuLA-GAN, a novel approach that leverages the synergistic power of Generative Adversarial Networks (GANs) and Multi-Level Attention mechanisms for comprehensive underwater image enhancement. The integration of Multi-Level Attention within the GAN architecture significantly enhances the model's capacity to learn discriminative features crucial for precise image restoration. By selectively focusing on relevant spatial and multi-level features, our model excels in capturing and preserving intricate details in underwater imagery, essential for various applications. Extensive qualitative and quantitative analyses on diverse datasets, including UIEB test dataset, UIEB challenge dataset, U45, and UCCS dataset, highlight the superior performance of MuLA-GAN compared to existing state-of-the-art methods. Experimental evaluations on a specialized dataset tailored for bio-fouling and aquaculture applications demonstrate the model's robustness in challenging environmental conditions. On the UIEB test dataset, MuLA-GAN achieves exceptional PSNR (25.59) and SSIM (0.893) scores, surpassing Water-Net, the second-best model, with scores of 24.36 and 0.885, respectively. This work not only addresses a significant research gap in underwater image enhancement but also underscores the pivotal role of Multi-Level Attention in enhancing GANs, providing a novel and comprehensive framework for restoring underwater image quality. △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2312.10979 [pdf, ps, other]

3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications

Authors: Shulin He, **jiang liu, Hao Li, Yang Yang, Fei Chen, Xueliang Zhang

Abstract: Target speaker extraction (TSE) aims to isolate a specific voice from multiple mixed speakers relying on a registerd sample. Since voiceprint features usually vary greatly, current end-to-end neural networks require large model parameters which are computational intensive and impractical for real-time applications, espetially on resource-constrained platforms. In this paper, we address the TSE tas… ▽ More Target speaker extraction (TSE) aims to isolate a specific voice from multiple mixed speakers relying on a registerd sample. Since voiceprint features usually vary greatly, current end-to-end neural networks require large model parameters which are computational intensive and impractical for real-time applications, espetially on resource-constrained platforms. In this paper, we address the TSE task using microphone array and introduce a novel three-stage solution that systematically decouples the process: First, a neural network is trained to estimate the direction of the target speaker. Second, with the direction determined, the Generalized Sidelobe Canceller (GSC) is used to extract the target speech. Third, an Inplace Convolutional Recurrent Neural Network (ICRN) acts as a denoising post-processor, refining the GSC output to yield the final separated speech. Our approach delivers superior performance while drastically reducing computational load, setting a new standard for efficient real-time target speaker extraction. △ Less

Submitted 4 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: Accepted to ICASSP 2024

arXiv:2312.05062 [pdf, ps, other]

Deep Learning Enabled Semantic Communication Systems for Video Transmission

Authors: Zhenguo Zhang, Qianqian Yang, Shibo He, Jiming Chen

Abstract: Semantic communication has emerged as a promising approach for improving efficient transmission in the next generation of wireless networks. Inspired by the success of semantic communication in different areas, we aim to provide a new semantic communication scheme from the semantic level. In this paper, we propose a novel DL-based semantic communication system for video transmission, which compact… ▽ More Semantic communication has emerged as a promising approach for improving efficient transmission in the next generation of wireless networks. Inspired by the success of semantic communication in different areas, we aim to provide a new semantic communication scheme from the semantic level. In this paper, we propose a novel DL-based semantic communication system for video transmission, which compacts semantic-related information to improve transmission efficiency. In particular, we utilize the Bi-optical flow to estimate residual information of inter-frame details. We also propose a feature choice module and a feature fusion module to drop semantically redundant features while paying more attention to the important semantic-related content. We employ a frame prediction module to reconstruct semantic features of the prediction frame from the received signal at the receiver. To enhance the system's robustness, we propose a noise attention module that assigns different importance weights to the extracted features. Simulation results indicate that our proposed method outperforms existing approaches in terms of transmission efficiency, achieving about 33.3\% reduction in the number of transmitted symbols while improving the peak signal-to-noise ratio (PSNR) performance by an average of 0.56dB. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2311.07039 [pdf, other]

Time-Optimal Control for High-Order Chain-of-Integrators Systems with Full State Constraints and Arbitrary Terminal States (Extended Version)

Authors: Yunan Wang, Chuxiong Hu, Zeyang Li, Shize Lin, Suqin He, Yu Zhu

Abstract: Time-optimal control for high-order chain-of-integrators systems with full state constraints and arbitrarily given terminal states remains a challenging problem in the optimal control theory domain, yet to be resolved. To enhance further comprehension of the problem, this paper establishes a novel notation system and theoretical framework, providing the switching manifold for high-order problems i… ▽ More Time-optimal control for high-order chain-of-integrators systems with full state constraints and arbitrarily given terminal states remains a challenging problem in the optimal control theory domain, yet to be resolved. To enhance further comprehension of the problem, this paper establishes a novel notation system and theoretical framework, providing the switching manifold for high-order problems in the form of switching laws. Through deriving properties of switching laws regarding signs and dimension, this paper proposes a definite condition for time-optimal control. Guided by the developed theory, a trajectory planning method named the manifold-intercept method (MIM) is developed. The proposed MIM can plan time-optimal jerk-limited trajectories with full state constraints, and can also plan near-optimal non-chattering higher-order trajectories with negligible extra motion time compared to optimal profiles. Numerical results indicate that the proposed MIM outperforms all baselines in computational time, computational accuracy, and trajectory quality by a large gap. △ Less

Submitted 28 March, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

arXiv:2309.10393 [pdf, ps, other]

Hierarchical Modeling of Spatial Cues via Spherical Harmonics for Multi-Channel Speech Enhancement

Authors: Jiahui Pan, Shulin He, Hui Zhang, Xueliang Zhang

Abstract: Multi-channel speech enhancement utilizes spatial information from multiple microphones to extract the target speech. However, most existing methods do not explicitly model spatial cues, instead relying on implicit learning from multi-channel spectra. To better leverage spatial information, we propose explicitly incorporating spatial modeling by applying spherical harmonic transforms (SHT) to the… ▽ More Multi-channel speech enhancement utilizes spatial information from multiple microphones to extract the target speech. However, most existing methods do not explicitly model spatial cues, instead relying on implicit learning from multi-channel spectra. To better leverage spatial information, we propose explicitly incorporating spatial modeling by applying spherical harmonic transforms (SHT) to the multi-channel input. In detail, a hierarchical framework is introduced whereby lower order harmonics capturing broader spatial patterns are estimated first, then combined with higher orders to recursively predict finer spatial details. Experiments on TIMIT demonstrate the proposed method can effectively recover target spatial patterns and achieve improved performance over baseline models, using fewer parameters and computations. Explicitly modeling spatial information hierarchically enables more effective multi-channel speech enhancement. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.10379 [pdf, ps, other]

PDPCRN: Parallel Dual-Path CRN with Bi-directional Inter-Branch Interactions for Multi-Channel Speech Enhancement

Authors: Jiahui Pan, Shulin He, Tianci Wu, Hui Zhang, Xueliang Zhang

Abstract: Multi-channel speech enhancement seeks to utilize spatial information to distinguish target speech from interfering signals. While deep learning approaches like the dual-path convolutional recurrent network (DPCRN) have made strides, challenges persist in effectively modeling inter-channel correlations and amalgamating multi-level information. In response, we introduce the Parallel Dual-Path Convo… ▽ More Multi-channel speech enhancement seeks to utilize spatial information to distinguish target speech from interfering signals. While deep learning approaches like the dual-path convolutional recurrent network (DPCRN) have made strides, challenges persist in effectively modeling inter-channel correlations and amalgamating multi-level information. In response, we introduce the Parallel Dual-Path Convolutional Recurrent Network (PDPCRN). This acoustic modeling architecture has two key innovations. First, a parallel design with separate branches extracts complementary features. Second, bi-directional modules enable cross-branch communication. Together, these facilitate diverse representation fusion and enhanced modeling. Experimental validation on TIMIT datasets underscores the prowess of PDPCRN. Notably, against baseline models like the standard DPCRN, PDPCRN not only outperforms in PESQ and STOI metrics but also boasts a leaner computational footprint with reduced parameters. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2307.16228 [pdf, other]

Robust Electric Vehicle Balancing of Autonomous Mobility-On-Demand System: A Multi-Agent Reinforcement Learning Approach

Authors: Sihong He, Shuo Han, Fei Miao

Abstract: Electric autonomous vehicles (EAVs) are getting attention in future autonomous mobility-on-demand (AMoD) systems due to their economic and societal benefits. However, EAVs' unique charging patterns (long charging time, high charging frequency, unpredictable charging behaviors, etc.) make it challenging to accurately predict the EAVs supply in E-AMoD systems. Furthermore, the mobility demand's pred… ▽ More Electric autonomous vehicles (EAVs) are getting attention in future autonomous mobility-on-demand (AMoD) systems due to their economic and societal benefits. However, EAVs' unique charging patterns (long charging time, high charging frequency, unpredictable charging behaviors, etc.) make it challenging to accurately predict the EAVs supply in E-AMoD systems. Furthermore, the mobility demand's prediction uncertainty makes it an urgent and challenging task to design an integrated vehicle balancing solution under supply and demand uncertainties. Despite the success of reinforcement learning-based E-AMoD balancing algorithms, state uncertainties under the EV supply or mobility demand remain unexplored. In this work, we design a multi-agent reinforcement learning (MARL)-based framework for EAVs balancing in E-AMoD systems, with adversarial agents to model both the EAVs supply and mobility demand uncertainties that may undermine the vehicle balancing solutions. We then propose a robust E-AMoD Balancing MARL (REBAMA) algorithm to train a robust EAVs balancing policy to balance both the supply-demand ratio and charging utilization rate across the whole city. Experiments show that our proposed robust method performs better compared with a non-robust MARL method that does not consider state uncertainties; it improves the reward, charging utilization fairness, and supply-demand fairness by 19.28%, 28.18%, and 3.97%, respectively. Compared with a robust optimization-based method, the proposed MARL algorithm can improve the reward, charging utilization fairness, and supply-demand fairness by 8.21%, 8.29%, and 9.42%, respectively. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: accepted to International Conference on Intelligent Robots and Systems (IROS2023)

arXiv:2307.16212 [pdf, other]

Robust Multi-Agent Reinforcement Learning with State Uncertainty

Authors: Sihong He, Songyang Han, Sanbao Su, Shuo Han, Shaofeng Zou, Fei Miao

Abstract: In real-world multi-agent reinforcement learning (MARL) applications, agents may not have perfect state information (e.g., due to inaccurate measurement or malicious attacks), which challenges the robustness of agents' policies. Though robustness is getting important in MARL deployment, little prior work has studied state uncertainties in MARL, neither in problem formulation nor algorithm design.… ▽ More In real-world multi-agent reinforcement learning (MARL) applications, agents may not have perfect state information (e.g., due to inaccurate measurement or malicious attacks), which challenges the robustness of agents' policies. Though robustness is getting important in MARL deployment, little prior work has studied state uncertainties in MARL, neither in problem formulation nor algorithm design. Motivated by this robustness issue and the lack of corresponding studies, we study the problem of MARL with state uncertainty in this work. We provide the first attempt to the theoretical and empirical analysis of this challenging problem. We first model the problem as a Markov Game with state perturbation adversaries (MG-SPA) by introducing a set of state perturbation adversaries into a Markov Game. We then introduce robust equilibrium (RE) as the solution concept of an MG-SPA. We conduct a fundamental analysis regarding MG-SPA such as giving conditions under which such a robust equilibrium exists. Then we propose a robust multi-agent Q-learning (RMAQ) algorithm to find such an equilibrium, with convergence guarantees. To handle high-dimensional state-action space, we design a robust multi-agent actor-critic (RMAAC) algorithm based on an analytical expression of the policy gradient derived in the paper. Our experiments show that the proposed RMAQ algorithm converges to the optimal value function; our RMAAC algorithm outperforms several MARL and robust MARL methods in multiple multi-agent environments when state uncertainty is present. The source code is public on \url{https://github.com/sihongho/robust_marl_with_state_uncertainty}. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: 50 pages, Published in TMLR, Transactions on Machine Learning Research (06/2023)

arXiv:2306.16250 [pdf, other]

MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation

Authors: Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Yukai Ju, Shulin He, Yannan Wang, Zhiyong Wu

Abstract: The previous SpEx+ has yielded outstanding performance in speaker extraction and attracted much attention. However, it still encounters inadequate utilization of multi-scale information and speaker embedding. To this end, this paper proposes a new effective speaker extraction system with multi-scale interfusion and conditional speaker modulation (ConSM), which is called MC-SpEx. First of all, we d… ▽ More The previous SpEx+ has yielded outstanding performance in speaker extraction and attracted much attention. However, it still encounters inadequate utilization of multi-scale information and speaker embedding. To this end, this paper proposes a new effective speaker extraction system with multi-scale interfusion and conditional speaker modulation (ConSM), which is called MC-SpEx. First of all, we design the weight-share multi-scale fusers (ScaleFusers) for efficiently leveraging multi-scale information as well as ensuring consistency of the model's feature space. Then, to consider different scale information while generating masks, the multi-scale interactive mask generator (ScaleInterMG) is presented. Moreover, we introduce ConSM module to fully exploit speaker embedding in the speech extractor. Experimental results on the Libri2Mix dataset demonstrate the effectiveness of our improvements and the state-of-the-art performance of our proposed MC-SpEx. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: Accepted by InterSpeech 2023

arXiv:2306.08454 [pdf, other]

Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction

Authors: Wenzhe Liu, Yupeng Shi, Jun Chen, Wei Rao, Shulin He, Andong Li, Yannan Wang, Zhiyong Wu

Abstract: This paper describes a real-time General Speech Reconstruction (Gesper) system submitted to the ICASSP 2023 Speech Signal Improvement (SSI) Challenge. This novel proposed system is a two-stage architecture, in which the speech restoration is performed, and then cascaded by speech enhancement. We propose a complex spectral map**-based generative adversarial network (CSM-GAN) as the speech restora… ▽ More This paper describes a real-time General Speech Reconstruction (Gesper) system submitted to the ICASSP 2023 Speech Signal Improvement (SSI) Challenge. This novel proposed system is a two-stage architecture, in which the speech restoration is performed, and then cascaded by speech enhancement. We propose a complex spectral map**-based generative adversarial network (CSM-GAN) as the speech restoration module for the first time. For noise suppression and dereverberation, the enhancement module is performed with fullband-wideband parallel processing. On the blind test set of ICASSP 2023 SSI Challenge, the proposed Gesper system, which satisfies the real-time condition, achieves 3.27 P.804 overall mean opinion score (MOS) and 3.35 P.835 overall MOS, ranked 1st in both track 1 and track 2. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: Accepted by InterSpeech 2023

arXiv:2304.09324 [pdf, other]

Computer-Vision Benchmark Segment-Anything Model (SAM) in Medical Images: Accuracy in 12 Datasets

Authors: Sheng He, Rina Bao, **gpeng Li, Jeffrey Stout, Atle Bjornerud, P. Ellen Grant, Yangming Ou

Abstract: Background: The segment-anything model (SAM), introduced in April 2023, shows promise as a benchmark model and a universal solution to segment various natural images. It comes without previously-required re-training or fine-tuning specific to each new dataset. Purpose: To test SAM's accuracy in various medical image segmentation tasks and investigate potential factors that may affect its accurac… ▽ More Background: The segment-anything model (SAM), introduced in April 2023, shows promise as a benchmark model and a universal solution to segment various natural images. It comes without previously-required re-training or fine-tuning specific to each new dataset. Purpose: To test SAM's accuracy in various medical image segmentation tasks and investigate potential factors that may affect its accuracy in medical images. Methods: SAM was tested on 12 public medical image segmentation datasets involving 7,451 subjects. The accuracy was measured by the Dice overlap between the algorithm-segmented and ground-truth masks. SAM was compared with five state-of-the-art algorithms specifically designed for medical image segmentation tasks. Associations of SAM's accuracy with six factors were computed, independently and jointly, including segmentation difficulties as measured by segmentation ability score and by Dice overlap in U-Net, image dimension, size of the target region, image modality, and contrast. Results: The Dice overlaps from SAM were significantly lower than the five medical-image-based algorithms in all 12 medical image segmentation datasets, by a margin of 0.1-0.5 and even 0.6-0.7 Dice. SAM-Semantic was significantly associated with medical image segmentation difficulty and the image modality, and SAM-Point and SAM-Box were significantly associated with image segmentation difficulty, image dimension, target region size, and target-vs-background contrast. All these 3 variations of SAM were more accurate in 2D medical images, larger target region sizes, easier cases with a higher Segmentation Ability score and higher U-Net Dice, and higher foreground-background contrast. △ Less

Submitted 5 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

Comments: Technical Report

arXiv:2304.09156 [pdf, other]

Using simulation to design an MPC policy for field navigation using GPS sensing

Authors: Harry Zhang, Stefan Caldararu, Ishaan Mahajan, Shouvik Chatterjee, Thomas Hansen, Abhiraj Dashora, Sriram Ashokkumar, Luning Fang, Xiangru Xu, Shen He, Dan Negrut

Abstract: Modeling a robust control system with a precise GPS-based state estimation capability in simulation can be useful in field navigation applications as it allows for testing and validation in a controlled environment. This testing process would enable navigation systems to be developed and optimized in simulation with direct transferability to real-world scenarios. The multi-physics simulation engin… ▽ More Modeling a robust control system with a precise GPS-based state estimation capability in simulation can be useful in field navigation applications as it allows for testing and validation in a controlled environment. This testing process would enable navigation systems to be developed and optimized in simulation with direct transferability to real-world scenarios. The multi-physics simulation engine Chrono allows for the creation of scenarios that may be difficult or dangerous to replicate in the field, such as extreme weather or terrain conditions. Autonomy Research Testbed (ART), a specialized robotics algorithm testbed, is operated in conjunction with Chrono to develop an MPC control policy as well as an EKF state estimator. This platform enables users to easily integrate custom algorithms in the autonomy stack. This model is initially developed and used in simulation and then tested on a twin vehicle model in reality, to demonstrate the transferability between simulation and reality (also known as Sim2Real). △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: 10 pages,5 figures,submitted to ECCOMAS Thematic Conference on Multibody Dynamics

arXiv:2304.07036 [pdf, other]

Hierarchical Agent-based Reinforcement Learning Framework for Automated Quality Assessment of Fetal Ultrasound Video

Authors: Si**g Liu, Qilong Ying, Shuangchi He, Xin Yang, Dong Ni, Ruobing Huang

Abstract: Ultrasound is the primary modality to examine fetal growth during pregnancy, while the image quality could be affected by various factors. Quality assessment is essential for controlling the quality of ultrasound images to guarantee both the perceptual and diagnostic values. Existing automated approaches often require heavy structural annotations and the predictions may not necessarily be consiste… ▽ More Ultrasound is the primary modality to examine fetal growth during pregnancy, while the image quality could be affected by various factors. Quality assessment is essential for controlling the quality of ultrasound images to guarantee both the perceptual and diagnostic values. Existing automated approaches often require heavy structural annotations and the predictions may not necessarily be consistent with the assessment results by human experts. Furthermore, the overall quality of a scan and the correlation between the quality of frames should not be overlooked. In this work, we propose a reinforcement learning framework powered by two hierarchical agents that collaboratively learn to perform both frame-level and video-level quality assessments. It is equipped with a specially-designed reward mechanism that considers temporal dependency among frame quality and only requires sparse binary annotations to train. Experimental results on a challenging fetal brain dataset verify that the proposed framework could perform dual-level quality assessment and its predictions correlate well with the subjective assessment results. △ Less

Submitted 14 April, 2023; originally announced April 2023.

arXiv:2304.01401 [pdf, other]

U-Netmer: U-Net meets Transformer for medical image segmentation

Authors: Sheng He, Rina Bao, P. Ellen Grant, Yangming Ou

Abstract: The combination of the U-Net based deep learning models and Transformer is a new trend for medical image segmentation. U-Net can extract the detailed local semantic and texture information and Transformer can learn the long-rang dependencies among pixels in the input image. However, directly adapting the Transformer for segmentation has ``token-flatten" problem (flattens the local patches into 1D… ▽ More The combination of the U-Net based deep learning models and Transformer is a new trend for medical image segmentation. U-Net can extract the detailed local semantic and texture information and Transformer can learn the long-rang dependencies among pixels in the input image. However, directly adapting the Transformer for segmentation has ``token-flatten" problem (flattens the local patches into 1D tokens which losses the interaction among pixels within local patches) and ``scale-sensitivity" problem (uses a fixed scale to split the input image into local patches). Compared to directly combining U-Net and Transformer, we propose a new global-local fashion combination of U-Net and Transformer, named U-Netmer, to solve the two problems. The proposed U-Netmer splits an input image into local patches. The global-context information among local patches is learnt by the self-attention mechanism in Transformer and U-Net segments each local patch instead of flattening into tokens to solve the `token-flatten" problem. The U-Netmer can segment the input image with different patch sizes with the identical structure and the same parameter. Thus, the U-Netmer can be trained with different patch sizes to solve the ``scale-sensitivity" problem. We conduct extensive experiments in 7 public datasets on 7 organs (brain, heart, breast, lung, polyp, pancreas and prostate) and 4 imaging modalities (MRI, CT, ultrasound, and endoscopy) to show that the proposed U-Netmer can be generally applied to improve accuracy of medical image segmentation. These experimental results show that U-Netmer provides state-of-the-art performance compared to baselines and other models. In addition, the discrepancy among the outputs of U-Netmer with different scales is linearly correlated to the segmentation accuracy which can be considered as a confidence score to rank test images by difficulty without ground-truth. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: 10 pages, 5 figures, under review

arXiv:2303.07704 [pdf, other]

TEA-PSE 3.0: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System For ICASSP 2023 DNS Challenge

Authors: Yukai Ju, Jun Chen, Shimin Zhang, Shulin He, Wei Rao, Weixin Zhu, Yannan Wang, Tao Yu, Shidong Shang

Abstract: This paper introduces the Unbeatable Team's submission to the ICASSP 2023 Deep Noise Suppression (DNS) Challenge. We expand our previous work, TEA-PSE, to its upgraded version -- TEA-PSE 3.0. Specifically, TEA-PSE 3.0 incorporates a residual LSTM after squeezed temporal convolution network (S-TCN) to enhance sequence modeling capabilities. Additionally, the local-global representation (LGR) struct… ▽ More This paper introduces the Unbeatable Team's submission to the ICASSP 2023 Deep Noise Suppression (DNS) Challenge. We expand our previous work, TEA-PSE, to its upgraded version -- TEA-PSE 3.0. Specifically, TEA-PSE 3.0 incorporates a residual LSTM after squeezed temporal convolution network (S-TCN) to enhance sequence modeling capabilities. Additionally, the local-global representation (LGR) structure is introduced to boost speaker information extraction, and multi-STFT resolution loss is used to effectively capture the time-frequency characteristics of the speech signals. Moreover, retraining methods are employed based on the freeze training strategy to fine-tune the system. According to the official results, TEA-PSE 3.0 ranks 1st in both ICASSP 2023 DNS-Challenge track 1 and track 2. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023

arXiv:2303.06593 [pdf, other]

Domain-Knowledge-Aided Airborne Ground Moving Targets Tracking

Authors: Jianduo Chai, Shaoming He, Hyo-Sang Shin

Abstract: This paper investigates the problem of traffic surveillance using an unmanned aerial vehicle (UAV) and proposes a domain-knowledge-aided airborne ground moving targets tracking algorithm. To improve the accuracy of multiple targets tracking, the proposed algorithm incorporates domain knowledge into the joint probabilistic data association (JPDA) filter as state constraints. The domain knowledge co… ▽ More This paper investigates the problem of traffic surveillance using an unmanned aerial vehicle (UAV) and proposes a domain-knowledge-aided airborne ground moving targets tracking algorithm. To improve the accuracy of multiple targets tracking, the proposed algorithm incorporates domain knowledge into the joint probabilistic data association (JPDA) filter as state constraints. The domain knowledge considered in this paper includes both road information extracted from a given map and local traffic regulations. Conventional track update method is modified to enhance the capability of recognition of temporarily track loss. A variable structure multiple model (VS-MM) method is developed to assign the road segment to a given target. The effectiveness of proposed algorithm is demonstrated through extensive numerical simulations. △ Less

Submitted 12 March, 2023; originally announced March 2023.

arXiv:2302.14246 [pdf, other]

i2LQR: Iterative LQR for Iterative Tasks in Dynamic Environments

Authors: Yifan Zeng, Suiyi He, Han Hoang Nguyen, Yihan Li, Zhongyu Li, Koushil Sreenath, Jun Zeng

Abstract: This work introduces a novel control strategy called Iterative Linear Quadratic Regulator for Iterative Tasks (i2LQR), which aims to improve closed-loop performance with local trajectory optimization for iterative tasks in a dynamic environment. The proposed algorithm is reference-free and utilizes historical data from previous iterations to enhance the performance of the autonomous system. Unlike… ▽ More This work introduces a novel control strategy called Iterative Linear Quadratic Regulator for Iterative Tasks (i2LQR), which aims to improve closed-loop performance with local trajectory optimization for iterative tasks in a dynamic environment. The proposed algorithm is reference-free and utilizes historical data from previous iterations to enhance the performance of the autonomous system. Unlike existing algorithms, the i2LQR computes the optimal solution in an iterative manner at each timestamp, rendering it well-suited for iterative tasks with changing constraints at different iterations. To evaluate the performance of the proposed algorithm, we conduct numerical simulations for an iterative task aimed at minimizing completion time. The results show that i2LQR achieves an optimized performance with respect to learning-based MPC (LMPC) as the benchmark in static environments, and outperforms LMPC in dynamic environments with both static and dynamics obstacles. △ Less

Submitted 6 September, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: Accepted by 2023 62nd IEEE Conference on Decision and Control (CDC)

arXiv:2302.13770 [pdf, other]

Mask Reference Image Quality Assessment

Authors: Pengxiang Xiao, Shuai He, Limin Liu, Anlong Ming

Abstract: Understanding semantic information is an essential step in knowing what is being learned in both full-reference (FR) and no-reference (NR) image quality assessment (IQA) methods. However, especially for many severely distorted images, even if there is an undistorted image as a reference (FR-IQA), it is difficult to perceive the lost semantic and texture information of distorted images directly. In… ▽ More Understanding semantic information is an essential step in knowing what is being learned in both full-reference (FR) and no-reference (NR) image quality assessment (IQA) methods. However, especially for many severely distorted images, even if there is an undistorted image as a reference (FR-IQA), it is difficult to perceive the lost semantic and texture information of distorted images directly. In this paper, we propose a Mask Reference IQA (MR-IQA) method that masks specific patches of a distorted image and supplements missing patches with the reference image patches. In this way, our model only needs to input the reconstructed image for quality assessment. First, we design a mask generator to select the best candidate patches from reference images and supplement the lost semantic information in distorted images, thus providing more reference for quality assessment; in addition, the different masked patches imply different data augmentations, which favors model training and reduces overfitting. Second, we provide a Mask Reference Network (MRNet): the dedicated modules can prevent disturbances due to masked patches and help eliminate the patch discontinuity in the reconstructed image. Our method achieves state-of-the-art performances on the benchmark KADID-10k, LIVE and CSIQ datasets and has better generalization performance across datasets. The code and results are available in the supplementary material. △ Less

Submitted 19 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: 10 pages, 6 figures

arXiv:2301.00641 [pdf, other]

doi 10.1109/TNNLS.2022.3232630

Federated Multi-Agent Deep Reinforcement Learning Approach via Physics-Informed Reward for Multi-Microgrid Energy Management

Authors: Yuanzheng Li, Shangyang He, Yang Li, Yang Shi, Zhigang Zeng

Abstract: The utilization of large-scale distributed renewable energy promotes the development of the multi-microgrid (MMG), which raises the need of develo** an effective energy management method to minimize economic costs and keep self energy-sufficiency. The multi-agent deep reinforcement learning (MADRL) has been widely used for the energy management problem because of its real-time scheduling ability… ▽ More The utilization of large-scale distributed renewable energy promotes the development of the multi-microgrid (MMG), which raises the need of develo** an effective energy management method to minimize economic costs and keep self energy-sufficiency. The multi-agent deep reinforcement learning (MADRL) has been widely used for the energy management problem because of its real-time scheduling ability. However, its training requires massive energy operation data of microgrids (MGs), while gathering these data from different MGs would threaten their privacy and data security. Therefore, this paper tackles this practical yet challenging issue by proposing a federated multi-agent deep reinforcement learning (F-MADRL) algorithm via the physics-informed reward. In this algorithm, the federated learning (FL) mechanism is introduced to train the F-MADRL algorithm thus ensures the privacy and the security of data. In addition, a decentralized MMG model is built, and the energy of each participated MG is managed by an agent, which aims to minimize economic costs and keep self energy-sufficiency according to the physics-informed reward. At first, MGs individually execute the self-training based on local energy operation data to train their local agent models. Then, these local models are periodically uploaded to a server and their parameters are aggregated to build a global agent, which will be broadcasted to MGs and replace their local agents. In this way, the experience of each MG agent can be shared and the energy operation data is not explicitly transmitted, thus protecting the privacy and ensuring data security. Finally, experiments are conducted on Oak Ridge national laboratory distributed energy control communication lab microgrid (ORNL-MG) test system, and the comparisons are carried out to verify the effectiveness of introducing the FL mechanism and the outperformance of our proposed F-MADRL. △ Less

Submitted 29 December, 2022; originally announced January 2023.

Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems

Journal ref: IEEE Transactions on Neural Networks and Learning Systems 35 (2024) 5902-5914

arXiv:2212.09206 [pdf, other]

Segmentation Ability Map: Interpret deep features for medical image segmentation

Authors: Sheng He, Yanfang Feng, P. Ellen Grant, Yangming Ou

Abstract: Deep convolutional neural networks (CNNs) have been widely used for medical image segmentation. In most studies, only the output layer is exploited to compute the final segmentation results and the hidden representations of the deep learned features have not been well understood. In this paper, we propose a prototype segmentation (ProtoSeg) method to compute a binary segmentation map based on deep… ▽ More Deep convolutional neural networks (CNNs) have been widely used for medical image segmentation. In most studies, only the output layer is exploited to compute the final segmentation results and the hidden representations of the deep learned features have not been well understood. In this paper, we propose a prototype segmentation (ProtoSeg) method to compute a binary segmentation map based on deep features. We measure the segmentation abilities of the features by computing the Dice between the feature segmentation map and ground-truth, named as the segmentation ability score (SA score for short). The corresponding SA score can quantify the segmentation abilities of deep features in different layers and units to understand the deep neural networks for segmentation. In addition, our method can provide a mean SA score which can give a performance estimation of the output on the test images without ground-truth. Finally, we use the proposed ProtoSeg method to compute the segmentation map directly on input images to further understand the segmentation ability of each input image. Results are presented on segmenting tumors in brain MRI, lesions in skin images, COVID-related abnormality in CT images, prostate segmentation in abdominal MRI, and pancreatic mass segmentation in CT images. Our method can provide new insights for interpreting and explainable AI systems for medical image segmentation. Our code is available on: \url{https://github.com/shengfly/ProtoSeg}. △ Less

Submitted 18 December, 2022; originally announced December 2022.

Journal ref: Medical Image Analysis, 2023

arXiv:2212.02764 [pdf, other]

A Trustworthy Framework for Medical Image Analysis with Deep Learning

Authors: Kai Ma, Siyuan He, Pengcheng Xi, Ashkan Ebadi, Stéphane Tremblay, Alexander Wong

Abstract: Computer vision and machine learning are playing an increasingly important role in computer-assisted diagnosis; however, the application of deep learning to medical imaging has challenges in data availability and data imbalance, and it is especially important that models for medical imaging are built to be trustworthy. Therefore, we propose TRUDLMIA, a trustworthy deep learning framework for medic… ▽ More Computer vision and machine learning are playing an increasingly important role in computer-assisted diagnosis; however, the application of deep learning to medical imaging has challenges in data availability and data imbalance, and it is especially important that models for medical imaging are built to be trustworthy. Therefore, we propose TRUDLMIA, a trustworthy deep learning framework for medical image analysis, which adopts a modular design, leverages self-supervised pre-training, and utilizes a novel surrogate loss function. Experimental evaluations indicate that models generated from the framework are both trustworthy and high-performing. It is anticipated that the framework will support researchers and clinicians in advancing the use of deep learning for dealing with public health crises including COVID-19. △ Less

Submitted 6 December, 2022; originally announced December 2022.

arXiv:2212.01106

ExARN: self-attending RNN for target speaker extraction

Authors: Pengjie Shen, Shulin He, Xueliang Zhang

Abstract: Target speaker extraction is to extract the target speaker, specified by enrollment utterance, in an environment with other competing speakers. Therefore, the task needs to solve two problems, speaker identification and separation, at the same time. In this paper, we combine self-attention and Recurrent Neural Networks (RNN). Further, we exploit various ways to combining different auxiliary inform… ▽ More Target speaker extraction is to extract the target speaker, specified by enrollment utterance, in an environment with other competing speakers. Therefore, the task needs to solve two problems, speaker identification and separation, at the same time. In this paper, we combine self-attention and Recurrent Neural Networks (RNN). Further, we exploit various ways to combining different auxiliary information with mixed representations. Experimental results show that our proposed model achieves excellent performance on the task of target speaker extraction. △ Less

Submitted 12 March, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: The overall quality of the article is not good enough

arXiv:2211.13797 [pdf, other]

Data-Driven Distributionally Robust Electric Vehicle Balancing for Autonomous Mobility-on-Demand Systems under Demand and Supply Uncertainties

Authors: Sihong He, Zhili Zhang, Shuo Han, Lynn Pepin, Guang Wang, Desheng Zhang, John Stankovic, Fei Miao

Abstract: Electric vehicles (EVs) are being rapidly adopted due to their economic and societal benefits. Autonomous mobility-on-demand (AMoD) systems also embrace this trend. However, the long charging time and high recharging frequency of EVs pose challenges to efficiently managing EV AMoD systems. The complicated dynamic charging and mobility process of EV AMoD systems makes the demand and supply uncertai… ▽ More Electric vehicles (EVs) are being rapidly adopted due to their economic and societal benefits. Autonomous mobility-on-demand (AMoD) systems also embrace this trend. However, the long charging time and high recharging frequency of EVs pose challenges to efficiently managing EV AMoD systems. The complicated dynamic charging and mobility process of EV AMoD systems makes the demand and supply uncertainties significant when designing vehicle balancing algorithms. In this work, we design a data-driven distributionally robust optimization (DRO) approach to balance EVs for both the mobility service and the charging process. The optimization goal is to minimize the worst-case expected cost under both passenger mobility demand uncertainties and EV supply uncertainties. We then propose a novel distributional uncertainty sets construction algorithm that guarantees the produced parameters are contained in desired confidence regions with a given probability. To solve the proposed DRO AMoD EV balancing problem, we derive an equivalent computationally tractable convex optimization problem. Based on real-world EV data of a taxi system, we show that with our solution the average total balancing cost is reduced by 14.49%, and the average mobility fairness and charging fairness are improved by 15.78% and 34.51%, respectively, compared to solutions that do not consider uncertainties. △ Less

Submitted 24 November, 2022; originally announced November 2022.

Comments: 16 pages

arXiv:2211.12340 [pdf, other]

DOLCE: A Model-Based Probabilistic Diffusion Framework for Limited-Angle CT Reconstruction

Authors: Jiaming Liu, Rushil Anirudh, Jayaraman J. Thiagarajan, Stewart He, K. Aditya Mohan, Ulugbek S. Kamilov, Hyo** Kim

Abstract: Limited-Angle Computed Tomography (LACT) is a non-destructive evaluation technique used in a variety of applications ranging from security to medicine. The limited angle coverage in LACT is often a dominant source of severe artifacts in the reconstructed images, making it a challenging inverse problem. We present DOLCE, a new deep model-based framework for LACT that uses a conditional diffusion mo… ▽ More Limited-Angle Computed Tomography (LACT) is a non-destructive evaluation technique used in a variety of applications ranging from security to medicine. The limited angle coverage in LACT is often a dominant source of severe artifacts in the reconstructed images, making it a challenging inverse problem. We present DOLCE, a new deep model-based framework for LACT that uses a conditional diffusion model as an image prior. Diffusion models are a recent class of deep generative models that are relatively easy to train due to their implementation as image denoisers. DOLCE can form high-quality images from severely under-sampled data by integrating data-consistency updates with the sampling updates of a diffusion model, which is conditioned on the transformed limited-angle data. We show through extensive experimentation on several challenging real LACT datasets that, the same pre-trained DOLCE model achieves the SOTA performance on drastically different types of images. Additionally, we show that, unlike standard LACT reconstruction methods, DOLCE naturally enables the quantification of the reconstruction uncertainty by generating multiple samples consistent with the measured data. △ Less

Submitted 22 November, 2022; originally announced November 2022.

Comments: 29 pages, 21 figures

arXiv:2210.15853 [pdf, other]

Speech Enhancement with Intelligent Neural Homomorphic Synthesis

Authors: Shulin He, Wei Rao, **jiang Liu, Jun Chen, Yukai Ju, Xueliang Zhang, Yannan Wang, Shidong Shang

Abstract: Most neural network speech enhancement models ignore speech production mathematical models by directly map** Fourier transform spectrums or waveforms. In this work, we propose a neural source filter network for speech enhancement. Specifically, we use homomorphic signal processing and cepstral analysis to obtain noisy speech's excitation and vocal tract. Unlike traditional signal processing, we… ▽ More Most neural network speech enhancement models ignore speech production mathematical models by directly map** Fourier transform spectrums or waveforms. In this work, we propose a neural source filter network for speech enhancement. Specifically, we use homomorphic signal processing and cepstral analysis to obtain noisy speech's excitation and vocal tract. Unlike traditional signal processing, we use an attentive recurrent network (ARN) model predicted ratio mask to replace the liftering separation function. Then two convolutional attentive recurrent network (CARN) networks are used to predict the excitation and vocal tract of clean speech, respectively. The system's output is synthesized from the estimated excitation and vocal. Experiments prove that our proposed method performs better, with SI-SNR improving by 1.363dB compared to FullSubNet. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: Submitted to ICASSP 2023

arXiv:2210.15849 [pdf, ps, other]

Hierarchical speaker representation for target speaker extraction

Authors: Shulin He, Huaiwen Zhang, Wei Rao, Kanghao Zhang, Yukai Ju, Yang Yang, Xueliang Zhang

Abstract: Target speaker extraction aims to isolate a specific speaker's voice from a composite of multiple sound sources, guided by an enrollment utterance or called anchor. Current methods predominantly derive speaker embeddings from the anchor and integrate them into the separation network to separate the voice of the target speaker. However, the representation of the speaker embedding is too simplistic,… ▽ More Target speaker extraction aims to isolate a specific speaker's voice from a composite of multiple sound sources, guided by an enrollment utterance or called anchor. Current methods predominantly derive speaker embeddings from the anchor and integrate them into the separation network to separate the voice of the target speaker. However, the representation of the speaker embedding is too simplistic, often being merely a 1*1024 vector. This dense information makes it difficult for the separation network to harness effectively. To address this limitation, we introduce a pioneering methodology called Hierarchical Representation (HR) that seamlessly fuses anchor data across granular and overarching 5 layers of the separation network, enhancing the precision of target extraction. HR amplifies the efficacy of anchors to improve target speaker isolation. On the Libri-2talker dataset, HR substantially outperforms state-of-the-art time-frequency domain techniques. Further demonstrating HR's capabilities, we achieved first place in the prestigious ICASSP 2023 Deep Noise Suppression Challenge. The proposed HR methodology shows great promise for advancing target speaker extraction through enhanced anchor utilization. △ Less

Submitted 4 January, 2024; v1 submitted 27 October, 2022; originally announced October 2022.

Comments: Accepted to ICASSP 2024

arXiv:2210.04747 [pdf, other]

An NLoS-based Enhanced Sensing Method for MmWave Communication System

Authors: Shiwen He, Kangli Cai, Shiyue Huang, Zhenyu Anz, Wei Huang, Ning Gao

Abstract: The millimeter-wave (mmWave)-based Wi-Fi sensing technology has recently attracted extensive attention since it provides a possibility to realize higher sensing accuracy. However, current works mainly concentrate on sensing scenarios where the line-of-sight (LoS) path exists, which significantly limits their applications. To address the problem, we propose an enhanced mmWave sensing algorithm in t… ▽ More The millimeter-wave (mmWave)-based Wi-Fi sensing technology has recently attracted extensive attention since it provides a possibility to realize higher sensing accuracy. However, current works mainly concentrate on sensing scenarios where the line-of-sight (LoS) path exists, which significantly limits their applications. To address the problem, we propose an enhanced mmWave sensing algorithm in the 3D non-line-of-sight environment (mm3NLoS), aiming to sense the direction and distance of the target when the LoS path is weak or blocked. Specifically, we first adopt the directional beam to estimate the azimuth/elevation angle of arrival (AoA) and angle of departure (AoD) of the reflection path. Then, the distance of the related path is measured by the fine timing measurement protocol. Finally, we transform the AoA and AoD of the multiple non-line-of-sight (NLoS) paths into the direction vector and then obtain the information of targets based on the geometric relationship. The simulation results demonstrate that mm3NLoS can achieve a centimeter-level error with a 2m spacing. Compared to the prior work, it can significantly reduce the performance degradation under the NLoS condition. △ Less

Submitted 10 October, 2022; originally announced October 2022.

arXiv:2209.08230 [pdf, other]

A Robust and Constrained Multi-Agent Reinforcement Learning Electric Vehicle Rebalancing Method in AMoD Systems

Authors: Sihong He, Yue Wang, Shuo Han, Shaofeng Zou, Fei Miao

Abstract: Electric vehicles (EVs) play critical roles in autonomous mobility-on-demand (AMoD) systems, but their unique charging patterns increase the model uncertainties in AMoD systems (e.g. state transition probability). Since there usually exists a mismatch between the training and test/true environments, incorporating model uncertainty into system design is of critical importance in real-world applicat… ▽ More Electric vehicles (EVs) play critical roles in autonomous mobility-on-demand (AMoD) systems, but their unique charging patterns increase the model uncertainties in AMoD systems (e.g. state transition probability). Since there usually exists a mismatch between the training and test/true environments, incorporating model uncertainty into system design is of critical importance in real-world applications. However, model uncertainties have not been considered explicitly in EV AMoD system rebalancing by existing literature yet, and the coexistence of model uncertainties and constraints that the decision should satisfy makes the problem even more challenging. In this work, we design a robust and constrained multi-agent reinforcement learning (MARL) framework with state transition kernel uncertainty for EV AMoD systems. We then propose a robust and constrained MARL algorithm (ROCOMA) with robust natural policy gradients (RNPG) that trains a robust EV rebalancing policy to balance the supply-demand ratio and the charging utilization rate across the city under model uncertainty. Experiments show that the ROCOMA can learn an effective and robust rebalancing policy. It outperforms non-robust MARL methods in the presence of model uncertainties. It increases the system fairness by 19.6% and decreases the rebalancing costs by 75.8%. △ Less

Submitted 27 September, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

Comments: 8 pages, accepted to IROS2023

arXiv:2208.13113 [pdf, other]

Accurate and Robust Lesion RECIST Diameter Prediction and Segmentation with Transformers

Authors: Youbao Tang, Ning Zhang, Yirui Wang, Shenghua He, Mei Han, **g Xiao, Ruei-Sung Lin

Abstract: Automatically measuring lesion/tumor size with RECIST (Response Evaluation Criteria In Solid Tumors) diameters and segmentation is important for computer-aided diagnosis. Although it has been studied in recent years, there is still space to improve its accuracy and robustness, such as (1) enhancing features by incorporating rich contextual information while kee** a high spatial resolution and (2… ▽ More Automatically measuring lesion/tumor size with RECIST (Response Evaluation Criteria In Solid Tumors) diameters and segmentation is important for computer-aided diagnosis. Although it has been studied in recent years, there is still space to improve its accuracy and robustness, such as (1) enhancing features by incorporating rich contextual information while kee** a high spatial resolution and (2) involving new tasks and losses for joint optimization. To reach this goal, this paper proposes a transformer-based network (MeaFormer, Measurement transFormer) for lesion RECIST diameter prediction and segmentation (LRDPS). It is formulated as three correlative and complementary tasks: lesion segmentation, heatmap prediction, and keypoint regression. To the best of our knowledge, it is the first time to use keypoint regression for RECIST diameter prediction. MeaFormer can enhance high-resolution features by employing transformers to capture their long-range dependencies. Two consistency losses are introduced to explicitly build relationships among these tasks for better optimization. Experiments show that MeaFormer achieves the state-of-the-art performance of LRDPS on the large-scale DeepLesion dataset and produces promising results of two downstream clinic-relevant tasks, i.e., 3D lesion segmentation and RECIST assessment in longitudinal studies. △ Less

Submitted 27 August, 2022; originally announced August 2022.

Comments: One of a series of works about lesion RECIST diameter prediction and weakly-supervised lesion segmentation (MICCAI 2022)

arXiv:2205.12727 [pdf, other]

Semantic-preserved Communication System for Highly Efficient Speech Transmission

Authors: Tianxiao Han, Qianqian Yang, Zhiguo Shi, Shibo He, Zhaoyang Zhang

Abstract: Deep learning (DL) based semantic communication methods have been explored for the efficient transmission of images, text, and speech in recent years. In contrast to traditional wireless communication methods that focus on the transmission of abstract symbols, semantic communication approaches attempt to achieve better transmission efficiency by only sending the semantic-related information of the… ▽ More Deep learning (DL) based semantic communication methods have been explored for the efficient transmission of images, text, and speech in recent years. In contrast to traditional wireless communication methods that focus on the transmission of abstract symbols, semantic communication approaches attempt to achieve better transmission efficiency by only sending the semantic-related information of the source data. In this paper, we consider semantic-oriented speech transmission which transmits only the semantic-relevant information over the channel for the speech recognition task, and a compact additional set of semantic-irrelevant information for the speech reconstruction task. We propose a novel end-to-end DL-based transceiver which extracts and encodes the semantic information from the input speech spectrums at the transmitter and outputs the corresponding transcriptions from the decoded semantic information at the receiver. For the speech to speech transmission, we further include a CTC alignment module that extracts a small number of additional semantic-irrelevant but speech-related information for the better reconstruction of the original speech signals at the receiver. The simulation results confirm that our proposed method outperforms current methods in terms of the accuracy of the predicted text for the speech to text transmission and the quality of the recovered speech signals for the speech to speech transmission, and significantly improves transmission efficiency. More specifically, the proposed method only sends 16% of the amount of the transmitted symbols required by the existing methods while achieving about 10% reduction in WER for the speech to text transmission. For the speech to speech transmission, it results in an even more remarkable improvement in terms of transmission efficiency with only 0.2% of the amount of the transmitted symbols required by the existing method. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2202.03211

arXiv:2204.11538 [pdf, other]

doi 10.1109/MVT.2023.3237004

Leveraging RIS-Enabled Smart Signal Propagation for Solving Infeasible Localization Problems

Authors: Kamran Keykhosravi, Benoit Denis, George C. Alexandropoulos, Zhongxia Simon He, Antonio Albanese, Vincenzo Sciancalepore, Henk Wymeersch

Abstract: Reconfigurable intelligent surfaces (RISs) have tremendous potential for both communication and localization. While communication benefits are now well-understood, the breakthrough nature of the technology may well lie in its capability to provide location estimates when conventional approaches fail, (e.g., due to insufficient available infrastructure). A limited number of example scenarios have b… ▽ More Reconfigurable intelligent surfaces (RISs) have tremendous potential for both communication and localization. While communication benefits are now well-understood, the breakthrough nature of the technology may well lie in its capability to provide location estimates when conventional approaches fail, (e.g., due to insufficient available infrastructure). A limited number of example scenarios have been identified, but an overview of possible RIS-enabled localization scenarios is still missing from the literature. In this article, we present such an overview and extend localization to include even user orientation or velocity. In particular, we consider localization scenarios with various numbers of RISs, single- or multi-antenna base stations, narrowband or wideband transmissions, and near- and farfield operation. Furthermore, we provide a short description of the general RIS operation together with radio localization fundamentals, experimental validation of a localization scheme with two RISs, as well as key research directions and open challenges specific to RIS-enabled localization and sensing. △ Less

Submitted 25 April, 2022; originally announced April 2022.

arXiv:2204.07905 [pdf, other]

doi 10.1109/TIV.2022.3168577

Probabilistic Charging Power Forecast of EVCS: Reinforcement Learning Assisted Deep Learning Approach

Authors: Yuanzheng Li, Shangyang He, Yang Li, Leijiao Ge, Suhua Lou, Zhigang Zeng

Abstract: The electric vehicle (EV) and electric vehicle charging station (EVCS) have been widely deployed with the development of large-scale transportation electrifications. However, since charging behaviors of EVs show large uncertainties, the forecasting of EVCS charging power is non-trivial. This paper tackles this issue by proposing a reinforcement learning assisted deep learning framework for the pro… ▽ More The electric vehicle (EV) and electric vehicle charging station (EVCS) have been widely deployed with the development of large-scale transportation electrifications. However, since charging behaviors of EVs show large uncertainties, the forecasting of EVCS charging power is non-trivial. This paper tackles this issue by proposing a reinforcement learning assisted deep learning framework for the probabilistic EVCS charging power forecasting to capture its uncertainties. Since the EVCS charging power data are not standard time-series data like electricity load, they are first converted to the time-series format. On this basis, one of the most popular deep learning models, the long short-term memory (LSTM) is used and trained to obtain the point forecast of EVCS charging power. To further capture the forecast uncertainty, a Markov decision process (MDP) is employed to model the change of LSTM cell states, which is solved by our proposed adaptive exploration proximal policy optimization (AePPO) algorithm based on reinforcement learning. Finally, experiments are carried out on the real EVCSs charging data from Caltech, and Jet Propulsion Laboratory, USA, respectively. The results and comparative analysis verify the effectiveness and outperformance of our proposed framework. △ Less

Submitted 16 April, 2022; originally announced April 2022.

Comments: Accepted by IEEE Transactions on Intelligent Vehicles

Journal ref: IEEE Transactions on Intelligent Vehicles 8 (2023) 344-357

arXiv:2204.06929 [pdf, other]

Sketch guided and progressive growing GAN for realistic and editable ultrasound image synthesis

Authors: Jiamin Liang, Xin Yang, Yuhao Huang, Haoming Li, Shuangchi He, Xindi Hu, Zejian Chen, Wufeng Xue, Jun Cheng, Dong Ni

Abstract: Ultrasound (US) imaging is widely used for anatomical structure inspection in clinical diagnosis. The training of new sonographers and deep learning based algorithms for US image analysis usually requires a large amount of data. However, obtaining and labeling large-scale US imaging data are not easy tasks, especially for diseases with low incidence. Realistic US image synthesis can alleviate this… ▽ More Ultrasound (US) imaging is widely used for anatomical structure inspection in clinical diagnosis. The training of new sonographers and deep learning based algorithms for US image analysis usually requires a large amount of data. However, obtaining and labeling large-scale US imaging data are not easy tasks, especially for diseases with low incidence. Realistic US image synthesis can alleviate this problem to a great extent. In this paper, we propose a generative adversarial network (GAN) based image synthesis framework. Our main contributions include: 1) we present the first work that can synthesize realistic B-mode US images with high-resolution and customized texture editing features; 2) to enhance structural details of generated images, we propose to introduce auxiliary sketch guidance into a conditional GAN. We superpose the edge sketch onto the object mask and use the composite mask as the network input; 3) to generate high-resolution US images, we adopt a progressive training strategy to gradually generate high-resolution images from low-resolution images. In addition, a feature loss is proposed to minimize the difference of high-level features between the generated and real images, which further improves the quality of generated images; 4) the proposed US image synthesis method is quite universal and can also be generalized to the US images of other anatomical structures besides the three ones tested in our study (lung, hip joint, and ovary); 5) extensive experiments on three large US image datasets are conducted to validate our method. Ablation studies, customized texture editing, user studies, and segmentation tests demonstrate promising results of our method in synthesizing realistic US images. △ Less

Submitted 25 May, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

Comments: Accepted by Medical Image Analysis (13 figures, 4 tabels)

arXiv:2202.04754 [pdf, other]

Wireless Transmission of Images With The Assistance of Multi-level Semantic Information

Authors: Zhenguo Zhang, Qianqian Yang, Shibo He, Mingyang Sun, Jiming Chen

Abstract: Semantic-oriented communication has been considered as a promising to boost the bandwidth efficiency by only transmitting the semantics of the data. In this paper, we propose a multi-level semantic aware communication system for wireless image transmission, named MLSC-image, which is based on the deep learning techniques and trained in an end to end manner. In particular, the proposed model includ… ▽ More Semantic-oriented communication has been considered as a promising to boost the bandwidth efficiency by only transmitting the semantics of the data. In this paper, we propose a multi-level semantic aware communication system for wireless image transmission, named MLSC-image, which is based on the deep learning techniques and trained in an end to end manner. In particular, the proposed model includes a multilevel semantic feature extractor, that extracts both the highlevel semantic information, such as the text semantics and the segmentation semantics, and the low-level semantic information, such as local spatial details of the images. We employ a pretrained image caption to capture the text semantics and a pretrained image segmentation model to obtain the segmentation semantics. These high-level and low-level semantic features are then combined and encoded by a joint semantic and channel encoder into symbols to transmit over the physical channel. The numerical results validate the effectiveness and efficiency of the proposed semantic communication system, especially under the limited bandwidth condition, which indicates the advantages of the high-level semantics in the compression of images. △ Less

Submitted 8 December, 2023; v1 submitted 8 February, 2022; originally announced February 2022.

arXiv:2202.03211 [pdf, other]

Semantic-aware Speech to Text Transmission with Redundancy Removal

Authors: Tianxiao Han, Qianqian Yang, Zhiguo Shi, Shibo He, Zhaoyang Zhang

Abstract: Deep learning (DL) based semantic communication methods have been explored for the efficient transmission of images, text, and speech in recent years. In contrast to traditional wireless communication methods that focus on the transmission of abstract symbols, semantic communication approaches attempt to achieve better transmission efficiency by only sending the semantic-related information of the… ▽ More Deep learning (DL) based semantic communication methods have been explored for the efficient transmission of images, text, and speech in recent years. In contrast to traditional wireless communication methods that focus on the transmission of abstract symbols, semantic communication approaches attempt to achieve better transmission efficiency by only sending the semantic-related information of the source data. In this paper, we consider semantic-oriented speech to text transmission. We propose a novel end-to-end DL-based transceiver, which includes an attention-based soft alignment module and a redundancy removal module to compress the transmitted data. In particular, the former extracts only the text-related semantic features, and the latter further drops the semantically redundant content, greatly reducing the amount of semantic redundancy compared to existing methods. We also propose a two-stage training scheme, which speeds up the training of the proposed DL model. The simulation results indicate that our proposed method outperforms current methods in terms of the accuracy of the received text and transmission efficiency. Moreover, the proposed method also has a smaller model size and shorter end-to-end runtime. △ Less

Submitted 7 February, 2022; originally announced February 2022.

arXiv:2112.08363 [pdf, other]

Performance or Trust? Why Not Both. Deep AUC Maximization with Self-Supervised Learning for COVID-19 Chest X-ray Classifications

Authors: Siyuan He, Pengcheng Xi, Ashkan Ebadi, Stephane Tremblay, Alexander Wong

Abstract: Effective representation learning is the key in improving model performance for medical image analysis. In training deep learning models, a compromise often must be made between performance and trust, both of which are essential for medical applications. Moreover, models optimized with cross-entropy loss tend to suffer from unwarranted overconfidence in the majority class and over-cautiousness in… ▽ More Effective representation learning is the key in improving model performance for medical image analysis. In training deep learning models, a compromise often must be made between performance and trust, both of which are essential for medical applications. Moreover, models optimized with cross-entropy loss tend to suffer from unwarranted overconfidence in the majority class and over-cautiousness in the minority class. In this work, we integrate a new surrogate loss with self-supervised learning for computer-aided screening of COVID-19 patients using radiography images. In addition, we adopt a new quantification score to measure a model's trustworthiness. Ablation study is conducted for both the performance and the trust on feature learning methods and loss functions. Comparisons show that leveraging the new surrogate loss on self-supervised models can produce label-efficient networks that are both high-performing and trustworthy. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: 3 pages

Journal ref: Published at CVIS 2021: 7th Annual Conference on Vision and Intelligent Systems

arXiv:2112.06435 [pdf, other]

Autonomous Racing with Multiple Vehicles using a Parallelized Optimization with Safety Guarantee using Control Barrier Functions

Authors: Suiyi He, Jun Zeng, Koushil Sreenath

Abstract: This paper presents a novel planning and control strategy for competing with multiple vehicles in a car racing scenario. The proposed racing strategy switches between two modes. When there are no surrounding vehicles, a learning-based model predictive control (MPC) trajectory planner is used to guarantee that the ego vehicle achieves better lap timing performance. When the ego vehicle is competing… ▽ More This paper presents a novel planning and control strategy for competing with multiple vehicles in a car racing scenario. The proposed racing strategy switches between two modes. When there are no surrounding vehicles, a learning-based model predictive control (MPC) trajectory planner is used to guarantee that the ego vehicle achieves better lap timing performance. When the ego vehicle is competing with other surrounding vehicles to overtake, an optimization-based planner generates multiple dynamically-feasible trajectories through parallel computation. Each trajectory is optimized under a MPC formulation with different homotopic Bezier-curve reference paths lying laterally between surrounding vehicles. The time-optimal trajectory among these different homotopic trajectories is selected and a low-level MPC controller with control barrier function constraints for obstacle avoidance is used to guarantee system's safety-critical performance. The proposed algorithm has the capability to generate collision-free trajectories and track them while enhancing the lap timing performance with steady low computational complexity, outperforming existing approaches in both timing and performance for a autonomous racing environment. To demonstrate the performance of our racing strategy, we simulate with multiple randomly generated moving vehicles on the track and test the ego vehicle's overtake maneuvers. △ Less

Submitted 27 March, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

Comments: Accepted to IEEE International Conference on Robotics and Automation (ICRA 2022)

arXiv:2112.01738 [pdf, ps, other]

Joint User Scheduling and Beamforming Design for Multiuser MISO Downlink Systems

Authors: S. He, J. Yuan, Z. An, W. Huang, Y. Huang, Y. Zhang

Abstract: In multiuser communication systems, user scheduling and beamforming (US-BF) design are two fundamental problems that are usually studied separately in the existing literature. In this work, we focus on the joint US-BF design with the goal of maximizing the set cardinality of scheduled users, which is computationally challenging due to the non-convex objective function and the coupled constraints w… ▽ More In multiuser communication systems, user scheduling and beamforming (US-BF) design are two fundamental problems that are usually studied separately in the existing literature. In this work, we focus on the joint US-BF design with the goal of maximizing the set cardinality of scheduled users, which is computationally challenging due to the non-convex objective function and the coupled constraints with discrete-continuous variables. To tackle these difficulties, a successive convex approximation based US-BF (SCA-USBF) optimization algorithm is firstly proposed. Then, inspired by wireless intelligent communication, a graph neural network based joint US-BF (J-USBF) learning algorithm is developed by combining the joint US and power allocation network model with the BF analytical solution. The effectiveness of SCA-USBF and J-USBF is verified by various numerical results, the latter achieves close performance and higher computational efficiency. Furthermore, the proposed J-USBF also enjoys the generalizability in dynamic wireless network scenarios. △ Less

Submitted 4 July, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

Comments: 31 pages, 9 figures, submit to IEEE Transactions on Wireless Communications

arXiv:2110.03775 [pdf]

Proposing a System Level Machine Learning Hybrid Architecture and Approach for a Comprehensive Autism Spectrum Disorder Diagnosis

Authors: Ryan Liu, Spencer He

Abstract: Autism Spectrum Disorder (ASD) is a severe neuropsychiatric disorder that affects intellectual development, social behavior, and facial features, and the number of cases is still significantly increasing. Due to the variety of symptoms ASD displays, the diagnosis process remains challenging, with numerous misdiagnoses as well as lengthy and expensive diagnoses. Fortunately, if ASD is diagnosed and… ▽ More Autism Spectrum Disorder (ASD) is a severe neuropsychiatric disorder that affects intellectual development, social behavior, and facial features, and the number of cases is still significantly increasing. Due to the variety of symptoms ASD displays, the diagnosis process remains challenging, with numerous misdiagnoses as well as lengthy and expensive diagnoses. Fortunately, if ASD is diagnosed and treated early, then the patient will have a much higher chance of develo** normally. For an ASD diagnosis, machine learning algorithms can analyze both social behavior and facial features accurately and efficiently, providing an ASD diagnosis in a drastically shorter amount of time than through current clinical diagnosis processes. Therefore, we propose to develop a hybrid architecture fully utilizing both social behavior and facial feature data to improve the accuracy of diagnosing ASD. We first developed a Linear Support Vector Machine for the social behavior based module, which analyzes Autism Diagnostic Observation Schedule (ADOS) social behavior data. For the facial feature based module, a DenseNet model was utilized to analyze facial feature image data. Finally, we implemented our hybrid model by incorporating different features of the Support Vector Machine and the DenseNet into one model. Our results show that the highest accuracy of 87% for ASD diagnosis has been achieved by our proposed hybrid model. The pros and cons of each module will be discussed in this paper. △ Less

Submitted 18 September, 2021; originally announced October 2021.

Showing 1–50 of 68 results for author: He, S