Search | arXiv e-print repository

arXiv:2406.19043 [pdf]

CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Ya**g Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, **g Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover high-quality, clinically interpretable images from undersampled measurements. However, the lack of publicly available cardiac MRI k-space dataset in terms of both quantity and diversity has severely hindered substantial technological progress, particularly for data-driven artificial intelligence. Here, we provide a standardized, diverse, and high-quality CMRxRecon2024 dataset to facilitate the technical development, fair evaluation, and clinical transfer of cardiac MRI reconstruction approaches, towards promoting the universal frameworks that enable fast and robust reconstructions across different cardiac MRI protocols in clinical practice. To the best of our knowledge, the CMRxRecon2024 dataset is the largest and most diverse publicly available cardiac k-space dataset. It is acquired from 330 healthy volunteers, covering commonly used modalities, anatomical views, and acquisition trajectories in clinical cardiac MRI workflows. Besides, an open platform with tutorials, benchmarks, and data processing tools is provided to facilitate data usage, advanced method development, and fair performance evaluation. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 19 pages, 3 figures, 2 tables

arXiv:2406.18914 [pdf, other]

Verification and Synthesis of Compatible Control Lyapunov and Control Barrier Functions

Authors: Hongkai Dai, Chuanrui Jiang, Hongchao Zhang, Andrew Clark

Abstract: Safety and stability are essential properties of control systems. Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs) have been proposed to ensure safety and stability respectively. However, previous approaches typically verify and synthesize the CBFs and CLFs separately, satisfying their respective constraints, without proving that the CBFs and CLFs are compatible with each oth… ▽ More Safety and stability are essential properties of control systems. Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs) have been proposed to ensure safety and stability respectively. However, previous approaches typically verify and synthesize the CBFs and CLFs separately, satisfying their respective constraints, without proving that the CBFs and CLFs are compatible with each other, namely at every state, there exists control actions that satisfy both the CBF and CLF constraints simultaneously. There exists some recent works that synthesized compatible CLF and CBF, but relying on nominal polynomial or rational controllers, which is just a sufficient but not necessary condition for compatibility. In this work, we investigate verification and synthesis of compatible CBF and CLF independent from any nominal controllers. We derive exact necessary and sufficient conditions for compatibility, and further formulate Sum-Of-Squares program for the compatibility verification. Based on our verification framework, we also design an alternating nominal-controller-free synthesis method. We evaluate our method in a linear toy, a non-linear toy, and a power converter example. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.18542 [pdf, other]

Generative AI Empowered LiDAR Point Cloud Generation with Multimodal Transformer

Authors: Mohammad Farzanullah, Han Zhang, Akram Bin Sediq, Ali Afana, Melike Erol-Kantarci

Abstract: Integrated sensing and communications is a key enabler for the 6G wireless communication systems. The multiple sensing modalities will allow the base station to have a more accurate representation of the environment, leading to context-aware communications. Some widely equipped sensors such as cameras and RADAR sensors can provide some environmental perceptions. However, they are not enough to gen… ▽ More Integrated sensing and communications is a key enabler for the 6G wireless communication systems. The multiple sensing modalities will allow the base station to have a more accurate representation of the environment, leading to context-aware communications. Some widely equipped sensors such as cameras and RADAR sensors can provide some environmental perceptions. However, they are not enough to generate precise environmental representations, especially in adverse weather conditions. On the other hand, the LiDAR sensors provide more accurate representations, however, their widespread adoption is hindered by their high cost. This paper proposes a novel approach to enhance the wireless communication systems by synthesizing LiDAR point clouds from images and RADAR data. Specifically, it uses a multimodal transformer architecture and pre-trained encoding models to enable an accurate LiDAR generation. The proposed framework is evaluated on the DeepSense 6G dataset, which is a real-world dataset curated for context-aware wireless applications. Our results demonstrate the efficacy of the proposed approach in accurately generating LiDAR point clouds. We achieve a modified mean squared error of 10.3931. Visual examination of the images indicates that our model can successfully capture the majority of structures present in the LiDAR point cloud for diverse environments. This will enable the base stations to achieve more precise environmental sensing. By integrating LiDAR synthesis with existing sensing modalities, our method can enhance the performance of various wireless applications, including beam and blockage prediction. △ Less

Submitted 20 May, 2024; originally announced June 2024.

Comments: 6 pages, 4 figures, conference

arXiv:2406.18102 [pdf]

A Lung Nodule Dataset with Histopathology-based Cancer Type Annotation

Authors: Muwei Jian, Hongyu Chen, Zaiyong Zhang, Nan Yang, Haorang Zhang, Lifu Ma, Wen**g Xu, Huixiang Zhi

Abstract: Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accuratel… ▽ More Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accurately predicting multiple cancer types. This limitation can be attributed to the scarcity of publicly available datasets annotated with expert-level cancer type information. This research aims to bridge this gap by providing publicly accessible datasets and reliable tools for medical diagnosis, facilitating a finer categorization of different types of lung diseases so as to offer precise treatment recommendations. To achieve this objective, we curated a diverse dataset of lung Computed Tomography (CT) images, comprising 330 annotated nodules (nodules are labeled as bounding boxes) from 95 distinct patients. The quality of the dataset was evaluated using a variety of classical classification and detection models, and these promising results demonstrate that the dataset has a feasible application and further facilitate intelligent auxiliary diagnosis. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.18018 [pdf, other]

A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset

Authors: Muwei Jian, Haoran Zhang, Mingju Shao, Hongyu Chen, Huihui Huang, Yanjie Zhong, Changlei Zhang, Bin Wang, Penghui Gao

Abstract: Recently, intelligent analysis of lung nodules with the assistant of computer aided detection (CAD) techniques can improve the accuracy rate of lung cancer diagnosis. However, existing CAD systems and pulmonary datasets mainly focus on Computed Tomography (CT) images from one single period, while ignoring the cross spatio-temporal features associated with the progression of nodules contained in im… ▽ More Recently, intelligent analysis of lung nodules with the assistant of computer aided detection (CAD) techniques can improve the accuracy rate of lung cancer diagnosis. However, existing CAD systems and pulmonary datasets mainly focus on Computed Tomography (CT) images from one single period, while ignoring the cross spatio-temporal features associated with the progression of nodules contained in imaging data from various captured periods of lung cancer. If the evolution patterns of nodules across various periods in the patients' CT sequences can be explored, it will play a crucial role in guiding the precise screening identification of lung cancer. Therefore, a cross spatio-temporal lung nodule dataset with pathological information for nodule identification and diagnosis is constructed, which contains 328 CT sequences and 362 annotated nodules from 109 patients. This comprehensive database is intended to drive research in the field of CAD towards more practical and robust methods, and also contribute to the further exploration of precision medicine related field. To ensure patient confidentiality, we have removed sensitive information from the dataset. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.16150 [pdf, other]

Intensity Confusion Matters: An Intensity-Distance Guided Loss for Bronchus Segmentation

Authors: Haifan Gong, Wenhao Huang, Huan Zhang, Yu Wang, Xiang Wan, Hong Shen, Guanbin Li, Haofeng Li

Abstract: Automatic segmentation of the bronchial tree from CT imaging is important, as it provides structural information for disease diagnosis. Despite the merits of previous automatic bronchus segmentation methods, they have paied less attention to the issue we term as \textit{Intensity Confusion}, wherein the intensity values of certain background voxels approach those of the foreground voxels within br… ▽ More Automatic segmentation of the bronchial tree from CT imaging is important, as it provides structural information for disease diagnosis. Despite the merits of previous automatic bronchus segmentation methods, they have paied less attention to the issue we term as \textit{Intensity Confusion}, wherein the intensity values of certain background voxels approach those of the foreground voxels within bronchi. Conversely, the intensity values of some foreground voxels are nearly identical to those of background voxels. This proximity in intensity values introduces significant challenges to neural network methodologies. To address the issue, we introduce a novel Intensity-Distance Guided loss function, which assigns adaptive weights to different image voxels for mining hard samples that cause the intensity confusion. The proposed loss estimates the voxel-level hardness of samples, on the basis of the following intensity and distance priors. We regard a voxel as a hard sample if it is in: (1) the background and has an intensity value close to the bronchus region; (2) the bronchus region and is of higher intensity than most voxels inside the bronchus; (3) the background region and at a short distance from the bronchus. Extensive experiments not only show the superiority of our method compared with the state-of-the-art methods, but also verify that tackling the intensity confusion issue helps to significantly improve bronchus segmentation. Project page: https://github.com/lhaof/ICM. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: IEEE International Conference on Multimedia & Expo (ICME) 2024

arXiv:2406.16026 [pdf]

CEST-KAN: Kolmogorov-Arnold Networks for CEST MRI Data Analysis

Authors: Jiawen Wang, Pei Cai, Ziyan Wang, Huabin Zhang, Jianpan Huang

Abstract: Purpose: This study aims to propose and investigate the feasibility of using Kolmogorov-Arnold Network (KAN) for CEST MRI data analysis (CEST-KAN). Methods: CEST MRI data were acquired from twelve healthy volunteers at 3T. Data from ten subjects were used for training, while the remaining two were reserved for testing. The performance of multi-layer perceptron (MLP) and KAN models with the same ne… ▽ More Purpose: This study aims to propose and investigate the feasibility of using Kolmogorov-Arnold Network (KAN) for CEST MRI data analysis (CEST-KAN). Methods: CEST MRI data were acquired from twelve healthy volunteers at 3T. Data from ten subjects were used for training, while the remaining two were reserved for testing. The performance of multi-layer perceptron (MLP) and KAN models with the same network settings were evaluated and compared to the conventional multi-pool Lorentzian fitting (MPLF) method in generating water and multiple CEST contrasts, including amide, relayed nuclear Overhauser effect (rNOE), and magnetization transfer (MT). Results: The water and CEST maps generated by both MLP and KAN were visually comparable to the MPLF results. However, the KAN model demonstrated higher accuracy in extrapolating the CEST fitting metrics, as evidenced by the smaller validation loss during training and smaller absolute error during testing. Voxel-wise correlation analysis showed that all four CEST fitting metrics generated by KAN consistently exhibited higher Pearson coefficients than the MLP results, indicating superior performance. Moreover, the KAN models consistently outperformed the MLP models in varying hidden layer numbers despite longer training time. Conclusion: In this study, we demonstrated for the first time the feasibility of utilizing KAN for CEST MRI data analysis, highlighting its superiority over MLP in this task. The findings suggest that CEST-KAN has the potential to be a robust and reliable post-analysis tool for CEST MRI in clinical settings. △ Less

Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.15222 [pdf]

Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed as having other acute chest pain conditions. Subsequently, these AAS patients will undergo clinically inaccurate or suboptimal differential diagnosis. Fortunately, even under these suboptimal protocols, nearly all these patients underwent non-contrast CT covering the aorta anatomy at the early stage of differential diagnosis. In this study, we developed an artificial intelligence model (DeepAAS) using non-contrast CT, which is highly accurate for identifying AAS and provides interpretable results to assist in clinical decision-making. Performance was assessed in two major phases: a multi-center retrospective study (n = 20,750) and an exploration in real-world emergency scenarios (n = 137,525). In the multi-center cohort, DeepAAS achieved a mean area under the receiver operating characteristic curve of 0.958 (95% CI 0.950-0.967). In the real-world cohort, DeepAAS detected 109 AAS patients with misguided initial suspicion, achieving 92.6% (95% CI 76.2%-97.5%) in mean sensitivity and 99.2% (95% CI 99.1%-99.3%) in mean specificity. Our AI model performed well on non-contrast CT at all applicable early stages of differential diagnosis workflows, effectively reduced the overall missed diagnosis and misdiagnosis rate from 48.8% to 4.8% and shortened the diagnosis time for patients with misguided initial suspicion from an average of 681.8 (74-11,820) mins to 68.5 (23-195) mins. DeepAAS could effectively fill the gap in the current clinical workflow without requiring additional tests. △ Less

Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: under peer review

arXiv:2406.14850 [pdf, other]

DExter: Learning and Controlling Performance Expression with Diffusion Models

Authors: Huan Zhang, Shreyan Chowdhury, Carlos Eduardo Cancino-Chacón, **hua Liang, Simon Dixon, Gerhard Widmer

Abstract: In the pursuit of develo** expressive music performance models using artificial intelligence, this paper introduces DExter, a new approach leveraging diffusion probabilistic models to render Western classical piano performances. In this approach, performance parameters are represented in a continuous expression space and a diffusion model is trained to predict these continuous parameters while b… ▽ More In the pursuit of develo** expressive music performance models using artificial intelligence, this paper introduces DExter, a new approach leveraging diffusion probabilistic models to render Western classical piano performances. In this approach, performance parameters are represented in a continuous expression space and a diffusion model is trained to predict these continuous parameters while being conditioned on the musical score. Furthermore, DExter also enables the generation of interpretations (expressive variations of a performance) guided by perceptually meaningful features by conditioning jointly on score and perceptual feature representations. Consequently, we find that our model is useful for learning expressive performance, generating perceptually steered performances, and transferring performance styles. We assess the model through quantitative and qualitative analyses, focusing on specific performance metrics regarding dimensions like asynchrony and articulation, as well as through listening tests comparing generated performances with different human interpretations. Results show that DExter is able to capture the time-varying correlation of the expressive parameters, and compares well to existing rendering models in subjectively evaluated ratings. The perceptual-feature-conditioned generation and transferring capabilities of DExter are verified by a proxy model predicting perceptual characteristics of differently steered performances. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: in submission to appsci special session

arXiv:2406.11175 [pdf, other]

SMRU: Split-and-Merge Recurrent-based UNet for Acoustic Echo Cancellation and Noise Suppression

Authors: Zhihang Sun, Andong Li, Rilin Chen, Hao Zhang, Meng Yu, Yi Zhou, Dong Yu

Abstract: The proliferation of deep neural networks has spawned the rapid development of acoustic echo cancellation and noise suppression, and plenty of prior arts have been proposed, which yield promising performance. Nevertheless, they rarely consider the deployment generality in different processing scenarios, such as edge devices, and cloud processing. To this end, this paper proposes a general model, t… ▽ More The proliferation of deep neural networks has spawned the rapid development of acoustic echo cancellation and noise suppression, and plenty of prior arts have been proposed, which yield promising performance. Nevertheless, they rarely consider the deployment generality in different processing scenarios, such as edge devices, and cloud processing. To this end, this paper proposes a general model, termed SMRU, to cover different application scenarios. The novelty lies in two-fold. First, a multi-scale band split layer and band merge layer are proposed to effectively fuse local frequency bands for lower complexity modeling. Besides, by simulating the multi-resolution feature modeling characteristic of the classical UNet structure, a novel recurrent-dominated UNet is devised. It consists of multiple variable frame rate blocks, each of which involves the causal time down-/up-sampling layer with varying compression ratios and the dual-path structure for inter- and intra-band modeling. The model is configured from 50 M/s to 6.8 G/s in terms of MACs, and the experimental results show that the proposed approach yields competitive or even better performance over existing baselines, and has the full potential to adapt to more general scenarios with varying complexity requirements. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.11163 [pdf, other]

Explainable Bayesian Recurrent Neural Smoother to Capture Global State Evolutionary Correlations

Authors: Shi Yan, Yan Liang, Huayu Zhang, Le Zheng, Difan Zou, Binglu Wang

Abstract: Through integrating the evolutionary correlations across global states in the bidirectional recursion, an explainable Bayesian recurrent neural smoother (EBRNS) is proposed for offline data-assisted fixed-interval state smoothing. At first, the proposed model, containing global states in the evolutionary interval, is transformed into an equivalent model with bidirectional memory. This transformati… ▽ More Through integrating the evolutionary correlations across global states in the bidirectional recursion, an explainable Bayesian recurrent neural smoother (EBRNS) is proposed for offline data-assisted fixed-interval state smoothing. At first, the proposed model, containing global states in the evolutionary interval, is transformed into an equivalent model with bidirectional memory. This transformation incorporates crucial global state information with support for bi-directional recursive computation. For the transformed model, the joint state-memory-trend Bayesian filtering and smoothing frameworks are derived by introducing the bidirectional memory iteration mechanism and offline data into Bayesian estimation theory. The derived frameworks are implemented using the Gaussian approximation to ensure analytical properties and computational efficiency. Finally, the neural network modules within EBRNS and its two-stage training scheme are designed. Unlike most existing approaches that artificially combine deep learning and model-based estimation, the bidirectional recursion and internal gated structures of EBRNS are naturally derived from Bayesian estimation theory, explainably integrating prior model knowledge, online measurement, and offline data. Experiments on representative real-world datasets demonstrate that the high smoothing accuracy of EBRNS is accompanied by data efficiency and a lightweight parameter scale. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.09082 [pdf]

Data-driven modeling and supervisory control system optimization for plug-in hybrid electric vehicles

Authors: Hao Zhang, Nuo Lei, Boli Chen, Bingbing Li, Rulong Li, Zhi Wang

Abstract: Learning-based intelligent energy management systems for plug-in hybrid electric vehicles (PHEVs) are crucial for achieving efficient energy utilization. However, their application faces system reliability challenges in the real world, which prevents widespread acceptance by original equipment manufacturers (OEMs). This paper begins by establishing a PHEV model based on physical and data-driven mo… ▽ More Learning-based intelligent energy management systems for plug-in hybrid electric vehicles (PHEVs) are crucial for achieving efficient energy utilization. However, their application faces system reliability challenges in the real world, which prevents widespread acceptance by original equipment manufacturers (OEMs). This paper begins by establishing a PHEV model based on physical and data-driven models, focusing on the high-fidelity training environment. It then proposes a real-vehicle application-oriented control framework, combining horizon-extended reinforcement learning (RL)-based energy management with the equivalent consumption minimization strategy (ECMS) to enhance practical applicability, and improves the flawed method of equivalent factor evaluation based on instantaneous driving cycle and powertrain states found in existing research. Finally, comprehensive simulation and hardware-in-the-loop validation are carried out which demonstrates the advantages of the proposed control framework in fuel economy over adaptive-ECMS and rule-based strategies. Compared to conventional RL architectures that directly control powertrain components, the proposed control method not only achieves similar optimality but also significantly enhances the disturbance resistance of the energy management system, providing an effective control framework for RL-based energy management strategies aimed at real-vehicle applications by OEMs. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.07162 [pdf, other]

EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Authors: Ziyang Ma, Mingjie Chen, Hezhao Zhang, Zhisheng Zheng, Wenxi Chen, Xiquan Li, Jiaxin Ye, Xie Chen, Thomas Hain

Abstract: Speech emotion recognition (SER) is an important part of human-computer interaction, receiving extensive attention from both industry and academia. However, the current research field of SER has long suffered from the following problems: 1) There are few reasonable and universal splits of the datasets, making comparing different models and methods difficult. 2) No commonly used benchmark covers nu… ▽ More Speech emotion recognition (SER) is an important part of human-computer interaction, receiving extensive attention from both industry and academia. However, the current research field of SER has long suffered from the following problems: 1) There are few reasonable and universal splits of the datasets, making comparing different models and methods difficult. 2) No commonly used benchmark covers numerous corpus and languages for researchers to refer to, making reproduction a burden. In this paper, we propose EmoBox, an out-of-the-box multilingual multi-corpus speech emotion recognition toolkit, along with a benchmark for both intra-corpus and cross-corpus settings. For intra-corpus settings, we carefully designed the data partitioning for different datasets. For cross-corpus settings, we employ a foundation SER model, emotion2vec, to mitigate annotation errors and obtain a test set that is fully balanced in speakers and emotions distributions. Based on EmoBox, we present the intra-corpus SER results of 10 pre-trained speech models on 32 emotion datasets with 14 languages, and the cross-corpus SER results on 4 datasets with the fully balanced test sets. To the best of our knowledge, this is the largest SER benchmark, across language scopes and quantity scales. We hope that our toolkit and benchmark can facilitate the research of SER in the community. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH 2024. GitHub Repository: https://github.com/emo-box/EmoBox

arXiv:2406.07077 [pdf, other]

Meta-Backscatter: A New ISAC Paradigm for Battery-Free Internet of Things

Authors: Xu Liu, Hongliang Zhang, Kaigui Bian, Xi Weng, Lingyang Song

Abstract: The meta-material sensor has been regarded as a next-generation sensing technology for the battery-free Internet of Things (IoT) due to its battery-free characteristic and improved sensing performance. The meta-material sensors function as backscatter tags that change their reflection coefficients with the conditions of sensing targets such as temperature and gas concentration, allowing transceive… ▽ More The meta-material sensor has been regarded as a next-generation sensing technology for the battery-free Internet of Things (IoT) due to its battery-free characteristic and improved sensing performance. The meta-material sensors function as backscatter tags that change their reflection coefficients with the conditions of sensing targets such as temperature and gas concentration, allowing transceivers to perform sensing by analyzing the reflected signals from the sensors. Simultaneously, the sensors also function as environmental scatterers, creating additional signal paths to enhance communication performance. Therefore, the meta-material sensor potentially provides a new paradigm of Integrated Sensing and Communication (ISAC) for the battery-free IoT system. In this article, we first propose a Meta-Backscatter system that utilizes meta-material sensors to achieve diverse sensing functionalities and improved communication performance. We begin with the introduction of the metamaterial sensor and further elaborate on the Meta-Backscatter system. Subsequently, we present optimization strategies for meta-material sensors, transmitters, and receivers to strike a balance between sensing and communication. Furthermore, this article provides a case study of the system and examines the feasibility and trade-off through the simulation results. Finally, potential extensions of the system and their related research challenges are addressed. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.05780 [pdf, ps, other]

Two-Stage Resource Allocation in Reconfigurable Intelligent Surface Assisted Hybrid Networks via Multi-Player Bandits

Authors: **gwen Tong, Hongliang Zhang, Liqun Fu, Amir Leshem, Zhu Han

Abstract: This paper considers a resource allocation problem where several Internet-of-Things (IoT) devices send data to a base station (BS) with or without the help of the reconfigurable intelligent surface (RIS) assisted cellular network. The objective is to maximize the sum rate of all IoT devices by finding the optimal RIS and spreading factor (SF) for each device. Since these IoT devices lack prior inf… ▽ More This paper considers a resource allocation problem where several Internet-of-Things (IoT) devices send data to a base station (BS) with or without the help of the reconfigurable intelligent surface (RIS) assisted cellular network. The objective is to maximize the sum rate of all IoT devices by finding the optimal RIS and spreading factor (SF) for each device. Since these IoT devices lack prior information on the RISs or the channel state information (CSI), a distributed resource allocation framework with low complexity and learning features is required to achieve this goal. Therefore, we model this problem as a two-stage multi-player multi-armed bandit (MPMAB) framework to learn the optimal RIS and SF sequentially. Then, we put forth an exploration and exploitation boosting (E2Boost) algorithm to solve this two-stage MPMAB problem by combining the $ε$-greedy algorithm, Thompson sampling (TS) algorithm, and non-cooperation game method. We derive an upper regret bound for the proposed algorithm, i.e., $\mathcal{O}(\log^{1+δ}_2 T)$, increasing logarithmically with the time horizon $T$. Numerical results show that the E2Boost algorithm has the best performance among the existing methods and exhibits a fast convergence rate. More importantly, the proposed algorithm is not sensitive to the number of combinations of the RISs and SFs thanks to the two-stage allocation mechanism, which can benefit high-density networks. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: This paper was published in IEEE Transcation on Communications

arXiv:2406.05555 [pdf, ps, other]

doi 10.1109/MCOM.001.2100704

OAM-SWIPT for IoE-Driven 6G

Authors: Runyu Lyu, Wenchi Cheng, Bazhong Shen, Zhiyuan Ren, Hailin Zhang

Abstract: Simultaneous wireless information and power transfer (SWIPT), which achieves both wireless energy transfer (WET) and information transfer, is an attractive technique for future Internet of Everything (IoE) in the sixth-generation (6G) mobile communications. With SWIPT, battery-less IoE devices can be powered while communicating with other devices. Line-of-sight (LOS) RF transmission and near-field… ▽ More Simultaneous wireless information and power transfer (SWIPT), which achieves both wireless energy transfer (WET) and information transfer, is an attractive technique for future Internet of Everything (IoE) in the sixth-generation (6G) mobile communications. With SWIPT, battery-less IoE devices can be powered while communicating with other devices. Line-of-sight (LOS) RF transmission and near-field inductive coupling based transmission are typical SWIPT scenarios, which are both LOS channels and without enough degree of freedom for high spectrum efficiency as well as high energy efficiency. Due to the orthogonal wavefronts, orbital angular momentum (OAM) can facilitate the SWIPT in LOS channels. In this article, we introduce the OAM-based SWIPT as well as discuss some basic advantages and challenges for it. After introducing the OAM-based SWIPT for IoE, we first propose an OAM-based SWIPT system model with the OAM-modes assisted dynamic power splitting (DPS). Then, four basic advantages regarding the OAM-based SWIPT are reviewed with some numerical analyses for further demonstrating the advantages. Next, four challenges regarding integrating OAM into SWIPT and possible solutions are discussed. OAM technology provides multiple orthogonal streams to increase both spectrum and energy efficiencies for SWIPT, thus creating many opportunities for future WET and SWIPT researches. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 7 pages, 6 figures

Journal ref: in IEEE Communications Magazine, vol. 60, no. 3, pp. 19-25, March 2022

arXiv:2406.04762 [pdf, other]

Holographic Intelligence Surface Assisted Integrated Sensing and Communication

Authors: Zhuoyang Liu, Yuchen Zhang, Haiyang Zhang, Feng Xu, Yonina C. Eldar

Abstract: Traditional discrete-array-based systems fail to exploit interactions between closely spaced antennas, resulting in inadequate utilization of the aperture resource. In this paper, we propose a holographic intelligence surface (HIS) assisted integrated sensing and communication (HISAC) system, wherein both the transmitter and receiver are fabricated using a continuous-aperture array. A continuous-d… ▽ More Traditional discrete-array-based systems fail to exploit interactions between closely spaced antennas, resulting in inadequate utilization of the aperture resource. In this paper, we propose a holographic intelligence surface (HIS) assisted integrated sensing and communication (HISAC) system, wherein both the transmitter and receiver are fabricated using a continuous-aperture array. A continuous-discrete transformation of the HIS pattern based on the Fourier transform is proposed, converting the continuous pattern design into a discrete beamforming design. We formulate a joint transmit-receive beamforming optimization problem for the HISAC system, aiming to balance the performance of multi-target sensing while fulfilling the performance requirement of multi-user communication. To solve the non-convex problem with coupled variables, an alternating optimization-based algorithm is proposed to optimize the HISAC transmit-receive beamforming in an alternate manner. Specifically, the transmit beamforming design is solved by decoupling into a series of feasibility-checking sub-problems while the receive beamforming is determined by the Rayleigh quotient-based method. Simulation results demonstrate the superiority of the proposed HISAC system over traditional discrete-array-based ISAC systems, achieving significantly higher sensing performance while guaranteeing predetermined communication performance. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.20064 [pdf, other]

1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem

Authors: Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu, Ziyang Ma, Peter Bell, Catherine Lai, Joshua Reiss, Lin Wang, Philip C. Woodland, Xie Chen, Huy Phan, Thomas Hain

Abstract: Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Previous work has utilised class weighted loss for t… ▽ More Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Previous work has utilised class weighted loss for training, but problems remain as it sometimes causes over-fitting for minor classes or under-fitting for major classes. This paper presents the system developed by a multi-site team for the participation in the Odyssey 2024 Emotion Recognition Challenge Track-1. The challenge data has the aforementioned properties and therefore the presented systems aimed to tackle these issues, by introducing focal loss in optimisation when applying class weighted loss. Specifically, the focal loss is further weighted by prior-based class weights. Experimental results show that combining these two approaches brings better overall performance, by sacrificing performance on major classes. The system further employs a majority voting strategy to combine the outputs of an ensemble of 7 models. The models are trained independently, using different acoustic features and loss functions - with the aim to have different properties for different data. Hence these models show different performance preferences on major classes and minor classes. The ensemble system output obtained the best performance in the challenge, ranking top-1 among 68 submissions. It also outperformed all single models in our set. On the Odyssey 2024 Emotion Recognition Challenge Task-1 data the system obtained a Macro-F1 score of 35.69% and an accuracy of 37.32%. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.18205 [pdf, other]

Joint Radar Sensing, Location, and Communication Resources Optimization in 6G Network

Authors: Haijun Zhang, Bowen Chen, Xiangnan Liu, Chao Ren

Abstract: The possibility of jointly optimizing location sensing and communication resources, facilitated by the existence of communication and sensing spectrum sharing, is what promotes the system performance to a higher level. However, the rapid mobility of user equipment (UE) can result in inaccurate location estimation, which can severely degrade system performance. Therefore, the precise UE location se… ▽ More The possibility of jointly optimizing location sensing and communication resources, facilitated by the existence of communication and sensing spectrum sharing, is what promotes the system performance to a higher level. However, the rapid mobility of user equipment (UE) can result in inaccurate location estimation, which can severely degrade system performance. Therefore, the precise UE location sensing and resource allocation issues are investigated in a spectrum sharing sixth generation network. An approach is proposed for joint subcarrier and power optimization based on UE location sensing, aiming to minimize system energy consumption. The joint allocation process is separated into two key phases of operation. In the radar location sensing phase, the multipath interference and Doppler effects are considered simultaneously, and the issues of UE's location and channel state estimation are transformed into a convex optimization problem, which is then solved through gradient descent. In the communication phase, a subcarrier allocation method based on subcarrier weights is proposed. To further minimize system energy consumption, a joint subcarrier and power allocation method is introduced, resolved via the Lagrange multiplier method for the non-convex resource allocation problem. Simulation analysis results indicate that the location sensing algorithm exhibits a prominent improvement in accuracy compared to benchmark algorithms. Simultaneously, the proposed resource allocation scheme also demonstrates a substantial enhancement in performance relative to baseline schemes. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 12 pages,9 figures and 4 charts. This paper has been accepted for publication in the IEEE Journal on Selected Areas in Communications

arXiv:2405.16664 [pdf]

Deep learning improved autofocus for motion artifact reduction and its application in quantitative susceptibility map**

Authors: Chao Li, **wei Zhang, Hang Zhang, Jiahao Li, Pascal Spincemaille, Thanh D. Nguyen, Yi Wang

Abstract: Purpose: To develop a pipeline for motion artifact correction in mGRE and quantitative susceptibility map** (QSM). Methods: Deep learning is integrated with autofocus to improve motion artifact suppression, which is applied QSM of patients with Parkinson's disease (PD). The estimation of affine motion parameters in the autofocus method depends on signal-to-noise ratio and lacks accuracy when dat… ▽ More Purpose: To develop a pipeline for motion artifact correction in mGRE and quantitative susceptibility map** (QSM). Methods: Deep learning is integrated with autofocus to improve motion artifact suppression, which is applied QSM of patients with Parkinson's disease (PD). The estimation of affine motion parameters in the autofocus method depends on signal-to-noise ratio and lacks accuracy when data sampling occurs outside the k-space center. A deep learning strategy is employed to remove the residual motion artifacts in autofocus. Results: Results obtained in simulated brain data (n =15) with reference truth show that the proposed autofocus deep learning method significantly improves the image quality of mGRE and QSM (p = 0.001 for SSIM, p < 0.0001 for PSNR and RMSE). Results from 10 PD patients with real motion artifacts in QSM have also been corrected using the proposed method and sent to an experienced radiologist for image quality evaluation, and the average image quality score has increased (p=0.0039). Conclusions: The proposed method enables substantial suppression of motion artifacts in mGRE and QSM. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.16184 [pdf, other]

Safe Deep Model-Based Reinforcement Learning with Lyapunov Functions

Authors: Harry Zhang

Abstract: Model-based Reinforcement Learning (MBRL) has shown many desirable properties for intelligent control tasks. However, satisfying safety and stability constraints during training and rollout remains an open question. We propose a new Model-based RL framework to enable efficient policy learning with unknown dynamics based on learning model predictive control (LMPC) framework with mathematically prov… ▽ More Model-based Reinforcement Learning (MBRL) has shown many desirable properties for intelligent control tasks. However, satisfying safety and stability constraints during training and rollout remains an open question. We propose a new Model-based RL framework to enable efficient policy learning with unknown dynamics based on learning model predictive control (LMPC) framework with mathematically provable guarantees of stability. We introduce and explore a novel method for adding safety constraints for model-based RL during training and policy learning. The new stability-augmented framework consists of a neural-network-based learner that learns to construct a Lyapunov function, and a model-based RL agent to consistently complete the tasks while satisfying user-specified constraints given only sub-optimal demonstrations and sparse-cost feedback. We demonstrate the capability of the proposed framework through simulated experiments. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.15153 [pdf, other]

Optimal Reference Nodes Deployment for Positioning Seafloor Anchor Nodes

Authors: Wei Huang, Pengfei Wu, Tianhe Xu, Hao Zhang, Kaitao Meng

Abstract: Seafloor anchor nodes, which form a geodetic network, are designed to provide surface and underwater users with positioning, navigation and timing (PNT) services. Due to the non-uniform distribution of underwater sound speed, accurate positioning of underwater anchor nodes is a challenge work. Traditional anchor node positioning typically uses cross or circular shapes, however, how to optimize the… ▽ More Seafloor anchor nodes, which form a geodetic network, are designed to provide surface and underwater users with positioning, navigation and timing (PNT) services. Due to the non-uniform distribution of underwater sound speed, accurate positioning of underwater anchor nodes is a challenge work. Traditional anchor node positioning typically uses cross or circular shapes, however, how to optimize the deployment of reference nodes for positioning underwater anchor nodes considering the variability of sound speed has not yet been studied. This paper focuses on the optimal reference nodes deployment strategies for time--of--arrival (TOA) localization in the three-dimensional (3D) underwater space. We adopt the criterion that minimizing the trace of the inverse Fisher information matrix (FIM) to determine optimal reference nodes deployment with Gaussian measurement noise, which is positive related to the signal propagation path. A comprehensive analysis of optimal reference-target geometries is provided in the general circumstance with no restriction on the number of reference nodes, elevation angle and reference-target range. A new semi-closed form solution is found to detemine the optimal geometries. To demonstrate the findings in this paper, we conducted both simulations and sea trials on underwater anchor node positioning. Both the simulation and experiment results are consistent with theoretical analysis. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14210 [pdf, other]

Eidos: Efficient, Imperceptible Adversarial 3D Point Clouds

Authors: Hanwei Zhang, Luo Cheng, Qisong He, Wei Huang, Renjue Li, Ronan Sicre, Xiaowei Huang, Holger Hermanns, Lijun Zhang

Abstract: Classification of 3D point clouds is a challenging machine learning (ML) task with important real-world applications in a spectrum from autonomous driving and robot-assisted surgery to earth observation from low orbit. As with other ML tasks, classification models are notoriously brittle in the presence of adversarial attacks. These are rooted in imperceptible changes to inputs with the effect tha… ▽ More Classification of 3D point clouds is a challenging machine learning (ML) task with important real-world applications in a spectrum from autonomous driving and robot-assisted surgery to earth observation from low orbit. As with other ML tasks, classification models are notoriously brittle in the presence of adversarial attacks. These are rooted in imperceptible changes to inputs with the effect that a seemingly well-trained model ends up misclassifying the input. This paper adds to the understanding of adversarial attacks by presenting Eidos, a framework providing Efficient Imperceptible aDversarial attacks on 3D pOint cloudS. Eidos supports a diverse set of imperceptibility metrics. It employs an iterative, two-step procedure to identify optimal adversarial examples, thereby enabling a runtime-imperceptibility trade-off. We provide empirical evidence relative to several popular 3D point cloud classification models and several established 3D attack methods, showing Eidos' superiority with respect to efficiency as well as imperceptibility. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Preprint

arXiv:2405.13661 [pdf, ps, other]

Timbre Perception, Representation, and its Neuroscientific Exploration: A Comprehensive Review

Authors: Hong Zhang, Jie Lin, Shengxuan Chen

Abstract: Timbre, the sound's unique "color", is fundamental to how we perceive and appreciate music. This review explores the multifaceted world of timbre perception and representation. It begins by tracing the word's origin, offering an intuitive grasp of the concept. Building upon this foundation, the article delves into the complexities of defining and measuring timbre. It then explores the concept and… ▽ More Timbre, the sound's unique "color", is fundamental to how we perceive and appreciate music. This review explores the multifaceted world of timbre perception and representation. It begins by tracing the word's origin, offering an intuitive grasp of the concept. Building upon this foundation, the article delves into the complexities of defining and measuring timbre. It then explores the concept and techniques of timbre space, a powerful tool for visualizing how we perceive different timbres. The review further examines recent advancements in timbre manipulation and representation, including the increasingly utilized machine learning techniques. While the underlying neural mechanisms remain partially understood, the article discusses current neuroimaging techniques used to investigate this aspect of perception. Finally, it summarizes key takeaways, identifies promising future research directions, and emphasizes the potential applications of timbre research in music technology, assistive technologies, and our overall understanding of auditory perception. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13476 [pdf, other]

Restricting Voltage Deviation of DC Microgrids with Critical and Ordinary Nodes

Authors: Handong Bai, Peng Li, Hongwei Zhang

Abstract: Restricting bus voltage deviation is crucial for normal operation of multi-bus DC microgrids, yet it has received insufficient attention due to the conflict between two main control objectives in DC microgrids, i.e., voltage regulation and current sharing. By revealing a necessary and sufficient condition for achieving these two objectives, this paper proposes a compromised distributed control alg… ▽ More Restricting bus voltage deviation is crucial for normal operation of multi-bus DC microgrids, yet it has received insufficient attention due to the conflict between two main control objectives in DC microgrids, i.e., voltage regulation and current sharing. By revealing a necessary and sufficient condition for achieving these two objectives, this paper proposes a compromised distributed control algorithm, which regulates the voltage deviation of all buses by relaxing the accuracy of current sharing. Moreover, for a class of DC Microgrids consisting of both critical nodes and ordinary nodes, this paper proposes a distributed control algorithm that restricts the voltage deviation of critical nodes and simultaneously keeps the current sharing of ordinary nodes. This algorithm also works under plug-and-play settings. Simulations illustrate our theory. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13243 [pdf]

A Novel Approach to Evaluating Battery Charger Controller Design with Nonlinear PID Controller in an Extendable CHIL Setup

Authors: Shervin Salehi Rad, Micheal Muhlbaier, Oleg Fishman, Javad Chevinly, Elias Nadi, Hua Zhang, Fei Lu

Abstract: The design and development of power electronics converters pose a multitude of challenges. The evaluation of power electronics converters, particularly when operating at high power levels, presents a significant task, offering designers a deeper understanding of the functionality. Several methodologies have been devised to conduct hardware-in-the-loop (HIL) tests, which are classified into two mai… ▽ More The design and development of power electronics converters pose a multitude of challenges. The evaluation of power electronics converters, particularly when operating at high power levels, presents a significant task, offering designers a deeper understanding of the functionality. Several methodologies have been devised to conduct hardware-in-the-loop (HIL) tests, which are classified into two main categories: controller hardware-in-the-loop (CHIL) and power hardware-in-the-loop (PHIL) tests. This paper explores the advantages and drawbacks of these two approaches and introduces a straightforward and cost-effective CHIL method for initial proof of concept and risk-free controller training. This method is based on the interaction between MATLAB/Simulink and Python. To assess the operation of the proposed system, a modeled battery pack (96S1P) is charged using a modeled battery charger converter, employing a non-linear PID controller. In this scenario, the controller benefits from the CC-CV technique for charging the battery pack. The results are presented in the final section. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 2024 IEEE Transportation Electrification Conference & Expo (ITEC 2024) (5 pages, 11 figures)

arXiv:2405.12359 [pdf]

Design and Analysis of a Detuned Series-Series IPT System with Solenoid Coil Structure for Drone Charging Applications

Authors: Elias Nadi, Hua Zhang

Abstract: This paper proposes a new coil configuration that uses solenoid ferrites on the receiver side instead of planar ferrite coils that are employed in existing wireless charging systems.The solenoid ferrites are used in the drone legs to help mount the receiver on a moving truck while the two parts of the transmitter are placed on the truck.To validate this idea a detuned transmitter series-series (SS… ▽ More This paper proposes a new coil configuration that uses solenoid ferrites on the receiver side instead of planar ferrite coils that are employed in existing wireless charging systems.The solenoid ferrites are used in the drone legs to help mount the receiver on a moving truck while the two parts of the transmitter are placed on the truck.To validate this idea a detuned transmitter series-series (SS) compensated IPT prototype system has been investigated to charge the onboard battery of a drone.The proposed detuned design helps to limit the system output power and transmitter side current at even zero coupling condition and tolerates misalignment positions. Experimental results demonstrate that the system switching frequency is 245 kHz and the primary side resonates at 223.5 kHz, which is a detuned design The transmitting current is limited to 4.5 A at zero coupling condition with a low power loss of only 4.2 W at zero coupling condition. The target achievable power of the system is 50 W with a 10 mm air gap and horizontal and vertical misalignment in the range of [0, 50 mm].This system has been validated by experiments to charge an 11.1V battery at 48.2W with 88.2% efficiency. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: ITEC 2024 (IEEE Transportation Electrification Conference & Expo)

arXiv:2405.11131 [pdf]

Gallium Nitride (GaN) based High-Power Multilevel H-Bridge Inverter for Wireless Power Transfer of Electric Vehicles

Authors: Javad Chevinly, Shervin Salehi Rad, Elias Nadi, Bogdan Proca, John Wolgemuth, Anthony Calabro, Hua Zhang, Fei Lu

Abstract: This paper presents a design and implementation of a high-power Gallium Nitride (GaN)-based multilevel Hbridge inverter to excite wireless charging coils for the wireless power transfer of electric vehicles (EVs). Compared to the traditional conductive charging, wireless charging technology offers a safer and more convenient way to charge EVs. Due to the increasing demand of fast charging, high-po… ▽ More This paper presents a design and implementation of a high-power Gallium Nitride (GaN)-based multilevel Hbridge inverter to excite wireless charging coils for the wireless power transfer of electric vehicles (EVs). Compared to the traditional conductive charging, wireless charging technology offers a safer and more convenient way to charge EVs. Due to the increasing demand of fast charging, high-power inverters play a crucial role in exciting the wireless charging coils within a wireless power transfer system. This paper details the system specifications for the wireless charging of EVs, providing theoretical analysis and a control strategy for the modular design of a 75-kW 3-level and 4-level Hbridge inverter. The goal is to deliver a low-distortion excitation voltage to the wireless charging coils. LTspice simulation results, including output voltage, Fast Fourier Transform (FFT) analysis for both 3-level and 4-level H-bridge inverters, are presented to validate the control strategy and demonstrate the elimination of output harmonic components in the modular design. A GaNbased inverter prototype was employed to deliver a 85-kHz power to the wireless charging pads of the wireless power transfer system. Experimental results at two different voltage and power levels, 100V-215W and 150V-489W, validate the successful performance of the GaN inverter in the wireless charging system. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.10723 [pdf, other]

Eddeep: Fast eddy-current distortion correction for diffusion MRI with deep learning

Authors: Antoine Legouhy, Ross Callaghan, Whitney Stee, Philippe Peigneux, Hojjat Azadbakht, Hui Zhang

Abstract: Modern diffusion MRI sequences commonly acquire a large number of volumes with diffusion sensitization gradients of differing strengths or directions. Such sequences rely on echo-planar imaging (EPI) to achieve reasonable scan duration. However, EPI is vulnerable to off-resonance effects, leading to tissue susceptibility and eddy-current induced distortions. The latter is particularly problematic… ▽ More Modern diffusion MRI sequences commonly acquire a large number of volumes with diffusion sensitization gradients of differing strengths or directions. Such sequences rely on echo-planar imaging (EPI) to achieve reasonable scan duration. However, EPI is vulnerable to off-resonance effects, leading to tissue susceptibility and eddy-current induced distortions. The latter is particularly problematic because it causes misalignment between volumes, disrupting downstream modelling and analysis. The essential correction of eddy distortions is typically done post-acquisition, with image registration. However, this is non-trivial because correspondence between volumes can be severely disrupted due to volume-specific signal attenuations induced by varying directions and strengths of the applied gradients. This challenge has been successfully addressed by the popular FSL~Eddy tool but at considerable computational cost. We propose an alternative approach, leveraging recent advances in image processing enabled by deep learning (DL). It consists of two convolutional neural networks: 1) An image translator to restore correspondence between images; 2) A registration model to align the translated images. Results demonstrate comparable distortion estimates to FSL~Eddy, while requiring only modest training sample sizes. This work, to the best of our knowledge, is the first to tackle this problem with deep learning. Together with recently developed DL-based susceptibility correction techniques, they pave the way for real-time preprocessing of diffusion MRI, facilitating its wider uptake in the clinic. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: submitted to MICCAI 2024

arXiv:2405.10691 [pdf, other]

LoCI-DiffCom: Longitudinal Consistency-Informed Diffusion Model for 3D Infant Brain Image Completion

Authors: Zihao Zhu, Tianli Tao, Yitian Tao, Haowen Deng, Xinyi Cai, Gaofeng Wu, Kaidong Wang, Haifeng Tang, Lixuan Zhu, Zhuoyang Gu, Jiawei Huang, Dinggang Shen, Han Zhang

Abstract: The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets wit… ▽ More The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets with missing time points. This limitation significantly impedes subsequent neuroscience and clinical modeling. Yet, existing deep generative models are facing difficulties in missing brain image completion, due to sparse data and the nonlinear, dramatic contrast/geometric variations in the develo** brain. We propose LoCI-DiffCom, a novel Longitudinal Consistency-Informed Diffusion model for infant brain image Completion,which integrates the images from preceding and subsequent time points to guide a diffusion model for generating high-fidelity missing data. Our designed LoCI module can work on highly sparse sequences, relying solely on data from two temporal points. Despite wide separation and diversity between age time points, our approach can extract individualized developmental features while ensuring context-aware consistency. Our experiments on a large infant brain MR dataset demonstrate its effectiveness with consistent performance on missing infant brain MR completion even in big gap scenarios, aiding in better delineation of early developmental trajectories. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.09569 [pdf, other]

GaitMotion: A Multitask Dataset for Pathological Gait Forecasting

Authors: Wenwen Zhang, Hao Zhang, Zenan Jiang, **g Wang, Amir Servati, Peyman Servati

Abstract: Gait benchmark empowers uncounted encouraging research fields such as gait recognition, humanoid locomotion, etc. Despite the growing focus on gait analysis, the research community is hindered by the limitations of the currently available databases, which mostly consist of videos or images with limited labeling. In this paper, we introduce GaitMotion, a multitask dataset leveraging wearable sensor… ▽ More Gait benchmark empowers uncounted encouraging research fields such as gait recognition, humanoid locomotion, etc. Despite the growing focus on gait analysis, the research community is hindered by the limitations of the currently available databases, which mostly consist of videos or images with limited labeling. In this paper, we introduce GaitMotion, a multitask dataset leveraging wearable sensors to capture the patients' real-time movement with pathological gait. This dataset offers extensive ground-truth labeling for multiple tasks, including step/stride segmentation and step/stride length prediction, empowers researchers with a more holistic understanding of gait disturbances linked to neurological impairments. The wearable gait analysis suit captures the gait cycle, pattern, and parameters for both normal and pathological subjects. This data may prove beneficial for healthcare products focused on patient progress monitoring and post-disease recovery, as well as for forensics technologies aimed at person reidentification, and biomechanics research to aid in the development of humanoid robotics. Moreover, the analysis has considered the drift in data distribution across individual subjects. This drift can be attributed to each participant's unique behavioral habits or potential displacement of the sensor. Stride length variance for normal, Parkinson's, and stroke patients are compared to recognize the pathological walking pattern. As the baseline and benchmark, we provide an error of 14.1, 13.3, and 12.2 centimeters of stride length prediction for normal, Parkinson's, and Stroke gaits separately. We also analyzed the gait characteristics for normal and pathological gaits in terms of the gait cycle and gait parameters. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.08838 [pdf, other]

PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset

Authors: Yang Hou, Haitao Fu, Chuankai Chen, Zida Li, Haoyu Zhang, Jianjun Zhao

Abstract: With the rapid advancement of generative AI, multimodal deepfakes, which manipulate both audio and visual modalities, have drawn increasing public concern. Currently, deepfake detection has emerged as a crucial strategy in countering these growing threats. However, as a key factor in training and validating deepfake detectors, most existing deepfake datasets primarily focus on the visual modal, an… ▽ More With the rapid advancement of generative AI, multimodal deepfakes, which manipulate both audio and visual modalities, have drawn increasing public concern. Currently, deepfake detection has emerged as a crucial strategy in countering these growing threats. However, as a key factor in training and validating deepfake detectors, most existing deepfake datasets primarily focus on the visual modal, and the few that are multimodal employ outdated techniques, and their audio content is limited to a single language, thereby failing to represent the cutting-edge advancements and globalization trends in current deepfake technologies. To address this gap, we propose a novel, multilingual, and multimodal deepfake dataset: PolyGlotFake. It includes content in seven languages, created using a variety of cutting-edge and popular Text-to-Speech, voice cloning, and lip-sync technologies. We conduct comprehensive experiments using state-of-the-art detection methods on PolyGlotFake dataset. These experiments demonstrate the dataset's significant challenges and its practical value in advancing research into multimodal deepfake detection. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 13 page, 4 figures

MSC Class: 68T45 ACM Class: I.4.9

arXiv:2405.07685 [pdf, other]

Comprehensive Analysis of Access Control Models in Edge Computing: Challenges, Solutions, and Future Directions

Authors: Tao Xue, Ying Zhang, Yanbin Wang, Wenbo Wang, Shuailou Li, Haibin Zhang

Abstract: Many contemporary applications, including smart homes and autonomous vehicles, rely on the Internet of Things technology. While cloud computing provides a multitude of valuable services for these applications, it generally imposes constraints on latency-sensitive applications due to the significant propagation delays. As a complementary technique to cloud computing, edge computing situates computi… ▽ More Many contemporary applications, including smart homes and autonomous vehicles, rely on the Internet of Things technology. While cloud computing provides a multitude of valuable services for these applications, it generally imposes constraints on latency-sensitive applications due to the significant propagation delays. As a complementary technique to cloud computing, edge computing situates computing resources closer to the data sources, which reduces the latency and simultaneously alleviates the bandwidth pressure for the cloud and enhances data security. While edge computing offers significant advantages, it also presents significant challenges in access control -- a critical component for safeguarding data. For instance, it is crucial to implement access control mechanisms that are both effective and efficient on resource-constrained devices, ensuring high security without compromising the inherent low latency benefits of edge computing. These challenges drive the development of innovative access control solutions tailored to meet the unique requirements of edge computing environments. We classify related references from the perspectives of multiple data lifecycles (including data collection, storage, and usage), which thoroughly investigates the access control techniques and helps readers understand them systematically. Finally, we reflect on the classification and envisage future research directions. △ Less

Submitted 22 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.06747 [pdf, other]

Music Emotion Prediction Using Recurrent Neural Networks

Authors: Xinyu Chang, Xiangyu Zhang, Haoruo Zhang, Yulu Ran

Abstract: This study explores the application of recurrent neural networks to recognize emotions conveyed in music, aiming to enhance music recommendation systems and support therapeutic interventions by tailoring music to fit listeners' emotional states. We utilize Russell's Emotion Quadrant to categorize music into four distinct emotional regions and develop models capable of accurately predicting these c… ▽ More This study explores the application of recurrent neural networks to recognize emotions conveyed in music, aiming to enhance music recommendation systems and support therapeutic interventions by tailoring music to fit listeners' emotional states. We utilize Russell's Emotion Quadrant to categorize music into four distinct emotional regions and develop models capable of accurately predicting these categories. Our approach involves extracting a comprehensive set of audio features using Librosa and applying various recurrent neural network architectures, including standard RNNs, Bidirectional RNNs, and Long Short-Term Memory (LSTM) networks. Initial experiments are conducted using a dataset of 900 audio clips, labeled according to the emotional quadrants. We compare the performance of our neural network models against a set of baseline classifiers and analyze their effectiveness in capturing the temporal dynamics inherent in musical expression. The results indicate that simpler RNN architectures may perform comparably or even superiorly to more complex models, particularly in smaller datasets. We've also applied the following experiments on larger datasets: one is augmented based on our original dataset, and the other is from other sources. This research not only enhances our understanding of the emotional impact of music but also demonstrates the potential of neural networks in creating more personalized and emotionally resonant music recommendation and therapy systems. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 15 pages, 13 figures

arXiv:2405.05133 [pdf, other]

Identifying every building's function in large-scale urban areas with multi-modality remote-sensing data

Authors: Zhuohong Li, Wei He, Jiepan Li, Hongyan Zhang

Abstract: Buildings, as fundamental man-made structures in urban environments, serve as crucial indicators for understanding various city function zones. Rapid urbanization has raised an urgent need for efficiently surveying building footprints and functions. In this study, we proposed a semi-supervised framework to identify every building's function in large-scale urban areas with multi-modality remote-sen… ▽ More Buildings, as fundamental man-made structures in urban environments, serve as crucial indicators for understanding various city function zones. Rapid urbanization has raised an urgent need for efficiently surveying building footprints and functions. In this study, we proposed a semi-supervised framework to identify every building's function in large-scale urban areas with multi-modality remote-sensing data. In detail, optical images, building height, and nighttime-light data are collected to describe the morphological attributes of buildings. Then, the area of interest (AOI) and building masks from the volunteered geographic information (VGI) data are collected to form sparsely labeled samples. Furthermore, the multi-modality data and weak labels are utilized to train a segmentation model with a semi-supervised strategy. Finally, results are evaluated by 20,000 validation points and statistical survey reports from the government. The evaluations reveal that the produced function maps achieve an OA of 82% and Kappa of 71% among 1,616,796 buildings in Shanghai, China. This study has the potential to support large-scale urban management and sustainable urban development. All collected data and produced maps are open access at https://github.com/LiZhuoHong/BuildingMap. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 5 pages, 7 figures, accepted by IGARSS 2024

arXiv:2405.01515 [pdf, other]

Model-based Deep Learning for Rate Split Multiple Access in Vehicular Communications

Authors: Hanwen Zhang, Mingzhe Chen, Alireza Vahid, Haijian Sun

Abstract: Rate split multiple access (RSMA) has been proven as an effective communication scheme for 5G and beyond, especially in vehicular scenarios. However, RSMA requires complicated iterative algorithms for proper resource allocation, which cannot fulfill the stringent latency requirement in resource constrained vehicles. Although data driven approaches can alleviate this issue, they suffer from poor ge… ▽ More Rate split multiple access (RSMA) has been proven as an effective communication scheme for 5G and beyond, especially in vehicular scenarios. However, RSMA requires complicated iterative algorithms for proper resource allocation, which cannot fulfill the stringent latency requirement in resource constrained vehicles. Although data driven approaches can alleviate this issue, they suffer from poor generalizability and scarce training data. In this paper, we propose a fractional programming (FP) based deep unfolding (DU) approach to address resource allocation problem for a weighted sum rate optimization in RSMA. By carefully designing the penalty function, we couple the variable update with projected gradient descent algorithm (PGD). Following the structure of PGD, we embed few learnable parameters in each layer of the DU network. Through extensive simulation, we have shown that the proposed model-based neural networks has similar performance as optimal results given by traditional algorithm but with much lower computational complexity, less training data, and higher resilience to test set data and out-of-distribution (OOD) data. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: submitted to IEEE conference

arXiv:2405.01115 [pdf]

A New Self-Alignment Method without Solving Wahba Problem for SINS in Autonomous Vehicles

Authors: Hongliang Zhang, Yilan Zhou, Lei Wang, Tengchao Huang

Abstract: Initial alignment is one of the key technologies in strapdown inertial navigation system (SINS) to provide initial state information for vehicle attitude and navigation. For some situations, such as the attitude heading reference system, the position is not necessarily required or even available, then the self-alignment that does not rely on any external aid becomes very necessary. This study pres… ▽ More Initial alignment is one of the key technologies in strapdown inertial navigation system (SINS) to provide initial state information for vehicle attitude and navigation. For some situations, such as the attitude heading reference system, the position is not necessarily required or even available, then the self-alignment that does not rely on any external aid becomes very necessary. This study presents a new self-alignment method under swaying conditions, which can determine the latitude and attitude simultaneously by utilizing all observation vectors without solving the Wahba problem, and it is different from the existing methods. By constructing the dyadic tensor of each observation and reference vector itself, all equations related to observation and reference vectors are accumulated into one equation, where the latitude variable is extracted and solved according to the same eigenvalues of similar matrices on both sides of the equation, meanwhile the attitude is obtained by eigenvalue decomposition. Simulation and experiment tests verify the effectiveness of the proposed methods, and the alignment result is better than TRIAD in convergence speed and stability and comparable with OBA method in alignment accuracy with or without latitude. It is useful for guiding the design of initial alignment in autonomous vehicle applications. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.16905 [pdf, other]

Samsung Research China-Bei**g at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations

Authors: Shen Zhang, Haojie Zhang, **g Zhang, Xudong Zhang, Yimeng Zhuang, **ting Wu

Abstract: In human-computer interaction, it is crucial for agents to respond to human by understanding their emotions. Unraveling the causes of emotions is more challenging. A new task named Multimodal Emotion-Cause Pair Extraction in Conversations is responsible for recognizing emotion and identifying causal expressions. In this study, we propose a multi-stage framework to generate emotion and extract the… ▽ More In human-computer interaction, it is crucial for agents to respond to human by understanding their emotions. Unraveling the causes of emotions is more challenging. A new task named Multimodal Emotion-Cause Pair Extraction in Conversations is responsible for recognizing emotion and identifying causal expressions. In this study, we propose a multi-stage framework to generate emotion and extract the emotion causal pairs given the target emotion. In the first stage, Llama-2-based InstructERC is utilized to extract the emotion category of each utterance in a conversation. After emotion recognition, a two-stream attention model is employed to extract the emotion causal pairs given the target emotion for subtask 2 while MuTEC is employed to extract causal span for subtask 1. Our approach achieved first place for both of the two subtasks in the competition. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16522 [pdf, other]

A Deep Learning-Driven Pipeline for Differentiating Hypertrophic Cardiomyopathy from Cardiac Amyloidosis Using 2D Multi-View Echocardiography

Authors: Bo Peng, Xiaofeng Li, Xinyu Li, Zhenghan Wang, Hui Deng, Xiaoxian Luo, Lixue Yin, Hongmei Zhang

Abstract: Hypertrophic cardiomyopathy (HCM) and cardiac amyloidosis (CA) are both heart conditions that can progress to heart failure if untreated. They exhibit similar echocardiographic characteristics, often leading to diagnostic challenges. This paper introduces a novel multi-view deep learning approach that utilizes 2D echocardiography for differentiating between HCM and CA. The method begins by classif… ▽ More Hypertrophic cardiomyopathy (HCM) and cardiac amyloidosis (CA) are both heart conditions that can progress to heart failure if untreated. They exhibit similar echocardiographic characteristics, often leading to diagnostic challenges. This paper introduces a novel multi-view deep learning approach that utilizes 2D echocardiography for differentiating between HCM and CA. The method begins by classifying 2D echocardiography data into five distinct echocardiographic views: apical 4-chamber, parasternal long axis of left ventricle, parasternal short axis at levels of the mitral valve, papillary muscle, and apex. It then extracts features of each view separately and combines five features for disease classification. A total of 212 patients diagnosed with HCM, and 30 patients diagnosed with CA, along with 200 individuals with normal cardiac function(Normal), were enrolled in this study from 2018 to 2022. This approach achieved a precision, recall of 0.905, and micro-F1 score of 0.904, demonstrating its effectiveness in accurately identifying HCM and CA using a multi-view analysis. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16484 [pdf, other]

Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: CVPR 2024, AI for Streaming (AIS) Workshop

arXiv:2404.15830 [pdf, other]

SNR Maximization and Localization for UAV-IRS-Assisted Near-Field Systems

Authors: Hanfu Zhang, Yidan Mei, Erwu Liu, Rui Wang

Abstract: This letter introduces a novel unmanned aerial vehicle (UAV)-intelligent reflecting surface (IRS) structure into near-field localization systems to enhance the design flexibility of IRS, thereby obtaining additional performance gains. Specifically, a UAV-IRS is utilized to improve the harsh wireless environment and provide localization possibilities. To improve the localization accuracy, a joint o… ▽ More This letter introduces a novel unmanned aerial vehicle (UAV)-intelligent reflecting surface (IRS) structure into near-field localization systems to enhance the design flexibility of IRS, thereby obtaining additional performance gains. Specifically, a UAV-IRS is utilized to improve the harsh wireless environment and provide localization possibilities. To improve the localization accuracy, a joint optimization problem considering UAV position and UAV-IRS passive beamforming is formulated to maximize the receiving signal-to-noise ratio (SNR). An alternative optimization algorithm is proposed to solve the complex non-convex problem leveraging the projected gradient ascent (PGA) algorithm and the principle of minimizing the phase difference of the receiving signals. Closed-form expressions for UAV-IRS phase shift are derived to reduce the algorithm complexity. In the simulations, the proposed algorithm is compared with three different schemes and outperforms the others in both receiving SNR and localization accuracy. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 5 pages, 3 figures

arXiv:2404.12613 [pdf, other]

A Fourier Approach to the Parameter Estimation Problem for One-dimensional Gaussian Mixture Models

Authors: Xinyu Liu, Hai Zhang

Abstract: The purpose of this paper is twofold. First, we propose a novel algorithm for estimating parameters in one-dimensional Gaussian mixture models (GMMs). The algorithm takes advantage of the Hankel structure inherent in the Fourier data obtained from independent and identically distributed (i.i.d) samples of the mixture. For GMMs with a unified variance, a singular value ratio functional using the Fo… ▽ More The purpose of this paper is twofold. First, we propose a novel algorithm for estimating parameters in one-dimensional Gaussian mixture models (GMMs). The algorithm takes advantage of the Hankel structure inherent in the Fourier data obtained from independent and identically distributed (i.i.d) samples of the mixture. For GMMs with a unified variance, a singular value ratio functional using the Fourier data is introduced and used to resolve the variance and component number simultaneously. The consistency of the estimator is derived. Compared to classic algorithms such as the method of moments and the maximum likelihood method, the proposed algorithm does not require prior knowledge of the number of Gaussian components or good initial guesses. Numerical experiments demonstrate its superior performance in estimation accuracy and computational cost. Second, we reveal that there exists a fundamental limit to the problem of estimating the number of Gaussian components or model order in the mixture model if the number of i.i.d samples is finite. For the case of a single variance, we show that the model order can be successfully estimated only if the minimum separation distance between the component means exceeds a certain threshold value and can fail if below. We derive a lower bound for this threshold value, referred to as the computational resolution limit, in terms of the number of i.i.d samples, the variance, and the number of Gaussian components. Numerical experiments confirm this phase transition phenomenon in estimating the model order. Moreover, we demonstrate that our algorithm achieves better scores in likelihood, AIC, and BIC when compared to the EM algorithm. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2404.07956 [pdf, other]

Lyapunov-stable Neural Control for State and Output Feedback: A Novel Formulation

Authors: Lujie Yang, Hongkai Dai, Zhouxing Shi, Cho-Jui Hsieh, Russ Tedrake, Huan Zhang

Abstract: Learning-based neural network (NN) control policies have shown impressive empirical performance in a wide range of tasks in robotics and control. However, formal (Lyapunov) stability guarantees over the region-of-attraction (ROA) for NN controllers with nonlinear dynamical systems are challenging to obtain, and most existing approaches rely on expensive solvers such as sums-of-squares (SOS), mixed… ▽ More Learning-based neural network (NN) control policies have shown impressive empirical performance in a wide range of tasks in robotics and control. However, formal (Lyapunov) stability guarantees over the region-of-attraction (ROA) for NN controllers with nonlinear dynamical systems are challenging to obtain, and most existing approaches rely on expensive solvers such as sums-of-squares (SOS), mixed-integer programming (MIP), or satisfiability modulo theories (SMT). In this paper, we demonstrate a new framework for learning NN controllers together with Lyapunov certificates using fast empirical falsification and strategic regularizations. We propose a novel formulation that defines a larger verifiable region-of-attraction (ROA) than shown in the literature, and refines the conventional restrictive constraints on Lyapunov derivatives to focus only on certifiable ROAs. The Lyapunov condition is rigorously verified post-hoc using branch-and-bound with scalable linear bound propagation-based NN verification techniques. The approach is efficient and flexible, and the full training and verification procedure is accelerated on GPUs without relying on expensive solvers for SOS, MIP, nor SMT. The flexibility and efficiency of our framework allow us to demonstrate Lyapunov-stable output feedback control with synthesized NN-based controllers and NN-based observers with formal stability guarantees, for the first time in literature. Source code at https://github.com/Verified-Intelligence/Lyapunov_Stable_NN_Controllers △ Less

Submitted 4 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: Paper accepted by ICML 2024

arXiv:2404.07477 [pdf, ps, other]

Integrated Sensing and Communication Under DISCO Physical-Layer Jamming Attacks

Authors: Huan Huang, Hongliang Zhang, Weidong Mei, Jun Li, Yi Cai, A. Lee Swindlehurst, Zhu Han

Abstract: Integrated sensing and communication (ISAC) systems traditionally presuppose that sensing and communication (S&C) channels remain approximately constant during their coherence time. However, a "DISCO" reconfigurable intelligent surface (DRIS), i.e., an illegitimate RIS with random, time-varying reflection properties that acts like a "disco ball," introduces a paradigm shift that enables active cha… ▽ More Integrated sensing and communication (ISAC) systems traditionally presuppose that sensing and communication (S&C) channels remain approximately constant during their coherence time. However, a "DISCO" reconfigurable intelligent surface (DRIS), i.e., an illegitimate RIS with random, time-varying reflection properties that acts like a "disco ball," introduces a paradigm shift that enables active channel aging more rapidly during the channel coherence time. In this letter, we investigate the impact of DISCO jamming attacks launched by a DRISbased fully-passive jammer (FPJ) on an ISAC system. Specifically, an ISAC problem formulation and a corresponding waveform optimization are presented in which the ISAC waveform design considers the trade-off between the S&C performance and is formulated as a Pareto optimization problem. Moreover, a theoretical analysis is conducted to quantify the impact of DISCO jamming attacks. Numerical results are presented to evaluate the S&C performance under DISCO jamming attacks and to validate the derived theoretical analysis. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: This paper has been submitted for possible publication. For the code of the DISCO RIS is available on Github (https://github.com/huanhuan1799/Disco-Intelligent-Reflecting-Surfaces-Active-Channel-Aging-for-Fully-Passive-Jamming-Attacks)

arXiv:2404.06079 [pdf, other]

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

Authors: Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu

Abstract: Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS). In this paper, we describe the systems developed by the SJTU X-LANCE group for the TTS (acoustic + vocoder), SVS, and ASR tracks in the Interspeech 2024 Speech Processing Using Discrete Speech Unit Challen… ▽ More Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS). In this paper, we describe the systems developed by the SJTU X-LANCE group for the TTS (acoustic + vocoder), SVS, and ASR tracks in the Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge. Notably, we achieved 1st rank on the leaderboard in the TTS track both with the whole training set and only 1h training data, with the highest UTMOS score and lowest bitrate among all submissions. △ Less

Submitted 9 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: 5 pages, 3 figures. Report of a challenge

arXiv:2404.02467 [pdf, ps, other]

SSwsrNet: A Semi-Supervised Few-Shot Learning Framework for Wireless Signal Recognition

Authors: Hao Zhang, Fuhui Zhou, Qihui Wu, Naofal Al-Dhahir

Abstract: Wireless signal recognition (WSR) is crucial in modern and future wireless communication networks since it aims to identify properties of the received signal. Although many deep learning-based WSR models have been developed, they still rely on a large amount of labeled training data. Thus, they cannot tackle the few-sample problem in the practically and dynamically changing wireless communication… ▽ More Wireless signal recognition (WSR) is crucial in modern and future wireless communication networks since it aims to identify properties of the received signal. Although many deep learning-based WSR models have been developed, they still rely on a large amount of labeled training data. Thus, they cannot tackle the few-sample problem in the practically and dynamically changing wireless communication environment. To overcome this challenge, a novel SSwsrNet framework is proposed by using the deep residual shrinkage network (DRSN) and semi-supervised learning. The DRSN can learn discriminative features from noisy signals. Moreover, a modular semi-supervised learning method that combines labeled and unlabeled data using MixMatch is exploited to further improve the classification performance under few-sample conditions. Extensive simulation results on automatic modulation classification (AMC) and wireless technology classification (WTC) demonstrate that our proposed WSR scheme can achieve better performance than the benchmark schemes in terms of classification accuracy. This novel method enables more robust and adaptive signal recognition for next-generation wireless networks. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: accpeted by IEEE Transactions on Communications

Journal ref: IEEE Transactions on Communications,2024

arXiv:2404.01164 [pdf, ps, other]

Unified Predefined-time Stability Conditions of Nonlinear Systems with Lyapunov Analysis

Authors: Bing Xiao, Haichao Zhang, Shijie Zhao, Lu Cao

Abstract: This brief gives a set of unified Lyapunov stability conditions to guarantee the predefined-time/finite-time stability of a dynamical systems. The derived Lyapunov theorem for autonomous systems establishes equivalence with existing theorems on predefined-time/finite-time stability. The findings proposed herein develop a nonsingular sliding mode control framework for an Euler-Lagrange system to an… ▽ More This brief gives a set of unified Lyapunov stability conditions to guarantee the predefined-time/finite-time stability of a dynamical systems. The derived Lyapunov theorem for autonomous systems establishes equivalence with existing theorems on predefined-time/finite-time stability. The findings proposed herein develop a nonsingular sliding mode control framework for an Euler-Lagrange system to analyze its stability, and its upper bound for the settling time can be arbitrarily determined a priori through predefined time constant. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00631 [pdf, other]

doi 10.1016/j.phycom.2024.102350

Network-Assisted Full-Duplex Cell-Free mmWave Networks: Hybrid MIMO Processing and Multi-Agent DRL-Based Power Allocation

Authors: Qingrui Fan, Yu Zhang, Jiamin Li, Dongming Wang, Hongbiao Zhang, Xiaohu You

Abstract: This paper investigates the network-assisted full-duplex (NAFD) cell-free millimeter-wave (mmWave) networks, where the distribution of the transmitting access points (T-APs) and receiving access points (R-APs) across distinct geographical locations mitigates cross-link interference, facilitating the attainment of a truly flexible duplex mode. To curtail deployment expenses and power consumption fo… ▽ More This paper investigates the network-assisted full-duplex (NAFD) cell-free millimeter-wave (mmWave) networks, where the distribution of the transmitting access points (T-APs) and receiving access points (R-APs) across distinct geographical locations mitigates cross-link interference, facilitating the attainment of a truly flexible duplex mode. To curtail deployment expenses and power consumption for mmWave band operations, each AP incorporates a hybrid digital-analog structure encompassing precoder/combiner functions. However, this incorporation introduces processing intricacies within channel estimation and precoding/combining design. In this paper, we first present a hybrid multiple-input multiple-output (MIMO) processing framework and derive explicit expressions for both uplink and downlink achievable rates. Then we formulate a power allocation problem to maximize the weighted bidirectional sum rates. To tackle this non-convex problem, we develop a collaborative multi-agent deep reinforcement learning (MADRL) algorithm called multi-agent twin delayed deep deterministic policy gradient (MATD3) for NAFD cell-free mmWave networks. Specifically, given the tightly coupled nature of both uplink and downlink power coefficients in NAFD cell-free mmWave networks, the MATD3 algorithm resolves such coupled conflicts through an interactive learning process between agents and the environment. Finally, the simulation results validate the effectiveness of the proposed channel estimation methods within our hybrid MIMO processing paradigm, and demonstrate that our MATD3 algorithm outperforms both multi-agent deep deterministic policy gradient (MADDPG) and conventional power allocation strategies. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: 14 pages, 9 figures, published on Physical Communication

Journal ref: Physical Communication, volume 64, pages 102350, 2024

arXiv:2404.00327 [pdf, other]

YNetr: Dual-Encoder architecture on Plain Scan Liver Tumors (PSLT)

Authors: Wen Sheng, Zhong Zheng, Jiajun Liu, Han Lu, Hanyuan Zhang, Zhengyong Jiang, Zhihong Zhang, Dao** Zhu

Abstract: Background: Liver tumors are abnormal growths in the liver that can be either benign or malignant, with liver cancer being a significant health concern worldwide. However, there is no dataset for plain scan segmentation of liver tumors, nor any related algorithms. To fill this gap, we propose Plain Scan Liver Tumors(PSLT) and YNetr. Methods: A collection of 40 liver tumor plain scan segmentation d… ▽ More Background: Liver tumors are abnormal growths in the liver that can be either benign or malignant, with liver cancer being a significant health concern worldwide. However, there is no dataset for plain scan segmentation of liver tumors, nor any related algorithms. To fill this gap, we propose Plain Scan Liver Tumors(PSLT) and YNetr. Methods: A collection of 40 liver tumor plain scan segmentation datasets was assembled and annotated. Concurrently, we utilized Dice coefficient as the metric for assessing the segmentation outcomes produced by YNetr, having advantage of capturing different frequency information. Results: The YNetr model achieved a Dice coefficient of 62.63% on the PSLT dataset, surpassing the other publicly available model by an accuracy margin of 1.22%. Comparative evaluations were conducted against a range of models including UNet 3+, XNet, UNetr, Swin UNetr, Trans-BTS, COTr, nnUNetv2 (2D), nnUNetv2 (3D fullres), MedNext (2D) and MedNext(3D fullres). Conclusions: We not only proposed a dataset named PSLT(Plain Scan Liver Tumors), but also explored a structure called YNetr that utilizes wavelet transform to extract different frequency information, which having the SOTA in PSLT by experiments. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 15 pages

Showing 1–50 of 709 results for author: Zhang, H