Search | arXiv e-print repository

LU2Net: A Lightweight Network for Real-time Underwater Image Enhancement

Authors: Haodong Yang, Jisheng Xu, Zhiliang Lin, Jian** He

Abstract: Computer vision techniques have empowered underwater robots to effectively undertake a multitude of tasks, including object tracking and path planning. However, underwater optical factors like light refraction and absorption present challenges to underwater vision, which cause degradation of underwater images. A variety of underwater image enhancement methods have been proposed to improve the effe… ▽ More Computer vision techniques have empowered underwater robots to effectively undertake a multitude of tasks, including object tracking and path planning. However, underwater optical factors like light refraction and absorption present challenges to underwater vision, which cause degradation of underwater images. A variety of underwater image enhancement methods have been proposed to improve the effectiveness of underwater vision perception. Nevertheless, for real-time vision tasks on underwater robots, it is necessary to overcome the challenges associated with algorithmic efficiency and real-time capabilities. In this paper, we introduce Lightweight Underwater Unet (LU2Net), a novel U-shape network designed specifically for real-time enhancement of underwater images. The proposed model incorporates axial depthwise convolution and the channel attention module, enabling it to significantly reduce computational demands and model parameters, thereby improving processing speed. The extensive experiments conducted on the dataset and real-world underwater robots demonstrate the exceptional performance and speed of proposed model. It is capable of providing well-enhanced underwater images at a speed 8 times faster than the current state-of-the-art underwater image enhancement method. Moreover, LU2Net is able to handle real-time underwater video enhancement. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.12783 [pdf, ps, other]

Zeroing neural dynamics solving time-variant complex conjugate matrix equation

Authors: Jiakuang He, Dongqing Wu

Abstract: Complex conjugate matrix equations (CCME) have aroused the interest of many researchers because of computations and antilinear systems. Existing research is dominated by its time-invariant solving methods, but lacks proposed theories for solving its time-variant version. Moreover, artificial neural networks are rarely studied for solving CCME. In this paper, starting with the earliest CCME, zeroin… ▽ More Complex conjugate matrix equations (CCME) have aroused the interest of many researchers because of computations and antilinear systems. Existing research is dominated by its time-invariant solving methods, but lacks proposed theories for solving its time-variant version. Moreover, artificial neural networks are rarely studied for solving CCME. In this paper, starting with the earliest CCME, zeroing neural dynamics (ZND) is applied to solve its time-variant version. Firstly, the vectorization and Kronecker product in the complex field are defined uniformly. Secondly, Con-CZND1 model and Con-CZND2 model are proposed and theoretically prove convergence and effectiveness. Thirdly, three numerical experiments are designed to illustrate the effectiveness of the two models, compare their differences, highlight the significance of neural dynamics in the complex field, and refine the theory related to ZND. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12186 [pdf, ps, other]

Unlocking the Potential of Early Epochs: Uncertainty-aware CT Metal Artifact Reduction

Authors: Xinquan Yang, Guanqun Zhou, Wei Sun, Youjian Zhang, Zhongya Wang, Jiahui He, Zhicheng Zhang

Abstract: In computed tomography (CT), the presence of metallic implants in patients often leads to disruptive artifacts in the reconstructed images, hindering accurate diagnosis. Recently, a large amount of supervised deep learning-based approaches have been proposed for metal artifact reduction (MAR). However, these methods neglect the influence of initial training weights. In this paper, we have discover… ▽ More In computed tomography (CT), the presence of metallic implants in patients often leads to disruptive artifacts in the reconstructed images, hindering accurate diagnosis. Recently, a large amount of supervised deep learning-based approaches have been proposed for metal artifact reduction (MAR). However, these methods neglect the influence of initial training weights. In this paper, we have discovered that the uncertainty image computed from the restoration result of initial training weights can effectively highlight high-frequency regions, including metal artifacts. This observation can be leveraged to assist the MAR network in removing metal artifacts. Therefore, we propose an uncertainty constraint (UC) loss that utilizes the uncertainty image as an adaptive weight to guide the MAR network to focus on the metal artifact region, leading to improved restoration. The proposed UC loss is designed to be a plug-and-play method, compatible with any MAR framework, and easily adoptable. To validate the effectiveness of the UC loss, we conduct extensive experiments on the public available Deeplesion and CLINIC-metal dataset. Experimental results demonstrate that the UC loss further optimizes the network training process and significantly improves the removal of metal artifacts. △ Less

Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2405.15830 [pdf, other]

Diff-DTI: Fast Diffusion Tensor Imaging Using A Feature-Enhanced Joint Diffusion Model

Authors: Lang Zhang, **ling He, Dong Liang, Hairong Zheng, Yanjie Zhu

Abstract: Magnetic resonance diffusion tensor imaging (DTI) is a critical tool for neural disease diagnosis. However, long scan time greatly hinders the widespread clinical use of DTI. To accelerate image acquisition, a feature-enhanced joint diffusion model (Diff-DTI) is proposed to obtain accurate DTI parameter maps from a limited number of diffusion-weighted images (DWIs). Diff-DTI introduces a joint dif… ▽ More Magnetic resonance diffusion tensor imaging (DTI) is a critical tool for neural disease diagnosis. However, long scan time greatly hinders the widespread clinical use of DTI. To accelerate image acquisition, a feature-enhanced joint diffusion model (Diff-DTI) is proposed to obtain accurate DTI parameter maps from a limited number of diffusion-weighted images (DWIs). Diff-DTI introduces a joint diffusion model that directly learns the joint probability distribution of DWIs with DTI parametric maps for conditional generation. Additionally, a feature enhancement fusion mechanism (FEFM) is designed and incorporated into the generative process of Diff-DTI to preserve fine structures in the generated DTI maps. A comprehensive evaluation of the performance of Diff-DTI was conducted on the Human Connectome Project dataset. The results demonstrate that Diff-DTI outperforms existing state-of-the-art fast DTI imaging methods in terms of visual quality and quantitative metrics. Furthermore, Diff-DTI has shown the ability to produce high-fidelity DTI maps with only three DWIs, thus overcoming the requirement of a minimum of six DWIs for DTI. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 11 pages, 7 figures

arXiv:2405.15241 [pdf, other]

Blaze3DM: Marry Triplane Representation with Diffusion for 3D Medical Inverse Problem Solving

Authors: Jia He, Bonan Li, Ge Yang, Ziwen Liu

Abstract: Solving 3D medical inverse problems such as image restoration and reconstruction is crucial in modern medical field. However, the curse of dimensionality in 3D medical data leads mainstream volume-wise methods to suffer from high resource consumption and challenges models to successfully capture the natural distribution, resulting in inevitable volume inconsistency and artifacts. Some recent works… ▽ More Solving 3D medical inverse problems such as image restoration and reconstruction is crucial in modern medical field. However, the curse of dimensionality in 3D medical data leads mainstream volume-wise methods to suffer from high resource consumption and challenges models to successfully capture the natural distribution, resulting in inevitable volume inconsistency and artifacts. Some recent works attempt to simplify generation in the latent space but lack the capability to efficiently model intricate image details. To address these limitations, we present Blaze3DM, a novel approach that enables fast and high-fidelity generation by integrating compact triplane neural field and powerful diffusion model. In technique, Blaze3DM begins by optimizing data-dependent triplane embeddings and a shared decoder simultaneously, reconstructing each triplane back to the corresponding 3D volume. To further enhance 3D consistency, we introduce a lightweight 3D aware module to model the correlation of three vertical planes. Then, diffusion model is trained on latent triplane embeddings and achieves both unconditional and conditional triplane generation, which is finally decoded to arbitrary size volume. Extensive experiments on zero-shot 3D medical inverse problem solving, including sparse-view CT, limited-angle CT, compressed-sensing MRI, and MRI isotropic super-resolution, demonstrate that Blaze3DM not only achieves state-of-the-art performance but also markedly improves computational efficiency over existing methods (22~40x faster than previous work). △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.04258 [pdf, other]

A Weighted Least-Squares Method for Non-Asymptotic Identification of Markov Parameters from Multiple Trajectories

Authors: Jiabao He, Cristian R. Rojas, Håkan Hjalmarsson

Abstract: Markov parameters play a key role in system identification. There exists many algorithms where these parameters are estimated using least-squares in a first, pre-processing, step, including subspace identification and multi-step least-squares algorithms, such as Weighted Null-Space Fitting. Recently, there has been an increasing interest in non-asymptotic analysis of estimation algorithms. In this… ▽ More Markov parameters play a key role in system identification. There exists many algorithms where these parameters are estimated using least-squares in a first, pre-processing, step, including subspace identification and multi-step least-squares algorithms, such as Weighted Null-Space Fitting. Recently, there has been an increasing interest in non-asymptotic analysis of estimation algorithms. In this contribution we identify the Markov parameters using weighted least-squares and present non-asymptotic analysis for such estimator. To cover both stable and unstable systems, multiple trajectories are collected. We show that with the optimal weighting matrix, weighted least-squares gives a tighter error bound than ordinary least-squares for the case of non-uniformly distributed measurement errors. Moreover, as the optimal weighting matrix depends on the system's true parameters, we introduce two methods to consistently estimate the optimal weighting matrix, where the convergence rate of these estimates is also provided. Numerical experiments demonstrate improvements of weighted least-squares over ordinary least-squares in finite sample settings. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.04250 [pdf, other]

Weighted Least-Squares PARSIM

Authors: Jiabao He, Cristian R. Rojas, Håkan Hjalmarsson

Abstract: Subspace identification methods (SIMs) have proven very powerful for estimating linear state-space models. To overcome the deficiencies of classical SIMs, a significant number of algorithms has appeared over the last two decades, where most of them involve a common intermediate step, that is to estimate the range space of the extended observability matrix. In this contribution, an optimized versio… ▽ More Subspace identification methods (SIMs) have proven very powerful for estimating linear state-space models. To overcome the deficiencies of classical SIMs, a significant number of algorithms has appeared over the last two decades, where most of them involve a common intermediate step, that is to estimate the range space of the extended observability matrix. In this contribution, an optimized version of the parallel and parsimonious SIM (PARSIM), PARSIM\textsubscript{opt}, is proposed by using weighted least-squares. It not only inherits all the benefits of PARSIM but also attains the best linear unbiased estimator for the above intermediate step. Furthermore, inspired by SIMs based on the predictor form, consistent estimates of the optimal weighting matrix for weighted least-squares are derived. Essential similarities, differences and simulated comparisons of some key SIMs related to our method are also presented. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2404.19500 [pdf, other]

Towards Real-world Video Face Restoration: A New Benchmark

Authors: Ziyan Chen, **gwen He, Xinqi Lin, Yu Qiao, Chao Dong

Abstract: Blind face restoration (BFR) on images has significantly progressed over the last several years, while real-world video face restoration (VFR), which is more challenging for more complex face motions such as moving gaze directions and facial orientations involved, remains unsolved. Typical BFR methods are evaluated on privately synthesized datasets or self-collected real-world low-quality face ima… ▽ More Blind face restoration (BFR) on images has significantly progressed over the last several years, while real-world video face restoration (VFR), which is more challenging for more complex face motions such as moving gaze directions and facial orientations involved, remains unsolved. Typical BFR methods are evaluated on privately synthesized datasets or self-collected real-world low-quality face images, which are limited in their coverage of real-world video frames. In this work, we introduced new real-world datasets named FOS with a taxonomy of "Full, Occluded, and Side" faces from mainly video frames to study the applicability of current methods on videos. Compared with existing test datasets, FOS datasets cover more diverse degradations and involve face samples from more complex scenarios, which helps to revisit current face restoration approaches more comprehensively. Given the established datasets, we benchmarked both the state-of-the-art BFR methods and the video super resolution (VSR) methods to comprehensively study current approaches, identifying their potential and limitations in VFR tasks. In addition, we studied the effectiveness of the commonly used image quality assessment (IQA) metrics and face IQA (FIQA) metrics by leveraging a subjective user study. With extensive experimental results and detailed analysis provided, we gained insights from the successes and failures of both current BFR and VSR methods. These results also pose challenges to current face restoration approaches, which we hope stimulate future advances in VFR research. △ Less

Submitted 4 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: Project page: https://ziyannchen.github.io/projects/VFRxBenchmark/

arXiv:2404.17411 [pdf, ps, other]

Low-Complexity Near-Field Channel Estimation for Hybrid RIS Assisted Systems

Authors: Rafaela Schroeder, Jiguang He, Hamza Djelouat, Markku Juntti

Abstract: We investigate the channel estimation (CE) problem for hybrid RIS assisted systems and focus on the near-field (NF) regime. Different from their far-field counterparts, NF channels possess a block-sparsity property, which is leveraged in the two developed CE algorithms: (i) boundary estimation and sub-vector recovery (BESVR) and (ii) linear total variation regularization (TVR). In addition, we ado… ▽ More We investigate the channel estimation (CE) problem for hybrid RIS assisted systems and focus on the near-field (NF) regime. Different from their far-field counterparts, NF channels possess a block-sparsity property, which is leveraged in the two developed CE algorithms: (i) boundary estimation and sub-vector recovery (BESVR) and (ii) linear total variation regularization (TVR). In addition, we adopt the alternating direction method of multipliers to reduce their computational complexity. Numerical results show that the linear TVR algorithm outperforms the chosen baseline schemes in terms of normalized mean square error in the high signal-to-noise ratio regime while the BESVR algorithm achieves comparable performance to the baseline schemes but with the added advantage of minimal CPU time. △ Less

Submitted 30 April, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

Comments: 5 pages, 5 figures

arXiv:2404.17331 [pdf, ps, other]

Finite Sample Analysis for a Class of Subspace Identification Methods

Authors: Jiabao He, Ingvar Ziemann, Cristian R. Rojas, Håkan Hjalmarsson

Abstract: While subspace identification methods (SIMs) are appealing due to their simple parameterization for MIMO systems and robust numerical realizations, a comprehensive statistical analysis of SIMs remains an open problem, especially in the non-asymptotic regime. In this work, we provide a finite sample analysis for a class of SIMs, which reveals that the convergence rates for estimating Markov paramet… ▽ More While subspace identification methods (SIMs) are appealing due to their simple parameterization for MIMO systems and robust numerical realizations, a comprehensive statistical analysis of SIMs remains an open problem, especially in the non-asymptotic regime. In this work, we provide a finite sample analysis for a class of SIMs, which reveals that the convergence rates for estimating Markov parameters and system matrices are $\mathcal{O}(1/\sqrt{N})$, in line with classical asymptotic results. Based on the observation that the model format in classical SIMs becomes non-causal because of a projection step, we choose a parsimonious SIM that bypasses the projection step and strictly enforces a causal model to facilitate the analysis, where a bank of ARX models are estimated in parallel. Leveraging recent results from finite sample analysis of an individual ARX model, we obtain an overall error bound of an array of ARX models and proceed to derive error bounds for system matrices via robustness results for the singular value decomposition. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.14879 [pdf, other]

Device-Free 3D Drone Localization in RIS-Assisted mmWave MIMO Networks

Authors: Jiguang He, Charles Vanwynsberghe, Hui Chen, Chongwen Huang, Aymen Fakhreddine

Abstract: In this paper, we investigate the potential of reconfigurable intelligent surfaces (RISs) in facilitating passive/device-free three-dimensional (3D) drone localization within existing cellular infrastructure operating at millimeter-wave (mmWave) frequencies and employing multiple antennas at the transceivers. The developed localization system operates in the bi-static mode without requiring direct… ▽ More In this paper, we investigate the potential of reconfigurable intelligent surfaces (RISs) in facilitating passive/device-free three-dimensional (3D) drone localization within existing cellular infrastructure operating at millimeter-wave (mmWave) frequencies and employing multiple antennas at the transceivers. The developed localization system operates in the bi-static mode without requiring direct communication between the drone and the base station. We analyze the theoretical performance limits via Fisher information analysis and Cramér Rao lower bounds (CRLBs). Furthermore, we develop a low-complexity yet effective drone localization algorithm based on coordinate gradient descent and examine the impact of factors such as radar cross section (RCS) of the drone and training overhead on system performance. It is demonstrated that integrating RIS yields significant benefits over its RIS-free counterpart, as evidenced by both theoretical analyses and numerical simulations. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 6 pages, 5 figures, submitted to IEEE GLOBECOM 2024

arXiv:2404.12257 [pdf, other]

Food Portion Estimation via 3D Object Scaling

Authors: Gautham Vinod, Jiangpeng He, Zeman Shao, Fengqing Zhu

Abstract: Image-based methods to analyze food images have alleviated the user burden and biases associated with traditional methods. However, accurate portion estimation remains a major challenge due to the loss of 3D information in the 2D representation of foods captured by smartphone cameras or wearable devices. In this paper, we propose a new framework to estimate both food volume and energy from 2D imag… ▽ More Image-based methods to analyze food images have alleviated the user burden and biases associated with traditional methods. However, accurate portion estimation remains a major challenge due to the loss of 3D information in the 2D representation of foods captured by smartphone cameras or wearable devices. In this paper, we propose a new framework to estimate both food volume and energy from 2D images by leveraging the power of 3D food models and physical reference in the eating scene. Our method estimates the pose of the camera and the food object in the input image and recreates the eating occasion by rendering an image of a 3D model of the food with the estimated poses. We also introduce a new dataset, SimpleFood45, which contains 2D images of 45 food items and associated annotations including food volume, weight, and energy. Our method achieves an average error of 31.10 kCal (17.67%) on this dataset, outperforming existing portion estimation methods. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.07507 [pdf, other]

Learning to Classify New Foods Incrementally Via Compressed Exemplars

Authors: Justin Yang, Zhihao Duan, Jiangpeng He, Fengqing Zhu

Abstract: Food image classification systems play a crucial role in health monitoring and diet tracking through image-based dietary assessment techniques. However, existing food recognition systems rely on static datasets characterized by a pre-defined fixed number of food classes. This contrasts drastically with the reality of food consumption, which features constantly changing data. Therefore, food image… ▽ More Food image classification systems play a crucial role in health monitoring and diet tracking through image-based dietary assessment techniques. However, existing food recognition systems rely on static datasets characterized by a pre-defined fixed number of food classes. This contrasts drastically with the reality of food consumption, which features constantly changing data. Therefore, food image classification systems should adapt to and manage data that continuously evolves. This is where continual learning plays an important role. A challenge in continual learning is catastrophic forgetting, where ML models tend to discard old knowledge upon learning new information. While memory-replay algorithms have shown promise in mitigating this problem by storing old data as exemplars, they are hampered by the limited capacity of memory buffers, leading to an imbalance between new and previously learned data. To address this, our work explores the use of neural image compression to extend buffer size and enhance data diversity. We introduced the concept of continuously learning a neural compression model to adaptively improve the quality of compressed data and optimize the bitrates per pixel (bpp) to store more exemplars. Our extensive experiments, including evaluations on food-specific datasets including Food-101 and VFN-74, as well as the general dataset ImageNet-100, demonstrate improvements in classification accuracy. This progress is pivotal in advancing more realistic food recognition systems that are capable of adapting to continually evolving data. Moreover, the principles and methodologies we've developed hold promise for broader applications, extending their benefits to other domains of continual machine learning systems. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.08343 [pdf, ps, other]

Coverage and Rate Analysis for Integrated Sensing and Communication Networks

Authors: Xu Gan, Chongwen Huang, Zhaohui Yang, Xiaoming Chen, Jiguang He, Zhaoyang Zhang, Chau Yuen, Yong Liang Guan, Mérouane Debbah

Abstract: Integrated sensing and communication (ISAC) is increasingly recognized as a pivotal technology for next-generation cellular networks, offering mutual benefits in both sensing and communication capabilities. This advancement necessitates a re-examination of the fundamental limits within networks where these two functions coexist via shared spectrum and infrastructures. However, traditional stochast… ▽ More Integrated sensing and communication (ISAC) is increasingly recognized as a pivotal technology for next-generation cellular networks, offering mutual benefits in both sensing and communication capabilities. This advancement necessitates a re-examination of the fundamental limits within networks where these two functions coexist via shared spectrum and infrastructures. However, traditional stochastic geometry-based performance analyses are confined to either communication or sensing networks separately. This paper bridges this gap by introducing a generalized stochastic geometry framework in ISAC networks. Based on this framework, we define and calculate the coverage and ergodic rate of sensing and communication performance under resource constraints. Then, we shed light on the fundamental limits of ISAC networks by presenting theoretical results for the coverage rate of the unified performance, taking into account the coupling effects of dual functions in coexistence networks. Further, we obtain the analytical formulations for evaluating the ergodic sensing rate constrained by the maximum communication rate, and the ergodic communication rate constrained by the maximum sensing rate. Extensive numerical results validate the accuracy of all theoretical derivations, and also indicate that denser networks significantly enhance ISAC coverage. Specifically, increasing the base station density from $1$ $\text{km}^{-2}$ to $10$ $\text{km}^{-2}$ can boost the ISAC coverage rate from $1.4\%$ to $39.8\%$. Further, results also reveal that with the increase of the constrained sensing rate, the ergodic communication rate improves significantly, but the reverse is not obvious. △ Less

Submitted 22 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.06074 [pdf, other]

Hashing Beam Training for Near-Field Communications

Authors: Yuan Xu, Li Wei, Chongwen Huang, Chen Zhu, Zhaohui Yang, Jun Yang, Jiguang He, Zhaoyang Zhang, Mérouane Debbah

Abstract: In this paper, we investigate the millimeter-wave (mmWave) near-field beam training problem to find the correct beam direction. In order to address the high complexity and low identification accuracy of existing beam training techniques, we propose an efficient hashing multi-arm beam (HMB) training scheme for the near-field scenario. Specifically, we first design a set of sparse bases based on the… ▽ More In this paper, we investigate the millimeter-wave (mmWave) near-field beam training problem to find the correct beam direction. In order to address the high complexity and low identification accuracy of existing beam training techniques, we propose an efficient hashing multi-arm beam (HMB) training scheme for the near-field scenario. Specifically, we first design a set of sparse bases based on the polar domain sparsity of the near-field channel. Then, the random hash functions are chosen to construct the near-field multi-arm beam training codebook. Each multi-arm beam codeword is scanned in a time slot until all the predefined codewords are traversed. Finally, the soft decision and voting methods are applied to distinguish the signal from different base stations and obtain correctly aligned beams. Simulation results show that our proposed near-field HMB training method can reduce the beam training overhead to the logarithmic level, and achieve 96.4% identification accuracy of exhaustive beam training. Moreover, we also verify applicability under the far-field scenario. △ Less

Submitted 9 April, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2402.04913

arXiv:2403.06073 [pdf, other]

Stochastic Geometry Analysis for Distributed RISs-Assisted mmWave Communications

Authors: Yuan Xu, Li Wei, Chongwen Huang, Yongxu Zhu, Zhaohui Yang, Jun Yang, Jiguang He, Zhaoyang Zhang, Mérouane Debbah

Abstract: Millimeter wave (mmWave) has attracted considerable attention due to its wide bandwidth and high frequency. However, it is highly susceptible to blockages, resulting in significant degradation of the coverage and the sum rate. A promising approach is deploying distributed reconfigurable intelligent surfaces (RISs), which can establish extra communication links. In this paper, we investigate the im… ▽ More Millimeter wave (mmWave) has attracted considerable attention due to its wide bandwidth and high frequency. However, it is highly susceptible to blockages, resulting in significant degradation of the coverage and the sum rate. A promising approach is deploying distributed reconfigurable intelligent surfaces (RISs), which can establish extra communication links. In this paper, we investigate the impact of distributed RISs on the coverage probability and the sum rate in mmWave wireless communication systems. Specifically, we first introduce the system model, which includes the blockage, the RIS and the user distribution models, leveraging the Poisson point process. Then, we define the association criterion and derive the conditional coverage probabilities for the two cases of direct association and reflective association through RISs. Finally, we combine the two cases using Campbell's theorem and the total probability theorem to obtain the closed-form expressions for the ergodic coverage probability and the sum rate. Simulation results validate the effectiveness of the proposed analytical approach, demonstrating that the deployment of distributed RISs significantly improves the ergodic coverage probability by 45.4% and the sum rate by over 1.5 times. △ Less

Submitted 9 April, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2402.06154

arXiv:2403.05970 [pdf, other]

Electromagnetic Hybrid Beamforming for Holographic Communications

Authors: Ran Ji, Chongwen Huang, Xiaoming Chen, Wei E. I. Sha, Linglong Dai, Jiguang He, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

Abstract: It is well known that there is inherent radiation pattern distortion for the commercial base station antenna array, which usually needs three antenna sectors to cover the whole space. To eliminate pattern distortion and further enhance beamforming performance, we propose an electromagnetic hybrid beamforming (EHB) scheme based on a three-dimensional (3D) superdirective holographic antenna array. S… ▽ More It is well known that there is inherent radiation pattern distortion for the commercial base station antenna array, which usually needs three antenna sectors to cover the whole space. To eliminate pattern distortion and further enhance beamforming performance, we propose an electromagnetic hybrid beamforming (EHB) scheme based on a three-dimensional (3D) superdirective holographic antenna array. Specifically, EHB consists of antenna excitation current vectors (analog beamforming) and digital precoding matrices, where the implementation of analog beamforming involves the real-time adjustment of the radiation pattern to adapt it to the dynamic wireless environment. Meanwhile, the digital beamforming is optimized based on the channel characteristics of analog beamforming to further improve the achievable rate of communication systems. An electromagnetic channel model incorporating array radiation patterns and the mutual coupling effect is also developed to evaluate the benefits of our proposed scheme. Simulation results demonstrate that our proposed EHB scheme with a 3D holographic array achieves a relatively flat superdirective beamforming gain and allows for programmable focusing directions throughout the entire spatial domain. Furthermore, they also verify that the proposed scheme achieves a sum rate gain of over 150% compared to traditional beamforming algorithms. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 13 pages

arXiv:2402.18862 [pdf, other]

Towards Backward-Compatible Continual Learning of Image Compression

Authors: Zhihao Duan, Ming Lu, Justin Yang, Jiangpeng He, Zhan Ma, Fengqing Zhu

Abstract: This paper explores the possibility of extending the capability of pre-trained neural image compressors (e.g., adapting to new data or target bitrates) without breaking backward compatibility, the ability to decode bitstreams encoded by the original model. We refer to this problem as continual learning of image compression. Our initial findings show that baseline solutions, such as end-to-end fine… ▽ More This paper explores the possibility of extending the capability of pre-trained neural image compressors (e.g., adapting to new data or target bitrates) without breaking backward compatibility, the ability to decode bitstreams encoded by the original model. We refer to this problem as continual learning of image compression. Our initial findings show that baseline solutions, such as end-to-end fine-tuning, do not preserve the desired backward compatibility. To tackle this, we propose a knowledge replay training strategy that effectively addresses this issue. We also design a new model architecture that enables more effective continual learning than existing baselines. Experiments are conducted for two scenarios: data-incremental learning and rate-incremental learning. The main conclusion of this paper is that neural image compressors can be fine-tuned to achieve better performance (compared to their pre-trained version) on new data and rates without compromising backward compatibility. Our code is available at https://gitlab.com/viper-purdue/continual-compression △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: Accepted to CVPR 2024

arXiv:2402.16619 [pdf]

Magnetic resonance delta radiomics to track radiation response in lung tumors receiving stereotactic MRI-guided radiotherapy

Authors: Yining Zha, Benjamin H. Kann, Zezhong Ye, Anna Zapaishchykova, John He, Shu-Hui Hsu, Jonathan E. Leeman, Kelly J. Fitzgerald, David E. Kozono, Raymond H. Mak, Hugo J. W. L. Aerts

Abstract: Introduction: Lung cancer is a leading cause of cancer-related mortality, and stereotactic body radiotherapy (SBRT) has become a standard treatment for early-stage lung cancer. However, the heterogeneous response to radiation at the tumor level poses challenges. Currently, standardized dosage regimens lack adaptation based on individual patient or tumor characteristics. Thus, we explore the potent… ▽ More Introduction: Lung cancer is a leading cause of cancer-related mortality, and stereotactic body radiotherapy (SBRT) has become a standard treatment for early-stage lung cancer. However, the heterogeneous response to radiation at the tumor level poses challenges. Currently, standardized dosage regimens lack adaptation based on individual patient or tumor characteristics. Thus, we explore the potential of delta radiomics from on-treatment magnetic resonance (MR) imaging to track radiation dose response, inform personalized radiotherapy dosing, and predict outcomes. Methods: A retrospective study of 47 MR-guided lung SBRT treatments for 39 patients was conducted. Radiomic features were extracted using Pyradiomics, and stability was evaluated temporally and spatially. Delta radiomics were correlated with radiation dose delivery and assessed for associations with tumor control and survival with Cox regressions. Results: Among 107 features, 49 demonstrated temporal stability, and 57 showed spatial stability. Fifteen stable and non-collinear features were analyzed. Median Skewness and surface to volume ratio decreased with radiation dose fraction delivery, while coarseness and 90th percentile values increased. Skewness had the largest relative median absolute changes (22%-45%) per fraction from baseline and was associated with locoregional failure (p=0.012) by analysis of covariance. Skewness, Elongation, and Flatness were significantly associated with local recurrence-free survival, while tumor diameter and volume were not. Conclusions: Our study establishes the feasibility and stability of delta radiomics analysis for MR-guided lung SBRT. Findings suggest that MR delta radiomics can capture short-term radiographic manifestations of intra-tumoral radiation effect. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.16129 [pdf, other]

Localization in Reconfigurable Intelligent Surface Aided mmWave Systems: A Multiple Measurement Vector Based Channel Estimation Method

Authors: Kunlun Li, Jiguang He, Mohammed El-Hajjar, Lie-Liang Yang

Abstract: The sparsity of millimeter wave (mmWave) channels in the angular and temporal domains is beneficial to channel estimation, while the associated channel parameters can be utilized for localization. However, line-of-sight (LoS) blockage poses a significant challenge on the localization in mmWave systems, potentially leading to substantial positioning errors. A promising solution is to employ reconfi… ▽ More The sparsity of millimeter wave (mmWave) channels in the angular and temporal domains is beneficial to channel estimation, while the associated channel parameters can be utilized for localization. However, line-of-sight (LoS) blockage poses a significant challenge on the localization in mmWave systems, potentially leading to substantial positioning errors. A promising solution is to employ reconfigurable intelligent surface (RIS) to generate the virtual line-of-sight (VLoS) paths to aid localization. Consequently, wireless localization in the RIS-assisted mmWave systems has become the essential research issue. In this paper, a multiple measurement vector (MMV) model is constructed and a two-stage channel estimation based localization scheme is proposed. During the first stage, by exploiting the beamspace sparsity and employing a random RIS phase shift matrix, the channel parameters are estimated, based on which the precoder at base station and combiner at user equipment (UE) are designed. Then, in the second stage, based on the designed precoding and combining matrices, the optimal phase shift matrix for RIS is designed using the proposed modified temporally correlated multiple sparse Bayesian learning (TMSBL) algorithm. Afterwards, the channel parameters, such as angle of reflection, time-of-arrival, etc., embedding location information are estimated for finally deriving the location of UE. We demonstrate the achievable performance of the proposed algorithm and compare it with the state-of-the-art algorithms. Our studies show that the proposed localization scheme is capable of achieving centimeter level localization accuracy, when LoS path is blocked. Furthermore, the proposed algorithm has a low computational complexity and outperforms the legacy algorithms in different perspectives. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.15857 [pdf, other]

ELAA Near-Field Localization and Sensing with Partial Blockage Detection

Authors: Hui Chen, Pinjun Zheng, Yu Ge, Ahmed Elzanaty, Jiguang He, Tareq Y. Al-Naffouri, Henk Wymeersch

Abstract: High-frequency communication systems bring extremely large aperture arrays (ELAA) and large bandwidths, integrating localization and (bi-static) sensing functions without extra infrastructure. Such systems are likely to operate in the near-field (NF), where the performance of localization and sensing is degraded if a simplified far-field channel model is considered. However, when taking advantage… ▽ More High-frequency communication systems bring extremely large aperture arrays (ELAA) and large bandwidths, integrating localization and (bi-static) sensing functions without extra infrastructure. Such systems are likely to operate in the near-field (NF), where the performance of localization and sensing is degraded if a simplified far-field channel model is considered. However, when taking advantage of the additional geometry information in the NF, e.g., the encapsulated information in the wavefront, localization and sensing performance can be improved. In this work, we formulate a joint synchronization, localization, and sensing problem in the NF. Considering the array size could be much larger than an obstacle, the effect of partial blockage (i.e., a portion of antennas are blocked) is investigated, and a blockage detection algorithm is proposed. The simulation results show that blockage greatly impacts performance for certain positions, and the proposed blockage detection algorithm can mitigate this impact by identifying the blocked antennas. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.09752 [pdf]

Vector spectrometer with Hertz-level resolution and super-recognition capability

Authors: Ting Qing, Shupeng Li, Huashan Yang, Lihan Wang, Yijie Fang, Xiaohu Tang, Meihui Cao, Jianming Lu, Jijun He, Junqiu Liu, Yueguang Lyu, Shilong Pan

Abstract: High-resolution optical spectrometers are crucial in revealing intricate characteristics of signals, determining laser frequencies, measuring physical constants, identifying substances, and advancing biosensing applications. Conventional spectrometers, however, often grapple with inherent trade-offs among spectral resolution, wavelength range, and accuracy. Furthermore, even at high resolution, re… ▽ More High-resolution optical spectrometers are crucial in revealing intricate characteristics of signals, determining laser frequencies, measuring physical constants, identifying substances, and advancing biosensing applications. Conventional spectrometers, however, often grapple with inherent trade-offs among spectral resolution, wavelength range, and accuracy. Furthermore, even at high resolution, resolving overlap** spectral lines during spectroscopic analyses remains a huge challenge. Here, we propose a vector spectrometer with ultrahigh resolution, combining broadband optical frequency hop**, ultrafine microwave-photonic scanning, and vector detection. A programmable frequency-hop** laser was developed, facilitating a sub-Hz linewidth and Hz-level frequency stability, an improvement of four and six orders of magnitude, respectively, compared to those of state-of-the-art tunable lasers. We also designed an asymmetric optical transmitter and receiver to eliminate measurement errors arising from modulation nonlinearity and multi-channel crosstalk. The resultant vector spectrometer exhibits an unprecedented frequency resolution of 2 Hz, surpassing the state-of-the-art by four orders of magnitude, over a 33-nm range. Through high-resolution vector analysis, we observed that group delay information enhances the separation capability of overlap** spectral lines by over 47%, significantly streamlining the real-time identification of diverse substances. Our technique fills the gap in optical spectrometers with resolutions below 10 kHz and enables vector measurement to embrace revolution in functionality. △ Less

Submitted 6 March, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: 21 pages, 6 figures

arXiv:2402.09372 [pdf, other]

Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

Authors: Jiancheng Yang, Rui Shi, Liang **, Xiaoyang Huang, Kaiming Kuang, Donglai Wei, Shixuan Gu, Jianying Liu, Pengfei Liu, Zhizhong Chai, Yongjie Xiao, Hao Chen, Liming Xu, Bang Du, Xiangyi Yan, Hao Tang, Adam Alessio, Gregory Holste, Jiapeng Zhang, Xiaoming Wang, Jianye He, Lixuan Che, Hanspeter Pfister, Ming Li, Bingbing Ni

Abstract: Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmar… ▽ More Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmark dataset of over 5,000 rib fractures from 660 CT scans, with voxel-level instance mask annotations and diagnosis labels for four clinical categories (buckle, nondisplaced, displaced, or segmental). The challenge includes two tracks: a detection (instance segmentation) track evaluated by an FROC-style metric and a classification track evaluated by an F1-style metric. During the MICCAI 2020 challenge period, 243 results were evaluated, and seven teams were invited to participate in the challenge summary. The analysis revealed that several top rib fracture detection solutions achieved performance comparable or even better than human experts. Nevertheless, the current rib fracture classification solutions are hardly clinically applicable, which can be an interesting area in the future. As an active benchmark and research resource, the data and online evaluation of the RibFrac Challenge are available at the challenge website. As an independent contribution, we have also extended our previous internal baseline by incorporating recent advancements in large-scale pretrained networks and point-based rib segmentation techniques. The resulting FracNet+ demonstrates competitive performance in rib fracture detection, which lays a foundation for further research and development in AI-assisted rib fracture detection and diagnosis. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: Challenge paper for MICCAI RibFrac Challenge (https://ribfrac.grand-challenge.org/)

arXiv:2402.09181 [pdf, other]

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

Authors: Yutao Hu, Tianbin Li, Quanfeng Lu, Wenqi Shao, Junjun He, Yu Qiao, ** Luo

Abstract: Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in various multimodal tasks. However, their potential in the medical domain remains largely unexplored. A significant challenge arises from the scarcity of diverse medical images spanning various modalities and anatomical regions, which is essential in real-world medical applications. To solve this problem, in this pape… ▽ More Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in various multimodal tasks. However, their potential in the medical domain remains largely unexplored. A significant challenge arises from the scarcity of diverse medical images spanning various modalities and anatomical regions, which is essential in real-world medical applications. To solve this problem, in this paper, we introduce OmniMedVQA, a novel comprehensive medical Visual Question Answering (VQA) benchmark. This benchmark is collected from 73 different medical datasets, including 12 different modalities and covering more than 20 distinct anatomical regions. Importantly, all images in this benchmark are sourced from authentic medical scenarios, ensuring alignment with the requirements of the medical field and suitability for evaluating LVLMs. Through our extensive experiments, we have found that existing LVLMs struggle to address these medical VQA problems effectively. Moreover, what surprises us is that medical-specialized LVLMs even exhibit inferior performance to those general-domain models, calling for a more versatile and robust LVLM in the biomedical field. The evaluation results not only reveal the current limitations of LVLM in understanding real medical images but also highlight our dataset's significance. Our code with dataset are available at https://github.com/OpenGVLab/Multi-Modality-Arena. △ Less

Submitted 21 April, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.07259 [pdf, ps, other]

RIS-Augmented Millimeter-Wave MIMO Systems for Passive Drone Detection

Authors: Jiguang He, Aymen Fakhreddine, George C. Alexandropoulos

Abstract: In the past decade, the number of amateur drones is increasing, and this trend is expected to continue in the future. The security issues brought by abuse and misconduct of drones become more and more severe and may incur a negative impact to the society. In this paper, we leverage existing cellular multiple-input multiple-output (MIMO) base station (BS) infrastructure, operating at millimeter wav… ▽ More In the past decade, the number of amateur drones is increasing, and this trend is expected to continue in the future. The security issues brought by abuse and misconduct of drones become more and more severe and may incur a negative impact to the society. In this paper, we leverage existing cellular multiple-input multiple-output (MIMO) base station (BS) infrastructure, operating at millimeter wave (mmWave) frequency bands, for drone detection in a device-free manner with the aid of one reconfigurable intelligent surface (RIS), deployed in the proximity of the BS. We theoretically examine the feasibility of drone detection with the aid of the generalized likelihood ratio test (GLRT) and validate via simulations that, the optimized deployment of an RIS can bring added benefits compared to RIS-free systems. In addition, the effect of RIS training beams, training overhead, and radar cross section, is investigated in order to offer theoretical design guidance for the proposed cellular RIS-based passive drone detection system. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: 6 pages, 6 figures, submitted to IEEE PIMRC 2024

arXiv:2402.06154 [pdf, other]

Coverage and Rate Analysis for Distributed RISs-Assisted mmWave Communications

Authors: Yuan Xu, Chongwen Huang, Wei Li, Yongxu Zhu, Zhaohui Yang, Jiguang He, Jun Yang, Zhaoyang Zhang, Chau Yuen, Merouane Debbah

Abstract: The millimeter wave (mmWave) has received considerable interest due to its expansive bandwidth and high frequency. However, a noteworthy challenge arises from its vulnerability to blockages, leading to reduced coverage and achievable rates. To address these limitations, a potential solution is to deploy distributed reconfigurable intelligent surfaces (RISs), which comprise many low-cost and passiv… ▽ More The millimeter wave (mmWave) has received considerable interest due to its expansive bandwidth and high frequency. However, a noteworthy challenge arises from its vulnerability to blockages, leading to reduced coverage and achievable rates. To address these limitations, a potential solution is to deploy distributed reconfigurable intelligent surfaces (RISs), which comprise many low-cost and passively reflected elements, and can facilitate the establishment of extra communication links. In this paper, we leverage stochastic geometry to investigate the ergodic coverage probability and the achievable rate in both distributed RISs-assisted single-cell and multi-cell mmWave wireless communication systems. Specifically, we first establish the system model considering the stochastically distributed blockages, RISs and users by the Poisson point process. Then we give the association criterion and derive the association probabilities, the distance distributions, and the conditional coverage probabilities for two cases of associations between base stations and users without or with RISs. Finally, we use Campbell's theorem and the total probability theorem to obtain the closed-form expressions of the ergodic coverage probability and the achievable rate. Simulation results verify the effectiveness of our analysis method, and demonstrate that by deploying distributed RISs, the ergodic coverage probability is significantly improved by approximately 50%, and the achievable rate is increased by more than 1.5 times. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2401.15619 [pdf, ps, other]

A semidefinite programming approach for robust elliptic localization

Authors: Wenxin Xiong, Jiajun He, Zhang-Lei Shi, Keyuan Hu, Hing Cheung So, Chi-Sing Leung

Abstract: This short communication addresses the problem of elliptic localization with outlier measurements, whose occurrences are prevalent in various location-enabled applications and can significantly compromise the positioning performance if not adequately handled. In contrast to the reliance on $M$-estimation adopted in the majority of existing solutions, we take a different path, specifically explorin… ▽ More This short communication addresses the problem of elliptic localization with outlier measurements, whose occurrences are prevalent in various location-enabled applications and can significantly compromise the positioning performance if not adequately handled. In contrast to the reliance on $M$-estimation adopted in the majority of existing solutions, we take a different path, specifically exploring the worst-case robust approximation criterion, to bolster resistance of the elliptic location estimator against outliers. From a geometric standpoint, our method boils down to pinpointing the Chebyshev center of the feasible set determined by the available bistatic ranges with bounded measurement errors. For a practical approach to the associated min-max problem, we convert it into the well-established convex optimization framework of semidefinite programming (SDP). Numerical simulations confirm that our SDP-based technique can outperform a number of existing elliptic localization schemes in terms of positioning accuracy in Gaussian mixture noise, a common type of impulsive interference in the context of range-based localization. △ Less

Submitted 28 January, 2024; originally announced January 2024.

arXiv:2401.13260 [pdf, other]

MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction

Authors: Jiajun He, Xiaohan Shi, Xingfeng Li, Tomoki Toda

Abstract: The prevalent approach in speech emotion recognition (SER) involves integrating both audio and textual information to comprehensively identify the speaker's emotion, with the text generally obtained through automatic speech recognition (ASR). An essential issue of this approach is that ASR errors from the text modality can worsen the performance of SER. Previous studies have proposed using an auxi… ▽ More The prevalent approach in speech emotion recognition (SER) involves integrating both audio and textual information to comprehensively identify the speaker's emotion, with the text generally obtained through automatic speech recognition (ASR). An essential issue of this approach is that ASR errors from the text modality can worsen the performance of SER. Previous studies have proposed using an auxiliary ASR error detection task to adaptively assign weights of each word in ASR hypotheses. However, this approach has limited improvement potential because it does not address the coherence of semantic information in the text. Additionally, the inherent heterogeneity of different modalities leads to distribution gaps between their representations, making their fusion challenging. Therefore, in this paper, we incorporate two auxiliary tasks, ASR error detection (AED) and ASR error correction (AEC), to enhance the semantic coherence of ASR text, and further introduce a novel multi-modal fusion (MF) method to learn shared representations across modalities. We refer to our method as MF-AED-AEC. Experimental results indicate that MF-AED-AEC significantly outperforms the baseline model by a margin of 4.1\%. △ Less

Submitted 28 May, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: Accepted by ICASSP 2024

arXiv:2312.16946 [pdf, other]

LEO Satellite and RIS: Two Keys to Seamless Indoor and Outdoor Localization

Authors: Pinjun Zheng, Xing Liu, Jiguang He, Gonzalo Seco-Granados, Tareq Y. Al-Naffouri

Abstract: The contemporary landscape of wireless technology underscores the critical role of precise localization services. Traditional global navigation satellite systems (GNSS)-based solutions, however, fall short when it comes to indoor environments, and existing indoor localization techniques such as electromagnetic fingerprinting methods face challenges of high implementation costs and limited coverage… ▽ More The contemporary landscape of wireless technology underscores the critical role of precise localization services. Traditional global navigation satellite systems (GNSS)-based solutions, however, fall short when it comes to indoor environments, and existing indoor localization techniques such as electromagnetic fingerprinting methods face challenges of high implementation costs and limited coverage. This article explores an innovative solution that seamlessly blends low Earth orbit (LEO) satellites with reconfigurable intelligent surfaces (RISs), unlocking its potential for realizing uninterrupted indoor and outdoor localization with global coverage. By leveraging the strong signal reception of the LEO satellite signals and capitalizing on the radio environment-resha** capability of RISs, the integration of these two technologies presents a vision of a future where localization services transcend existing constraints. After a comprehensive review of the distinctive attributes of LEO satellites and RISs, we evaluate the localization error bounds for the proposed collaborative system, showcasing their promising performance on simultaneous indoor and outdoor localization. To conclude, we engage in a discussion on open problems and future research directions for LEO satellite and RIS-enabled localization. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.16572 [pdf, other]

Observation-based Optimal Control Law Learning with LQR Reconstruction

Authors: Chendi Qu, Jian** He, Xiaoming Duan

Abstract: Designing controllers to generate various trajectories has been studied for years, while recently, recovering an optimal controller from trajectories receives increasing attention. In this paper, we reveal that the inherent linear quadratic regulator (LQR) problem of a moving agent can be reconstructed based on its trajectory observations only, which enables one to learn the optimal control law of… ▽ More Designing controllers to generate various trajectories has been studied for years, while recently, recovering an optimal controller from trajectories receives increasing attention. In this paper, we reveal that the inherent linear quadratic regulator (LQR) problem of a moving agent can be reconstructed based on its trajectory observations only, which enables one to learn the optimal control law of the agent autonomously. Specifically, the reconstruction of the optimization problem requires estimation of three unknown parameters including the target state, weighting matrices in the objective function and the control horizon. Our algorithm considers two types of objective function settings and identifies the weighting matrices with proposed novel inverse optimal control methods, providing the well-posedness and identifiability proof. We obtain the optimal estimate of the control horizon using binary search and finally reconstruct the LQR problem with above estimates. The strength of learning control law with optimization problem recovery lies in less computation consumption and strong generalization ability. We apply our algorithm to the future control input prediction and the discrepancy loss is further derived. Numerical simulations and hardware experiments on a self-designed robot platform illustrate the effectiveness of our work. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.15481 [pdf, other]

A Novel Field-Free SOT Magnetic Tunnel Junction With Local VCMA-Induced Switching

Authors: Rui Zhou, Haiyang Zhang, Hao Wang, ** He, Qijun Huang, Sheng Chang

Abstract: By integrating the local voltage-controlled magnetic anisotropy (VCMA) effect, Dzyaloshinskii-Moriya interaction (DMI) effect, and spin-orbit torque (SOT) effect, we propose a novel device structure for field-free magnetic tunnel junction (MTJ). Micromagnetic simulation shows that the device utilizes the chiral symmetry breaking caused by the DMI effect to induce a non-collinear spin texture under… ▽ More By integrating the local voltage-controlled magnetic anisotropy (VCMA) effect, Dzyaloshinskii-Moriya interaction (DMI) effect, and spin-orbit torque (SOT) effect, we propose a novel device structure for field-free magnetic tunnel junction (MTJ). Micromagnetic simulation shows that the device utilizes the chiral symmetry breaking caused by the DMI effect to induce a non-collinear spin texture under the influence of SOT current. This, combined with the perpendicular magnetic anisotropy (PMA) gradient generated by the local VCMA effect, enables deterministic switching of the MTJ state without an external field. The impact of variations in DMI strength and PMA gradient on the magnetization dynamics is analyzed. △ Less

Submitted 24 December, 2023; originally announced December 2023.

arXiv:2312.10741 [pdf, other]

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

Authors: Yu Zhang, Rongjie Huang, Ruiqi Li, **Zheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

Abstract: Style transfer for out-of-domain (OOD) singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles (such as timbre, emotion, pronunciation, and articulation skills) derived from reference singing voice samples. However, the endeavor to model the intricate nuances of singing voice styles is an arduous task, as singing voices possess a remarkable degree of expr… ▽ More Style transfer for out-of-domain (OOD) singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles (such as timbre, emotion, pronunciation, and articulation skills) derived from reference singing voice samples. However, the endeavor to model the intricate nuances of singing voice styles is an arduous task, as singing voices possess a remarkable degree of expressiveness. Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are discernible during the training phase. To overcome these challenges, we propose StyleSinger, the first singing voice synthesis model for zero-shot style transfer of out-of-domain reference singing voice samples. StyleSinger incorporates two critical approaches for enhanced effectiveness: 1) the Residual Style Adaptor (RSA) which employs a residual quantization module to capture diverse style characteristics in singing voices, and 2) the Uncertainty Modeling Layer Normalization (UMLN) to perturb the style attributes within the content representation during the training phase and thus improve the model generalization. Our extensive evaluations in zero-shot style transfer undeniably establish that StyleSinger outperforms baseline models in both audio quality and similarity to the reference singing voice samples. Access to singing voice samples can be found at https://stylesinger.github.io/. △ Less

Submitted 2 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024

arXiv:2312.09576 [pdf, other]

SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

Authors: Xiangde Luo, Jia Fu, Yunxin Zhong, Shuolin Liu, Bing Han, Mehdi Astaraki, Simone Bendazzoli, Iuliana Toma-Dasu, Yiwen Ye, Ziyang Chen, Yong Xia, Yanzhou Su, ** Ye, Junjun He, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Kaixiang Yang, Xin Fang, Zhiwei Wang, Chan Woong Lee, Sang Joon Park, Jaehee Chun, Constantin Ulrich, Klaus H. Maier-Hein , et al. (17 additional authors not shown)

Abstract: Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results… ▽ More Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results in many medical image segmentation tasks. However, for NPC OARs and GTVs segmentation, few public datasets are available for model development and evaluation. To alleviate this problem, the SegRap2023 challenge was organized in conjunction with MICCAI2023 and presented a large-scale benchmark for OAR and GTV segmentation with 400 Computed Tomography (CT) scans from 200 NPC patients, each with a pair of pre-aligned non-contrast and contrast-enhanced CT scans. The challenge's goal was to segment 45 OARs and 2 GTVs from the paired CT scans. In this paper, we detail the challenge and analyze the solutions of all participants. The average Dice similarity coefficient scores for all submissions ranged from 76.68\% to 86.70\%, and 70.42\% to 73.44\% for OARs and GTVs, respectively. We conclude that the segmentation of large-size OARs is well-addressed, and more efforts are needed for GTVs and small-size or thin-structure OARs. The benchmark will remain publicly available here: https://segrap2023.grand-challenge.org △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: A challenge report of SegRap2023 (organized in conjunction with MICCAI2023)

arXiv:2312.09420 [pdf, other]

Fairness-Driven Optimization of RIS-Augmented 5G Networks for Seamless 3D UAV Connectivity Using DRL Algorithms

Authors: Yu Tian, Ahmed Alhammadi, Jiguang He, Aymen Fakhreddine, Faouzi Bader

Abstract: In this paper, we study the problem of joint active and passive beamforming for reconfigurable intelligent surface (RIS)-assisted massive multiple-input multiple-output systems towards the extension of the wireless cellular coverage in 3D, where multiple RISs, each equipped with an array of passive elements, are deployed to assist a base station (BS) to simultaneously serve multiple unmanned aeria… ▽ More In this paper, we study the problem of joint active and passive beamforming for reconfigurable intelligent surface (RIS)-assisted massive multiple-input multiple-output systems towards the extension of the wireless cellular coverage in 3D, where multiple RISs, each equipped with an array of passive elements, are deployed to assist a base station (BS) to simultaneously serve multiple unmanned aerial vehicles (UAVs) in the same time-frequency resource of 5G wireless communications. With a focus on ensuring fairness among UAVs, our objective is to maximize the minimum signal-to-interference-plus-noise ratio (SINR) at UAVs by jointly optimizing the transmit beamforming parameters at the BS and phase shift parameters at RISs. We propose two novel algorithms to address this problem. The first algorithm aims to mitigate interference by calculating the BS beamforming matrix through matrix inverse operations once the phase shift parameters are determined. The second one is based on the principle that one RIS element only serves one UAV and the phase shift parameter of this RIS element is optimally designed to compensate the phase offset caused by the propagation and fading. To obtain the optimal parameters, we utilize one state-of-the-art reinforcement learning algorithm, deep deterministic policy gradient, to solve these two optimization problems. Simulation results are provided to illustrate the effectiveness of our proposed solution and some insightful remarks are observed. △ Less

Submitted 14 November, 2023; originally announced December 2023.

arXiv:2312.07846 [pdf, other]

Prompted Contextual Transformer for Incomplete-View CT Reconstruction

Authors: Chenglong Ma, Zilong Li, Junjun He, Jun** Zhang, Yi Zhang, Hongming Shan

Abstract: Incomplete-view computed tomography (CT) can shorten the data acquisition time and allow scanning of large objects, including sparse-view and limited-angle scenarios, each with various settings, such as different view numbers or angular ranges. However, the reconstructed images present severe, varying artifacts due to different missing projection data patterns. Existing methods tackle these scenar… ▽ More Incomplete-view computed tomography (CT) can shorten the data acquisition time and allow scanning of large objects, including sparse-view and limited-angle scenarios, each with various settings, such as different view numbers or angular ranges. However, the reconstructed images present severe, varying artifacts due to different missing projection data patterns. Existing methods tackle these scenarios/settings separately and individually, which are cumbersome and lack the flexibility to adapt to new settings. To enjoy the multi-setting synergy in a single model, we propose a novel Prompted Contextual Transformer (ProCT) for incomplete-view CT reconstruction. The novelties of ProCT lie in two folds. First, we devise a projection view-aware prompting to provide setting-discriminative information, enabling a single ProCT to handle diverse incomplete-view CT settings. Second, we propose artifact-aware contextual learning to sense artifact pattern knowledge from in-context image pairs, making ProCT capable of accurately removing the complex, unseen artifacts. Extensive experimental results on two publicly available clinical CT datasets demonstrate the superior performance of ProCT over state-of-the-art methods -- including single-setting models -- on a wide range of incomplete-view CT settings, strong transferability to unseen datasets and scenarios, and improved performance when sinogram data is available. The code is available at: https://github.com/Masaaki-75/proct △ Less

Submitted 11 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2311.13622 [pdf, other]

TDiffDe: A Truncated Diffusion Model for Remote Sensing Hyperspectral Image Denoising

Authors: Jiang He, Yajie Li, Jie L, Qiangqiang Yuan

Abstract: Hyperspectral images play a crucial role in precision agriculture, environmental monitoring or ecological analysis. However, due to sensor equipment and the imaging environment, the observed hyperspectral images are often inevitably corrupted by various noise. In this study, we proposed a truncated diffusion model, called TDiffDe, to recover the useful information in hyperspectral images gradually… ▽ More Hyperspectral images play a crucial role in precision agriculture, environmental monitoring or ecological analysis. However, due to sensor equipment and the imaging environment, the observed hyperspectral images are often inevitably corrupted by various noise. In this study, we proposed a truncated diffusion model, called TDiffDe, to recover the useful information in hyperspectral images gradually. Rather than starting from a pure noise, the input data contains image information in hyperspectral image denoising. Thus, we cut the trained diffusion model from small steps to avoid the destroy of valid information. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.11969 [pdf, other]

SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks

Authors: ** Ye, Junlong Cheng, Jianpin Chen, Zhongying Deng, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Min Zhu, Shaoting Zhang, Junjun He, Yu Qiao

Abstract: Segment Anything Model (SAM) has achieved impressive results for natural image segmentation with input prompts such as points and bounding boxes. Its success largely owes to massive labeled training data. However, directly applying SAM to medical image segmentation cannot perform well because SAM lacks medical knowledge -- it does not use medical images for training. To incorporate medical knowled… ▽ More Segment Anything Model (SAM) has achieved impressive results for natural image segmentation with input prompts such as points and bounding boxes. Its success largely owes to massive labeled training data. However, directly applying SAM to medical image segmentation cannot perform well because SAM lacks medical knowledge -- it does not use medical images for training. To incorporate medical knowledge into SAM, we introduce SA-Med2D-20M, a large-scale segmentation dataset of 2D medical images built upon numerous public and private datasets. It consists of 4.6 million 2D medical images and 19.7 million corresponding masks, covering almost the whole body and showing significant diversity. This paper describes all the datasets collected in SA-Med2D-20M and details how to process these datasets. Furthermore, comprehensive statistics of SA-Med2D-20M are presented to facilitate the better use of our dataset, which can help the researchers build medical vision foundation models or apply their models to downstream medical applications. We hope that the large scale and diversity of SA-Med2D-20M can be leveraged to develop medical artificial intelligence for enhancing diagnosis, medical image analysis, knowledge sharing, and education. The data with the redistribution license is publicly available at https://github.com/OpenGVLab/SAM-Med2D. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.07093 [pdf, other]

On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition

Authors: Xiaohan Shi, Jiajun He, Xingfeng Li, Tomoki Toda

Abstract: This paper proposes an efficient attempt to noisy speech emotion recognition (NSER). Conventional NSER approaches have proven effective in mitigating the impact of artificial noise sources, such as white Gaussian noise, but are limited to non-stationary noises in real-world environments due to their complexity and uncertainty. To overcome this limitation, we introduce a new method for NSER by adop… ▽ More This paper proposes an efficient attempt to noisy speech emotion recognition (NSER). Conventional NSER approaches have proven effective in mitigating the impact of artificial noise sources, such as white Gaussian noise, but are limited to non-stationary noises in real-world environments due to their complexity and uncertainty. To overcome this limitation, we introduce a new method for NSER by adopting the automatic speech recognition (ASR) model as a noise-robust feature extractor to eliminate non-vocal information in noisy speech. We first obtain intermediate layer information from the ASR model as a feature representation for emotional speech and then apply this representation for the downstream NSER task. Our experimental results show that 1) the proposed method achieves better NSER performance compared with the conventional noise reduction method, 2) outperforms self-supervised learning approaches, and 3) even outperforms text-based approaches using ASR transcription or the ground truth transcription of noisy speech. △ Less

Submitted 14 November, 2023; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: Submitted to ICASSP 2024

arXiv:2311.03653 [pdf, ps, other]

On the Performance of LoRa Empowered Communication for Wireless Body Area Networks

Authors: Minling Zhang, Guofa Cai, Zhi** Xu, Jiguang He, Markku Juntti

Abstract: To remotely monitor the physiological status of the human body, long range (LoRa) communication has been considered as an eminently suitable candidate for wireless body area networks (WBANs). Typically, a Rayleigh-lognormal fading channel is encountered by the LoRa links of the WBAN. In this context, we characterize the performance of the LoRa system in WBAN scenarios with an emphasis on the physi… ▽ More To remotely monitor the physiological status of the human body, long range (LoRa) communication has been considered as an eminently suitable candidate for wireless body area networks (WBANs). Typically, a Rayleigh-lognormal fading channel is encountered by the LoRa links of the WBAN. In this context, we characterize the performance of the LoRa system in WBAN scenarios with an emphasis on the physical (PHY) layer and medium access control (MAC) layer in the face of Rayleigh-lognormal fading channels and the same spreading factor interference. Specifically, closed-form approximate bit error probability (BEP) expressions are derived for the LoRa system. The results show that increasing the SF and reducing the interference efficiently mitigate the shadowing effects. Moreover, in the quest for the most suitable MAC protocol for LoRa based WBANs, three MAC protocols are critically appraised, namely the pure ALOHA, slotted ALOHA, and carrier-sense multiple access. The coverage probability, energy efficiency, throughput, and system delay of the three MAC protocols are analyzed in Rayleigh-lognormal fading channel. Furthermore, the performance of the equal-interval-based and equal-area-based schemes is analyzed to guide the choice of the SF. Our simulation results confirm the accuracy of the mathematical analysis and provide some useful insights for the future design of LoRa based WBANs. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2310.19288 [pdf, other]

EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution

Authors: Yi Xiao, Qiangqiang Yuan, Kui Jiang, Jiang He, Xianyu **, Liangpei Zhang

Abstract: Recently, convolutional networks have achieved remarkable development in remote sensing image Super-Resoltuion (SR) by minimizing the regression objectives, e.g., MSE loss. However, despite achieving impressive performance, these methods often suffer from poor visual quality with over-smooth issues. Generative adversarial networks have the potential to infer intricate details, but they are easy to… ▽ More Recently, convolutional networks have achieved remarkable development in remote sensing image Super-Resoltuion (SR) by minimizing the regression objectives, e.g., MSE loss. However, despite achieving impressive performance, these methods often suffer from poor visual quality with over-smooth issues. Generative adversarial networks have the potential to infer intricate details, but they are easy to collapse, resulting in undesirable artifacts. To mitigate these issues, in this paper, we first introduce Diffusion Probabilistic Model (DPM) for efficient remote sensing image SR, dubbed EDiffSR. EDiffSR is easy to train and maintains the merits of DPM in generating perceptual-pleasant images. Specifically, different from previous works using heavy UNet for noise prediction, we develop an Efficient Activation Network (EANet) to achieve favorable noise prediction performance by simplified channel attention and simple gate operation, which dramatically reduces the computational budget. Moreover, to introduce more valuable prior knowledge into the proposed EDiffSR, a practical Conditional Prior Enhancement Module (CPEM) is developed to help extract an enriched condition. Unlike most DPM-based SR models that directly generate conditions by amplifying LR images, the proposed CPEM helps to retain more informative cues for accurate SR. Extensive experiments on four remote sensing datasets demonstrate that EDiffSR can restore visual-pleasant images on simulated and real-world remote sensing images, both quantitatively and qualitatively. The code of EDiffSR will be available at https://github.com/XY-boy/EDiffSR △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: Submitted to IEEE TGRS

arXiv:2310.14217 [pdf, ps, other]

On the Sum Secrecy Rate of Multi-User Holographic MIMO Networks

Authors: Arthur S. de Sena, Jiguang He, Ahmed Al Hammadi, Chongwen Huang, Faouzi Bader, Merouane Debbah, Mathias Fink

Abstract: The emerging concept of extremely-large holographic multiple-input multiple-output (HMIMO), beneficial from compactly and densely packed cost-efficient radiating meta-atoms, has been demonstrated for enhanced degrees of freedom even in pure line-of-sight conditions, enabling tremendous multiplexing gain for the next-generation communication systems. Most of the reported works focus on energy and s… ▽ More The emerging concept of extremely-large holographic multiple-input multiple-output (HMIMO), beneficial from compactly and densely packed cost-efficient radiating meta-atoms, has been demonstrated for enhanced degrees of freedom even in pure line-of-sight conditions, enabling tremendous multiplexing gain for the next-generation communication systems. Most of the reported works focus on energy and spectrum efficiency, path loss analyses, and channel modeling. The extension to secure communications remains unexplored. In this paper, we theoretically characterize the secrecy capacity of the HMIMO network with multiple legitimate users and one eavesdropper while taking into consideration artificial noise and max-min fairness. We formulate the power allocation (PA) problem and address it by following successive convex approximation and Taylor expansion. We further study the effect of fixed PA coefficients, imperfect channel state information, inter-element spacing, and the number of Eve's antennas on the sum secrecy rate. Simulation results show that significant performance gain with more than 100\% increment in the high signal-to-noise ratio (SNR) regime for the two-user case is obtained by exploiting adaptive/flexible PA compared to the case with fixed PA coefficients. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: 7 pages, 7 figures, submitted to IEEE ICC 2024

arXiv:2310.10300 [pdf, other]

BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance Retrieval

Authors: Kaixing Yang, Xukun Zhou, Xulong Tang, Ran Diao, Hongyan Liu, Jun He, Zhaoxin Fan

Abstract: Dance and music are closely related forms of expression, with mutual retrieval between dance videos and music being a fundamental task in various fields like education, art, and sports. However, existing methods often suffer from unnatural generation effects or fail to fully explore the correlation between music and dance. To overcome these challenges, we propose BeatDance, a novel beat-based mode… ▽ More Dance and music are closely related forms of expression, with mutual retrieval between dance videos and music being a fundamental task in various fields like education, art, and sports. However, existing methods often suffer from unnatural generation effects or fail to fully explore the correlation between music and dance. To overcome these challenges, we propose BeatDance, a novel beat-based model-agnostic contrastive learning framework. BeatDance incorporates a Beat-Aware Music-Dance InfoExtractor, a Trans-Temporal Beat Blender, and a Beat-Enhanced Hubness Reducer to improve dance-music retrieval performance by utilizing the alignment between music beats and dance movements. We also introduce the Music-Dance (MD) dataset, a large-scale collection of over 10,000 music-dance video pairs for training and testing. Experimental results on the MD dataset demonstrate the superiority of our method over existing baselines, achieving state-of-the-art performance. The code and dataset will be made public available upon acceptance. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.08021 [pdf, other]

Channel-robust Automatic Modulation Classification Using Spectral Quotient Cumulants

Authors: Sai Huang, Yuting Chen, Jiashuo He, Shuo Chang, Zhiyong Feng

Abstract: Automatic modulation classification (AMC) is to identify the modulation format of the received signal corrupted by the channel effects and noise. Most existing works focus on the impact of noise while relatively little attention has been paid to the impact of channel effects. However, the instability posed by multipath fading channels leads to significant performance degradation. To mitigate the a… ▽ More Automatic modulation classification (AMC) is to identify the modulation format of the received signal corrupted by the channel effects and noise. Most existing works focus on the impact of noise while relatively little attention has been paid to the impact of channel effects. However, the instability posed by multipath fading channels leads to significant performance degradation. To mitigate the adverse effects of the multipath channel, we propose a channel-robust modulation classification framework named spectral quotient cumulant classification (SQCC) for orthogonal frequency division multiplexing (OFDM) systems. Specifically, we first transform the received signal to the spectral quotient (SQ) sequence by spectral circular shift division operations. Secondly, an outlier detector is proposed to filter the outliers in the SQ sequence. At last, we extract spectral quotient cumulants (SQCs) from the filtered SQ sequence as the inputs to train the artificial neural network (ANN) classifier and use the trained ANN to make the final decisions. Simulation results show that our proposed SQCC method exhibits classification robustness and superiority under various unknown Rician multipath fading channels compared with other existing methods. Specifically, the SQCC method achieves nearly 90% classification accuracy at the signal to noise ratio (SNR) of 4dB when testing under multiple channels but training under AWGN channel. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: THIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICATION. COPYRIGHT MAY BE TRANSFERRED WITHOUT NOTICE, AFTER WHICH THIS VERSION MAY NO LONGER BE ACCESSIBLE,5 Pages

arXiv:2309.16389 [pdf, other]

A Universal Framework for Holographic MIMO Sensing

Authors: Charles Vanwynsberghe, Jiguang He, Mérouane Debbah

Abstract: This paper addresses the sensing space identification of arbitrarily shaped continuous antennas. In the context of holographic multiple-input multiple-output (MIMO), a.k.a. large intelligent surfaces, these antennas offer benefits such as super-directivity and near-field operability. The sensing space reveals two key aspects: (a) its dimension specifies the maximally achievable spatial degrees of… ▽ More This paper addresses the sensing space identification of arbitrarily shaped continuous antennas. In the context of holographic multiple-input multiple-output (MIMO), a.k.a. large intelligent surfaces, these antennas offer benefits such as super-directivity and near-field operability. The sensing space reveals two key aspects: (a) its dimension specifies the maximally achievable spatial degrees of freedom (DoFs), and (b) the finite basis spanning this space accurately describes the sampled field. Earlier studies focus on specific geometries, bringing forth the need for extendable analysis to real-world conformal antennas. Thus, we introduce a universal framework to determine the antenna sensing space, regardless of its shape. The findings underscore both spatial and spectral concentration of sampled fields to define a generic eigenvalue problem of Slepian concentration. Results show that this approach precisely estimates the DoFs of well-known geometries, and verify its flexible extension to conformal antennas. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.15462 [pdf, other]

doi 10.1126/scirobotics.adh5401

DTC: Deep Tracking Control

Authors: Fabian Jenelten, Junzhe He, Farbod Farshidian, Marco Hutter

Abstract: Legged locomotion is a complex control problem that requires both accuracy and robustness to cope with real-world challenges. Legged systems have traditionally been controlled using trajectory optimization with inverse dynamics. Such hierarchical model-based methods are appealing due to intuitive cost function tuning, accurate planning, generalization, and most importantly, the insightful understa… ▽ More Legged locomotion is a complex control problem that requires both accuracy and robustness to cope with real-world challenges. Legged systems have traditionally been controlled using trajectory optimization with inverse dynamics. Such hierarchical model-based methods are appealing due to intuitive cost function tuning, accurate planning, generalization, and most importantly, the insightful understanding gained from more than one decade of extensive research. However, model mismatch and violation of assumptions are common sources of faulty operation. Simulation-based reinforcement learning, on the other hand, results in locomotion policies with unprecedented robustness and recovery skills. Yet, all learning algorithms struggle with sparse rewards emerging from environments where valid footholds are rare, such as gaps or step** stones. In this work, we propose a hybrid control architecture that combines the advantages of both worlds to simultaneously achieve greater robustness, foot-placement accuracy, and terrain generalization. Our approach utilizes a model-based planner to roll out a reference motion during training. A deep neural network policy is trained in simulation, aiming to track the optimized footholds. We evaluate the accuracy of our locomotion pipeline on sparse terrains, where pure data-driven methods are prone to fail. Furthermore, we demonstrate superior robustness in the presence of slippery or deformable ground when compared to model-based counterparts. Finally, we show that our proposed tracking controller generalizes across different trajectory optimization methods not seen during training. In conclusion, our work unites the predictive capabilities and optimality guarantees of online planning with the inherent robustness attributed to offline learning. △ Less

Submitted 22 January, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.11992 [pdf, other]

UAV Swarm Deployment and Trajectory for 3D Area Coverage via Reinforcement Learning

Authors: Jia He, Ziye Jia, Chao Dong, Junyu Liu, Qihui Wu, **gxian Liu

Abstract: Unmanned aerial vehicles (UAVs) are recognized as promising technologies for area coverage due to the flexibility and adaptability. However, the ability of a single UAV is limited, and as for the large-scale three-dimensional (3D) scenario, UAV swarms can establish seamless wireless communication services. Hence, in this work, we consider a scenario of UAV swarm deployment and trajectory to satisf… ▽ More Unmanned aerial vehicles (UAVs) are recognized as promising technologies for area coverage due to the flexibility and adaptability. However, the ability of a single UAV is limited, and as for the large-scale three-dimensional (3D) scenario, UAV swarms can establish seamless wireless communication services. Hence, in this work, we consider a scenario of UAV swarm deployment and trajectory to satisfy 3D coverage considering the effects of obstacles. In detail, we propose a hierarchical swarm framework to efficiently serve the large-area users. Then, the problem is formulated to minimize the total trajectory loss of the UAV swarm. However, the problem is intractable due to the non-convex property, and we decompose it into smaller issues of users clustering, UAV swarm hovering points selection, and swarm trajectory determination. Moreover, we design a Q-learning based algorithm to accelerate the solution efficiency. Finally, we conduct extensive simulations to verify the proposed mechanisms, and the designed algorithm outperforms other referred methods. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.11759 [pdf, other]

doi 10.1109/LSP.2024.3356915

Symbol Detection for Coarsely Quantized OTFS

Authors: Junwei He, Haochuan Zhang, Chao Dong, Huimin Zhu

Abstract: This paper explicitly models a coarse and noisy quantization in a communication system empowered by orthogonal time frequency space (OTFS) for cost and power efficiency. We first point out, with coarse quantization, the effective channel is imbalanced and thus no longer able to circularly shift the transmitted symbols along the delay-Doppler domain. Meanwhile, the effective channel is non-isotropi… ▽ More This paper explicitly models a coarse and noisy quantization in a communication system empowered by orthogonal time frequency space (OTFS) for cost and power efficiency. We first point out, with coarse quantization, the effective channel is imbalanced and thus no longer able to circularly shift the transmitted symbols along the delay-Doppler domain. Meanwhile, the effective channel is non-isotropic, which imposes a significant loss to symbol detection algorithms like the original approximate message passing (AMP). Although the algorithm of generalized expectation consistent for signal recovery (GEC-SR) can mitigate this loss, the complexity in computation is prohibitively high, mainly due to an dramatic increase in the matrix size of OTFS. In this context, we propose a low-complexity algorithm that incorporates into the GEC-SR a quick inversion of quasi-banded matrices, reducing the complexity from a cubic order to a linear order while kee** the performance at the same level. △ Less

Submitted 20 January, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.09565 [pdf, other]

A Covariance Adaptive Student's t Based Kalman Filter

Authors: Benyang Gong, Jiacheng He, Gang Wang, Bei Peng

Abstract: In the classical Kalman filter(KF), the estimated state is a linear combination of the one-step predicted state and measurement state, their confidence level change when the prediction mean square error matrix and covariance matrix of measurement noise vary. The existing student's t based Kalman filter(TKF) works similarly to the way KF works, they both work well with impulse noise, but when it co… ▽ More In the classical Kalman filter(KF), the estimated state is a linear combination of the one-step predicted state and measurement state, their confidence level change when the prediction mean square error matrix and covariance matrix of measurement noise vary. The existing student's t based Kalman filter(TKF) works similarly to the way KF works, they both work well with impulse noise, but when it comes to Gaussian noise, TKF encounters an adjustment limit of the confidence level, this can lead to inaccuracies in such situations. This brief optimizes TKF by using the Gaussian mixture model(GMM), which generates a reasonable covariance matrix from the measurement noise to replace the one used in the existing algorithm and breaks the adjustment limit of the confidence level. At the end of the brief, the performance of the covariance adaptive student's t based Kalman filter(TGKF) is verified. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2309.08088 [pdf, ps, other]

Interactive Model Fusion-Based GM-PHD Filter

Authors: Jiacheng He, Shan Zhong, Bei Peng, Gang Wang, Qizhen Wang

Abstract: In multi-target tracking (MTT), non-Gaussian measurement noise from sensors can diminish the performance of the Gaussian-assumed Gaussian mixture probability hypothesis density (GM-PHD) filter. In this paper, an approach that transforms the MTT problem under non-Gaussian conditions into an MTT problem under Gaussian conditions is developed. Specifically, measurement noise with a non-Gaussian distr… ▽ More In multi-target tracking (MTT), non-Gaussian measurement noise from sensors can diminish the performance of the Gaussian-assumed Gaussian mixture probability hypothesis density (GM-PHD) filter. In this paper, an approach that transforms the MTT problem under non-Gaussian conditions into an MTT problem under Gaussian conditions is developed. Specifically, measurement noise with a non-Gaussian distribution is modeled as a weighted sum of different Gaussian distributions. Subsequently, the GM-PHD filter is applied to compute the multi-target states under these distinct Gaussian distributions. Finally, an interactive multi-model framework is employed to fuse the diverse multi-target state information into a unified synthesis. The effectiveness of the proposed approach is validated through the simulation results. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: conference

arXiv:2309.04084 [pdf, other]

Towards Efficient SDRTV-to-HDRTV by Learning from Image Formation

Authors: Xiangyu Chen, Zheyuan Li, Zhengwen Zhang, Jimmy S. Ren, Yihao Liu, **gwen He, Yu Qiao, Jiantao Zhou, Chao Dong

Abstract: Modern displays are capable of rendering video content with high dynamic range (HDR) and wide color gamut (WCG). However, the majority of available resources are still in standard dynamic range (SDR). As a result, there is significant value in transforming existing SDR content into the HDRTV standard. In this paper, we define and analyze the SDRTV-to-HDRTV task by modeling the formation of SDRTV/H… ▽ More Modern displays are capable of rendering video content with high dynamic range (HDR) and wide color gamut (WCG). However, the majority of available resources are still in standard dynamic range (SDR). As a result, there is significant value in transforming existing SDR content into the HDRTV standard. In this paper, we define and analyze the SDRTV-to-HDRTV task by modeling the formation of SDRTV/HDRTV content. Our analysis and observations indicate that a naive end-to-end supervised training pipeline suffers from severe gamut transition errors. To address this issue, we propose a novel three-step solution pipeline called HDRTVNet++, which includes adaptive global color map**, local enhancement, and highlight refinement. The adaptive global color map** step uses global statistics as guidance to perform image-adaptive color map**. A local enhancement network is then deployed to enhance local details. Finally, we combine the two sub-networks above as a generator and achieve highlight consistency through GAN-based joint training. Our method is primarily designed for ultra-high-definition TV content and is therefore effective and lightweight for processing 4K resolution images. We also construct a dataset using HDR videos in the HDR10 standard, named HDRTV1K that contains 1235 and 117 training images and 117 testing images, all in 4K resolution. Besides, we select five metrics to evaluate the results of SDRTV-to-HDRTV algorithms. Our final results demonstrate state-of-the-art performance both quantitatively and visually. The code, model and dataset are available at https://github.com/xiaom233/HDRTVNet-plus. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: Extended version of HDRTVNet

Showing 1–50 of 195 results for author: He, J