Search | arXiv e-print repository

Learned Image Compression for HE-stained Histopathological Images via Stain Deconvolution

Authors: Maximilian Fischer, Peter Neher, Tassilo Wald, Silvia Dias Almeida, Shuhan Xiao, Peter Schüffler, Rickmer Braren, Michael Götz, Alexander Muckenhuber, Jens Kleesiek, Marco Nolden, Klaus Maier-Hein

Abstract: Processing histopathological Whole Slide Images (WSI) leads to massive storage requirements for clinics worldwide. Even after lossy image compression during image acquisition, additional lossy compression is frequently possible without substantially affecting the performance of deep learning-based (DL) downstream tasks. In this paper, we show that the commonly used JPEG algorithm is not best suite… ▽ More Processing histopathological Whole Slide Images (WSI) leads to massive storage requirements for clinics worldwide. Even after lossy image compression during image acquisition, additional lossy compression is frequently possible without substantially affecting the performance of deep learning-based (DL) downstream tasks. In this paper, we show that the commonly used JPEG algorithm is not best suited for further compression and we propose Stain Quantized Latent Compression (SQLC ), a novel DL based histopathology data compression approach. SQLC compresses staining and RGB channels before passing it through a compression autoencoder (CAE ) in order to obtain quantized latent representations for maximizing the compression. We show that our approach yields superior performance in a classification downstream task, compared to traditional approaches like JPEG, while image quality metrics like the Multi-Scale Structural Similarity Index (MS-SSIM) is largely preserved. Our method is online available. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.02534 [pdf, other]

Enhancing predictive imaging biomarker discovery through treatment effect analysis

Authors: Shuhan Xiao, Lukas Klein, Jens Petersen, Philipp Vollmuth, Paul F. Jaeger, Klaus H. Maier-Hein

Abstract: Identifying predictive biomarkers, which forecast individual treatment effectiveness, is crucial for personalized medicine and informs decision-making across diverse disciplines. These biomarkers are extracted from pre-treatment data, often within randomized controlled trials, and have to be distinguished from prognostic biomarkers, which are independent of treatment assignment. Our study focuses… ▽ More Identifying predictive biomarkers, which forecast individual treatment effectiveness, is crucial for personalized medicine and informs decision-making across diverse disciplines. These biomarkers are extracted from pre-treatment data, often within randomized controlled trials, and have to be distinguished from prognostic biomarkers, which are independent of treatment assignment. Our study focuses on the discovery of predictive imaging biomarkers, aiming to leverage pre-treatment images to unveil new causal relationships. Previous approaches relied on labor-intensive handcrafted or manually derived features, which may introduce biases. In response, we present a new task of discovering predictive imaging biomarkers directly from the pre-treatment images to learn relevant image features. We propose an evaluation protocol for this task to assess a model's ability to identify predictive imaging biomarkers and differentiate them from prognostic ones. It employs statistical testing and a comprehensive analysis of image feature attribution. We explore the suitability of deep learning models originally designed for estimating the conditional average treatment effect (CATE) for this task, which previously have been primarily assessed for the precision of CATE estimation, overlooking the evaluation of imaging biomarker discovery. Our proof-of-concept analysis demonstrates promising results in discovering and validating predictive imaging biomarkers from synthetic outcomes and real-world image datasets. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 19 pages, 12 figures

arXiv:2405.15413 [pdf, other]

MambaVC: Learned Visual Compression with Selective State Spaces

Authors: Shiyu Qin, **peng Wang, Yimin Zhou, Bin Chen, Tianci Luo, Baoyi An, Tao Dai, Shutao Xia, Yaowei Wang

Abstract: Learned visual compression is an important and active task in multimedia. Existing approaches have explored various CNN- and Transformer-based designs to model content distribution and eliminate redundancy, where balancing efficacy (i.e., rate-distortion trade-off) and efficiency remains a challenge. Recently, state-space models (SSMs) have shown promise due to their long-range modeling capacity a… ▽ More Learned visual compression is an important and active task in multimedia. Existing approaches have explored various CNN- and Transformer-based designs to model content distribution and eliminate redundancy, where balancing efficacy (i.e., rate-distortion trade-off) and efficiency remains a challenge. Recently, state-space models (SSMs) have shown promise due to their long-range modeling capacity and efficiency. Inspired by this, we take the first step to explore SSMs for visual compression. We introduce MambaVC, a simple, strong and efficient compression network based on SSM. MambaVC develops a visual state space (VSS) block with a 2D selective scanning (2DSS) module as the nonlinear activation function after each downsampling, which helps to capture informative global contexts and enhances compression. On compression benchmark datasets, MambaVC achieves superior rate-distortion performance with lower computational and memory overheads. Specifically, it outperforms CNN and Transformer variants by 9.3% and 15.6% on Kodak, respectively, while reducing computation by 42% and 24%, and saving 12% and 71% of memory. MambaVC shows even greater improvements with high-resolution images, highlighting its potential and scalability in real-world applications. We also provide a comprehensive comparison of different network designs, underscoring MambaVC's advantages. Code is available at https://github.com/QinSY123/2024-MambaVC. △ Less

Submitted 28 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 17pages,15 figures

arXiv:2405.01242 [pdf, other]

TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms

Authors: Yueyuan Sui, Minghui Zhao, Junxi Xia, Xiaofan Jiang, Stephen Xia

Abstract: We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art m… ▽ More We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art models with memory footprints of hundreds of MBs and methods better suited for resource-constrained systems. To adapt TRAMBA to vibration-based sensing modalities, we pre-train TRAMBA with audio speech datasets that are widely available. Then, users fine-tune with a small amount of bone conduction data. TRAMBA outperforms state-of-art GANs by up to 7.3% in PESQ and 1.8% in STOI, with an order of magnitude smaller memory footprint and an inference speed up of up to 465 times. We integrate TRAMBA into real systems and show that TRAMBA (i) improves battery life of wearables by up to 160% by requiring less data sampling and transmission; (ii) generates higher quality voice in noisy environments than over-the-air speech; (iii) requires a memory footprint of less than 20.0 MB. △ Less

Submitted 29 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.00953 [pdf, ps, other]

Movable Antenna-Aided Hybrid Beamforming for Multi-User Communications

Authors: Yichi Zhang, Yuchen Zhang, Lipeng Zhu, Sa Xiao, Wanbin Tang, Yonina C. Eldar, Rui Zhang

Abstract: In this correspondence, we propose a movable antenna (MA)-aided multi-user hybrid beamforming scheme with a sub-connected structure, where multiple movable sub-arrays can independently change their positions within different local regions. To maximize the system sum rate, we jointly optimize the digital beamformer, analog beamformer, and positions of subarrays, under the constraints of unit modulu… ▽ More In this correspondence, we propose a movable antenna (MA)-aided multi-user hybrid beamforming scheme with a sub-connected structure, where multiple movable sub-arrays can independently change their positions within different local regions. To maximize the system sum rate, we jointly optimize the digital beamformer, analog beamformer, and positions of subarrays, under the constraints of unit modulus, finite movable regions, and power budget. Due to the non-concave/non-convex objective function/constraints, as well as the highly coupled variables, the formulated problem is challenging to solve. By employing fractional programming, we develop an alternating optimization framework to solve the problem via a combination of Lagrange multipliers, penalty method, and gradient descent. Numerical results reveal that the proposed MA-aided hybrid beamforming scheme significantly improves the sum rate compared to its fixed-position antenna (FPA) counterpart. Moreover, with sufficiently large movable regions, the proposed scheme with sub-connected MA arrays even outperforms the fully-connected FPA array. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.14250 [pdf, other]

Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations

Authors: Xun Lin, Yi Yu, Song Xia, Jue Jiang, Haoran Wang, Zitong Yu, Yizhong Liu, Ying Fu, Shuai Wang, Wenzhong Tang, Alex Kot

Abstract: The widespread availability of publicly accessible medical images has significantly propelled advancements in various research and clinical fields. Nonetheless, concerns regarding unauthorized training of AI systems for commercial purposes and the duties of patient privacy protection have led numerous institutions to hesitate to share their images. This is particularly true for medical image segme… ▽ More The widespread availability of publicly accessible medical images has significantly propelled advancements in various research and clinical fields. Nonetheless, concerns regarding unauthorized training of AI systems for commercial purposes and the duties of patient privacy protection have led numerous institutions to hesitate to share their images. This is particularly true for medical image segmentation (MIS) datasets, where the processes of collection and fine-grained annotation are time-intensive and laborious. Recently, Unlearnable Examples (UEs) methods have shown the potential to protect images by adding invisible shortcuts. These shortcuts can prevent unauthorized deep neural networks from generalizing. However, existing UEs are designed for natural image classification and fail to protect MIS datasets imperceptibly as their protective perturbations are less learnable than important prior knowledge in MIS, e.g., contour and texture features. To this end, we propose an Unlearnable Medical image generation method, termed UMed. UMed integrates the prior knowledge of MIS by injecting contour- and texture-aware perturbations to protect images. Given that our target is to only poison features critical to MIS, UMed requires only minimal perturbations within the ROI and its contour to achieve greater imperceptibility (average PSNR is 50.03) and protective performance (clean average DSC degrades from 82.18% to 6.80%). △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2402.15047 [pdf]

Networked Collaborative Sensing using Multi-domain Measurements: Architectures, Performance Limits and Algorithms

Authors: Yihua Ma, Shuqiang Xia, Chen bai, Yuxin Wang, Zhongbin Wang, Songqian Li

Abstract: As a promising 6G technology, integrated sensing and communication (ISAC) gains growing interest. ISAC provides integration gain via sharing spectrum, hardware, and software. However, concerns exist regarding its sensing performance when compared to dedicated radar systems. To address this issue, the advantages of widely deployed networks should be utilized, and this paper proposes networked colla… ▽ More As a promising 6G technology, integrated sensing and communication (ISAC) gains growing interest. ISAC provides integration gain via sharing spectrum, hardware, and software. However, concerns exist regarding its sensing performance when compared to dedicated radar systems. To address this issue, the advantages of widely deployed networks should be utilized, and this paper proposes networked collaborative sensing (NCS) using multi-domain measurements (MM), including range, Doppler, and two-dimension angle of arrival. In the NCS-MM architecture, this paper proposes a novel multi-domain decoupling model and a novel guard band-based protocol. The proposed model simplifies multi-domain derivations and algorithm designs, and the proposed protocol conserves resources and mitigates NCS interference. To determine the performance limits, this paper derives the Cramér-Rao lower bound (CRLB) of three-dimension position and velocity in NCS-MM. An accumulated single-dimension channel model is used to obtain the CRLB of MM, which is proven to be equivalent to that of the multi-dimension model. The algorithms of both MM estimation and fusion are proposed. An arbitrary-dimension Newtonized orthogonal matched pursuit (AD-NOMP) is proposed to accurately estimate grid-less MM. The degree-of-freedom (DoF) of MM is analyzed, and a novel DoF-based two-stage weighted least squares (TSWLS) is proposed to reduce equations without DoF loss. The numerical results show that the performances of the proposed algorithms are close to their performance limits. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.08952 [pdf, other]

A two-stage solution to quantum process tomography: error analysis and optimal design

Authors: Shuixin Xiao, Yuanlong Wang, Jun Zhang, Daoyi Dong, Gary J. Mooney, Ian R. Petersen, Hidehiro Yonezawa

Abstract: Quantum process tomography is a critical task for characterizing the dynamics of quantum systems and achieving precise quantum control. In this paper, we propose a two-stage solution for both trace-preserving and non-trace-preserving quantum process tomography. Utilizing a tensor structure, our algorithm exhibits a computational complexity of $O(MLd^2)$ where $d$ is the dimension of the quantum sy… ▽ More Quantum process tomography is a critical task for characterizing the dynamics of quantum systems and achieving precise quantum control. In this paper, we propose a two-stage solution for both trace-preserving and non-trace-preserving quantum process tomography. Utilizing a tensor structure, our algorithm exhibits a computational complexity of $O(MLd^2)$ where $d$ is the dimension of the quantum system and $ M $, $ L $ represent the numbers of different input states and measurement operators, respectively. We establish an analytical error upper bound and then design the optimal input states and the optimal measurement operators, which are both based on minimizing the error upper bound and maximizing the robustness characterized by the condition number. Numerical examples and testing on IBM quantum devices are presented to demonstrate the performance and efficiency of our algorithm. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 41 pages, 7 figures

arXiv:2401.12587 [pdf, other]

An Efficient Implicit Neural Representation Image Codec Based on Mixed Autoregressive Model for Low-Complexity Decoding

Authors: Xiang Liu, Jiahong Chen, Bin Chen, Zimo Liu, Baoyi An, Shu-Tao Xia, Zhi Wang

Abstract: Displaying high-quality images on edge devices, such as augmented reality devices, is essential for enhancing the user experience. However, these devices often face power consumption and computing resource limitations, making it challenging to apply many deep learning-based image compression algorithms in this field. Implicit Neural Representation (INR) for image compression is an emerging technol… ▽ More Displaying high-quality images on edge devices, such as augmented reality devices, is essential for enhancing the user experience. However, these devices often face power consumption and computing resource limitations, making it challenging to apply many deep learning-based image compression algorithms in this field. Implicit Neural Representation (INR) for image compression is an emerging technology that offers two key benefits compared to cutting-edge autoencoder models: low computational complexity and parameter-free decoding. It also outperforms many traditional and early neural compression methods in terms of quality. In this study, we introduce a new Mixed AutoRegressive Model (MARM) to significantly reduce the decoding time for the current INR codec, along with a new synthesis network to enhance reconstruction quality. MARM includes our proposed AutoRegressive Upsampler (ARU) blocks, which are highly computationally efficient, and ARM from previous work to balance decoding time and reconstruction quality. We also propose enhancing ARU's performance using a checkerboard two-stage decoding strategy. Moreover, the ratio of different modules can be adjusted to maintain a balance between quality and speed. Comprehensive experiments demonstrate that our method significantly improves computational efficiency while preserving image quality. With different parameter settings, our method can achieve over a magnitude acceleration in decoding time without industrial level optimization, or achieve state-of-the-art reconstruction quality compared with other INR codecs. To the best of our knowledge, our method is the first INR-based codec comparable with Hyperprior in both decoding speed and quality while maintaining low complexity. △ Less

Submitted 7 June, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2311.13847 [pdf, other]

Perceptual Image Compression with Cooperative Cross-Modal Side Information

Authors: Shiyu Qin, Bin Chen, Yujun Huang, Baoyi An, Tao Dai, Shu-Tao Xia

Abstract: The explosion of data has resulted in more and more associated text being transmitted along with images. Inspired by from distributed source coding, many works utilize image side information to enhance image compression. However, existing methods generally do not consider using text as side information to enhance perceptual compression of images, even though the benefits of multimodal synergy have… ▽ More The explosion of data has resulted in more and more associated text being transmitted along with images. Inspired by from distributed source coding, many works utilize image side information to enhance image compression. However, existing methods generally do not consider using text as side information to enhance perceptual compression of images, even though the benefits of multimodal synergy have been widely demonstrated in research. This begs the following question: How can we effectively transfer text-level semantic dependencies to help image compression, which is only available to the decoder? In this work, we propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff. Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features. This is done by predicting a semantic mask to guide the learned text-adaptive affine transformation at the pixel level. Furthermore, we design a text-conditional generative adversarial networks to improve the perceptual quality of reconstructed images. Extensive experiments involving four datasets and ten image quality assessment metrics demonstrate that the proposed approach achieves superior results in terms of rate-perception trade-off and semantic distortion. △ Less

Submitted 28 November, 2023; v1 submitted 23 November, 2023; originally announced November 2023.

arXiv:2311.08738 [pdf, other]

doi 10.1109/TSP.2024.3390177

Near-Field Wideband Secure Communications: An Analog Beamfocusing Approach

Authors: Yuchen Zhang, Haiyang Zhang, Sa Xiao, Wanbin Tang, Yonina C. Eldar

Abstract: In the rapidly advancing landscape of 6G, characterized by ultra-high-speed wideband transmission in millimeter-wave and terahertz bands, our paper addresses the pivotal task of enhancing physical layer security (PLS) within near-field wideband communications. We introduce true-time delayer (TTD)-incorporated analog beamfocusing techniques designed to address the interplay between near-field propa… ▽ More In the rapidly advancing landscape of 6G, characterized by ultra-high-speed wideband transmission in millimeter-wave and terahertz bands, our paper addresses the pivotal task of enhancing physical layer security (PLS) within near-field wideband communications. We introduce true-time delayer (TTD)-incorporated analog beamfocusing techniques designed to address the interplay between near-field propagation and wideband beamsplit, an uncharted domain in existing literature. Our approach to maximizing secrecy rates involves formulating an optimization problem for joint power allocation and analog beamformer design, employing a two-stage process encompassing a semi-digital solution and analog approximation. This problem is efficiently solved through a combination of alternating optimization, fractional programming, and block successive upper-bound minimization techniques. Additionally, we present a low-complexity beamsplit-aware beamfocusing strategy, capitalizing on geometric insights from near-field wideband propagation, which can also serve as a robust initial value for the optimization-based approach. Numerical results substantiate the efficacy of the proposed methods, clearly demonstrating their superiority over TTD-free approaches in fortifying wideband PLS, as well as the advantageous secrecy energy efficiency achieved by leveraging low-cost analog devices. △ Less

Submitted 28 November, 2023; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: This work has been submitted to IEEE journal for publication

Journal ref: IEEE Transactions on Signal Processing, 2024

arXiv:2310.20421 [pdf, other]

Two-stage solution for ancilla-assisted quantum process tomography: error analysis and optimal design

Authors: Shuixin Xiao, Yuanlong Wang, Daoyi Dong, Jun Zhang

Abstract: Quantum process tomography (QPT) is a fundamental task to characterize the dynamics of quantum systems. In contrast to standard QPT, ancilla-assisted process tomography (AAPT) framework introduces an extra ancilla system such that a single input state is needed. In this paper, we extend the two-stage solution, a method originally designed for standard QPT, to perform AAPT. Our algorithm has… ▽ More Quantum process tomography (QPT) is a fundamental task to characterize the dynamics of quantum systems. In contrast to standard QPT, ancilla-assisted process tomography (AAPT) framework introduces an extra ancilla system such that a single input state is needed. In this paper, we extend the two-stage solution, a method originally designed for standard QPT, to perform AAPT. Our algorithm has $O(Md_A^2d_B^2)$ computational complexity where $ M $ is the type number of the measurement operators, $ d_A $ is the dimension of the quantum system of interest, and $d_B$ is the dimension of the ancilla system. Then we establish an error upper bound and further discuss the optimal design on the input state in AAPT. A numerical example on a phase dam** process demonstrates the effectiveness of the optimal design and illustrates the theoretical error analysis. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 6 pages, 3 figures

arXiv:2310.10997 [pdf]

Cooperative Dispatch of Microgrids Community Using Risk-Sensitive Reinforcement Learning with Monotonously Improved Performance

Authors: Ziqing Zhu, Xiang Gao, Siqi Bu, Ka Wing Chan, Bin Zhou, Shiwei Xia

Abstract: The integration of individual microgrids (MGs) into Microgrid Clusters (MGCs) significantly improves the reliability and flexibility of energy supply, through resource sharing and ensuring backup during outages. The dispatch of MGCs is the key challenge to be tackled to ensure their secure and economic operation. Currently, there is a lack of optimization method that can achieve a trade-off among… ▽ More The integration of individual microgrids (MGs) into Microgrid Clusters (MGCs) significantly improves the reliability and flexibility of energy supply, through resource sharing and ensuring backup during outages. The dispatch of MGCs is the key challenge to be tackled to ensure their secure and economic operation. Currently, there is a lack of optimization method that can achieve a trade-off among top-priority requirements of MGCs' dispatch, including fast computation speed, optimality, multiple objectives, and risk mitigation against uncertainty. In this paper, a novel Multi-Objective, Risk-Sensitive, and Online Trust Region Policy Optimization (RS-TRPO) Algorithm is proposed to tackle this problem. First, a dispatch paradigm for autonomous MGs in the MGC is proposed, enabling them sequentially implement their self-dispatch to mitigate potential conflicts. This dispatch paradigm is then formulated as a Markov Game model, which is finally solved by the RS-TRPO algorithm. This online algorithm enables MGs to spontaneously search for the Pareto Frontier considering multiple objectives and risk mitigation. The outstanding computational performance of this algorithm is demonstrated in comparison with mathematical programming methods and heuristic algorithms in a modified IEEE 30-Bus Test System integrated with four autonomous MGs. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2309.09415 [pdf]

Energy-efficient Integrated Sensing and Communication System and DNLFM Waveform

Authors: Yihua Ma, Zhifeng Yuan, Shuqiang Xia, Chen Bai, Zhongbin Wang, Yuxin Wang

Abstract: Integrated sensing and communication (ISAC) is a key enabler of 6G. Unlike communication radio links, the sensing signal requires to experience round trips from many scatters. Therefore, sensing is more power-sensitive and faces a severer multi-target interference. In this paper, the ISAC system employs dedicated sensing signals, which can be reused as the communication reference signal. This pape… ▽ More Integrated sensing and communication (ISAC) is a key enabler of 6G. Unlike communication radio links, the sensing signal requires to experience round trips from many scatters. Therefore, sensing is more power-sensitive and faces a severer multi-target interference. In this paper, the ISAC system employs dedicated sensing signals, which can be reused as the communication reference signal. This paper proposes to add time-frequency matched windows at both the transmitting and receiving sides, which avoids mismatch loss and increases energy efficiency. Discrete non-linear frequency modulation (DNLFM) is further proposed to achieve both time-domain constant modulus and frequency-domain arbitrary windowing weights. DNLFM uses very few Newton iterations and a simple geometrically-equivalent method to generate, which greatly reduces the complex numerical integral in the conventional method. Moreover, the spatial-domain matched window is proposed to achieve low sidelobes. The simulation results show that the proposed methods gain a higher energy efficiency than conventional methods. △ Less

Submitted 17 September, 2023; originally announced September 2023.

arXiv:2307.07936 [pdf, other]

doi 10.1109/TCOMM.2023.3294954

Joint Beam Management and SLAM for mmWave Communication Systems

Authors: Hang Que, Jie Yang, Chao-Kai Wen, Shuqiang Xia, Xiao Li, Shi **

Abstract: The millimeter-wave (mmWave) communication technology, which employs large-scale antenna arrays, enables inherent sensing capabilities. Simultaneous localization and map** (SLAM) can utilize channel multipath angle estimates to realize integrated sensing and communication design in 6G communication systems. However, existing works have ignored the significant overhead required by the mmWave beam… ▽ More The millimeter-wave (mmWave) communication technology, which employs large-scale antenna arrays, enables inherent sensing capabilities. Simultaneous localization and map** (SLAM) can utilize channel multipath angle estimates to realize integrated sensing and communication design in 6G communication systems. However, existing works have ignored the significant overhead required by the mmWave beam management when implementing SLAM with angle estimates. This study proposes a joint beam management and SLAM design that utilizes the strong coupling between the radio map and channel multipath for simultaneous beam management, localization, and map**. In this approach, we first propose a hierarchical swee** and sensing service design. The path angles are estimated in the hierarchical swee**, enabling angle-based SLAM with the aid of an inertial measurement unit (IMU) to realize sensing service. Then, feature-aided tracking is proposed that utilizes prior angle information generated from the radio map and IMU. Finally, a switching module is introduced to enable flexible switching between hierarchical swee** and feature-aided tracking. Simulations show that the proposed joint design can achieve sub-meter level localization and map** accuracy (with an error < 0.5 m). Moreover, the beam management overhead can be reduced by approximately 40% in different wireless environments. △ Less

Submitted 15 July, 2023; originally announced July 2023.

Journal ref: IEEE Transactions on Communications, early access, July 2023

arXiv:2306.11977 [pdf]

Encoding Enhanced Complex CNN for Accurate and Highly Accelerated MRI

Authors: Zimeng Li, Sa Xiao, Cheng Wang, Haidong Li, Xiuchao Zhao, Caohui Duan, Qian Zhou, Qiuchen Rao, Yuan Fang, Junshuai Xie, Lei Shi, Fumin Guo, Chaohui Ye, Xin Zhou

Abstract: Magnetic resonance imaging (MRI) using hyperpolarized noble gases provides a way to visualize the structure and function of human lung, but the long imaging time limits its broad research and clinical applications. Deep learning has demonstrated great potential for accelerating MRI by reconstructing images from undersampled data. However, most existing deep conventional neural networks (CNN) direc… ▽ More Magnetic resonance imaging (MRI) using hyperpolarized noble gases provides a way to visualize the structure and function of human lung, but the long imaging time limits its broad research and clinical applications. Deep learning has demonstrated great potential for accelerating MRI by reconstructing images from undersampled data. However, most existing deep conventional neural networks (CNN) directly apply square convolution to k-space data without considering the inherent properties of k-space sampling, limiting k-space learning efficiency and image reconstruction quality. In this work, we propose an encoding enhanced (EN2) complex CNN for highly undersampled pulmonary MRI reconstruction. EN2 employs convolution along either the frequency or phase-encoding direction, resembling the mechanisms of k-space sampling, to maximize the utilization of the encoding correlation and integrity within a row or column of k-space. We also employ complex convolution to learn rich representations from the complex k-space data. In addition, we develop a feature-strengthened modularized unit to further boost the reconstruction performance. Experiments demonstrate that our approach can accurately reconstruct hyperpolarized 129Xe and 1H lung MRI from 6-fold undersampled k-space data and provide lung function measurements with minimal biases compared with fully-sampled image. These results demonstrate the effectiveness of the proposed algorithmic components and indicate that the proposed approach could be used for accelerated pulmonary MRI in research and clinical lung disease patient care. △ Less

Submitted 13 November, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

arXiv:2305.12332 [pdf, other]

doi 10.1109/TWC.2023.3266343

Joint Localization and Environment Sensing by Harnessing NLOS Components in RIS-aided mmWave Communication Systems

Authors: Yixuan Huang, Jie Yang, Wankai Tang, Chao-Kai Wen, Shuqiang Xia, Shi **

Abstract: This study explores the use of non-line-of-sight (NLOS) components in millimeter-wave (mmWave) communication systems for joint localization and environment sensing. The radar cross section (RCS) of a reconfigurable intelligent surface (RIS) is calculated to develop a general path gain model for RISs and traditional scatterers. The results show that RISs have a greater potential to assist in locali… ▽ More This study explores the use of non-line-of-sight (NLOS) components in millimeter-wave (mmWave) communication systems for joint localization and environment sensing. The radar cross section (RCS) of a reconfigurable intelligent surface (RIS) is calculated to develop a general path gain model for RISs and traditional scatterers. The results show that RISs have a greater potential to assist in localization due to their ability to maintain high RCSs and create strong NLOS links. A one-stage linear weighted least squares estimator is proposed to simultaneously determine user equipment (UE) locations, velocities, and scatterer (or RIS) locations using line-of-sight (LOS) and NLOS paths. The estimator supports environment sensing and UE localization even using only NLOS paths. A second-stage estimator is also introduced to improve environment sensing accuracy by considering the nonlinear relationship between UE and scatterer locations. Simulation results demonstrate the effectiveness of the proposed estimators in rich scattering environments and the benefits of using NLOS paths for improving UE location accuracy and assisting in environment sensing. The effects of RIS number, size, and deployment on localization performance are also analyzed. △ Less

Submitted 20 May, 2023; originally announced May 2023.

Comments: 32 pages, 12 figures, accepted by IEEE Transactions on Wireless Communications

Journal ref: IEEE Transactions on Wireless Communications, early access, April 2023

arXiv:2305.06279 [pdf, other]

Vertical Federated Learning over Cloud-RAN: Convergence Analysis and System Optimization

Authors: Yuanming Shi, Shuhao Xia, Yong Zhou, Yijie Mao, Chunxiao Jiang, Meixia Tao

Abstract: Vertical federated learning (FL) is a collaborative machine learning framework that enables devices to learn a global model from the feature-partition datasets without sharing local raw data. However, as the number of the local intermediate outputs is proportional to the training samples, it is critical to develop communication-efficient techniques for wireless vertical FL to support high-dimensio… ▽ More Vertical federated learning (FL) is a collaborative machine learning framework that enables devices to learn a global model from the feature-partition datasets without sharing local raw data. However, as the number of the local intermediate outputs is proportional to the training samples, it is critical to develop communication-efficient techniques for wireless vertical FL to support high-dimensional model aggregation with full device participation. In this paper, we propose a novel cloud radio access network (Cloud-RAN) based vertical FL system to enable fast and accurate model aggregation by leveraging over-the-air computation (AirComp) and alleviating communication straggler issue with cooperative model aggregation among geographically distributed edge servers. However, the model aggregation error caused by AirComp and quantization errors caused by the limited fronthaul capacity degrade the learning performance for vertical FL. To address these issues, we characterize the convergence behavior of the vertical FL algorithm considering both uplink and downlink transmissions. To improve the learning performance, we establish a system optimization framework by joint transceiver and fronthaul quantization design, for which successive convex approximation and alternate convex search based system optimization algorithms are developed. We conduct extensive simulations to demonstrate the effectiveness of the proposed system architecture and optimization framework for vertical FL. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: 32 pages, 7 figures

arXiv:2305.05356 [pdf, other]

Learning Dynamic Point Cloud Compression via Hierarchical Inter-frame Block Matching

Authors: Shuting Xia, Tingyu Fan, Yiling Xu, Jenq-Neng Hwang, Zhu Li

Abstract: 3D dynamic point cloud (DPC) compression relies on mining its temporal context, which faces significant challenges due to DPC's sparsity and non-uniform structure. Existing methods are limited in capturing sufficient temporal dependencies. Therefore, this paper proposes a learning-based DPC compression framework via hierarchical block-matching-based inter-prediction module to compensate and compre… ▽ More 3D dynamic point cloud (DPC) compression relies on mining its temporal context, which faces significant challenges due to DPC's sparsity and non-uniform structure. Existing methods are limited in capturing sufficient temporal dependencies. Therefore, this paper proposes a learning-based DPC compression framework via hierarchical block-matching-based inter-prediction module to compensate and compress the DPC geometry in latent space. Specifically, we propose a hierarchical motion estimation and motion compensation (Hie-ME/MC) framework for flexible inter-prediction, which dynamically selects the granularity of optical flow to encapsulate the motion information accurately. To improve the motion estimation efficiency of the proposed inter-prediction module, we further design a KNN-attention block matching (KABM) network that determines the impact of potential corresponding points based on the geometry and feature correlation. Finally, we compress the residual and the multi-scale optical flow with a fully-factorized deep entropy model. The experiment result on the MPEG-specified Owlii Dynamic Human Dynamic Point Cloud (Owlii) dataset shows that our framework outperforms the previous state-of-the-art methods and the MPEG standard V-PCC v18 in inter-frame low-delay mode. △ Less

Submitted 16 May, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: 9 pages for the main body, 3 pages for the supplemental after References

arXiv:2305.02485

How to Use Reinforcement Learning to Facilitate Future Electricity Market Design? Part 1: A Paradigmatic Theory

Authors: Ziqing Zhu, Siqi Bu, Ka Wing Chan, Bin Zhou, Shiwei Xia

Abstract: In face of the pressing need of decarbonization in the power sector, the re-design of electricity market is necessary as a Marco-level approach to accommodate the high penetration of renewable generations, and to achieve power system operation security, economic efficiency, and environmental friendliness. However, existing market design methodologies suffer from the lack of coordination among ener… ▽ More In face of the pressing need of decarbonization in the power sector, the re-design of electricity market is necessary as a Marco-level approach to accommodate the high penetration of renewable generations, and to achieve power system operation security, economic efficiency, and environmental friendliness. However, existing market design methodologies suffer from the lack of coordination among energy spot market (ESM), ancillary service market (ASM) and financial market (FM), i.e., the "joint market", and the lack of reliable simulation-based verification. To tackle these deficiencies, this two-part paper develops a paradigmatic theory and detailed methods of the joint market design using reinforcement-learning (RL)-based simulation. In Part 1, the theory and framework of this novel market design philosophy are proposed. First, the controversial market design options while designing the joint market are summarized as the targeted research questions. Second, the Markov game model is developed to describe the bidding game in the joint market, incorporating the market design options to be determined. Third, a framework of deploying multiple types of RL algorithms to simulate the market model is developed. Finally, several market operation performance indicators are proposed to validate the market design based on the simulation results. △ Less

Submitted 11 May, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

Comments: It is old version with mistakes

arXiv:2305.00561 [pdf, other]

Model-free Motion Planning of Autonomous Agents for Complex Tasks in Partially Observable Environments

Authors: Junchao Li, Mingyu Cai, Zhen Kan, Abstract: Motion planning of autonomous agents in partially known environments with incomplete information is a challenging problem, particularly for complex tasks. This paper proposes a model-free reinforcement learning approach to address this problem. We formulate motion planning as a probabilistic-labeled partially observable Markov decision process (PL-POMDP) problem and use linear temporal logic (LTL)… ▽ More Motion planning of autonomous agents in partially known environments with incomplete information is a challenging problem, particularly for complex tasks. This paper proposes a model-free reinforcement learning approach to address this problem. We formulate motion planning as a probabilistic-labeled partially observable Markov decision process (PL-POMDP) problem and use linear temporal logic (LTL) to express the complex task. The LTL formula is then converted to a limit-deterministic generalized Büchi automaton (LDGBA). The problem is redefined as finding an optimal policy on the product of PL-POMDP with LDGBA based on model-checking techniques to satisfy the complex task. We implement deep Q learning with long short-term memory (LSTM) to process the observation history and task recognition. Our contributions include the proposed method, the utilization of LTL and LDGBA, and the LSTM-enhanced deep Q learning. We demonstrate the applicability of the proposed method by conducting simulations in various environments, including grid worlds, a virtual office, and a multi-agent warehouse. The simulation results demonstrate that our proposed method effectively addresses environment, action, and observation uncertainties. This indicates its potential for real-world applications, including the control of unmanned aerial vehicles (UAVs). △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: 32 pages, 22 figures, submitted to Autonomous Agents and Multi-Agent Systems

arXiv:2304.10780 [pdf, other]

Omni-Line-of-Sight Imaging for Holistic Shape Reconstruction

Authors: Binbin Huang, Xingyue Peng, Siyuan Shen, Suan Xia, Ruiqian Li, Yanhua Yu, Yuehan Wang, Shenghua Gao, Wenzheng Chen, Shiying Li, **gyi Yu

Abstract: We introduce Omni-LOS, a neural computational imaging method for conducting holistic shape reconstruction (HSR) of complex objects utilizing a Single-Photon Avalanche Diode (SPAD)-based time-of-flight sensor. As illustrated in Fig. 1, our method enables new capabilities to reconstruct near-$360^\circ$ surrounding geometry of an object from a single scan spot. In such a scenario, traditional line-o… ▽ More We introduce Omni-LOS, a neural computational imaging method for conducting holistic shape reconstruction (HSR) of complex objects utilizing a Single-Photon Avalanche Diode (SPAD)-based time-of-flight sensor. As illustrated in Fig. 1, our method enables new capabilities to reconstruct near-$360^\circ$ surrounding geometry of an object from a single scan spot. In such a scenario, traditional line-of-sight (LOS) imaging methods only see the front part of the object and typically fail to recover the occluded back regions. Inspired by recent advances of non-line-of-sight (NLOS) imaging techniques which have demonstrated great power to reconstruct occluded objects, Omni-LOS marries LOS and NLOS together, leveraging their complementary advantages to jointly recover the holistic shape of the object from a single scan position. The core of our method is to put the object nearby diffuse walls and augment the LOS scan in the front view with the NLOS scans from the surrounding walls, which serve as virtual ``mirrors'' to trap lights toward the object. Instead of separately recovering the LOS and NLOS signals, we adopt an implicit neural network to represent the object, analogous to NeRF and NeTF. While transients are measured along straight rays in LOS but over the spherical wavefronts in NLOS, we derive differentiable ray propagation models to simultaneously model both types of transient measurements so that the NLOS reconstruction also takes into account the direct LOS measurements and vice versa. We further develop a proof-of-concept Omni-LOS hardware prototype for real-world validation. Comprehensive experiments on various wall settings demonstrate that Omni-LOS successfully resolves shape ambiguities caused by occlusions, achieves high-fidelity 3D scan quality, and manages to recover objects of various scales and complexity. △ Less

Submitted 21 April, 2023; originally announced April 2023.

arXiv:2302.09256 [pdf, other]

Multi-dimensional frequency dynamic convolution with confident mean teacher for sound event detection

Authors: Shengchang Xiao, Xueshuai Zhang, Pengyuan Zhang

Abstract: Recently, convolutional neural networks (CNNs) have been widely used in sound event detection (SED). However, traditional convolution is deficient in learning time-frequency domain representation of different sound events. To address this issue, we propose multi-dimensional frequency dynamic convolution (MFDConv), a new design that endows convolutional kernels with frequency-adaptive dynamic prope… ▽ More Recently, convolutional neural networks (CNNs) have been widely used in sound event detection (SED). However, traditional convolution is deficient in learning time-frequency domain representation of different sound events. To address this issue, we propose multi-dimensional frequency dynamic convolution (MFDConv), a new design that endows convolutional kernels with frequency-adaptive dynamic properties along multiple dimensions. MFDConv utilizes a novel multi-dimensional attention mechanism with a parallel strategy to learn complementary frequency-adaptive attentions, which substantially strengthen the feature extraction ability of convolutional kernels. Moreover, in order to promote the performance of mean teacher, we propose the confident mean teacher to increase the accuracy of pseudo-labels from the teacher and train the student with high confidence labels. Experimental results show that the proposed methods achieve 0.470 and 0.692 of PSDS1 and PSDS2 on the DESED real validation dataset. △ Less

Submitted 21 February, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

Comments: accepted to ICASSP 2023

arXiv:2212.07651 [pdf, other]

Two-stage Contextual Transformer-based Convolutional Neural Network for Airway Extraction from CT Images

Authors: Yanan Wu, Shuiqing Zhao, Shouliang Qi, Jie Feng, Haowen Pang, Runsheng Chang, Long Bai, Mengqi Li, Shuyue Xia, Wei Qian, Hongliang Ren

Abstract: Accurate airway extraction from computed tomography (CT) images is a critical step for planning navigation bronchoscopy and quantitative assessment of airway-related chronic obstructive pulmonary disease (COPD). The existing methods are challenging to sufficiently segment the airway, especially the high-generation airway, with the constraint of the limited label and cannot meet the clinical use in… ▽ More Accurate airway extraction from computed tomography (CT) images is a critical step for planning navigation bronchoscopy and quantitative assessment of airway-related chronic obstructive pulmonary disease (COPD). The existing methods are challenging to sufficiently segment the airway, especially the high-generation airway, with the constraint of the limited label and cannot meet the clinical use in COPD. We propose a novel two-stage 3D contextual transformer-based U-Net for airway segmentation using CT images. The method consists of two stages, performing initial and refined airway segmentation. The two-stage model shares the same subnetwork with different airway masks as input. Contextual transformer block is performed both in the encoder and decoder path of the subnetwork to finish high-quality airway segmentation effectively. In the first stage, the total airway mask and CT images are provided to the subnetwork, and the intrapulmonary airway mask and corresponding CT scans to the subnetwork in the second stage. Then the predictions of the two-stage method are merged as the final prediction. Extensive experiments were performed on in-house and multiple public datasets. Quantitative and qualitative analysis demonstrate that our proposed method extracted much more branches and lengths of the tree while accomplishing state-of-the-art airway segmentation performance. The code is available at https://github.com/zhaozsq/airway_segmentation. △ Less

Submitted 15 December, 2022; originally announced December 2022.

arXiv:2210.16197 [pdf]

Dimensionality Reduced Antenna Array for Beamforming/steering

Authors: Shiyi Xia, Mingyang Zhao, Qian Ma, Xunnan Zhang, Ling Yang, Yazhi Pi, Hyunchul Chung, Ad Reniers, A. M. J. Koonen, Zizheng Cao

Abstract: Beamforming makes possible a focused communication method. It is extensively employed in many disciplines involving electromagnetic waves, including arrayed ultrasonic, optical, and high-speed wireless communication. Conventional beam steering often requires the addition of separate active amplitude phase control units after each radiating element. The high power consumption and complexity of larg… ▽ More Beamforming makes possible a focused communication method. It is extensively employed in many disciplines involving electromagnetic waves, including arrayed ultrasonic, optical, and high-speed wireless communication. Conventional beam steering often requires the addition of separate active amplitude phase control units after each radiating element. The high power consumption and complexity of large-scale phased arrays can be overcome by reducing the number of active controllers, pushing beamforming into satellite communications and deep space exploration. Here, we suggest a brand-new design for a phased array antenna with a dimension reduced cascaded angle offset (DRCAO-PAA). Furthermore, the suggested DRCAO-PAA was compressed by using the concept of singular value deposition. To pave the way for practical application the particle swarm optimization algorithm and deep neural network Transformer were adopted. Based on this theoretical framework, an experimental board was built to verify the theory. Finally, the 16/8/4 -array beam steering was demonstrated by using 4/3/2 active controllers, respectively. △ Less

Submitted 28 October, 2022; originally announced October 2022.

arXiv:2209.12002 [pdf, other]

doi 10.21437/Interspeech.2022-11412

Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting

Authors: Jie Wang, Yuji Liu, Binling Wang, Yiming Zhi, Song Li, Shipeng Xia, Jiayang Zhang, Feng Tong, Lin Li, Qingyang Hong

Abstract: This paper describes a spatial-aware speaker diarization system for the multi-channel multi-party meeting. The diarization system obtains direction information of speaker by microphone array. Speaker spatial embedding is generated by xvector and s-vector derived from superdirective beamforming (SDB) which makes the embedding more robust. Specifically, we propose a novel multi-channel sequence-to-s… ▽ More This paper describes a spatial-aware speaker diarization system for the multi-channel multi-party meeting. The diarization system obtains direction information of speaker by microphone array. Speaker spatial embedding is generated by xvector and s-vector derived from superdirective beamforming (SDB) which makes the embedding more robust. Specifically, we propose a novel multi-channel sequence-to-sequence neural network architecture named discriminative multi-stream neural network (DMSNet) which consists of attention superdirective beamforming (ASDB) block and Conformer encoder. The proposed ASDB is a self-adapted channel-wise block that extracts the latent spatial features of array audios by modeling interdependencies between channels. We explore DMSNet to address overlapped speech problem on multi-channel audio and achieve 93.53% accuracy on evaluation set. By performing DMSNet based overlapped speech detection (OSD) module, the diarization error rate (DER) of cluster-based diarization system decrease significantly from 13.45% to 7.64%. △ Less

Submitted 24 September, 2022; originally announced September 2022.

Comments: Accepted by Interspeech 2022. arXiv admin note: text overlap with arXiv:2202.05744

arXiv:2209.11953 [pdf]

TD-BPQBC: A 1.8μW 5.5mm3 ADC-less Neural Implant SoC utilizing 13.2pJ/Sample Time-domain Bi-phasic Quasi-static Brain Communication

Authors: Baibhab Chatterjee, K Gaurav Kumar, Shulan Xiao, Gourab Barik, Krishna Jayant, Shreyas Sen

Abstract: Untethered miniaturized wireless neural sensor nodes with data transmission and energy harvesting capabilities call for circuit and system-level innovations to enable ultra-low energy deep implants for brain-machine interfaces. Realizing that the energy and size constraints of a neural implant motivate highly asymmetric system design (a small, low-power sensor and transmitter at the implant, with… ▽ More Untethered miniaturized wireless neural sensor nodes with data transmission and energy harvesting capabilities call for circuit and system-level innovations to enable ultra-low energy deep implants for brain-machine interfaces. Realizing that the energy and size constraints of a neural implant motivate highly asymmetric system design (a small, low-power sensor and transmitter at the implant, with a relatively higher power receiver at a body-worn hub), we present Time-Domain Bi-Phasic Quasi-static Brain Communication (TD- BPQBC), offloading the burden of analog to digital conversion (ADC) and digital signal processing (DSP) to the receiver. The input analog signal is converted to time-domain pulse-width modulated (PWM) waveforms, and transmitted using the recently developed BPQBC method for reducing communication power in implants. The overall SoC consumes only 1.8μW power while sensing and communicating at 800kSps. The transmitter energy efficiency is only 1.1pJ/b, which is >30X better than the state-of-the-art, enabling a fully-electrical, energy-harvested, and connected in-brain sensor/stimulator node. △ Less

Submitted 19 October, 2022; v1 submitted 24 September, 2022; originally announced September 2022.

Comments: 4 pages, 6 figures, presented in ESSCIRC 2022 conference

arXiv:2208.04318 [pdf, other]

Adaptive Local Implicit Image Function for Arbitrary-scale Super-resolution

Authors: Hongwei Li, Tao Dai, Yiming Li, Xueyi Zou, Shu-Tao Xia

Abstract: Image representation is critical for many visual tasks. Instead of representing images discretely with 2D arrays of pixels, a recent study, namely local implicit image function (LIIF), denotes images as a continuous function where pixel values are expansion by using the corresponding coordinates as inputs. Due to its continuous nature, LIIF can be adopted for arbitrary-scale image super-resolution… ▽ More Image representation is critical for many visual tasks. Instead of representing images discretely with 2D arrays of pixels, a recent study, namely local implicit image function (LIIF), denotes images as a continuous function where pixel values are expansion by using the corresponding coordinates as inputs. Due to its continuous nature, LIIF can be adopted for arbitrary-scale image super-resolution tasks, resulting in a single effective and efficient model for various up-scaling factors. However, LIIF often suffers from structural distortions and ringing artifacts around edges, mostly because all pixels share the same model, thus ignoring the local properties of the image. In this paper, we propose a novel adaptive local image function (A-LIIF) to alleviate this problem. Specifically, our A-LIIF consists of two main components: an encoder and a expansion network. The former captures cross-scale image features, while the latter models the continuous up-scaling function by a weighted combination of multiple local implicit image functions. Accordingly, our A-LIIF can reconstruct the high-frequency textures and structures more accurately. Experiments on multiple benchmark datasets verify the effectiveness of our method. Our codes are available at \url{https://github.com/LeeHW-THU/A-LIIF}. △ Less

Submitted 7 August, 2022; originally announced August 2022.

Comments: This paper is accepted by ICIP 2022. 5 pages

arXiv:2207.03241 [pdf]

doi 10.1109/JIOT.2023.3274120

Highly Efficient Waveform Design and Hybrid Duplex for Joint Communication and Sensing

Authors: Yihua Ma, Zhifeng Yuan, Shuqiang Xia, Guanghui Yu, Liujun Hu

Abstract: Joint communication and sensing (JCAS) is a very promising 6G technology, which attracts more and more research attention. Compared with communication, radar has many unique features in terms of waveform design criteria, self-interference cancellation (SIC), aperture-dependent resolution, and virtual aperture. This paper proposes a novel waveform design named max-aperture radar slicing (MaRS) to g… ▽ More Joint communication and sensing (JCAS) is a very promising 6G technology, which attracts more and more research attention. Compared with communication, radar has many unique features in terms of waveform design criteria, self-interference cancellation (SIC), aperture-dependent resolution, and virtual aperture. This paper proposes a novel waveform design named max-aperture radar slicing (MaRS) to gain a large time-frequency aperture, which is generated by orthogonal frequency division multiplexing (OFDM) and occupies only a tiny fraction of OFDM resources. The proposed MaRS keeps the radar advantages of constant modulus, zero auto-correlation sequence, and simple SIC. As MaRS consumes much less resources, conventional processing methods fail, and novel angle-Doppler map based methods are proposed to obtain the range-velocity-angle information from MaRS echos and strong clutters. To avoid complex full-duplex communication, this paper proposes a hybrid-duplex JCAS scheme composed of half-duplex communication and full-duplex radar. The half-duplex communication antenna array is reused, and a small sensing-dedicated antenna array is added. Using these two arrays, a large space-domain sensing aperture is virtually formed to greatly improve the angle resolution. The numerical results show that the proposed MaRS and hybrid duplex can achieve a high sensing resolution with only 0.4% OFDM resources, which reduces the overheads of conventional methods to less than one tenth. △ Less

Submitted 4 July, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

Comments: in IEEE Internet of Things Journal

arXiv:2206.12480 [pdf, other]

Attention-Guided Autoencoder for Automated Progression Prediction of Subjective Cognitive Decline with Structural MRI

Authors: Hao Guan, Ling Yue, Pew-Thian Yap, Shifu Xiao, Andrea Bozoki, Mingxia Liu

Abstract: Subjective cognitive decline (SCD) is a preclinical stage of Alzheimer's disease (AD) which occurs even before mild cognitive impairment (MCI). Progressive SCD will convert to MCI with the potential of further evolving to AD. Therefore, early identification of progressive SCD with neuroimaging techniques (e.g., structural MRI) is of great clinical value for early intervention of AD. However, exist… ▽ More Subjective cognitive decline (SCD) is a preclinical stage of Alzheimer's disease (AD) which occurs even before mild cognitive impairment (MCI). Progressive SCD will convert to MCI with the potential of further evolving to AD. Therefore, early identification of progressive SCD with neuroimaging techniques (e.g., structural MRI) is of great clinical value for early intervention of AD. However, existing MRI-based machine/deep learning methods usually suffer the small-sample-size problem which poses a great challenge to related neuroimaging analysis. The central question we aim to tackle in this paper is how to leverage related domains (e.g., AD/NC) to assist the progression prediction of SCD. Meanwhile, we are concerned about which brain areas are more closely linked to the identification of progressive SCD. To this end, we propose an attention-guided autoencoder model for efficient cross-domain adaptation which facilitates the knowledge transfer from AD to SCD. The proposed model is composed of four key components: 1) a feature encoding module for learning shared subspace representations of different domains, 2) an attention module for automatically locating discriminative brain regions of interest defined in brain atlases, 3) a decoding module for reconstructing the original input, 4) a classification module for identification of brain diseases. Through joint training of these four modules, domain invariant features can be learned. Meanwhile, the brain disease related regions can be highlighted by the attention mechanism. Extensive experiments on the publicly available ADNI dataset and a private CLAS dataset have demonstrated the effectiveness of the proposed method. The proposed model is straightforward to train and test with only 5-10 seconds on CPUs and is suitable for medical tasks with small datasets. △ Less

Submitted 16 February, 2023; v1 submitted 24 June, 2022; originally announced June 2022.

Comments: 10 pages, 12 figures

arXiv:2206.09611 [pdf, other]

SJ-HD^2R: Selective Joint High Dynamic Range and Denoising Imaging for Dynamic Scenes

Authors: Wei Li, Shuai Xiao, Tianhong Dai, Shanxin Yuan, Tao Wang, Cheng Li, Fenglong Song

Abstract: Ghosting artifacts, motion blur, and low fidelity in highlight are the main challenges in High Dynamic Range (HDR) imaging from multiple Low Dynamic Range (LDR) images. These issues come from using the medium-exposed image as the reference frame in previous methods. To deal with them, we propose to use the under-exposed image as the reference to avoid these issues. However, the heavy noise in dark… ▽ More Ghosting artifacts, motion blur, and low fidelity in highlight are the main challenges in High Dynamic Range (HDR) imaging from multiple Low Dynamic Range (LDR) images. These issues come from using the medium-exposed image as the reference frame in previous methods. To deal with them, we propose to use the under-exposed image as the reference to avoid these issues. However, the heavy noise in dark regions of the under-exposed image becomes a new problem. Therefore, we propose a joint HDR and denoising pipeline, containing two sub-networks: (i) a pre-denoising network (PreDNNet) to adaptively denoise input LDRs by exploiting exposure priors; (ii) a pyramid cascading fusion network (PCFNet), introducing an attention mechanism and cascading structure in a multi-scale manner. To further leverage these two paradigms, we propose a selective and joint HDR and denoising (SJ-HD$^2$R) imaging framework, utilizing scenario-specific priors to conduct the path selection with an accuracy of more than 93.3$\%$. We create the first joint HDR and denoising benchmark dataset, which contains a variety of challenging HDR and denoising scenes and supports the switching of the reference image. Extensive experiment results show that our method achieves superior performance to previous methods. △ Less

Submitted 3 November, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

arXiv:2206.05279 [pdf, other]

PILC: Practical Image Lossless Compression with an End-to-end GPU Oriented Neural Framework

Authors: Ning Kang, Shanzhao Qiu, Shifeng Zhang, Zhenguo Li, Shutao Xia

Abstract: Generative model based image lossless compression algorithms have seen a great success in improving compression ratio. However, the throughput for most of them is less than 1 MB/s even with the most advanced AI accelerated chips, preventing them from most real-world applications, which often require 100 MB/s. In this paper, we propose PILC, an end-to-end image lossless compression framework that a… ▽ More Generative model based image lossless compression algorithms have seen a great success in improving compression ratio. However, the throughput for most of them is less than 1 MB/s even with the most advanced AI accelerated chips, preventing them from most real-world applications, which often require 100 MB/s. In this paper, we propose PILC, an end-to-end image lossless compression framework that achieves 200 MB/s for both compression and decompression with a single NVIDIA Tesla V100 GPU, 10 times faster than the most efficient one before. To obtain this result, we first develop an AI codec that combines auto-regressive model and VQ-VAE which performs well in lightweight setting, then we design a low complexity entropy coder that works well with our codec. Experiments show that our framework compresses better than PNG by a margin of 30% in multiple datasets. We believe this is an important step to bring AI compression forward to commercial use. △ Less

Submitted 9 June, 2022; originally announced June 2022.

arXiv:2205.08540 [pdf]

Bi-Phasic Quasistatic Brain Communication for Fully Untethered Connected Brain Implants

Authors: Baibhab Chatterjee, Mayukh Nath, Gaurav Kumar K, Shulan Xiao, Krishna Jayant, Shreyas Sen

Abstract: Wireless communication using electro-magnetic (EM) fields acts as the backbone for information exchange among wearable devices around the human body. However, for Implanted devices, EM fields incur high amount of absorption in the tissue, while alternative modes of transmission including ultrasound, optical and magneto-electric methods result in large amount of transduction losses due to conversio… ▽ More Wireless communication using electro-magnetic (EM) fields acts as the backbone for information exchange among wearable devices around the human body. However, for Implanted devices, EM fields incur high amount of absorption in the tissue, while alternative modes of transmission including ultrasound, optical and magneto-electric methods result in large amount of transduction losses due to conversion of one form of energy to another, thereby increasing the overall end-to-end energy loss. To solve the challenge of powering and communication in a brain implant with low end-end channel loss, we present Bi-Phasic Quasistatic Brain Communication (BP-QBC), achieving < 60dB worst-case end-to-end channel loss at a channel length of 55mm, by avoiding the transduction losses during field-modality conversion. BP-QBC utilizes dipole coupling based signal transmission within the brain tissue using differential excitation in the transmitter and differential signal pick-up at the receiver, and offers 41X lower power w.r.t. traditional Galvanic Human Body Communication at a carrier frequency of 1MHz, by blocking any DC current paths through the brain tissue. Since the electrical signal transfer through the human tissue is electro-quasistatic up to several 10's of MHz range, BP-QBC allows a scalable (bps-10Mbps) duty-cycled uplink from the implant to an external wearable. The power consumption in the BP-QBC TX is only 0.52uW at 1Mbps (with 1% duty cycling), which is within the range of harvested body-coupled power in the downlink from an external wearable to the brain implant. Furthermore, BP-QBC eliminates the need for sub-cranial repeaters, as it utilizes quasi-static electrical signals, thereby avoiding any transduction losses. Such low end-to-end channel loss with high data rates would find applications in neuroscience, brain-machine interfaces, electroceuticals and connected healthcare. △ Less

Submitted 4 July, 2023; v1 submitted 18 May, 2022; originally announced May 2022.

Comments: 22 pages

arXiv:2205.08369

Applications of Reinforcement Learning in Deregulated Power Market: A Comprehensive Review

Authors: Ziqing Zhu, Ze Hu, Ka Wing Chan, Siqi Bu, Bin Zhou, Shiwei Xia

Abstract: The increasing penetration of renewable generations, along with the deregulation and marketization of power industry, promotes the transformation of power market operation paradigms. The optimal bidding strategy and dispatching methodology under these new paradigms are prioritized concerns for both market participants and power system operators, with obstacles of uncertain characteristics, computa… ▽ More The increasing penetration of renewable generations, along with the deregulation and marketization of power industry, promotes the transformation of power market operation paradigms. The optimal bidding strategy and dispatching methodology under these new paradigms are prioritized concerns for both market participants and power system operators, with obstacles of uncertain characteristics, computational efficiency, as well as requirements of hyperopic decision-making. To tackle these problems, the Reinforcement Learning (RL), as an emerging machine learning technique with advantages compared with conventional optimization tools, is playing an increasingly significant role in both academia and industry. This paper presents a comprehensive review of RL applications in deregulated power market operation including bidding and dispatching strategy optimization, based on more than 150 carefully selected literatures. For each application, apart from a paradigmatic summary of generalized methodology, in-depth discussions of applicability and obstacles while deploying RL techniques are also provided. Finally, some RL techniques that have great potentiality to be deployed in bidding and dispatching problems are recommended and discussed. △ Less

Submitted 11 May, 2023; v1 submitted 7 May, 2022; originally announced May 2022.

Comments: It is old version with mistakes

arXiv:2202.05744 [pdf, other]

The xmuspeech system for multi-channel multi-party meeting transcription challenge

Authors: Jie Wang, Yuji Liu, Binling Wang, Yiming Zhi, Song Li1, Shipeng Xia, Jiayang Zhang, Lin Li1, Qingyang Hong, Feng Tong

Abstract: This paper describes the system developed by the XMUSPEECH team for the Multi-channel Multi-party Meeting Transcription Challenge (M2MeT). For the speaker diarization task, we propose a multi-channel speaker diarization system that obtains spatial information of speaker by Difference of Arrival (DOA) technology. Speaker-spatial embedding is generated by x-vector and s-vector derived from Filter-an… ▽ More This paper describes the system developed by the XMUSPEECH team for the Multi-channel Multi-party Meeting Transcription Challenge (M2MeT). For the speaker diarization task, we propose a multi-channel speaker diarization system that obtains spatial information of speaker by Difference of Arrival (DOA) technology. Speaker-spatial embedding is generated by x-vector and s-vector derived from Filter-and-Sum Beamforming (FSB) which makes the embedding more robust. Specifically, we propose a novel multi-channel sequence-to-sequence neural network architecture named Discriminative Multi-stream Neural Network (DMSNet) which consists of Attention Filter-and-Sum block (AFSB) and Conformer encoder. We explore DMSNet to address overlapped speech problem on multi-channel audio. Compared with LSTM based OSD module, we achieve a decreases of 10.1% in Detection Error Rate(DetER). By performing DMSNet based OSD module, the DER of cluster-based diarization system decrease significantly form 13.44% to 7.63%. Our best fusion system achieves 7.09% and 9.80% of the diarization error rate (DER) on evaluation set and test set. △ Less

Submitted 11 February, 2022; originally announced February 2022.

arXiv:2201.05502 [pdf]

doi 10.1109/JLT.2022.3168698

Fast and accurate waveform modeling of long-haul multi-channel optical fiber transmission using a hybrid model-data driven scheme

Authors: Hang Yang, Zekun Niu, Haochen Zhao, Shilin Xiao, Weisheng Hu, Lilin Yi

Abstract: The modeling of optical wave propagation in optical fiber is a task of fast and accurate solving the nonlinear Schrödinger equation (NLSE), and can enable the optical system design, digital signal processing verification and fast waveform calculation. Traditional waveform modeling of full-time and full-frequency information is the split-step Fourier method (SSFM), which has long been regarded as c… ▽ More The modeling of optical wave propagation in optical fiber is a task of fast and accurate solving the nonlinear Schrödinger equation (NLSE), and can enable the optical system design, digital signal processing verification and fast waveform calculation. Traditional waveform modeling of full-time and full-frequency information is the split-step Fourier method (SSFM), which has long been regarded as challenging in long-haul wavelength division multiplexing (WDM) optical fiber communication systems because it is extremely time-consuming. Here we propose a linear-nonlinear feature decoupling distributed (FDD) waveform modeling scheme to model long-haul WDM fiber channel, where the channel linear effects are modelled by the NLSE-derived model-driven methods and the nonlinear effects are modelled by the data-driven deep learning methods. Meanwhile, the proposed scheme only focuses on one-span fiber distance fitting, and then recursively transmits the model to achieve the required transmission distance. The proposed modeling scheme is demonstrated to have high accuracy, high computing speeds, and robust generalization abilities for different optical launch powers, modulation formats, channel numbers and transmission distances. The total running time of FDD waveform modeling scheme for 41-channel 1040-km fiber transmission is only 3 minutes versus more than 2 hours using SSFM for each input condition, which achieves a 98% reduction in computing time. Considering the multi-round optimization by adjusting system parameters, the complexity reduction is significant. The results represent a remarkable improvement in nonlinear fiber modeling and open up novel perspectives for solution of NLSE-like partial differential equations and optical fiber physics problems. △ Less

Submitted 16 May, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: 8 pages, 5 figures, 1 table, 30 references

arXiv:2201.00941 [pdf]

Waveform Design Using Half-duplex Devices for 6G Joint Communications and Sensing

Authors: Yihua Ma, Zhifeng Yuan, Guanghui Yu, Shuqiang Xia, Liujun Hu

Abstract: Joint communications and sensing is a promising 6G technology, and the challenge is how to integrate them efficiently. Existing frequency-division and time-division coexistence can hardly bring a gain of integration. Directly using orthogonal frequency-division multiplexing (OFDM) to sense requires complex in-band full-duplex to cancel the selfinterference (SI). To solve these problems, this paper… ▽ More Joint communications and sensing is a promising 6G technology, and the challenge is how to integrate them efficiently. Existing frequency-division and time-division coexistence can hardly bring a gain of integration. Directly using orthogonal frequency-division multiplexing (OFDM) to sense requires complex in-band full-duplex to cancel the selfinterference (SI). To solve these problems, this paper proposes novel coexistence schemes to gain super sensing range (SSR) and simple SI cancellation. SSR enables JCS to gain a sensing range of a sensing-only scheme and shares the resources with communications. Random time-division is proposed to gain a super Doppler range. Flexible sensing implanted OFDM (FSIOFDM) is also proposed. FSI-OFDM uses random sensing occasions to gain super Doppler range, as well as utilizes the fixed tail sensing occasions to achieve supper distance range. The simulation results show that the proposed schemes can gain SSR with limited resources. △ Less

Submitted 3 January, 2022; originally announced January 2022.

arXiv:2112.14792 [pdf, other]

Graph Neural Networks for Communication Networks: Context, Use Cases and Opportunities

Authors: José Suárez-Varela, Paul Almasan, Miquel Ferriol-Galmés, Krzysztof Rusek, Fabien Geyer, Xiangle Cheng, Xiang Shi, Shihan Xiao, Franco Scarselli, Albert Cabellos-Aparicio, Pere Barlet-Ros

Abstract: Graph neural networks (GNN) have shown outstanding applications in many fields where data is fundamentally represented as graphs (e.g., chemistry, biology, recommendation systems). In this vein, communication networks comprise many fundamental components that are naturally represented in a graph-structured manner (e.g., topology, configurations, traffic flows). This position article presents GNNs… ▽ More Graph neural networks (GNN) have shown outstanding applications in many fields where data is fundamentally represented as graphs (e.g., chemistry, biology, recommendation systems). In this vein, communication networks comprise many fundamental components that are naturally represented in a graph-structured manner (e.g., topology, configurations, traffic flows). This position article presents GNNs as a fundamental tool for modeling, control and management of communication networks. GNNs represent a new generation of data-driven models that can accurately learn and reproduce the complex behaviors behind real networks. As a result, such models can be applied to a wide variety of networking use cases, such as planning, online optimization, or troubleshooting. The main advantage of GNNs over traditional neural networks lies in its unprecedented generalization capabilities when applied to other networks and configurations unseen during training, which is a critical feature for achieving practical data-driven solutions for networking. This article comprises a brief tutorial on GNNs and their possible applications to communication networks. To showcase the potential of this technology, we present two use cases with state-of-the-art GNN models respectively applied to wired and wireless networks. Lastly, we delve into the key open challenges and opportunities yet to be explored in this novel research area. △ Less

Submitted 27 July, 2022; v1 submitted 29 December, 2021; originally announced December 2021.

Journal ref: IEEE Network, 2022

arXiv:2111.10152 [pdf, other]

Integrated Sensing and Communications for V2I Networks: Dynamic Predictive Beamforming for Extended Vehicle Targets

Authors: Zhen Du, Fan Liu, Weijie Yuan, Christos Masouros, Zenghui Zhang, Shuqiang Xia, Giuseppe Caire

Abstract: We investigate sensing-assisted predictive beamforming schemes for vehicle-to-infrastructure (V2I) communication by exploiting the integrated sensing and communication (ISAC) functionalities at the roadside unit (RSU). The RSU deploys a massive multi-input-multi-output (mMIMO) array and operates at millimeter wave (mmWave) frequencies. The pencil-sharp mMIMO beams and fine range resolution achieve… ▽ More We investigate sensing-assisted predictive beamforming schemes for vehicle-to-infrastructure (V2I) communication by exploiting the integrated sensing and communication (ISAC) functionalities at the roadside unit (RSU). The RSU deploys a massive multi-input-multi-output (mMIMO) array and operates at millimeter wave (mmWave) frequencies. The pencil-sharp mMIMO beams and fine range resolution achieved at mmWave, implicates that the point target assumption is impractical in such V2I networks, as the volume and shape of the vehicles become essential for beamforming. Simply pointing a beam to the vehicle may result in the communication receiver (CR) never lying in the beam, even when the vehicle's trajectory is accurately tracked. To tackle this problem, we consider the extended vehicle target with two novel beam tracking schemes. For the first scheme, the beamwidth is adjusted in real-time to cover the entire vehicle, followed by an extended Kalman filtering (EKF) algorithm to predict and track the position of CR according to the resolved high-resolution scatterers. An upgraded scheme is further proposed by splitting each transmission block into two stages. The first stage is exploited for ISAC transmission, where a wide beam is adopted for both communication and sensing. Based on the sensed results at the first stage, the second stage is dedicated to communication by adopting a pencil-sharp beam, yielding a significant improvement of the achievable rate. We further reveal the inherent tradeoff between the two stages in terms of their durations, and develop an optimal time allocation strategy that maximizes the average achievable rate. Finally, numerical results are provided to verify the superiorities of proposed schemes over the state-of-the-art methods. △ Less

Submitted 25 November, 2021; v1 submitted 19 November, 2021; originally announced November 2021.

arXiv:2111.00418 [pdf, other]

Deep Learning in Human Activity Recognition with Wearable Sensors: A Review on Advances

Authors: Shibo Zhang, Yaxuan Li, Shen Zhang, Farzad Shahabi, Stephen Xia, Yu Deng, Nabil Alshurafa

Abstract: Mobile and wearable devices have enabled numerous applications, including activity tracking, wellness monitoring, and human--computer interaction, that measure and improve our daily lives. Many of these applications are made possible by leveraging the rich collection of low-power sensors found in many mobile and wearable devices to perform human activity recognition (HAR). Recently, deep learning… ▽ More Mobile and wearable devices have enabled numerous applications, including activity tracking, wellness monitoring, and human--computer interaction, that measure and improve our daily lives. Many of these applications are made possible by leveraging the rich collection of low-power sensors found in many mobile and wearable devices to perform human activity recognition (HAR). Recently, deep learning has greatly pushed the boundaries of HAR on mobile and wearable devices. This paper systematically categorizes and summarizes existing work that introduces deep learning methods for wearables-based HAR and provides a comprehensive analysis of the current advancements, develo** trends, and major challenges. We also present cutting-edge frontiers and future directions for deep learning-based HAR. △ Less

Submitted 3 March, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

arXiv:2109.06715 [pdf, other]

doi 10.1109/MNET.001.2100266

IGNNITION: Bridging the Gap Between Graph Neural Networks and Networking Systems

Authors: David Pujol-Perich, José Suárez-Varela, Miquel Ferriol, Shihan Xiao, Bo Wu, Albert Cabellos-Aparicio, Pere Barlet-Ros

Abstract: Recent years have seen the vast potential of Graph Neural Networks (GNN) in many fields where data is structured as graphs (e.g., chemistry, recommender systems). In particular, GNNs are becoming increasingly popular in the field of networking, as graphs are intrinsically present at many levels (e.g., topology, routing). The main novelty of GNNs is their ability to generalize to other networks uns… ▽ More Recent years have seen the vast potential of Graph Neural Networks (GNN) in many fields where data is structured as graphs (e.g., chemistry, recommender systems). In particular, GNNs are becoming increasingly popular in the field of networking, as graphs are intrinsically present at many levels (e.g., topology, routing). The main novelty of GNNs is their ability to generalize to other networks unseen during training, which is an essential feature for develo** practical Machine Learning (ML) solutions for networking. However, implementing a functional GNN prototype is currently a cumbersome task that requires strong skills in neural network programming. This poses an important barrier to network engineers that often do not have the necessary ML expertise. In this article, we present IGNNITION, a novel open-source framework that enables fast prototy** of GNNs for networking systems. IGNNITION is based on an intuitive high-level abstraction that hides the complexity behind GNNs, while still offering great flexibility to build custom GNN architectures. To showcase the versatility and performance of this framework, we implement two state-of-the-art GNN models applied to different networking use cases. Our results show that the GNN models produced by IGNNITION are equivalent in terms of accuracy and performance to their native implementations in TensorFlow. △ Less

Submitted 2 February, 2022; v1 submitted 14 September, 2021; originally announced September 2021.

Journal ref: IEEE Network, vol. 35, no. 6, pp. 171-177, 2021

arXiv:2107.11511 [pdf, other]

Primary-Auxiliary Model Scheduling Based Estimation of the Vertical Wheel Force in a Full Vehicle System

Authors: Xueke Zheng, Runze Cai, Shuixin Xiao, Yu Qiu, Jun Zhang, Mian Li

Abstract: In this work, we study estimation problems in nonlinear mechanical systems subject to non-stationary and unknown excitation, which are common and critical problems in design and health management of mechanical systems. A primary-auxiliary model scheduling procedure based on time-domain transmissibilities is proposed and performed under switching linear dynamics: In addition to constructing a pri… ▽ More In this work, we study estimation problems in nonlinear mechanical systems subject to non-stationary and unknown excitation, which are common and critical problems in design and health management of mechanical systems. A primary-auxiliary model scheduling procedure based on time-domain transmissibilities is proposed and performed under switching linear dynamics: In addition to constructing a primary transmissibility family from the pseudo-inputs to the output during the offline stage, an auxiliary transmissibility family is constructed by further decomposing the pseudo-input vector into two parts. The auxiliary family enables to determine the unknown working condition at which the system is currently running at, and then an appropriate transmissibility from the primary transmissibility family for estimating the unknown output can be selected during the online estimation stage. As a result, the proposed approach offers a generalizable and explainable solution to the signal estimation problems in nonlinear mechanical systems in the context of switching linear dynamics with unknown inputs. A real-world application to the estimation of the vertical wheel force in a full vehicle system are, respectively, conducted to demonstrate the effectiveness of the proposed method. During the vehicle design phase, the vertical wheel force is the most important one among Wheel Center Loads (WCLs), and it is often measured directly with expensive, intrusive, and hard-to-install measurement devices during full vehicle testing campaigns. Meanwhile, the estimation problem of the vertical wheel force has not been solved well and is still of great interest. The experimental results show good performances of the proposed method in the sense of estimation accuracy for estimating the vertical wheel force. △ Less

Submitted 23 July, 2021; originally announced July 2021.

arXiv:2106.07596 [pdf, other]

Maximizing Revenue with Adaptive Modulation and Multiple FECs in Flexible Optical Networks

Authors: Cao Chen, Fen Zhou, Massimo Tornatore, Shilin Xiao

Abstract: Flexible optical networks (FONs) are being adopted to accommodate the increasingly heterogeneous traffic in today's Internet. However, in presence of high traffic load, not all offered traffic can be satisfied at all time. As carried traffic load brings revenues to operators, traffic blocking due to limited spectrum resource leads to revenue losses. In this study, given a set of traffic requests t… ▽ More Flexible optical networks (FONs) are being adopted to accommodate the increasingly heterogeneous traffic in today's Internet. However, in presence of high traffic load, not all offered traffic can be satisfied at all time. As carried traffic load brings revenues to operators, traffic blocking due to limited spectrum resource leads to revenue losses. In this study, given a set of traffic requests to be provisioned, we consider the problem of maximizing operator's revenue, subject to limited spectrum resource and physical layer impairments (PLIs), namely amplified spontaneous emission noise (ASE), self-channel interference (SCI), cross-channel interference (XCI), and node crosstalk. In FONs, adaptive modulation, multiple FEC, and the tuning of power spectrum density (PSD) can be effectively employed to mitigate the impact of PLIs. Hence, in our study, we propose a universal bandwidth-related impairment evaluation model based on channel bandwidth, which allows a performance analysis for different PSD, FEC and modulations. Leveraging this PLI model and a piecewise linear fitting function, we succeed to formulate the revenue maximization problem as a mixed integer linear program. Then, to solve the problem on larger network instances, a fast two-phase heuristic algorithm is also proposed, which is shown to be near-optimal for revenue maximization. Through simulations, we demonstrate that using adaptive modulation enables to significantly increase revenues in the scenario of high signal-to-noise ratio (SNR), where the revenue can even be doubled for high traffic load, while using multiple FECs is more profitable for scenarios with low SNR. △ Less

Submitted 14 June, 2021; originally announced June 2021.

arXiv:2106.07536 [pdf, other]

doi 10.1109/JLT.2022.3157084

Throughput Maximization Leveraging Just-Enough SNR Margin and Channel Spacing Optimization

Authors: Cao Chen, Fen Zhou, Yuanhao Liu, Shilin Xiao

Abstract: Flexible optical network is a promising technology to accommodate high-capacity demands in next-generation networks. To ensure uninterrupted communication, existing lightpath provisioning schemes are mainly done with the assumption of worst-case resource under-provisioning and fixed channel spacing, which preserves an excessive signal-to-noise ratio (SNR) margin. However, under a resource over-pro… ▽ More Flexible optical network is a promising technology to accommodate high-capacity demands in next-generation networks. To ensure uninterrupted communication, existing lightpath provisioning schemes are mainly done with the assumption of worst-case resource under-provisioning and fixed channel spacing, which preserves an excessive signal-to-noise ratio (SNR) margin. However, under a resource over-provisioning scenario, the excessive SNR margin restricts the transmission bit-rate or transmission reach, leading to physical layer resource waste and stranded transmission capacity. To tackle this challenging problem, we leverage an iterative feedback tuning algorithm to provide a just-enough SNR margin, so as to maximize the network throughput. Specifically, the proposed algorithm is implemented in three steps. First, starting from the high SNR margin setup, we establish an integer linear programming model as well as a heuristic algorithm to maximize the network throughput by solving the problem of routing, modulation format, forward error correction, baud-rate selection, and spectrum assignment. Second, we optimize the channel spacing of the lightpaths obtained from the previous step, thereby increasing the available physical layer resources. Finally, we iteratively reduce the SNR margin of each lightpath until the network throughput cannot be increased. Through numerical simulations, we confirm the throughput improvement in different networks and with different baud-rates. In particular, we find that our algorithm enables over 20\% relative gain when network resource is over-provisioned, compared to the traditional method preserving an excessive SNR margin. △ Less

Submitted 16 July, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

Comments: submitted to IEEE JLT, Jul. 17th, 2021. 14 pages, 8 figures

arXiv:2106.02670 [pdf, other]

Robust Resource Allocation for Multi-Antenna URLLC-OFDMA Systems in a Smart Factory

Authors: **g Cheng, Chao Shen, Shuqiang Xia

Abstract: In this paper, we investigate the worst-case robust beamforming design and resource block (RB) assignment problem for total transmit power minimization of the central controller while guaranteeing each robot's transmission with target number of data bits and within required ultra-low latency and extremely high reliability. By using the property of the independence of each robot's beamformer design… ▽ More In this paper, we investigate the worst-case robust beamforming design and resource block (RB) assignment problem for total transmit power minimization of the central controller while guaranteeing each robot's transmission with target number of data bits and within required ultra-low latency and extremely high reliability. By using the property of the independence of each robot's beamformer design, we can obtain the equivalent power control design form of the original beamforming design. The binary RB map** indicators are transformed into continuous ones with additional $\ell_0$-norm constraints to promote sparsity on each RB. A novel non-convex penalty (NCP) approach is applied to solve such $\ell_0$-norm constraints. Numerical results demonstrate the superiority of the NCP approach to the well-known reweighted $\ell_1$ method in terms of the optimized power consumption, convergence rate and robustness to channel realizations. Also, the impacts of latency, reliability, number of transmit antennas and channel uncertainty on the system performance are revealed. △ Less

Submitted 4 June, 2021; originally announced June 2021.

arXiv:2105.08629 [pdf, other]

Fast Camera Image Denoising on Mobile GPUs with Deep Learning, Mobile AI 2021 Challenge: Report

Authors: Andrey Ignatov, Kim Byeoung-su, Radu Timofte, Angeline Pouget, Fenglong Song, Cheng Li, Shuai Xiao, Zhongqian Fu, Matteo Maggioni, Yibin Huang, Shen Cheng, Xin Lu, Yifeng Zhou, Liangyu Chen, Donghao Liu, Xiangyu Zhang, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Minsu Kwon, Myungje Lee, Jaeyoon Yoo, Changbeom Kang, Shinjo Wang, Bin Huang , et al. (7 additional authors not shown)

Abstract: Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solut… ▽ More Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solution that can demonstrate high efficiency on smartphone GPUs. For this, the participants were provided with a novel large-scale dataset consisting of noisy-clean image pairs captured in the wild. The runtime of all models was evaluated on the Samsung Exynos 2100 chipset with a powerful Mali GPU capable of accelerating floating-point and quantized neural networks. The proposed solutions are fully compatible with any mobile GPU and are capable of processing 480p resolution images under 40-80 ms while achieving high fidelity results. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 17 May, 2021; originally announced May 2021.

Comments: Mobile AI 2021 Workshop and Challenges: https://ai-benchmark.com/workshops/mai/2021/. arXiv admin note: substantial text overlap with arXiv:2105.07809, arXiv:2105.07825

arXiv:2104.10781 [pdf, other]

NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

Authors: Ren Yang, Radu Timofte, **g Liu, Yi Xu, Xinjian Zhang, Minyi Zhao, Shuigeng Zhou, Kelvin C. K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy, Xin Li, Fanglong Liu, He Zheng, Lielin Jiang, Qi Zhang, Dongliang He, Fu Li, Qingqing Dang, Yibin Huang, Matteo Maggioni, Zhongqian Fu, Shuai Xiao, Cheng li, Thomas Tanay , et al. (47 additional authors not shown)

Abstract: This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at… ▽ More This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at a fixed bit-rate. Besides, the quality enhancement of Tracks 1 and 3 targets at improving the fidelity (PSNR), and Track 2 targets at enhancing the perceptual quality. The three tracks totally attract 482 registrations. In the test phase, 12 teams, 8 teams and 11 teams submitted the final results of Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of video quality enhancement. The homepage of the challenge: https://github.com/RenYang-home/NTIRE21_VEnh △ Less

Submitted 31 August, 2022; v1 submitted 21 April, 2021; originally announced April 2021.

Comments: Corrected the MOS values in Table 2, and corrected some minor typos

arXiv:2103.05407 [pdf, ps, other]

doi 10.1109/CVPR46437.2021.00347

Efficient Multi-Stage Video Denoising with Recurrent Spatio-Temporal Fusion

Authors: Matteo Maggioni, Yibin Huang, Cheng Li, Shuai Xiao, Zhongqian Fu, Fenglong Song

Abstract: In recent years, denoising methods based on deep learning have achieved unparalleled performance at the cost of large computational complexity. In this work, we propose an Efficient Multi-stage Video Denoising algorithm, called EMVD, to drastically reduce the complexity while maintaining or even improving the performance. First, a fusion stage reduces the noise through a recursive combination of a… ▽ More In recent years, denoising methods based on deep learning have achieved unparalleled performance at the cost of large computational complexity. In this work, we propose an Efficient Multi-stage Video Denoising algorithm, called EMVD, to drastically reduce the complexity while maintaining or even improving the performance. First, a fusion stage reduces the noise through a recursive combination of all past frames in the video. Then, a denoising stage removes the noise in the fused frame. Finally, a refinement stage restores the missing high frequency in the denoised frame. All stages operate on a transform-domain representation obtained by learnable and invertible linear operators which simultaneously increase accuracy and decrease complexity of the model. A single loss on the final output is sufficient for successful convergence, hence making EMVD easy to train. Experiments on real raw data demonstrate that EMVD outperforms the state of the art when complexity is constrained, and even remains competitive against methods whose complexities are several orders of magnitude higher. Further, the low complexity and memory requirements of EMVD enable real-time video denoising on commercial SoC in mobile devices. △ Less

Submitted 30 March, 2023; v1 submitted 9 March, 2021; originally announced March 2021.

Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 3465-3474

arXiv:2101.10322 [pdf, other]

Reconfigurable Intelligent Surface for Massive Connectivity

Authors: Shuhao Xia, Yuanming Shi, Yong Zhou, Xiaojun Yuan

Abstract: With the rapid development of Internet of Things (IoT), massive machine-type communication has become a promising application scenario, where a large number of devices transmit sporadically to a base station (BS). Reconfigurable intelligent surface (RIS) has been recently proposed as an innovative new technology to achieve energy efficiency and coverage enhancement by establishing favorable signal… ▽ More With the rapid development of Internet of Things (IoT), massive machine-type communication has become a promising application scenario, where a large number of devices transmit sporadically to a base station (BS). Reconfigurable intelligent surface (RIS) has been recently proposed as an innovative new technology to achieve energy efficiency and coverage enhancement by establishing favorable signal propagation environments, thereby improving data transmission in massive connectivity. Nevertheless, the BS needs to detect active devices and estimate channels to support data transmission in RIS-assisted massive access systems, which yields unique challenges. This paper shall consider an RIS-assisted uplink IoT network and aims to solve the RIS-related activity detection and channel estimation problem, where the BS detects the active devices and estimates the separated channels of the RIS-to-device link and the RIS-to-BS link. Due to limited scattering between the RIS and the BS, we model the RIS-to-BS channel as a sparse channel. As a result, by simultaneously exploiting both the sparsity of sporadic transmission in massive connectivity and the RIS-to-BS channels, we formulate the RIS-related activity detection and channel estimation problem as a sparse matrix factorization problem. Furthermore, we develop an approximate message passing (AMP) based algorithm to solve the problem based on Bayesian inference framework and reduce the computational complexity by approximating the algorithm with the central limit theorem and Taylor series arguments. Finally, extensive numerical experiments are conducted to verify the effectiveness and improvements of the proposed algorithm. △ Less

Submitted 13 January, 2021; originally announced January 2021.

arXiv:2101.07502 [pdf, other]

UAV-Enabled Cooperative Jamming for Covert Communications

Authors: Hangmei Rao, Sa Xiao, Shihao Yan, Jianquan Wang, Wanbin Tang

Abstract: This work employs an unmanned aerial vehicle (UAV) as a jammer to aid a covert communication from a transmitter Alice to a receiver Bob, where the UAV transmits artificial noise (AN) with random power to deliberately create interference to a warden Willie. In the considered system, the UAV's trajectory is critical to the covert communication performance, since the AN transmitted by the UAV also ge… ▽ More This work employs an unmanned aerial vehicle (UAV) as a jammer to aid a covert communication from a transmitter Alice to a receiver Bob, where the UAV transmits artificial noise (AN) with random power to deliberately create interference to a warden Willie. In the considered system, the UAV's trajectory is critical to the covert communication performance, since the AN transmitted by the UAV also generates interference to Bob. To maximize the system performance, we formulate an optimization problem to jointly design the UAV's trajectory and Alice's transmit power. The formulated optimization problem is non-convex and is normally solved by a conventional iterative (CI) method, which requires multiple approximations based on Taylor expansions and an initialization on the UAV's trajectory. In order to eliminate these requirements, this work, for the first time, develops a geometric (GM) method to solve the optimization problem. By analyzing the covertness constraint, the GM method decouples the joint optimization into optimizing the UAV's trajectory and Alice's transmit power separately. Our examination shows that the GM method can significantly outperform the CI method in terms of achieving a higher average covert rate and the complexity of the GM method is lower than that of the CI method. △ Less

Submitted 25 February, 2022; v1 submitted 19 January, 2021; originally announced January 2021.

Showing 1–50 of 69 results for author: Xiao, S