-
Learned Image Compression for HE-stained Histopathological Images via Stain Deconvolution
Authors:
Maximilian Fischer,
Peter Neher,
Tassilo Wald,
Silvia Dias Almeida,
Shuhan Xiao,
Peter Schüffler,
Rickmer Braren,
Michael Götz,
Alexander Muckenhuber,
Jens Kleesiek,
Marco Nolden,
Klaus Maier-Hein
Abstract:
Processing histopathological Whole Slide Images (WSI) leads to massive storage requirements for clinics worldwide. Even after lossy image compression during image acquisition, additional lossy compression is frequently possible without substantially affecting the performance of deep learning-based (DL) downstream tasks. In this paper, we show that the commonly used JPEG algorithm is not best suite…
▽ More
Processing histopathological Whole Slide Images (WSI) leads to massive storage requirements for clinics worldwide. Even after lossy image compression during image acquisition, additional lossy compression is frequently possible without substantially affecting the performance of deep learning-based (DL) downstream tasks. In this paper, we show that the commonly used JPEG algorithm is not best suited for further compression and we propose Stain Quantized Latent Compression (SQLC ), a novel DL based histopathology data compression approach. SQLC compresses staining and RGB channels before passing it through a compression autoencoder (CAE ) in order to obtain quantized latent representations for maximizing the compression. We show that our approach yields superior performance in a classification downstream task, compared to traditional approaches like JPEG, while image quality metrics like the Multi-Scale Structural Similarity Index (MS-SSIM) is largely preserved. Our method is online available.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Enhancing predictive imaging biomarker discovery through treatment effect analysis
Authors:
Shuhan Xiao,
Lukas Klein,
Jens Petersen,
Philipp Vollmuth,
Paul F. Jaeger,
Klaus H. Maier-Hein
Abstract:
Identifying predictive biomarkers, which forecast individual treatment effectiveness, is crucial for personalized medicine and informs decision-making across diverse disciplines. These biomarkers are extracted from pre-treatment data, often within randomized controlled trials, and have to be distinguished from prognostic biomarkers, which are independent of treatment assignment. Our study focuses…
▽ More
Identifying predictive biomarkers, which forecast individual treatment effectiveness, is crucial for personalized medicine and informs decision-making across diverse disciplines. These biomarkers are extracted from pre-treatment data, often within randomized controlled trials, and have to be distinguished from prognostic biomarkers, which are independent of treatment assignment. Our study focuses on the discovery of predictive imaging biomarkers, aiming to leverage pre-treatment images to unveil new causal relationships. Previous approaches relied on labor-intensive handcrafted or manually derived features, which may introduce biases. In response, we present a new task of discovering predictive imaging biomarkers directly from the pre-treatment images to learn relevant image features. We propose an evaluation protocol for this task to assess a model's ability to identify predictive imaging biomarkers and differentiate them from prognostic ones. It employs statistical testing and a comprehensive analysis of image feature attribution. We explore the suitability of deep learning models originally designed for estimating the conditional average treatment effect (CATE) for this task, which previously have been primarily assessed for the precision of CATE estimation, overlooking the evaluation of imaging biomarker discovery. Our proof-of-concept analysis demonstrates promising results in discovering and validating predictive imaging biomarkers from synthetic outcomes and real-world image datasets.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
MambaVC: Learned Visual Compression with Selective State Spaces
Authors:
Shiyu Qin,
**peng Wang,
Yimin Zhou,
Bin Chen,
Tianci Luo,
Baoyi An,
Tao Dai,
Shutao Xia,
Yaowei Wang
Abstract:
Learned visual compression is an important and active task in multimedia. Existing approaches have explored various CNN- and Transformer-based designs to model content distribution and eliminate redundancy, where balancing efficacy (i.e., rate-distortion trade-off) and efficiency remains a challenge. Recently, state-space models (SSMs) have shown promise due to their long-range modeling capacity a…
▽ More
Learned visual compression is an important and active task in multimedia. Existing approaches have explored various CNN- and Transformer-based designs to model content distribution and eliminate redundancy, where balancing efficacy (i.e., rate-distortion trade-off) and efficiency remains a challenge. Recently, state-space models (SSMs) have shown promise due to their long-range modeling capacity and efficiency. Inspired by this, we take the first step to explore SSMs for visual compression. We introduce MambaVC, a simple, strong and efficient compression network based on SSM. MambaVC develops a visual state space (VSS) block with a 2D selective scanning (2DSS) module as the nonlinear activation function after each downsampling, which helps to capture informative global contexts and enhances compression. On compression benchmark datasets, MambaVC achieves superior rate-distortion performance with lower computational and memory overheads. Specifically, it outperforms CNN and Transformer variants by 9.3% and 15.6% on Kodak, respectively, while reducing computation by 42% and 24%, and saving 12% and 71% of memory. MambaVC shows even greater improvements with high-resolution images, highlighting its potential and scalability in real-world applications. We also provide a comprehensive comparison of different network designs, underscoring MambaVC's advantages. Code is available at https://github.com/QinSY123/2024-MambaVC.
△ Less
Submitted 28 May, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms
Authors:
Yueyuan Sui,
Minghui Zhao,
Junxi Xia,
Xiaofan Jiang,
Stephen Xia
Abstract:
We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art m…
▽ More
We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art models with memory footprints of hundreds of MBs and methods better suited for resource-constrained systems. To adapt TRAMBA to vibration-based sensing modalities, we pre-train TRAMBA with audio speech datasets that are widely available. Then, users fine-tune with a small amount of bone conduction data. TRAMBA outperforms state-of-art GANs by up to 7.3% in PESQ and 1.8% in STOI, with an order of magnitude smaller memory footprint and an inference speed up of up to 465 times. We integrate TRAMBA into real systems and show that TRAMBA (i) improves battery life of wearables by up to 160% by requiring less data sampling and transmission; (ii) generates higher quality voice in noisy environments than over-the-air speech; (iii) requires a memory footprint of less than 20.0 MB.
△ Less
Submitted 29 May, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Movable Antenna-Aided Hybrid Beamforming for Multi-User Communications
Authors:
Yichi Zhang,
Yuchen Zhang,
Lipeng Zhu,
Sa Xiao,
Wanbin Tang,
Yonina C. Eldar,
Rui Zhang
Abstract:
In this correspondence, we propose a movable antenna (MA)-aided multi-user hybrid beamforming scheme with a sub-connected structure, where multiple movable sub-arrays can independently change their positions within different local regions. To maximize the system sum rate, we jointly optimize the digital beamformer, analog beamformer, and positions of subarrays, under the constraints of unit modulu…
▽ More
In this correspondence, we propose a movable antenna (MA)-aided multi-user hybrid beamforming scheme with a sub-connected structure, where multiple movable sub-arrays can independently change their positions within different local regions. To maximize the system sum rate, we jointly optimize the digital beamformer, analog beamformer, and positions of subarrays, under the constraints of unit modulus, finite movable regions, and power budget. Due to the non-concave/non-convex objective function/constraints, as well as the highly coupled variables, the formulated problem is challenging to solve. By employing fractional programming, we develop an alternating optimization framework to solve the problem via a combination of Lagrange multipliers, penalty method, and gradient descent. Numerical results reveal that the proposed MA-aided hybrid beamforming scheme significantly improves the sum rate compared to its fixed-position antenna (FPA) counterpart. Moreover, with sufficiently large movable regions, the proposed scheme with sub-connected MA arrays even outperforms the fully-connected FPA array.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations
Authors:
Xun Lin,
Yi Yu,
Song Xia,
Jue Jiang,
Haoran Wang,
Zitong Yu,
Yizhong Liu,
Ying Fu,
Shuai Wang,
Wenzhong Tang,
Alex Kot
Abstract:
The widespread availability of publicly accessible medical images has significantly propelled advancements in various research and clinical fields. Nonetheless, concerns regarding unauthorized training of AI systems for commercial purposes and the duties of patient privacy protection have led numerous institutions to hesitate to share their images. This is particularly true for medical image segme…
▽ More
The widespread availability of publicly accessible medical images has significantly propelled advancements in various research and clinical fields. Nonetheless, concerns regarding unauthorized training of AI systems for commercial purposes and the duties of patient privacy protection have led numerous institutions to hesitate to share their images. This is particularly true for medical image segmentation (MIS) datasets, where the processes of collection and fine-grained annotation are time-intensive and laborious. Recently, Unlearnable Examples (UEs) methods have shown the potential to protect images by adding invisible shortcuts. These shortcuts can prevent unauthorized deep neural networks from generalizing. However, existing UEs are designed for natural image classification and fail to protect MIS datasets imperceptibly as their protective perturbations are less learnable than important prior knowledge in MIS, e.g., contour and texture features. To this end, we propose an Unlearnable Medical image generation method, termed UMed. UMed integrates the prior knowledge of MIS by injecting contour- and texture-aware perturbations to protect images. Given that our target is to only poison features critical to MIS, UMed requires only minimal perturbations within the ROI and its contour to achieve greater imperceptibility (average PSNR is 50.03) and protective performance (clean average DSC degrades from 82.18% to 6.80%).
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Networked Collaborative Sensing using Multi-domain Measurements: Architectures, Performance Limits and Algorithms
Authors:
Yihua Ma,
Shuqiang Xia,
Chen bai,
Yuxin Wang,
Zhongbin Wang,
Songqian Li
Abstract:
As a promising 6G technology, integrated sensing and communication (ISAC) gains growing interest. ISAC provides integration gain via sharing spectrum, hardware, and software. However, concerns exist regarding its sensing performance when compared to dedicated radar systems. To address this issue, the advantages of widely deployed networks should be utilized, and this paper proposes networked colla…
▽ More
As a promising 6G technology, integrated sensing and communication (ISAC) gains growing interest. ISAC provides integration gain via sharing spectrum, hardware, and software. However, concerns exist regarding its sensing performance when compared to dedicated radar systems. To address this issue, the advantages of widely deployed networks should be utilized, and this paper proposes networked collaborative sensing (NCS) using multi-domain measurements (MM), including range, Doppler, and two-dimension angle of arrival. In the NCS-MM architecture, this paper proposes a novel multi-domain decoupling model and a novel guard band-based protocol. The proposed model simplifies multi-domain derivations and algorithm designs, and the proposed protocol conserves resources and mitigates NCS interference. To determine the performance limits, this paper derives the Cramér-Rao lower bound (CRLB) of three-dimension position and velocity in NCS-MM. An accumulated single-dimension channel model is used to obtain the CRLB of MM, which is proven to be equivalent to that of the multi-dimension model. The algorithms of both MM estimation and fusion are proposed. An arbitrary-dimension Newtonized orthogonal matched pursuit (AD-NOMP) is proposed to accurately estimate grid-less MM. The degree-of-freedom (DoF) of MM is analyzed, and a novel DoF-based two-stage weighted least squares (TSWLS) is proposed to reduce equations without DoF loss. The numerical results show that the performances of the proposed algorithms are close to their performance limits.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
A two-stage solution to quantum process tomography: error analysis and optimal design
Authors:
Shuixin Xiao,
Yuanlong Wang,
Jun Zhang,
Daoyi Dong,
Gary J. Mooney,
Ian R. Petersen,
Hidehiro Yonezawa
Abstract:
Quantum process tomography is a critical task for characterizing the dynamics of quantum systems and achieving precise quantum control. In this paper, we propose a two-stage solution for both trace-preserving and non-trace-preserving quantum process tomography. Utilizing a tensor structure, our algorithm exhibits a computational complexity of $O(MLd^2)$ where $d$ is the dimension of the quantum sy…
▽ More
Quantum process tomography is a critical task for characterizing the dynamics of quantum systems and achieving precise quantum control. In this paper, we propose a two-stage solution for both trace-preserving and non-trace-preserving quantum process tomography. Utilizing a tensor structure, our algorithm exhibits a computational complexity of $O(MLd^2)$ where $d$ is the dimension of the quantum system and $ M $, $ L $ represent the numbers of different input states and measurement operators, respectively. We establish an analytical error upper bound and then design the optimal input states and the optimal measurement operators, which are both based on minimizing the error upper bound and maximizing the robustness characterized by the condition number. Numerical examples and testing on IBM quantum devices are presented to demonstrate the performance and efficiency of our algorithm.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
An Efficient Implicit Neural Representation Image Codec Based on Mixed Autoregressive Model for Low-Complexity Decoding
Authors:
Xiang Liu,
Jiahong Chen,
Bin Chen,
Zimo Liu,
Baoyi An,
Shu-Tao Xia,
Zhi Wang
Abstract:
Displaying high-quality images on edge devices, such as augmented reality devices, is essential for enhancing the user experience. However, these devices often face power consumption and computing resource limitations, making it challenging to apply many deep learning-based image compression algorithms in this field. Implicit Neural Representation (INR) for image compression is an emerging technol…
▽ More
Displaying high-quality images on edge devices, such as augmented reality devices, is essential for enhancing the user experience. However, these devices often face power consumption and computing resource limitations, making it challenging to apply many deep learning-based image compression algorithms in this field. Implicit Neural Representation (INR) for image compression is an emerging technology that offers two key benefits compared to cutting-edge autoencoder models: low computational complexity and parameter-free decoding. It also outperforms many traditional and early neural compression methods in terms of quality. In this study, we introduce a new Mixed AutoRegressive Model (MARM) to significantly reduce the decoding time for the current INR codec, along with a new synthesis network to enhance reconstruction quality. MARM includes our proposed AutoRegressive Upsampler (ARU) blocks, which are highly computationally efficient, and ARM from previous work to balance decoding time and reconstruction quality. We also propose enhancing ARU's performance using a checkerboard two-stage decoding strategy. Moreover, the ratio of different modules can be adjusted to maintain a balance between quality and speed. Comprehensive experiments demonstrate that our method significantly improves computational efficiency while preserving image quality. With different parameter settings, our method can achieve over a magnitude acceleration in decoding time without industrial level optimization, or achieve state-of-the-art reconstruction quality compared with other INR codecs. To the best of our knowledge, our method is the first INR-based codec comparable with Hyperprior in both decoding speed and quality while maintaining low complexity.
△ Less
Submitted 7 June, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
Perceptual Image Compression with Cooperative Cross-Modal Side Information
Authors:
Shiyu Qin,
Bin Chen,
Yujun Huang,
Baoyi An,
Tao Dai,
Shu-Tao Xia
Abstract:
The explosion of data has resulted in more and more associated text being transmitted along with images. Inspired by from distributed source coding, many works utilize image side information to enhance image compression. However, existing methods generally do not consider using text as side information to enhance perceptual compression of images, even though the benefits of multimodal synergy have…
▽ More
The explosion of data has resulted in more and more associated text being transmitted along with images. Inspired by from distributed source coding, many works utilize image side information to enhance image compression. However, existing methods generally do not consider using text as side information to enhance perceptual compression of images, even though the benefits of multimodal synergy have been widely demonstrated in research. This begs the following question: How can we effectively transfer text-level semantic dependencies to help image compression, which is only available to the decoder? In this work, we propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff. Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features. This is done by predicting a semantic mask to guide the learned text-adaptive affine transformation at the pixel level. Furthermore, we design a text-conditional generative adversarial networks to improve the perceptual quality of reconstructed images. Extensive experiments involving four datasets and ten image quality assessment metrics demonstrate that the proposed approach achieves superior results in terms of rate-perception trade-off and semantic distortion.
△ Less
Submitted 28 November, 2023; v1 submitted 23 November, 2023;
originally announced November 2023.
-
Near-Field Wideband Secure Communications: An Analog Beamfocusing Approach
Authors:
Yuchen Zhang,
Haiyang Zhang,
Sa Xiao,
Wanbin Tang,
Yonina C. Eldar
Abstract:
In the rapidly advancing landscape of 6G, characterized by ultra-high-speed wideband transmission in millimeter-wave and terahertz bands, our paper addresses the pivotal task of enhancing physical layer security (PLS) within near-field wideband communications. We introduce true-time delayer (TTD)-incorporated analog beamfocusing techniques designed to address the interplay between near-field propa…
▽ More
In the rapidly advancing landscape of 6G, characterized by ultra-high-speed wideband transmission in millimeter-wave and terahertz bands, our paper addresses the pivotal task of enhancing physical layer security (PLS) within near-field wideband communications. We introduce true-time delayer (TTD)-incorporated analog beamfocusing techniques designed to address the interplay between near-field propagation and wideband beamsplit, an uncharted domain in existing literature. Our approach to maximizing secrecy rates involves formulating an optimization problem for joint power allocation and analog beamformer design, employing a two-stage process encompassing a semi-digital solution and analog approximation. This problem is efficiently solved through a combination of alternating optimization, fractional programming, and block successive upper-bound minimization techniques. Additionally, we present a low-complexity beamsplit-aware beamfocusing strategy, capitalizing on geometric insights from near-field wideband propagation, which can also serve as a robust initial value for the optimization-based approach. Numerical results substantiate the efficacy of the proposed methods, clearly demonstrating their superiority over TTD-free approaches in fortifying wideband PLS, as well as the advantageous secrecy energy efficiency achieved by leveraging low-cost analog devices.
△ Less
Submitted 28 November, 2023; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Two-stage solution for ancilla-assisted quantum process tomography: error analysis and optimal design
Authors:
Shuixin Xiao,
Yuanlong Wang,
Daoyi Dong,
Jun Zhang
Abstract:
Quantum process tomography (QPT) is a fundamental task to characterize the dynamics of quantum systems. In contrast to standard QPT, ancilla-assisted process tomography (AAPT) framework introduces an extra ancilla system such that a single input state is needed. In this paper, we extend the two-stage solution, a method originally designed for standard QPT, to perform AAPT. Our algorithm has…
▽ More
Quantum process tomography (QPT) is a fundamental task to characterize the dynamics of quantum systems. In contrast to standard QPT, ancilla-assisted process tomography (AAPT) framework introduces an extra ancilla system such that a single input state is needed. In this paper, we extend the two-stage solution, a method originally designed for standard QPT, to perform AAPT. Our algorithm has $O(Md_A^2d_B^2)$ computational complexity where $ M $ is the type number of the measurement operators, $ d_A $ is the dimension of the quantum system of interest, and $d_B$ is the dimension of the ancilla system. Then we establish an error upper bound and further discuss the optimal design on the input state in AAPT. A numerical example on a phase dam** process demonstrates the effectiveness of the optimal design and illustrates the theoretical error analysis.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Cooperative Dispatch of Microgrids Community Using Risk-Sensitive Reinforcement Learning with Monotonously Improved Performance
Authors:
Ziqing Zhu,
Xiang Gao,
Siqi Bu,
Ka Wing Chan,
Bin Zhou,
Shiwei Xia
Abstract:
The integration of individual microgrids (MGs) into Microgrid Clusters (MGCs) significantly improves the reliability and flexibility of energy supply, through resource sharing and ensuring backup during outages. The dispatch of MGCs is the key challenge to be tackled to ensure their secure and economic operation. Currently, there is a lack of optimization method that can achieve a trade-off among…
▽ More
The integration of individual microgrids (MGs) into Microgrid Clusters (MGCs) significantly improves the reliability and flexibility of energy supply, through resource sharing and ensuring backup during outages. The dispatch of MGCs is the key challenge to be tackled to ensure their secure and economic operation. Currently, there is a lack of optimization method that can achieve a trade-off among top-priority requirements of MGCs' dispatch, including fast computation speed, optimality, multiple objectives, and risk mitigation against uncertainty. In this paper, a novel Multi-Objective, Risk-Sensitive, and Online Trust Region Policy Optimization (RS-TRPO) Algorithm is proposed to tackle this problem. First, a dispatch paradigm for autonomous MGs in the MGC is proposed, enabling them sequentially implement their self-dispatch to mitigate potential conflicts. This dispatch paradigm is then formulated as a Markov Game model, which is finally solved by the RS-TRPO algorithm. This online algorithm enables MGs to spontaneously search for the Pareto Frontier considering multiple objectives and risk mitigation. The outstanding computational performance of this algorithm is demonstrated in comparison with mathematical programming methods and heuristic algorithms in a modified IEEE 30-Bus Test System integrated with four autonomous MGs.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Energy-efficient Integrated Sensing and Communication System and DNLFM Waveform
Authors:
Yihua Ma,
Zhifeng Yuan,
Shuqiang Xia,
Chen Bai,
Zhongbin Wang,
Yuxin Wang
Abstract:
Integrated sensing and communication (ISAC) is a key enabler of 6G. Unlike communication radio links, the sensing signal requires to experience round trips from many scatters. Therefore, sensing is more power-sensitive and faces a severer multi-target interference. In this paper, the ISAC system employs dedicated sensing signals, which can be reused as the communication reference signal. This pape…
▽ More
Integrated sensing and communication (ISAC) is a key enabler of 6G. Unlike communication radio links, the sensing signal requires to experience round trips from many scatters. Therefore, sensing is more power-sensitive and faces a severer multi-target interference. In this paper, the ISAC system employs dedicated sensing signals, which can be reused as the communication reference signal. This paper proposes to add time-frequency matched windows at both the transmitting and receiving sides, which avoids mismatch loss and increases energy efficiency. Discrete non-linear frequency modulation (DNLFM) is further proposed to achieve both time-domain constant modulus and frequency-domain arbitrary windowing weights. DNLFM uses very few Newton iterations and a simple geometrically-equivalent method to generate, which greatly reduces the complex numerical integral in the conventional method. Moreover, the spatial-domain matched window is proposed to achieve low sidelobes. The simulation results show that the proposed methods gain a higher energy efficiency than conventional methods.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Joint Beam Management and SLAM for mmWave Communication Systems
Authors:
Hang Que,
Jie Yang,
Chao-Kai Wen,
Shuqiang Xia,
Xiao Li,
Shi **
Abstract:
The millimeter-wave (mmWave) communication technology, which employs large-scale antenna arrays, enables inherent sensing capabilities. Simultaneous localization and map** (SLAM) can utilize channel multipath angle estimates to realize integrated sensing and communication design in 6G communication systems. However, existing works have ignored the significant overhead required by the mmWave beam…
▽ More
The millimeter-wave (mmWave) communication technology, which employs large-scale antenna arrays, enables inherent sensing capabilities. Simultaneous localization and map** (SLAM) can utilize channel multipath angle estimates to realize integrated sensing and communication design in 6G communication systems. However, existing works have ignored the significant overhead required by the mmWave beam management when implementing SLAM with angle estimates. This study proposes a joint beam management and SLAM design that utilizes the strong coupling between the radio map and channel multipath for simultaneous beam management, localization, and map**. In this approach, we first propose a hierarchical swee** and sensing service design. The path angles are estimated in the hierarchical swee**, enabling angle-based SLAM with the aid of an inertial measurement unit (IMU) to realize sensing service. Then, feature-aided tracking is proposed that utilizes prior angle information generated from the radio map and IMU. Finally, a switching module is introduced to enable flexible switching between hierarchical swee** and feature-aided tracking. Simulations show that the proposed joint design can achieve sub-meter level localization and map** accuracy (with an error < 0.5 m). Moreover, the beam management overhead can be reduced by approximately 40% in different wireless environments.
△ Less
Submitted 15 July, 2023;
originally announced July 2023.
-
Encoding Enhanced Complex CNN for Accurate and Highly Accelerated MRI
Authors:
Zimeng Li,
Sa Xiao,
Cheng Wang,
Haidong Li,
Xiuchao Zhao,
Caohui Duan,
Qian Zhou,
Qiuchen Rao,
Yuan Fang,
Junshuai Xie,
Lei Shi,
Fumin Guo,
Chaohui Ye,
Xin Zhou
Abstract:
Magnetic resonance imaging (MRI) using hyperpolarized noble gases provides a way to visualize the structure and function of human lung, but the long imaging time limits its broad research and clinical applications. Deep learning has demonstrated great potential for accelerating MRI by reconstructing images from undersampled data. However, most existing deep conventional neural networks (CNN) direc…
▽ More
Magnetic resonance imaging (MRI) using hyperpolarized noble gases provides a way to visualize the structure and function of human lung, but the long imaging time limits its broad research and clinical applications. Deep learning has demonstrated great potential for accelerating MRI by reconstructing images from undersampled data. However, most existing deep conventional neural networks (CNN) directly apply square convolution to k-space data without considering the inherent properties of k-space sampling, limiting k-space learning efficiency and image reconstruction quality. In this work, we propose an encoding enhanced (EN2) complex CNN for highly undersampled pulmonary MRI reconstruction. EN2 employs convolution along either the frequency or phase-encoding direction, resembling the mechanisms of k-space sampling, to maximize the utilization of the encoding correlation and integrity within a row or column of k-space. We also employ complex convolution to learn rich representations from the complex k-space data. In addition, we develop a feature-strengthened modularized unit to further boost the reconstruction performance. Experiments demonstrate that our approach can accurately reconstruct hyperpolarized 129Xe and 1H lung MRI from 6-fold undersampled k-space data and provide lung function measurements with minimal biases compared with fully-sampled image. These results demonstrate the effectiveness of the proposed algorithmic components and indicate that the proposed approach could be used for accelerated pulmonary MRI in research and clinical lung disease patient care.
△ Less
Submitted 13 November, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Joint Localization and Environment Sensing by Harnessing NLOS Components in RIS-aided mmWave Communication Systems
Authors:
Yixuan Huang,
Jie Yang,
Wankai Tang,
Chao-Kai Wen,
Shuqiang Xia,
Shi **
Abstract:
This study explores the use of non-line-of-sight (NLOS) components in millimeter-wave (mmWave) communication systems for joint localization and environment sensing. The radar cross section (RCS) of a reconfigurable intelligent surface (RIS) is calculated to develop a general path gain model for RISs and traditional scatterers. The results show that RISs have a greater potential to assist in locali…
▽ More
This study explores the use of non-line-of-sight (NLOS) components in millimeter-wave (mmWave) communication systems for joint localization and environment sensing. The radar cross section (RCS) of a reconfigurable intelligent surface (RIS) is calculated to develop a general path gain model for RISs and traditional scatterers. The results show that RISs have a greater potential to assist in localization due to their ability to maintain high RCSs and create strong NLOS links. A one-stage linear weighted least squares estimator is proposed to simultaneously determine user equipment (UE) locations, velocities, and scatterer (or RIS) locations using line-of-sight (LOS) and NLOS paths. The estimator supports environment sensing and UE localization even using only NLOS paths. A second-stage estimator is also introduced to improve environment sensing accuracy by considering the nonlinear relationship between UE and scatterer locations. Simulation results demonstrate the effectiveness of the proposed estimators in rich scattering environments and the benefits of using NLOS paths for improving UE location accuracy and assisting in environment sensing. The effects of RIS number, size, and deployment on localization performance are also analyzed.
△ Less
Submitted 20 May, 2023;
originally announced May 2023.
-
Vertical Federated Learning over Cloud-RAN: Convergence Analysis and System Optimization
Authors:
Yuanming Shi,
Shuhao Xia,
Yong Zhou,
Yijie Mao,
Chunxiao Jiang,
Meixia Tao
Abstract:
Vertical federated learning (FL) is a collaborative machine learning framework that enables devices to learn a global model from the feature-partition datasets without sharing local raw data. However, as the number of the local intermediate outputs is proportional to the training samples, it is critical to develop communication-efficient techniques for wireless vertical FL to support high-dimensio…
▽ More
Vertical federated learning (FL) is a collaborative machine learning framework that enables devices to learn a global model from the feature-partition datasets without sharing local raw data. However, as the number of the local intermediate outputs is proportional to the training samples, it is critical to develop communication-efficient techniques for wireless vertical FL to support high-dimensional model aggregation with full device participation. In this paper, we propose a novel cloud radio access network (Cloud-RAN) based vertical FL system to enable fast and accurate model aggregation by leveraging over-the-air computation (AirComp) and alleviating communication straggler issue with cooperative model aggregation among geographically distributed edge servers. However, the model aggregation error caused by AirComp and quantization errors caused by the limited fronthaul capacity degrade the learning performance for vertical FL. To address these issues, we characterize the convergence behavior of the vertical FL algorithm considering both uplink and downlink transmissions. To improve the learning performance, we establish a system optimization framework by joint transceiver and fronthaul quantization design, for which successive convex approximation and alternate convex search based system optimization algorithms are developed. We conduct extensive simulations to demonstrate the effectiveness of the proposed system architecture and optimization framework for vertical FL.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Learning Dynamic Point Cloud Compression via Hierarchical Inter-frame Block Matching
Authors:
Shuting Xia,
Tingyu Fan,
Yiling Xu,
Jenq-Neng Hwang,
Zhu Li
Abstract:
3D dynamic point cloud (DPC) compression relies on mining its temporal context, which faces significant challenges due to DPC's sparsity and non-uniform structure. Existing methods are limited in capturing sufficient temporal dependencies. Therefore, this paper proposes a learning-based DPC compression framework via hierarchical block-matching-based inter-prediction module to compensate and compre…
▽ More
3D dynamic point cloud (DPC) compression relies on mining its temporal context, which faces significant challenges due to DPC's sparsity and non-uniform structure. Existing methods are limited in capturing sufficient temporal dependencies. Therefore, this paper proposes a learning-based DPC compression framework via hierarchical block-matching-based inter-prediction module to compensate and compress the DPC geometry in latent space. Specifically, we propose a hierarchical motion estimation and motion compensation (Hie-ME/MC) framework for flexible inter-prediction, which dynamically selects the granularity of optical flow to encapsulate the motion information accurately. To improve the motion estimation efficiency of the proposed inter-prediction module, we further design a KNN-attention block matching (KABM) network that determines the impact of potential corresponding points based on the geometry and feature correlation. Finally, we compress the residual and the multi-scale optical flow with a fully-factorized deep entropy model. The experiment result on the MPEG-specified Owlii Dynamic Human Dynamic Point Cloud (Owlii) dataset shows that our framework outperforms the previous state-of-the-art methods and the MPEG standard V-PCC v18 in inter-frame low-delay mode.
△ Less
Submitted 16 May, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
How to Use Reinforcement Learning to Facilitate Future Electricity Market Design? Part 1: A Paradigmatic Theory
Authors:
Ziqing Zhu,
Siqi Bu,
Ka Wing Chan,
Bin Zhou,
Shiwei Xia
Abstract:
In face of the pressing need of decarbonization in the power sector, the re-design of electricity market is necessary as a Marco-level approach to accommodate the high penetration of renewable generations, and to achieve power system operation security, economic efficiency, and environmental friendliness. However, existing market design methodologies suffer from the lack of coordination among ener…
▽ More
In face of the pressing need of decarbonization in the power sector, the re-design of electricity market is necessary as a Marco-level approach to accommodate the high penetration of renewable generations, and to achieve power system operation security, economic efficiency, and environmental friendliness. However, existing market design methodologies suffer from the lack of coordination among energy spot market (ESM), ancillary service market (ASM) and financial market (FM), i.e., the "joint market", and the lack of reliable simulation-based verification. To tackle these deficiencies, this two-part paper develops a paradigmatic theory and detailed methods of the joint market design using reinforcement-learning (RL)-based simulation. In Part 1, the theory and framework of this novel market design philosophy are proposed. First, the controversial market design options while designing the joint market are summarized as the targeted research questions. Second, the Markov game model is developed to describe the bidding game in the joint market, incorporating the market design options to be determined. Third, a framework of deploying multiple types of RL algorithms to simulate the market model is developed. Finally, several market operation performance indicators are proposed to validate the market design based on the simulation results.
△ Less
Submitted 11 May, 2023; v1 submitted 3 May, 2023;
originally announced May 2023.
-
Model-free Motion Planning of Autonomous Agents for Complex Tasks in Partially Observable Environments
Authors:
Junchao Li,
Mingyu Cai,
Zhen Kan,
Abstract:
Motion planning of autonomous agents in partially known environments with incomplete information is a challenging problem, particularly for complex tasks. This paper proposes a model-free reinforcement learning approach to address this problem. We formulate motion planning as a probabilistic-labeled partially observable Markov decision process (PL-POMDP) problem and use linear temporal logic (LTL)…
▽ More
Motion planning of autonomous agents in partially known environments with incomplete information is a challenging problem, particularly for complex tasks. This paper proposes a model-free reinforcement learning approach to address this problem. We formulate motion planning as a probabilistic-labeled partially observable Markov decision process (PL-POMDP) problem and use linear temporal logic (LTL) to express the complex task. The LTL formula is then converted to a limit-deterministic generalized Büchi automaton (LDGBA). The problem is redefined as finding an optimal policy on the product of PL-POMDP with LDGBA based on model-checking techniques to satisfy the complex task. We implement deep Q learning with long short-term memory (LSTM) to process the observation history and task recognition. Our contributions include the proposed method, the utilization of LTL and LDGBA, and the LSTM-enhanced deep Q learning. We demonstrate the applicability of the proposed method by conducting simulations in various environments, including grid worlds, a virtual office, and a multi-agent warehouse. The simulation results demonstrate that our proposed method effectively addresses environment, action, and observation uncertainties. This indicates its potential for real-world applications, including the control of unmanned aerial vehicles (UAVs).
△ Less
Submitted 30 April, 2023;
originally announced May 2023.
-
Omni-Line-of-Sight Imaging for Holistic Shape Reconstruction
Authors:
Binbin Huang,
Xingyue Peng,
Siyuan Shen,
Suan Xia,
Ruiqian Li,
Yanhua Yu,
Yuehan Wang,
Shenghua Gao,
Wenzheng Chen,
Shiying Li,
**gyi Yu
Abstract:
We introduce Omni-LOS, a neural computational imaging method for conducting holistic shape reconstruction (HSR) of complex objects utilizing a Single-Photon Avalanche Diode (SPAD)-based time-of-flight sensor. As illustrated in Fig. 1, our method enables new capabilities to reconstruct near-$360^\circ$ surrounding geometry of an object from a single scan spot. In such a scenario, traditional line-o…
▽ More
We introduce Omni-LOS, a neural computational imaging method for conducting holistic shape reconstruction (HSR) of complex objects utilizing a Single-Photon Avalanche Diode (SPAD)-based time-of-flight sensor. As illustrated in Fig. 1, our method enables new capabilities to reconstruct near-$360^\circ$ surrounding geometry of an object from a single scan spot. In such a scenario, traditional line-of-sight (LOS) imaging methods only see the front part of the object and typically fail to recover the occluded back regions. Inspired by recent advances of non-line-of-sight (NLOS) imaging techniques which have demonstrated great power to reconstruct occluded objects, Omni-LOS marries LOS and NLOS together, leveraging their complementary advantages to jointly recover the holistic shape of the object from a single scan position. The core of our method is to put the object nearby diffuse walls and augment the LOS scan in the front view with the NLOS scans from the surrounding walls, which serve as virtual ``mirrors'' to trap lights toward the object. Instead of separately recovering the LOS and NLOS signals, we adopt an implicit neural network to represent the object, analogous to NeRF and NeTF. While transients are measured along straight rays in LOS but over the spherical wavefronts in NLOS, we derive differentiable ray propagation models to simultaneously model both types of transient measurements so that the NLOS reconstruction also takes into account the direct LOS measurements and vice versa. We further develop a proof-of-concept Omni-LOS hardware prototype for real-world validation. Comprehensive experiments on various wall settings demonstrate that Omni-LOS successfully resolves shape ambiguities caused by occlusions, achieves high-fidelity 3D scan quality, and manages to recover objects of various scales and complexity.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Multi-dimensional frequency dynamic convolution with confident mean teacher for sound event detection
Authors:
Shengchang Xiao,
Xueshuai Zhang,
Pengyuan Zhang
Abstract:
Recently, convolutional neural networks (CNNs) have been widely used in sound event detection (SED). However, traditional convolution is deficient in learning time-frequency domain representation of different sound events. To address this issue, we propose multi-dimensional frequency dynamic convolution (MFDConv), a new design that endows convolutional kernels with frequency-adaptive dynamic prope…
▽ More
Recently, convolutional neural networks (CNNs) have been widely used in sound event detection (SED). However, traditional convolution is deficient in learning time-frequency domain representation of different sound events. To address this issue, we propose multi-dimensional frequency dynamic convolution (MFDConv), a new design that endows convolutional kernels with frequency-adaptive dynamic properties along multiple dimensions. MFDConv utilizes a novel multi-dimensional attention mechanism with a parallel strategy to learn complementary frequency-adaptive attentions, which substantially strengthen the feature extraction ability of convolutional kernels. Moreover, in order to promote the performance of mean teacher, we propose the confident mean teacher to increase the accuracy of pseudo-labels from the teacher and train the student with high confidence labels. Experimental results show that the proposed methods achieve 0.470 and 0.692 of PSDS1 and PSDS2 on the DESED real validation dataset.
△ Less
Submitted 21 February, 2023; v1 submitted 18 February, 2023;
originally announced February 2023.
-
Two-stage Contextual Transformer-based Convolutional Neural Network for Airway Extraction from CT Images
Authors:
Yanan Wu,
Shuiqing Zhao,
Shouliang Qi,
Jie Feng,
Haowen Pang,
Runsheng Chang,
Long Bai,
Mengqi Li,
Shuyue Xia,
Wei Qian,
Hongliang Ren
Abstract:
Accurate airway extraction from computed tomography (CT) images is a critical step for planning navigation bronchoscopy and quantitative assessment of airway-related chronic obstructive pulmonary disease (COPD). The existing methods are challenging to sufficiently segment the airway, especially the high-generation airway, with the constraint of the limited label and cannot meet the clinical use in…
▽ More
Accurate airway extraction from computed tomography (CT) images is a critical step for planning navigation bronchoscopy and quantitative assessment of airway-related chronic obstructive pulmonary disease (COPD). The existing methods are challenging to sufficiently segment the airway, especially the high-generation airway, with the constraint of the limited label and cannot meet the clinical use in COPD. We propose a novel two-stage 3D contextual transformer-based U-Net for airway segmentation using CT images. The method consists of two stages, performing initial and refined airway segmentation. The two-stage model shares the same subnetwork with different airway masks as input. Contextual transformer block is performed both in the encoder and decoder path of the subnetwork to finish high-quality airway segmentation effectively. In the first stage, the total airway mask and CT images are provided to the subnetwork, and the intrapulmonary airway mask and corresponding CT scans to the subnetwork in the second stage. Then the predictions of the two-stage method are merged as the final prediction. Extensive experiments were performed on in-house and multiple public datasets. Quantitative and qualitative analysis demonstrate that our proposed method extracted much more branches and lengths of the tree while accomplishing state-of-the-art airway segmentation performance. The code is available at https://github.com/zhaozsq/airway_segmentation.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
Dimensionality Reduced Antenna Array for Beamforming/steering
Authors:
Shiyi Xia,
Mingyang Zhao,
Qian Ma,
Xunnan Zhang,
Ling Yang,
Yazhi Pi,
Hyunchul Chung,
Ad Reniers,
A. M. J. Koonen,
Zizheng Cao
Abstract:
Beamforming makes possible a focused communication method. It is extensively employed in many disciplines involving electromagnetic waves, including arrayed ultrasonic, optical, and high-speed wireless communication. Conventional beam steering often requires the addition of separate active amplitude phase control units after each radiating element. The high power consumption and complexity of larg…
▽ More
Beamforming makes possible a focused communication method. It is extensively employed in many disciplines involving electromagnetic waves, including arrayed ultrasonic, optical, and high-speed wireless communication. Conventional beam steering often requires the addition of separate active amplitude phase control units after each radiating element. The high power consumption and complexity of large-scale phased arrays can be overcome by reducing the number of active controllers, pushing beamforming into satellite communications and deep space exploration. Here, we suggest a brand-new design for a phased array antenna with a dimension reduced cascaded angle offset (DRCAO-PAA). Furthermore, the suggested DRCAO-PAA was compressed by using the concept of singular value deposition. To pave the way for practical application the particle swarm optimization algorithm and deep neural network Transformer were adopted. Based on this theoretical framework, an experimental board was built to verify the theory. Finally, the 16/8/4 -array beam steering was demonstrated by using 4/3/2 active controllers, respectively.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting
Authors:
Jie Wang,
Yuji Liu,
Binling Wang,
Yiming Zhi,
Song Li,
Shipeng Xia,
Jiayang Zhang,
Feng Tong,
Lin Li,
Qingyang Hong
Abstract:
This paper describes a spatial-aware speaker diarization system for the multi-channel multi-party meeting. The diarization system obtains direction information of speaker by microphone array. Speaker spatial embedding is generated by xvector and s-vector derived from superdirective beamforming (SDB) which makes the embedding more robust. Specifically, we propose a novel multi-channel sequence-to-s…
▽ More
This paper describes a spatial-aware speaker diarization system for the multi-channel multi-party meeting. The diarization system obtains direction information of speaker by microphone array. Speaker spatial embedding is generated by xvector and s-vector derived from superdirective beamforming (SDB) which makes the embedding more robust. Specifically, we propose a novel multi-channel sequence-to-sequence neural network architecture named discriminative multi-stream neural network (DMSNet) which consists of attention superdirective beamforming (ASDB) block and Conformer encoder. The proposed ASDB is a self-adapted channel-wise block that extracts the latent spatial features of array audios by modeling interdependencies between channels. We explore DMSNet to address overlapped speech problem on multi-channel audio and achieve 93.53% accuracy on evaluation set. By performing DMSNet based overlapped speech detection (OSD) module, the diarization error rate (DER) of cluster-based diarization system decrease significantly from 13.45% to 7.64%.
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
TD-BPQBC: A 1.8μW 5.5mm3 ADC-less Neural Implant SoC utilizing 13.2pJ/Sample Time-domain Bi-phasic Quasi-static Brain Communication
Authors:
Baibhab Chatterjee,
K Gaurav Kumar,
Shulan Xiao,
Gourab Barik,
Krishna Jayant,
Shreyas Sen
Abstract:
Untethered miniaturized wireless neural sensor nodes with data transmission and energy harvesting capabilities call for circuit and system-level innovations to enable ultra-low energy deep implants for brain-machine interfaces. Realizing that the energy and size constraints of a neural implant motivate highly asymmetric system design (a small, low-power sensor and transmitter at the implant, with…
▽ More
Untethered miniaturized wireless neural sensor nodes with data transmission and energy harvesting capabilities call for circuit and system-level innovations to enable ultra-low energy deep implants for brain-machine interfaces. Realizing that the energy and size constraints of a neural implant motivate highly asymmetric system design (a small, low-power sensor and transmitter at the implant, with a relatively higher power receiver at a body-worn hub), we present Time-Domain Bi-Phasic Quasi-static Brain Communication (TD- BPQBC), offloading the burden of analog to digital conversion (ADC) and digital signal processing (DSP) to the receiver. The input analog signal is converted to time-domain pulse-width modulated (PWM) waveforms, and transmitted using the recently developed BPQBC method for reducing communication power in implants. The overall SoC consumes only 1.8μW power while sensing and communicating at 800kSps. The transmitter energy efficiency is only 1.1pJ/b, which is >30X better than the state-of-the-art, enabling a fully-electrical, energy-harvested, and connected in-brain sensor/stimulator node.
△ Less
Submitted 19 October, 2022; v1 submitted 24 September, 2022;
originally announced September 2022.
-
Adaptive Local Implicit Image Function for Arbitrary-scale Super-resolution
Authors:
Hongwei Li,
Tao Dai,
Yiming Li,
Xueyi Zou,
Shu-Tao Xia
Abstract:
Image representation is critical for many visual tasks. Instead of representing images discretely with 2D arrays of pixels, a recent study, namely local implicit image function (LIIF), denotes images as a continuous function where pixel values are expansion by using the corresponding coordinates as inputs. Due to its continuous nature, LIIF can be adopted for arbitrary-scale image super-resolution…
▽ More
Image representation is critical for many visual tasks. Instead of representing images discretely with 2D arrays of pixels, a recent study, namely local implicit image function (LIIF), denotes images as a continuous function where pixel values are expansion by using the corresponding coordinates as inputs. Due to its continuous nature, LIIF can be adopted for arbitrary-scale image super-resolution tasks, resulting in a single effective and efficient model for various up-scaling factors. However, LIIF often suffers from structural distortions and ringing artifacts around edges, mostly because all pixels share the same model, thus ignoring the local properties of the image. In this paper, we propose a novel adaptive local image function (A-LIIF) to alleviate this problem. Specifically, our A-LIIF consists of two main components: an encoder and a expansion network. The former captures cross-scale image features, while the latter models the continuous up-scaling function by a weighted combination of multiple local implicit image functions. Accordingly, our A-LIIF can reconstruct the high-frequency textures and structures more accurately. Experiments on multiple benchmark datasets verify the effectiveness of our method. Our codes are available at \url{https://github.com/LeeHW-THU/A-LIIF}.
△ Less
Submitted 7 August, 2022;
originally announced August 2022.
-
Highly Efficient Waveform Design and Hybrid Duplex for Joint Communication and Sensing
Authors:
Yihua Ma,
Zhifeng Yuan,
Shuqiang Xia,
Guanghui Yu,
Liujun Hu
Abstract:
Joint communication and sensing (JCAS) is a very promising 6G technology, which attracts more and more research attention. Compared with communication, radar has many unique features in terms of waveform design criteria, self-interference cancellation (SIC), aperture-dependent resolution, and virtual aperture. This paper proposes a novel waveform design named max-aperture radar slicing (MaRS) to g…
▽ More
Joint communication and sensing (JCAS) is a very promising 6G technology, which attracts more and more research attention. Compared with communication, radar has many unique features in terms of waveform design criteria, self-interference cancellation (SIC), aperture-dependent resolution, and virtual aperture. This paper proposes a novel waveform design named max-aperture radar slicing (MaRS) to gain a large time-frequency aperture, which is generated by orthogonal frequency division multiplexing (OFDM) and occupies only a tiny fraction of OFDM resources. The proposed MaRS keeps the radar advantages of constant modulus, zero auto-correlation sequence, and simple SIC. As MaRS consumes much less resources, conventional processing methods fail, and novel angle-Doppler map based methods are proposed to obtain the range-velocity-angle information from MaRS echos and strong clutters. To avoid complex full-duplex communication, this paper proposes a hybrid-duplex JCAS scheme composed of half-duplex communication and full-duplex radar. The half-duplex communication antenna array is reused, and a small sensing-dedicated antenna array is added. Using these two arrays, a large space-domain sensing aperture is virtually formed to greatly improve the angle resolution. The numerical results show that the proposed MaRS and hybrid duplex can achieve a high sensing resolution with only 0.4% OFDM resources, which reduces the overheads of conventional methods to less than one tenth.
△ Less
Submitted 4 July, 2023; v1 submitted 7 July, 2022;
originally announced July 2022.
-
Attention-Guided Autoencoder for Automated Progression Prediction of Subjective Cognitive Decline with Structural MRI
Authors:
Hao Guan,
Ling Yue,
Pew-Thian Yap,
Shifu Xiao,
Andrea Bozoki,
Mingxia Liu
Abstract:
Subjective cognitive decline (SCD) is a preclinical stage of Alzheimer's disease (AD) which occurs even before mild cognitive impairment (MCI). Progressive SCD will convert to MCI with the potential of further evolving to AD. Therefore, early identification of progressive SCD with neuroimaging techniques (e.g., structural MRI) is of great clinical value for early intervention of AD. However, exist…
▽ More
Subjective cognitive decline (SCD) is a preclinical stage of Alzheimer's disease (AD) which occurs even before mild cognitive impairment (MCI). Progressive SCD will convert to MCI with the potential of further evolving to AD. Therefore, early identification of progressive SCD with neuroimaging techniques (e.g., structural MRI) is of great clinical value for early intervention of AD. However, existing MRI-based machine/deep learning methods usually suffer the small-sample-size problem which poses a great challenge to related neuroimaging analysis. The central question we aim to tackle in this paper is how to leverage related domains (e.g., AD/NC) to assist the progression prediction of SCD. Meanwhile, we are concerned about which brain areas are more closely linked to the identification of progressive SCD. To this end, we propose an attention-guided autoencoder model for efficient cross-domain adaptation which facilitates the knowledge transfer from AD to SCD. The proposed model is composed of four key components: 1) a feature encoding module for learning shared subspace representations of different domains, 2) an attention module for automatically locating discriminative brain regions of interest defined in brain atlases, 3) a decoding module for reconstructing the original input, 4) a classification module for identification of brain diseases. Through joint training of these four modules, domain invariant features can be learned. Meanwhile, the brain disease related regions can be highlighted by the attention mechanism. Extensive experiments on the publicly available ADNI dataset and a private CLAS dataset have demonstrated the effectiveness of the proposed method. The proposed model is straightforward to train and test with only 5-10 seconds on CPUs and is suitable for medical tasks with small datasets.
△ Less
Submitted 16 February, 2023; v1 submitted 24 June, 2022;
originally announced June 2022.
-
SJ-HD^2R: Selective Joint High Dynamic Range and Denoising Imaging for Dynamic Scenes
Authors:
Wei Li,
Shuai Xiao,
Tianhong Dai,
Shanxin Yuan,
Tao Wang,
Cheng Li,
Fenglong Song
Abstract:
Ghosting artifacts, motion blur, and low fidelity in highlight are the main challenges in High Dynamic Range (HDR) imaging from multiple Low Dynamic Range (LDR) images. These issues come from using the medium-exposed image as the reference frame in previous methods. To deal with them, we propose to use the under-exposed image as the reference to avoid these issues. However, the heavy noise in dark…
▽ More
Ghosting artifacts, motion blur, and low fidelity in highlight are the main challenges in High Dynamic Range (HDR) imaging from multiple Low Dynamic Range (LDR) images. These issues come from using the medium-exposed image as the reference frame in previous methods. To deal with them, we propose to use the under-exposed image as the reference to avoid these issues. However, the heavy noise in dark regions of the under-exposed image becomes a new problem. Therefore, we propose a joint HDR and denoising pipeline, containing two sub-networks: (i) a pre-denoising network (PreDNNet) to adaptively denoise input LDRs by exploiting exposure priors; (ii) a pyramid cascading fusion network (PCFNet), introducing an attention mechanism and cascading structure in a multi-scale manner. To further leverage these two paradigms, we propose a selective and joint HDR and denoising (SJ-HD$^2$R) imaging framework, utilizing scenario-specific priors to conduct the path selection with an accuracy of more than 93.3$\%$. We create the first joint HDR and denoising benchmark dataset, which contains a variety of challenging HDR and denoising scenes and supports the switching of the reference image. Extensive experiment results show that our method achieves superior performance to previous methods.
△ Less
Submitted 3 November, 2022; v1 submitted 20 June, 2022;
originally announced June 2022.
-
PILC: Practical Image Lossless Compression with an End-to-end GPU Oriented Neural Framework
Authors:
Ning Kang,
Shanzhao Qiu,
Shifeng Zhang,
Zhenguo Li,
Shutao Xia
Abstract:
Generative model based image lossless compression algorithms have seen a great success in improving compression ratio. However, the throughput for most of them is less than 1 MB/s even with the most advanced AI accelerated chips, preventing them from most real-world applications, which often require 100 MB/s. In this paper, we propose PILC, an end-to-end image lossless compression framework that a…
▽ More
Generative model based image lossless compression algorithms have seen a great success in improving compression ratio. However, the throughput for most of them is less than 1 MB/s even with the most advanced AI accelerated chips, preventing them from most real-world applications, which often require 100 MB/s. In this paper, we propose PILC, an end-to-end image lossless compression framework that achieves 200 MB/s for both compression and decompression with a single NVIDIA Tesla V100 GPU, 10 times faster than the most efficient one before. To obtain this result, we first develop an AI codec that combines auto-regressive model and VQ-VAE which performs well in lightweight setting, then we design a low complexity entropy coder that works well with our codec. Experiments show that our framework compresses better than PNG by a margin of 30% in multiple datasets. We believe this is an important step to bring AI compression forward to commercial use.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Bi-Phasic Quasistatic Brain Communication for Fully Untethered Connected Brain Implants
Authors:
Baibhab Chatterjee,
Mayukh Nath,
Gaurav Kumar K,
Shulan Xiao,
Krishna Jayant,
Shreyas Sen
Abstract:
Wireless communication using electro-magnetic (EM) fields acts as the backbone for information exchange among wearable devices around the human body. However, for Implanted devices, EM fields incur high amount of absorption in the tissue, while alternative modes of transmission including ultrasound, optical and magneto-electric methods result in large amount of transduction losses due to conversio…
▽ More
Wireless communication using electro-magnetic (EM) fields acts as the backbone for information exchange among wearable devices around the human body. However, for Implanted devices, EM fields incur high amount of absorption in the tissue, while alternative modes of transmission including ultrasound, optical and magneto-electric methods result in large amount of transduction losses due to conversion of one form of energy to another, thereby increasing the overall end-to-end energy loss. To solve the challenge of powering and communication in a brain implant with low end-end channel loss, we present Bi-Phasic Quasistatic Brain Communication (BP-QBC), achieving < 60dB worst-case end-to-end channel loss at a channel length of 55mm, by avoiding the transduction losses during field-modality conversion. BP-QBC utilizes dipole coupling based signal transmission within the brain tissue using differential excitation in the transmitter and differential signal pick-up at the receiver, and offers 41X lower power w.r.t. traditional Galvanic Human Body Communication at a carrier frequency of 1MHz, by blocking any DC current paths through the brain tissue. Since the electrical signal transfer through the human tissue is electro-quasistatic up to several 10's of MHz range, BP-QBC allows a scalable (bps-10Mbps) duty-cycled uplink from the implant to an external wearable. The power consumption in the BP-QBC TX is only 0.52uW at 1Mbps (with 1% duty cycling), which is within the range of harvested body-coupled power in the downlink from an external wearable to the brain implant. Furthermore, BP-QBC eliminates the need for sub-cranial repeaters, as it utilizes quasi-static electrical signals, thereby avoiding any transduction losses. Such low end-to-end channel loss with high data rates would find applications in neuroscience, brain-machine interfaces, electroceuticals and connected healthcare.
△ Less
Submitted 4 July, 2023; v1 submitted 18 May, 2022;
originally announced May 2022.
-
Applications of Reinforcement Learning in Deregulated Power Market: A Comprehensive Review
Authors:
Ziqing Zhu,
Ze Hu,
Ka Wing Chan,
Siqi Bu,
Bin Zhou,
Shiwei Xia
Abstract:
The increasing penetration of renewable generations, along with the deregulation and marketization of power industry, promotes the transformation of power market operation paradigms. The optimal bidding strategy and dispatching methodology under these new paradigms are prioritized concerns for both market participants and power system operators, with obstacles of uncertain characteristics, computa…
▽ More
The increasing penetration of renewable generations, along with the deregulation and marketization of power industry, promotes the transformation of power market operation paradigms. The optimal bidding strategy and dispatching methodology under these new paradigms are prioritized concerns for both market participants and power system operators, with obstacles of uncertain characteristics, computational efficiency, as well as requirements of hyperopic decision-making. To tackle these problems, the Reinforcement Learning (RL), as an emerging machine learning technique with advantages compared with conventional optimization tools, is playing an increasingly significant role in both academia and industry. This paper presents a comprehensive review of RL applications in deregulated power market operation including bidding and dispatching strategy optimization, based on more than 150 carefully selected literatures. For each application, apart from a paradigmatic summary of generalized methodology, in-depth discussions of applicability and obstacles while deploying RL techniques are also provided. Finally, some RL techniques that have great potentiality to be deployed in bidding and dispatching problems are recommended and discussed.
△ Less
Submitted 11 May, 2023; v1 submitted 7 May, 2022;
originally announced May 2022.
-
The xmuspeech system for multi-channel multi-party meeting transcription challenge
Authors:
Jie Wang,
Yuji Liu,
Binling Wang,
Yiming Zhi,
Song Li1,
Shipeng Xia,
Jiayang Zhang,
Lin Li1,
Qingyang Hong,
Feng Tong
Abstract:
This paper describes the system developed by the XMUSPEECH team for the Multi-channel Multi-party Meeting Transcription Challenge (M2MeT). For the speaker diarization task, we propose a multi-channel speaker diarization system that obtains spatial information of speaker by Difference of Arrival (DOA) technology. Speaker-spatial embedding is generated by x-vector and s-vector derived from Filter-an…
▽ More
This paper describes the system developed by the XMUSPEECH team for the Multi-channel Multi-party Meeting Transcription Challenge (M2MeT). For the speaker diarization task, we propose a multi-channel speaker diarization system that obtains spatial information of speaker by Difference of Arrival (DOA) technology. Speaker-spatial embedding is generated by x-vector and s-vector derived from Filter-and-Sum Beamforming (FSB) which makes the embedding more robust. Specifically, we propose a novel multi-channel sequence-to-sequence neural network architecture named Discriminative Multi-stream Neural Network (DMSNet) which consists of Attention Filter-and-Sum block (AFSB) and Conformer encoder. We explore DMSNet to address overlapped speech problem on multi-channel audio. Compared with LSTM based OSD module, we achieve a decreases of 10.1% in Detection Error Rate(DetER). By performing DMSNet based OSD module, the DER of cluster-based diarization system decrease significantly form 13.44% to 7.63%. Our best fusion system achieves 7.09% and 9.80% of the diarization error rate (DER) on evaluation set and test set.
△ Less
Submitted 11 February, 2022;
originally announced February 2022.
-
Fast and accurate waveform modeling of long-haul multi-channel optical fiber transmission using a hybrid model-data driven scheme
Authors:
Hang Yang,
Zekun Niu,
Haochen Zhao,
Shilin Xiao,
Weisheng Hu,
Lilin Yi
Abstract:
The modeling of optical wave propagation in optical fiber is a task of fast and accurate solving the nonlinear Schrödinger equation (NLSE), and can enable the optical system design, digital signal processing verification and fast waveform calculation. Traditional waveform modeling of full-time and full-frequency information is the split-step Fourier method (SSFM), which has long been regarded as c…
▽ More
The modeling of optical wave propagation in optical fiber is a task of fast and accurate solving the nonlinear Schrödinger equation (NLSE), and can enable the optical system design, digital signal processing verification and fast waveform calculation. Traditional waveform modeling of full-time and full-frequency information is the split-step Fourier method (SSFM), which has long been regarded as challenging in long-haul wavelength division multiplexing (WDM) optical fiber communication systems because it is extremely time-consuming. Here we propose a linear-nonlinear feature decoupling distributed (FDD) waveform modeling scheme to model long-haul WDM fiber channel, where the channel linear effects are modelled by the NLSE-derived model-driven methods and the nonlinear effects are modelled by the data-driven deep learning methods. Meanwhile, the proposed scheme only focuses on one-span fiber distance fitting, and then recursively transmits the model to achieve the required transmission distance. The proposed modeling scheme is demonstrated to have high accuracy, high computing speeds, and robust generalization abilities for different optical launch powers, modulation formats, channel numbers and transmission distances. The total running time of FDD waveform modeling scheme for 41-channel 1040-km fiber transmission is only 3 minutes versus more than 2 hours using SSFM for each input condition, which achieves a 98% reduction in computing time. Considering the multi-round optimization by adjusting system parameters, the complexity reduction is significant. The results represent a remarkable improvement in nonlinear fiber modeling and open up novel perspectives for solution of NLSE-like partial differential equations and optical fiber physics problems.
△ Less
Submitted 16 May, 2022; v1 submitted 12 January, 2022;
originally announced January 2022.
-
Waveform Design Using Half-duplex Devices for 6G Joint Communications and Sensing
Authors:
Yihua Ma,
Zhifeng Yuan,
Guanghui Yu,
Shuqiang Xia,
Liujun Hu
Abstract:
Joint communications and sensing is a promising 6G technology, and the challenge is how to integrate them efficiently. Existing frequency-division and time-division coexistence can hardly bring a gain of integration. Directly using orthogonal frequency-division multiplexing (OFDM) to sense requires complex in-band full-duplex to cancel the selfinterference (SI). To solve these problems, this paper…
▽ More
Joint communications and sensing is a promising 6G technology, and the challenge is how to integrate them efficiently. Existing frequency-division and time-division coexistence can hardly bring a gain of integration. Directly using orthogonal frequency-division multiplexing (OFDM) to sense requires complex in-band full-duplex to cancel the selfinterference (SI). To solve these problems, this paper proposes novel coexistence schemes to gain super sensing range (SSR) and simple SI cancellation. SSR enables JCS to gain a sensing range of a sensing-only scheme and shares the resources with communications. Random time-division is proposed to gain a super Doppler range. Flexible sensing implanted OFDM (FSIOFDM) is also proposed. FSI-OFDM uses random sensing occasions to gain super Doppler range, as well as utilizes the fixed tail sensing occasions to achieve supper distance range. The simulation results show that the proposed schemes can gain SSR with limited resources.
△ Less
Submitted 3 January, 2022;
originally announced January 2022.
-
Graph Neural Networks for Communication Networks: Context, Use Cases and Opportunities
Authors:
José Suárez-Varela,
Paul Almasan,
Miquel Ferriol-Galmés,
Krzysztof Rusek,
Fabien Geyer,
Xiangle Cheng,
Xiang Shi,
Shihan Xiao,
Franco Scarselli,
Albert Cabellos-Aparicio,
Pere Barlet-Ros
Abstract:
Graph neural networks (GNN) have shown outstanding applications in many fields where data is fundamentally represented as graphs (e.g., chemistry, biology, recommendation systems). In this vein, communication networks comprise many fundamental components that are naturally represented in a graph-structured manner (e.g., topology, configurations, traffic flows). This position article presents GNNs…
▽ More
Graph neural networks (GNN) have shown outstanding applications in many fields where data is fundamentally represented as graphs (e.g., chemistry, biology, recommendation systems). In this vein, communication networks comprise many fundamental components that are naturally represented in a graph-structured manner (e.g., topology, configurations, traffic flows). This position article presents GNNs as a fundamental tool for modeling, control and management of communication networks. GNNs represent a new generation of data-driven models that can accurately learn and reproduce the complex behaviors behind real networks. As a result, such models can be applied to a wide variety of networking use cases, such as planning, online optimization, or troubleshooting. The main advantage of GNNs over traditional neural networks lies in its unprecedented generalization capabilities when applied to other networks and configurations unseen during training, which is a critical feature for achieving practical data-driven solutions for networking. This article comprises a brief tutorial on GNNs and their possible applications to communication networks. To showcase the potential of this technology, we present two use cases with state-of-the-art GNN models respectively applied to wired and wireless networks. Lastly, we delve into the key open challenges and opportunities yet to be explored in this novel research area.
△ Less
Submitted 27 July, 2022; v1 submitted 29 December, 2021;
originally announced December 2021.
-
Integrated Sensing and Communications for V2I Networks: Dynamic Predictive Beamforming for Extended Vehicle Targets
Authors:
Zhen Du,
Fan Liu,
Weijie Yuan,
Christos Masouros,
Zenghui Zhang,
Shuqiang Xia,
Giuseppe Caire
Abstract:
We investigate sensing-assisted predictive beamforming schemes for vehicle-to-infrastructure (V2I) communication by exploiting the integrated sensing and communication (ISAC) functionalities at the roadside unit (RSU). The RSU deploys a massive multi-input-multi-output (mMIMO) array and operates at millimeter wave (mmWave) frequencies. The pencil-sharp mMIMO beams and fine range resolution achieve…
▽ More
We investigate sensing-assisted predictive beamforming schemes for vehicle-to-infrastructure (V2I) communication by exploiting the integrated sensing and communication (ISAC) functionalities at the roadside unit (RSU). The RSU deploys a massive multi-input-multi-output (mMIMO) array and operates at millimeter wave (mmWave) frequencies. The pencil-sharp mMIMO beams and fine range resolution achieved at mmWave, implicates that the point target assumption is impractical in such V2I networks, as the volume and shape of the vehicles become essential for beamforming. Simply pointing a beam to the vehicle may result in the communication receiver (CR) never lying in the beam, even when the vehicle's trajectory is accurately tracked. To tackle this problem, we consider the extended vehicle target with two novel beam tracking schemes. For the first scheme, the beamwidth is adjusted in real-time to cover the entire vehicle, followed by an extended Kalman filtering (EKF) algorithm to predict and track the position of CR according to the resolved high-resolution scatterers. An upgraded scheme is further proposed by splitting each transmission block into two stages. The first stage is exploited for ISAC transmission, where a wide beam is adopted for both communication and sensing. Based on the sensed results at the first stage, the second stage is dedicated to communication by adopting a pencil-sharp beam, yielding a significant improvement of the achievable rate. We further reveal the inherent tradeoff between the two stages in terms of their durations, and develop an optimal time allocation strategy that maximizes the average achievable rate. Finally, numerical results are provided to verify the superiorities of proposed schemes over the state-of-the-art methods.
△ Less
Submitted 25 November, 2021; v1 submitted 19 November, 2021;
originally announced November 2021.
-
Deep Learning in Human Activity Recognition with Wearable Sensors: A Review on Advances
Authors:
Shibo Zhang,
Yaxuan Li,
Shen Zhang,
Farzad Shahabi,
Stephen Xia,
Yu Deng,
Nabil Alshurafa
Abstract:
Mobile and wearable devices have enabled numerous applications, including activity tracking, wellness monitoring, and human--computer interaction, that measure and improve our daily lives. Many of these applications are made possible by leveraging the rich collection of low-power sensors found in many mobile and wearable devices to perform human activity recognition (HAR). Recently, deep learning…
▽ More
Mobile and wearable devices have enabled numerous applications, including activity tracking, wellness monitoring, and human--computer interaction, that measure and improve our daily lives. Many of these applications are made possible by leveraging the rich collection of low-power sensors found in many mobile and wearable devices to perform human activity recognition (HAR). Recently, deep learning has greatly pushed the boundaries of HAR on mobile and wearable devices. This paper systematically categorizes and summarizes existing work that introduces deep learning methods for wearables-based HAR and provides a comprehensive analysis of the current advancements, develo** trends, and major challenges. We also present cutting-edge frontiers and future directions for deep learning-based HAR.
△ Less
Submitted 3 March, 2022; v1 submitted 31 October, 2021;
originally announced November 2021.
-
IGNNITION: Bridging the Gap Between Graph Neural Networks and Networking Systems
Authors:
David Pujol-Perich,
José Suárez-Varela,
Miquel Ferriol,
Shihan Xiao,
Bo Wu,
Albert Cabellos-Aparicio,
Pere Barlet-Ros
Abstract:
Recent years have seen the vast potential of Graph Neural Networks (GNN) in many fields where data is structured as graphs (e.g., chemistry, recommender systems). In particular, GNNs are becoming increasingly popular in the field of networking, as graphs are intrinsically present at many levels (e.g., topology, routing). The main novelty of GNNs is their ability to generalize to other networks uns…
▽ More
Recent years have seen the vast potential of Graph Neural Networks (GNN) in many fields where data is structured as graphs (e.g., chemistry, recommender systems). In particular, GNNs are becoming increasingly popular in the field of networking, as graphs are intrinsically present at many levels (e.g., topology, routing). The main novelty of GNNs is their ability to generalize to other networks unseen during training, which is an essential feature for develo** practical Machine Learning (ML) solutions for networking. However, implementing a functional GNN prototype is currently a cumbersome task that requires strong skills in neural network programming. This poses an important barrier to network engineers that often do not have the necessary ML expertise. In this article, we present IGNNITION, a novel open-source framework that enables fast prototy** of GNNs for networking systems. IGNNITION is based on an intuitive high-level abstraction that hides the complexity behind GNNs, while still offering great flexibility to build custom GNN architectures. To showcase the versatility and performance of this framework, we implement two state-of-the-art GNN models applied to different networking use cases. Our results show that the GNN models produced by IGNNITION are equivalent in terms of accuracy and performance to their native implementations in TensorFlow.
△ Less
Submitted 2 February, 2022; v1 submitted 14 September, 2021;
originally announced September 2021.
-
Primary-Auxiliary Model Scheduling Based Estimation of the Vertical Wheel Force in a Full Vehicle System
Authors:
Xueke Zheng,
Runze Cai,
Shuixin Xiao,
Yu Qiu,
Jun Zhang,
Mian Li
Abstract:
In this work, we study estimation problems in nonlinear mechanical systems subject to non-stationary and unknown excitation, which are common and critical problems in design and health management of mechanical systems.
A primary-auxiliary model scheduling procedure based on time-domain transmissibilities is proposed and performed under switching linear dynamics: In addition to constructing a pri…
▽ More
In this work, we study estimation problems in nonlinear mechanical systems subject to non-stationary and unknown excitation, which are common and critical problems in design and health management of mechanical systems.
A primary-auxiliary model scheduling procedure based on time-domain transmissibilities is proposed and performed under switching linear dynamics: In addition to constructing a primary transmissibility family from the pseudo-inputs to the output during the offline stage, an auxiliary transmissibility family is constructed by further decomposing the pseudo-input vector into two parts. The auxiliary family enables to determine the unknown working condition at which the system is currently running at, and then an appropriate transmissibility from the primary transmissibility family for estimating the unknown output can be selected during the online estimation stage. As a result, the proposed approach offers a generalizable and explainable solution to the signal estimation problems in nonlinear mechanical systems in the context of switching linear dynamics with unknown inputs.
A real-world application to the estimation of the vertical wheel force in a full vehicle system are, respectively, conducted to demonstrate the effectiveness of the proposed method. During the vehicle design phase, the vertical wheel force is the most important one among Wheel Center Loads (WCLs), and it is often measured directly with expensive, intrusive, and hard-to-install measurement devices during full vehicle testing campaigns. Meanwhile, the estimation problem of the vertical wheel force has not been solved well and is still of great interest. The experimental results show good performances of the proposed method in the sense of estimation accuracy for estimating the vertical wheel force.
△ Less
Submitted 23 July, 2021;
originally announced July 2021.
-
Maximizing Revenue with Adaptive Modulation and Multiple FECs in Flexible Optical Networks
Authors:
Cao Chen,
Fen Zhou,
Massimo Tornatore,
Shilin Xiao
Abstract:
Flexible optical networks (FONs) are being adopted to accommodate the increasingly heterogeneous traffic in today's Internet. However, in presence of high traffic load, not all offered traffic can be satisfied at all time. As carried traffic load brings revenues to operators, traffic blocking due to limited spectrum resource leads to revenue losses. In this study, given a set of traffic requests t…
▽ More
Flexible optical networks (FONs) are being adopted to accommodate the increasingly heterogeneous traffic in today's Internet. However, in presence of high traffic load, not all offered traffic can be satisfied at all time. As carried traffic load brings revenues to operators, traffic blocking due to limited spectrum resource leads to revenue losses. In this study, given a set of traffic requests to be provisioned, we consider the problem of maximizing operator's revenue, subject to limited spectrum resource and physical layer impairments (PLIs), namely amplified spontaneous emission noise (ASE), self-channel interference (SCI), cross-channel interference (XCI), and node crosstalk. In FONs, adaptive modulation, multiple FEC, and the tuning of power spectrum density (PSD) can be effectively employed to mitigate the impact of PLIs. Hence, in our study, we propose a universal bandwidth-related impairment evaluation model based on channel bandwidth, which allows a performance analysis for different PSD, FEC and modulations. Leveraging this PLI model and a piecewise linear fitting function, we succeed to formulate the revenue maximization problem as a mixed integer linear program. Then, to solve the problem on larger network instances, a fast two-phase heuristic algorithm is also proposed, which is shown to be near-optimal for revenue maximization. Through simulations, we demonstrate that using adaptive modulation enables to significantly increase revenues in the scenario of high signal-to-noise ratio (SNR), where the revenue can even be doubled for high traffic load, while using multiple FECs is more profitable for scenarios with low SNR.
△ Less
Submitted 14 June, 2021;
originally announced June 2021.
-
Throughput Maximization Leveraging Just-Enough SNR Margin and Channel Spacing Optimization
Authors:
Cao Chen,
Fen Zhou,
Yuanhao Liu,
Shilin Xiao
Abstract:
Flexible optical network is a promising technology to accommodate high-capacity demands in next-generation networks. To ensure uninterrupted communication, existing lightpath provisioning schemes are mainly done with the assumption of worst-case resource under-provisioning and fixed channel spacing, which preserves an excessive signal-to-noise ratio (SNR) margin. However, under a resource over-pro…
▽ More
Flexible optical network is a promising technology to accommodate high-capacity demands in next-generation networks. To ensure uninterrupted communication, existing lightpath provisioning schemes are mainly done with the assumption of worst-case resource under-provisioning and fixed channel spacing, which preserves an excessive signal-to-noise ratio (SNR) margin. However, under a resource over-provisioning scenario, the excessive SNR margin restricts the transmission bit-rate or transmission reach, leading to physical layer resource waste and stranded transmission capacity. To tackle this challenging problem, we leverage an iterative feedback tuning algorithm to provide a just-enough SNR margin, so as to maximize the network throughput. Specifically, the proposed algorithm is implemented in three steps. First, starting from the high SNR margin setup, we establish an integer linear programming model as well as a heuristic algorithm to maximize the network throughput by solving the problem of routing, modulation format, forward error correction, baud-rate selection, and spectrum assignment. Second, we optimize the channel spacing of the lightpaths obtained from the previous step, thereby increasing the available physical layer resources. Finally, we iteratively reduce the SNR margin of each lightpath until the network throughput cannot be increased. Through numerical simulations, we confirm the throughput improvement in different networks and with different baud-rates. In particular, we find that our algorithm enables over 20\% relative gain when network resource is over-provisioned, compared to the traditional method preserving an excessive SNR margin.
△ Less
Submitted 16 July, 2021; v1 submitted 14 June, 2021;
originally announced June 2021.
-
Robust Resource Allocation for Multi-Antenna URLLC-OFDMA Systems in a Smart Factory
Authors:
**g Cheng,
Chao Shen,
Shuqiang Xia
Abstract:
In this paper, we investigate the worst-case robust beamforming design and resource block (RB) assignment problem for total transmit power minimization of the central controller while guaranteeing each robot's transmission with target number of data bits and within required ultra-low latency and extremely high reliability. By using the property of the independence of each robot's beamformer design…
▽ More
In this paper, we investigate the worst-case robust beamforming design and resource block (RB) assignment problem for total transmit power minimization of the central controller while guaranteeing each robot's transmission with target number of data bits and within required ultra-low latency and extremely high reliability. By using the property of the independence of each robot's beamformer design, we can obtain the equivalent power control design form of the original beamforming design. The binary RB map** indicators are transformed into continuous ones with additional $\ell_0$-norm constraints to promote sparsity on each RB. A novel non-convex penalty (NCP) approach is applied to solve such $\ell_0$-norm constraints. Numerical results demonstrate the superiority of the NCP approach to the well-known reweighted $\ell_1$ method in terms of the optimized power consumption, convergence rate and robustness to channel realizations. Also, the impacts of latency, reliability, number of transmit antennas and channel uncertainty on the system performance are revealed.
△ Less
Submitted 4 June, 2021;
originally announced June 2021.
-
Fast Camera Image Denoising on Mobile GPUs with Deep Learning, Mobile AI 2021 Challenge: Report
Authors:
Andrey Ignatov,
Kim Byeoung-su,
Radu Timofte,
Angeline Pouget,
Fenglong Song,
Cheng Li,
Shuai Xiao,
Zhongqian Fu,
Matteo Maggioni,
Yibin Huang,
Shen Cheng,
Xin Lu,
Yifeng Zhou,
Liangyu Chen,
Donghao Liu,
Xiangyu Zhang,
Haoqiang Fan,
Jian Sun,
Shuaicheng Liu,
Minsu Kwon,
Myungje Lee,
Jaeyoon Yoo,
Changbeom Kang,
Shinjo Wang,
Bin Huang
, et al. (7 additional authors not shown)
Abstract:
Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solut…
▽ More
Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solution that can demonstrate high efficiency on smartphone GPUs. For this, the participants were provided with a novel large-scale dataset consisting of noisy-clean image pairs captured in the wild. The runtime of all models was evaluated on the Samsung Exynos 2100 chipset with a powerful Mali GPU capable of accelerating floating-point and quantized neural networks. The proposed solutions are fully compatible with any mobile GPU and are capable of processing 480p resolution images under 40-80 ms while achieving high fidelity results. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results
Authors:
Ren Yang,
Radu Timofte,
**g Liu,
Yi Xu,
Xinjian Zhang,
Minyi Zhao,
Shuigeng Zhou,
Kelvin C. K. Chan,
Shangchen Zhou,
Xiangyu Xu,
Chen Change Loy,
Xin Li,
Fanglong Liu,
He Zheng,
Lielin Jiang,
Qi Zhang,
Dongliang He,
Fu Li,
Qingqing Dang,
Yibin Huang,
Matteo Maggioni,
Zhongqian Fu,
Shuai Xiao,
Cheng li,
Thomas Tanay
, et al. (47 additional authors not shown)
Abstract:
This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at…
▽ More
This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at a fixed bit-rate. Besides, the quality enhancement of Tracks 1 and 3 targets at improving the fidelity (PSNR), and Track 2 targets at enhancing the perceptual quality. The three tracks totally attract 482 registrations. In the test phase, 12 teams, 8 teams and 11 teams submitted the final results of Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of video quality enhancement. The homepage of the challenge: https://github.com/RenYang-home/NTIRE21_VEnh
△ Less
Submitted 31 August, 2022; v1 submitted 21 April, 2021;
originally announced April 2021.
-
Efficient Multi-Stage Video Denoising with Recurrent Spatio-Temporal Fusion
Authors:
Matteo Maggioni,
Yibin Huang,
Cheng Li,
Shuai Xiao,
Zhongqian Fu,
Fenglong Song
Abstract:
In recent years, denoising methods based on deep learning have achieved unparalleled performance at the cost of large computational complexity. In this work, we propose an Efficient Multi-stage Video Denoising algorithm, called EMVD, to drastically reduce the complexity while maintaining or even improving the performance. First, a fusion stage reduces the noise through a recursive combination of a…
▽ More
In recent years, denoising methods based on deep learning have achieved unparalleled performance at the cost of large computational complexity. In this work, we propose an Efficient Multi-stage Video Denoising algorithm, called EMVD, to drastically reduce the complexity while maintaining or even improving the performance. First, a fusion stage reduces the noise through a recursive combination of all past frames in the video. Then, a denoising stage removes the noise in the fused frame. Finally, a refinement stage restores the missing high frequency in the denoised frame. All stages operate on a transform-domain representation obtained by learnable and invertible linear operators which simultaneously increase accuracy and decrease complexity of the model. A single loss on the final output is sufficient for successful convergence, hence making EMVD easy to train. Experiments on real raw data demonstrate that EMVD outperforms the state of the art when complexity is constrained, and even remains competitive against methods whose complexities are several orders of magnitude higher. Further, the low complexity and memory requirements of EMVD enable real-time video denoising on commercial SoC in mobile devices.
△ Less
Submitted 30 March, 2023; v1 submitted 9 March, 2021;
originally announced March 2021.
-
Reconfigurable Intelligent Surface for Massive Connectivity
Authors:
Shuhao Xia,
Yuanming Shi,
Yong Zhou,
Xiaojun Yuan
Abstract:
With the rapid development of Internet of Things (IoT), massive machine-type communication has become a promising application scenario, where a large number of devices transmit sporadically to a base station (BS). Reconfigurable intelligent surface (RIS) has been recently proposed as an innovative new technology to achieve energy efficiency and coverage enhancement by establishing favorable signal…
▽ More
With the rapid development of Internet of Things (IoT), massive machine-type communication has become a promising application scenario, where a large number of devices transmit sporadically to a base station (BS). Reconfigurable intelligent surface (RIS) has been recently proposed as an innovative new technology to achieve energy efficiency and coverage enhancement by establishing favorable signal propagation environments, thereby improving data transmission in massive connectivity. Nevertheless, the BS needs to detect active devices and estimate channels to support data transmission in RIS-assisted massive access systems, which yields unique challenges. This paper shall consider an RIS-assisted uplink IoT network and aims to solve the RIS-related activity detection and channel estimation problem, where the BS detects the active devices and estimates the separated channels of the RIS-to-device link and the RIS-to-BS link. Due to limited scattering between the RIS and the BS, we model the RIS-to-BS channel as a sparse channel. As a result, by simultaneously exploiting both the sparsity of sporadic transmission in massive connectivity and the RIS-to-BS channels, we formulate the RIS-related activity detection and channel estimation problem as a sparse matrix factorization problem. Furthermore, we develop an approximate message passing (AMP) based algorithm to solve the problem based on Bayesian inference framework and reduce the computational complexity by approximating the algorithm with the central limit theorem and Taylor series arguments. Finally, extensive numerical experiments are conducted to verify the effectiveness and improvements of the proposed algorithm.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
UAV-Enabled Cooperative Jamming for Covert Communications
Authors:
Hangmei Rao,
Sa Xiao,
Shihao Yan,
Jianquan Wang,
Wanbin Tang
Abstract:
This work employs an unmanned aerial vehicle (UAV) as a jammer to aid a covert communication from a transmitter Alice to a receiver Bob, where the UAV transmits artificial noise (AN) with random power to deliberately create interference to a warden Willie. In the considered system, the UAV's trajectory is critical to the covert communication performance, since the AN transmitted by the UAV also ge…
▽ More
This work employs an unmanned aerial vehicle (UAV) as a jammer to aid a covert communication from a transmitter Alice to a receiver Bob, where the UAV transmits artificial noise (AN) with random power to deliberately create interference to a warden Willie. In the considered system, the UAV's trajectory is critical to the covert communication performance, since the AN transmitted by the UAV also generates interference to Bob. To maximize the system performance, we formulate an optimization problem to jointly design the UAV's trajectory and Alice's transmit power. The formulated optimization problem is non-convex and is normally solved by a conventional iterative (CI) method, which requires multiple approximations based on Taylor expansions and an initialization on the UAV's trajectory. In order to eliminate these requirements, this work, for the first time, develops a geometric (GM) method to solve the optimization problem. By analyzing the covertness constraint, the GM method decouples the joint optimization into optimizing the UAV's trajectory and Alice's transmit power separately. Our examination shows that the GM method can significantly outperform the CI method in terms of achieving a higher average covert rate and the complexity of the GM method is lower than that of the CI method.
△ Less
Submitted 25 February, 2022; v1 submitted 19 January, 2021;
originally announced January 2021.