Search | arXiv e-print repository

arXiv:2406.19072 [pdf, other]

Scatterer Recognition from LiDAR Point Clouds for Environment-Embedded Vehicular Channel Modeling via Synesthesia of Machines

Authors: Ziwei Huang, Lu Bai, Zengrui Han, Xiang Cheng

Abstract: In this paper, a novel environment-embedded vehicular channel model is proposed by scatterer recognition from light detection and ranging (LiDAR) point clouds via Synesthesia of Machines (SoM). To provide a robust data foundation, a new intelligent sensing-communication integration dataset in vehicular urban scenarios is constructed. Based on the constructed dataset, the complex SoM mechanism, i.e… ▽ More In this paper, a novel environment-embedded vehicular channel model is proposed by scatterer recognition from light detection and ranging (LiDAR) point clouds via Synesthesia of Machines (SoM). To provide a robust data foundation, a new intelligent sensing-communication integration dataset in vehicular urban scenarios is constructed. Based on the constructed dataset, the complex SoM mechanism, i.e., map** relationship between scatterers in electromagnetic space and LiDAR point clouds in physical environment, is explored via multilayer perceptron (MLP) with electromagnetic propagation mechanism. By using LiDAR point clouds to implement scatterer recognition, channel non-stationarity and consistency are modeled in an environment-embedded manner. Using ray-tracing (RT)-based results as the ground truth, the scatterer recognition accuracy exceeds 90%. The accuracy of the proposed model is further verified by the close fit between simulation results and RT results. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.13705 [pdf, other]

EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy

Authors: Long Bai, Qiaozhi Tan, Tong Chen, Wan Jun Nah, Yanheng Li, Zhicheng He, Sishen Yuan, Zhen Chen, **lin Wu, Mobarakol Islam, Zhen Li, Hongbin Liu, Hongliang Ren

Abstract: Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema… ▽ More Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels remains underexplored. To tackle this, we introduce EndoUIC, a WCE unified illumination correction solution using an end-to-end promptable diffusion transformer (DFT) model. In our work, the illumination prompt module shall navigate the model to adapt to different exposure levels and perform targeted image enhancement, in which the Adaptive Prompt Integration (API) and Global Prompt Scanner (GPS) modules shall further boost the concurrent representation learning between the prompt parameters and features. Besides, the U-shaped restoration DFT model shall capture the long-range dependencies and contextual information for unified illumination restoration. Moreover, we present a novel Capsule-endoscopy Exposure Correction (CEC) dataset, including ground-truth and corrupted image pairs annotated by expert photographers. Extensive experiments against a variety of state-of-the-art (SOTA) methods on four datasets showcase the effectiveness of our proposed method and components in WCE illumination restoration, and the additional downstream experiments further demonstrate its utility for clinical diagnosis and surgical assistance. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: To appear in MICCAI 2024. Code and dataset availability: https://github.com/longbai1006/EndoUIC

arXiv:2405.10948 [pdf, other]

Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery

Authors: Guankun Wang, Long Bai, Wan Jun Nah, Jie Wang, Zhaoxi Zhang, Zhen Chen, **lin Wu, Mobarakol Islam, Hongbin Liu, Hongliang Ren

Abstract: Recent advancements in Surgical Visual Question Answering (Surgical-VQA) and related region grounding have shown great promise for robotic and medical applications, addressing the critical need for automated methods in personalized surgical mentorship. However, existing models primarily provide simple structured answers and struggle with complex scenarios due to their limited capability in recogni… ▽ More Recent advancements in Surgical Visual Question Answering (Surgical-VQA) and related region grounding have shown great promise for robotic and medical applications, addressing the critical need for automated methods in personalized surgical mentorship. However, existing models primarily provide simple structured answers and struggle with complex scenarios due to their limited capability in recognizing long-range dependencies and aligning multimodal information. In this paper, we introduce Surgical-LVLM, a novel personalized large vision-language model tailored for complex surgical scenarios. Leveraging the pre-trained large vision-language model and specialized Visual Perception LoRA (VP-LoRA) blocks, our model excels in understanding complex visual-language tasks within surgical contexts. In addressing the visual grounding task, we propose the Token-Interaction (TIT) module, which strengthens the interaction between the grounding module and the language responses of the Large Visual Language Model (LVLM) after projecting them into the latent space. We demonstrate the effectiveness of Surgical-LVLM on several benchmarks, including EndoVis-17-VQLA, EndoVis-18-VQLA, and a newly introduced EndoVis Conversations dataset, which sets new performance standards. Our work contributes to advancing the field of automated surgical mentorship by providing a context-aware solution. △ Less

Submitted 22 March, 2024; originally announced May 2024.

arXiv:2405.10550 [pdf, other]

LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion

Authors: Tong Chen, Qingcheng Lyu, Long Bai, Erjian Guo, Huxin Gao, Xiaoxiao Yang, Hongliang Ren, Lu** Zhou

Abstract: Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It ad… ▽ More Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It adopts a T-shape model architecture to capture global structural information using low-resolution images and gradually recover the details in subsequent denoising steps. We further prone the model to significantly reduce the model size while retaining performance. While discarding certain downsampling operations to save parameters leads to instability and low efficiency in convergence during the training, we introduce a Temporal Light Unit (TLU), a plug-and-play module, for more stable training and better performance. TLU associates time steps with denoised image features, establishing temporal dependencies of the denoising steps and improving denoising outcomes. Moreover, while recovering images using the diffusion model, potential spectral shifts were noted. We further introduce a Chroma Balancer (CB) to mitigate this issue. Our LighTDiff outperforms many competitive LLIE methods with exceptional computational efficiency. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.08672 [pdf, other]

EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera

Authors: Beilei Cui, Mobarakol Islam, Long Bai, An Wang, Hongliang Ren

Abstract: Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adapt… ▽ More Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adaptation methods to adapt these models to endoscopic depth estimation. We propose Endoscopic Depth Any Camera (EndoDAC) which is an efficient self-supervised depth estimation framework that adapts foundation models to endoscopic scenes. Specifically, we develop the Dynamic Vector-Based Low-Rank Adaptation (DV-LoRA) and employ Convolutional Neck blocks to tailor the foundational model to the surgical domain, utilizing remarkably few trainable parameters. Given that camera information is not always accessible, we also introduce a self-supervised adaptation strategy that estimates camera intrinsics using the pose encoder. Our framework is capable of being trained solely on monocular surgical videos from any camera, ensuring minimal training costs. Experiments demonstrate that our approach obtains superior performance even with fewer training epochs and unaware of the ground truth camera intrinsics. Code is available at https://github.com/BeileiCui/EndoDAC. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: early accepted by MICCAI 2024

arXiv:2404.10640 [pdf, other]

Adapting SAM for Surgical Instrument Tracking and Segmentation in Endoscopic Submucosal Dissection Videos

Authors: Jieming Yu, Long Bai, Guankun Wang, An Wang, Xiaoxiao Yang, Huxin Gao, Hongliang Ren

Abstract: The precise tracking and segmentation of surgical instruments have led to a remarkable enhancement in the efficiency of surgical procedures. However, the challenge lies in achieving accurate segmentation of surgical instruments while minimizing the need for manual annotation and reducing the time required for the segmentation process. To tackle this, we propose a novel framework for surgical instr… ▽ More The precise tracking and segmentation of surgical instruments have led to a remarkable enhancement in the efficiency of surgical procedures. However, the challenge lies in achieving accurate segmentation of surgical instruments while minimizing the need for manual annotation and reducing the time required for the segmentation process. To tackle this, we propose a novel framework for surgical instrument segmentation and tracking. Specifically, with a tiny subset of frames for segmentation, we ensure accurate segmentation across the entire surgical video. Our method adopts a two-stage approach to efficiently segment videos. Initially, we utilize the Segment-Anything (SAM) model, which has been fine-tuned using the Low-Rank Adaptation (LoRA) on the EndoVis17 Dataset. The fine-tuned SAM model is applied to segment the initial frames of the video accurately. Subsequently, we deploy the XMem++ tracking algorithm to follow the annotated frames, thereby facilitating the segmentation of the entire video sequence. This workflow enables us to precisely segment and track objects within the video. Through extensive evaluation of the in-distribution dataset (EndoVis17) and the out-of-distribution datasets (EndoVis18 \& the endoscopic submucosal dissection surgery (ESD) dataset), our framework demonstrates exceptional accuracy and robustness, thus showcasing its potential to advance the automated robotic-assisted surgery. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: To appear in IEEE ICRA 2024 C4SR+ Workshop

arXiv:2403.14185 [pdf, other]

A LiDAR-Aided Channel Model for Vehicular Intelligent Sensing-Communication Integration

Authors: Ziwei Huang, Lu Bai, Mingran Sun, Xiang Cheng

Abstract: In this paper, a novel channel modeling approach, named light detection and ranging (LiDAR)-aided geometry-based stochastic modeling (LA-GBSM), is developed. Based on the developed LA-GBSM approach, a new millimeter wave (mmWave) channel model for sixth-generation (6G) vehicular intelligent sensing-communication integration is proposed, which can support the design of intelligent transportation sy… ▽ More In this paper, a novel channel modeling approach, named light detection and ranging (LiDAR)-aided geometry-based stochastic modeling (LA-GBSM), is developed. Based on the developed LA-GBSM approach, a new millimeter wave (mmWave) channel model for sixth-generation (6G) vehicular intelligent sensing-communication integration is proposed, which can support the design of intelligent transportation systems (ITSs). The proposed LA-GBSM is accurately parameterized under high, medium, and low vehicular traffic density (VTD) conditions via a sensing-communication simulation dataset with LiDAR point clouds and scatterer information for the first time. Specifically, by detecting dynamic vehicles and static building/tress through LiDAR point clouds via machine learning, scatterers are divided into static and dynamic scatterers. Furthermore, statistical distributions of parameters, e.g., distance, angle, number, and power, related to static and dynamic scatterers are quantified under high, medium, and low VTD conditions. To mimic channel non-stationarity and consistency, based on the quantified statistical distributions, a new visibility region (VR)-based algorithm in consideration of newly generated static/dynamic scatterers is developed. Key channel statistics are derived and simulated. By comparing simulation results and ray-tracing (RT)-based results, the utility of the proposed LA-GBSM is verified. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2401.11960 [pdf, other]

Observation-Guided Meteorological Field Downscaling at Station Scale: A Benchmark and a New Method

Authors: Zili Liu, Hao Chen, Lei Bai, Wenyuan Li, Keyan Chen, Zhengyi Wang, Wanli Ouyang, Zhengxia Zou, Zhenwei Shi

Abstract: Downscaling (DS) of meteorological variables involves obtaining high-resolution states from low-resolution meteorological fields and is an important task in weather forecasting. Previous methods based on deep learning treat downscaling as a super-resolution task in computer vision and utilize high-resolution gridded meteorological fields as supervision to improve resolution at specific grid scales… ▽ More Downscaling (DS) of meteorological variables involves obtaining high-resolution states from low-resolution meteorological fields and is an important task in weather forecasting. Previous methods based on deep learning treat downscaling as a super-resolution task in computer vision and utilize high-resolution gridded meteorological fields as supervision to improve resolution at specific grid scales. However, this approach has struggled to align with the continuous distribution characteristics of meteorological fields, leading to an inherent systematic bias between the downscaled results and the actual observations at meteorological stations. In this paper, we extend meteorological downscaling to arbitrary scattered station scales, establish a brand new benchmark and dataset, and retrieve meteorological states at any given station location from a coarse-resolution meteorological field. Inspired by data assimilation techniques, we integrate observational data into the downscaling process, providing multi-scale observational priors. Building on this foundation, we propose a new downscaling model based on hypernetwork architecture, namely HyperDS, which efficiently integrates different observational information into the model training, achieving continuous scale modeling of the meteorological field. Through extensive experiments, our proposed method outperforms other specially designed baseline models on multiple surface variables. Notably, the mean squared error (MSE) for wind speed and surface pressure improved by 67% and 19.5% compared to other methods. We will release the dataset and code subsequently. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.04148 [pdf, other]

Online Test-Time Adaptation of Spatial-Temporal Traffic Flow Forecasting

Authors: Pengxin Guo, Pengrong **, Ziyue Li, Lei Bai, Yu Zhang

Abstract: Accurate spatial-temporal traffic flow forecasting is crucial in aiding traffic managers in implementing control measures and assisting drivers in selecting optimal travel routes. Traditional deep-learning based methods for traffic flow forecasting typically rely on historical data to train their models, which are then used to make predictions on future data. However, the performance of the traine… ▽ More Accurate spatial-temporal traffic flow forecasting is crucial in aiding traffic managers in implementing control measures and assisting drivers in selecting optimal travel routes. Traditional deep-learning based methods for traffic flow forecasting typically rely on historical data to train their models, which are then used to make predictions on future data. However, the performance of the trained model usually degrades due to the temporal drift between the historical and future data. To make the model trained on historical data better adapt to future data in a fully online manner, this paper conducts the first study of the online test-time adaptation techniques for spatial-temporal traffic flow forecasting problems. To this end, we propose an Adaptive Double Correction by Series Decomposition (ADCSD) method, which first decomposes the output of the trained model into seasonal and trend-cyclical parts and then corrects them by two separate modules during the testing phase using the latest observed data entry by entry. In the proposed ADCSD method, instead of fine-tuning the whole trained model during the testing phase, a lite network is attached after the trained model, and only the lite network is fine-tuned in the testing process each time a data entry is observed. Moreover, to satisfy that different time series variables may have different levels of temporal drift, two adaptive vectors are adopted to provide different weights for different time series variables. Extensive experiments on four real-world traffic flow forecasting datasets demonstrate the effectiveness of the proposed ADCSD method. The code is available at https://github.com/Pengxin-Guo/ADCSD. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2401.01117 [pdf, other]

Q-Refine: A Perceptual Quality Refiner for AI-Generated Image

Authors: Chunyi Li, Haoning Wu, Zicheng Zhang, Hongkun Hao, Kaiwei Zhang, Lei Bai, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai

Abstract: With the rapid evolution of the Text-to-Image (T2I) model in recent years, their unsatisfactory generation result has become a challenge. However, uniformly refining AI-Generated Images (AIGIs) of different qualities not only limited optimization capabilities for low-quality AIGIs but also brought negative optimization to high-quality AIGIs. To address this issue, a quality-award refiner named Q-R… ▽ More With the rapid evolution of the Text-to-Image (T2I) model in recent years, their unsatisfactory generation result has become a challenge. However, uniformly refining AI-Generated Images (AIGIs) of different qualities not only limited optimization capabilities for low-quality AIGIs but also brought negative optimization to high-quality AIGIs. To address this issue, a quality-award refiner named Q-Refine is proposed. Based on the preference of the Human Visual System (HVS), Q-Refine uses the Image Quality Assessment (IQA) metric to guide the refining process for the first time, and modify images of different qualities through three adaptive pipelines. Experimental shows that for mainstream T2I models, Q-Refine can perform effective optimization to AIGIs of different qualities. It can be a general refiner to optimize AIGIs from both fidelity and aesthetic quality levels, thus expanding the application of the T2I generation models. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: 6 pages, 5 figures

arXiv:2312.09576 [pdf, other]

SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

Authors: Xiangde Luo, Jia Fu, Yunxin Zhong, Shuolin Liu, Bing Han, Mehdi Astaraki, Simone Bendazzoli, Iuliana Toma-Dasu, Yiwen Ye, Ziyang Chen, Yong Xia, Yanzhou Su, ** Ye, Junjun He, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Kaixiang Yang, Xin Fang, Zhiwei Wang, Chan Woong Lee, Sang Joon Park, Jaehee Chun, Constantin Ulrich, Klaus H. Maier-Hein , et al. (17 additional authors not shown)

Abstract: Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results… ▽ More Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results in many medical image segmentation tasks. However, for NPC OARs and GTVs segmentation, few public datasets are available for model development and evaluation. To alleviate this problem, the SegRap2023 challenge was organized in conjunction with MICCAI2023 and presented a large-scale benchmark for OAR and GTV segmentation with 400 Computed Tomography (CT) scans from 200 NPC patients, each with a pair of pre-aligned non-contrast and contrast-enhanced CT scans. The challenge's goal was to segment 45 OARs and 2 GTVs from the paired CT scans. In this paper, we detail the challenge and analyze the solutions of all participants. The average Dice similarity coefficient scores for all submissions ranged from 76.68\% to 86.70\%, and 70.42\% to 73.44\% for OARs and GTVs, respectively. We conclude that the segmentation of large-size OARs is well-addressed, and more efforts are needed for GTVs and small-size or thin-structure OARs. The benchmark will remain publicly available here: https://segrap2023.grand-challenge.org △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: A challenge report of SegRap2023 (organized in conjunction with MICCAI2023)

arXiv:2310.09937 [pdf, other]

Joint Sparse Representations and Coupled Dictionary Learning in Multi-Source Heterogeneous Image Pseudo-color Fusion

Authors: Long Bai, Shilong Yao, Kun Gao, Yanjun Huang, Ruijie Tang, Hong Yan, Max Q. -H. Meng, Hongliang Ren

Abstract: Considering that Coupled Dictionary Learning (CDL) method can obtain a reasonable linear mathematical relationship between resource images, we propose a novel CDL-based Synthetic Aperture Radar (SAR) and multispectral pseudo-color fusion method. Firstly, the traditional Brovey transform is employed as a pre-processing method on the paired SAR and multispectral images. Then, CDL is used to capture… ▽ More Considering that Coupled Dictionary Learning (CDL) method can obtain a reasonable linear mathematical relationship between resource images, we propose a novel CDL-based Synthetic Aperture Radar (SAR) and multispectral pseudo-color fusion method. Firstly, the traditional Brovey transform is employed as a pre-processing method on the paired SAR and multispectral images. Then, CDL is used to capture the correlation between the pre-processed image pairs based on the dictionaries generated from the source images via enforced joint sparse coding. Afterward, the joint sparse representation in the pair of dictionaries is utilized to construct an image mask via calculating the reconstruction errors, and therefore generate the final fusion image. The experimental verification results of the SAR images from the Sentinel-1 satellite and the multispectral images from the Landsat-8 satellite show that the proposed method can achieve superior visual effects, and excellent quantitative performance in terms of spectral distortion, correlation coefficient, MSE, NIQE, BRISQUE, and PIQE. △ Less

Submitted 15 October, 2023; originally announced October 2023.

Comments: To appear in IEEE Sensors Journal

arXiv:2308.16376 [pdf, other]

Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites: A Federated Learning Approach with Noise-Resilient Training

Authors: Lei Bai, Dongang Wang, Michael Barnett, Mariano Cabezas, Weidong Cai, Fernando Calamante, Kain Kyle, Dongnan Liu, Linda Ly, Aria Nguyen, Chun-Chien Shieh, Ryan Sullivan, Hengrui Wang, Geng Zhan, Wanli Ouyang, Chenyu Wang

Abstract: Accurately measuring the evolution of Multiple Sclerosis (MS) with magnetic resonance imaging (MRI) critically informs understanding of disease progression and helps to direct therapeutic strategy. Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area. Obtaining sufficient data from a single clin… ▽ More Accurately measuring the evolution of Multiple Sclerosis (MS) with magnetic resonance imaging (MRI) critically informs understanding of disease progression and helps to direct therapeutic strategy. Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area. Obtaining sufficient data from a single clinical site is challenging and does not address the heterogeneous need for model robustness. Conversely, the collection of data from multiple sites introduces data privacy concerns and potential label noise due to varying annotation standards. To address this dilemma, we explore the use of the federated learning framework while considering label noise. Our approach enables collaboration among multiple clinical sites without compromising data privacy under a federated learning paradigm that incorporates a noise-robust training strategy based on label correction. Specifically, we introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions, enabling the correction of false annotations based on prediction confidence. We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites, enhancing the reliability of the correction process. Extensive experiments conducted on two multi-site datasets demonstrate the effectiveness and robustness of our proposed methods, indicating their potential for clinical applications in multi-site collaborations. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: 11 pages, 4 figures, journal submission

arXiv:2308.15717 [pdf]

Risk-aware Flexible Resource Utilization in an Unbalanced Three-Phase Distribution Network using SDP-based Distributionally Robust Optimal Power Flow

Authors: Zelong Lu, Jianxue Wang, Mohammad Shahidehpour, Linquan Bai, Zuyi Li, Lei Yan, Xianlong Chen

Abstract: The variability caused by the proliferation of distributed energy resources (DERs) and the significant growth in unbalanced three-phase loads pose unprecedented challenges to distribution network operations. This paper focuses on how a distribution system operator (DSO), taking over the distribution grid and market operations, would develop a risk-aware flexibility market to mitigate uncertainties… ▽ More The variability caused by the proliferation of distributed energy resources (DERs) and the significant growth in unbalanced three-phase loads pose unprecedented challenges to distribution network operations. This paper focuses on how a distribution system operator (DSO), taking over the distribution grid and market operations, would develop a risk-aware flexibility market to mitigate uncertainties in an unbalanced three-phase power distribution network. First, a distributionally robust chance constraint (DRCC) method is devised to solve the unbalanced three-phase optimal power flow using a semidefinite programming (SDP) model. The DSO can apply the proposed solution to jointly clear energy and flexibility markets. Then, the DRCC model accuracy is improved by an information-sharing mechanism characterized by spatially-correlated uncertainties in the distribution grid. Further, a novel system-wide response function is derived to make the DRCC model tractable. Using the duality theory, the paper further investigates the physical composition of the DSO's cleared flexibility prices to guide the unbalanced distribution network operation. Finally, the effectiveness of the risk-aware flexibility market is verified in a modified three-phase IEEE 34-node system. Results demonstrate that the flexibility market can quantify the impact of spatially correlated uncertainties and facilitate the utilization of flexible resources to mitigate uncertainties across the network. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.02845 [pdf, other]

Landmark Detection using Transformer Toward Robot-assisted Nasal Airway Intubation

Authors: Tianhang Liu, Hechen Li, Long Bai, Yanan Wu, An Wang, Mobarakol Islam, Hongliang Ren

Abstract: Robot-assisted airway intubation application needs high accuracy in locating targets and organs. Two vital landmarks, nostrils and glottis, can be detected during the intubation to accommodate the stages of nasal intubation. Automated landmark detection can provide accurate localization and quantitative evaluation. The Detection Transformer (DeTR) leads object detectors to a new paradigm with long… ▽ More Robot-assisted airway intubation application needs high accuracy in locating targets and organs. Two vital landmarks, nostrils and glottis, can be detected during the intubation to accommodate the stages of nasal intubation. Automated landmark detection can provide accurate localization and quantitative evaluation. The Detection Transformer (DeTR) leads object detectors to a new paradigm with long-range dependence. However, current DeTR requires long iterations to converge, and does not perform well in detecting small objects. This paper proposes a transformer-based landmark detection solution with deformable DeTR and the semantic-aligned-matching module for detecting landmarks in robot-assisted intubation. The semantics aligner can effectively align the semantics of object queries and image features in the same embedding space using the most discriminative features. To evaluate the performance of our solution, we utilize a publicly accessible glottis dataset and automatically annotate a nostril detection dataset. The experimental results demonstrate our competitive performance in detection accuracy. Our code is publicly accessible. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Comments: ICBIR 2023 (Best Student Paper Award). Code availability: https://github.com/ConorLTH/airway_intubation_landmarks_detection

arXiv:2307.02452 [pdf, other]

LLCaps: Learning to Illuminate Low-Light Capsule Endoscopy with Curved Wavelet Attention and Reverse Diffusion

Authors: Long Bai, Tong Chen, Yanan Wu, An Wang, Mobarakol Islam, Hongliang Ren

Abstract: Wireless capsule endoscopy (WCE) is a painless and non-invasive diagnostic tool for gastrointestinal (GI) diseases. However, due to GI anatomical constraints and hardware manufacturing limitations, WCE vision signals may suffer from insufficient illumination, leading to a complicated screening and examination procedure. Deep learning-based low-light image enhancement (LLIE) in the medical field gr… ▽ More Wireless capsule endoscopy (WCE) is a painless and non-invasive diagnostic tool for gastrointestinal (GI) diseases. However, due to GI anatomical constraints and hardware manufacturing limitations, WCE vision signals may suffer from insufficient illumination, leading to a complicated screening and examination procedure. Deep learning-based low-light image enhancement (LLIE) in the medical field gradually attracts researchers. Given the exuberant development of the denoising diffusion probabilistic model (DDPM) in computer vision, we introduce a WCE LLIE framework based on the multi-scale convolutional neural network (CNN) and reverse diffusion process. The multi-scale design allows models to preserve high-resolution representation and context information from low-resolution, while the curved wavelet attention (CWA) block is proposed for high-frequency and local feature learning. Furthermore, we combine the reverse diffusion procedure to further optimize the shallow output and generate the most realistic image. The proposed method is compared with ten state-of-the-art (SOTA) LLIE methods and significantly outperforms quantitatively and qualitatively. The superior performance on GI disease segmentation further demonstrates the clinical potential of our proposed model. Our code is publicly accessible. △ Less

Submitted 22 July, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: To appear in MICCAI 2023. Code availability: https://github.com/longbai1006/LLCaps

arXiv:2306.14143 [pdf, other]

Intelligent Multi-Modal Sensing-Communication Integration: Synesthesia of Machines

Authors: Xiang Cheng, Haotian Zhang, Jianan Zhang, Shijian Gao, Sijiang Li, Ziwei Huang, Lu Bai, Zonghui Yang, Xinhu Zheng, Liuqing Yang

Abstract: In the era of sixth-generation (6G) wireless communications, integrated sensing and communications (ISAC) is recognized as a promising solution to upgrade the physical system by endowing wireless communications with sensing capability. Existing ISAC is mainly oriented to static scenarios with radio-frequency (RF) sensors being the primary participants, thus lacking a comprehensive environment feat… ▽ More In the era of sixth-generation (6G) wireless communications, integrated sensing and communications (ISAC) is recognized as a promising solution to upgrade the physical system by endowing wireless communications with sensing capability. Existing ISAC is mainly oriented to static scenarios with radio-frequency (RF) sensors being the primary participants, thus lacking a comprehensive environment feature characterization and facing a severe performance bottleneck in dynamic environments. To date, extensive surveys on ISAC have been conducted but are limited to summarizing RF-based radar sensing. Currently, some research efforts have been devoted to exploring multi-modal sensing-communication integration but still lack a comprehensive review. Therefore, we generalize the concept of ISAC inspired by human synesthesia to establish a unified framework of intelligent multi-modal sensing-communication integration and provide a comprehensive review under such a framework in this paper. The so-termed Synesthesia of Machines (SoM) gives the clearest cognition of such intelligent integration and details its paradigm for the first time. We commence by justifying the necessity of the new paradigm. Subsequently, we offer a definition of SoM and zoom into the detailed paradigm, which is summarized as three operation modes. To facilitate SoM research, we overview the prerequisite of SoM research, i.e., mixed multi-modal (MMM) datasets. Then, we introduce the map** relationships between multi-modal sensing and communications. Afterward, we cover the technological review on SoM-enhance-based and SoM-concert-based applications. To corroborate the superiority of SoM, we also present simulation results related to dual-function waveform and predictive beamforming design. Finally, we propose some potential directions to inspire future research efforts. △ Less

Submitted 20 November, 2023; v1 submitted 25 June, 2023; originally announced June 2023.

Comments: This paper has been accepted by IEEE Communications Surveys & Tutorials

arXiv:2306.14125 [pdf, other]

M$^3$SC: A Generic Dataset for Mixed Multi-Modal (MMM) Sensing and Communication Integration

Authors: Xiang Cheng, Ziwei Huang, Lu Bai, Haotian Zhang, Mingran Sun, Boxun Liu, Sijiang Li, Jianan Zhang, Minson Lee

Abstract: The sixth generation (6G) of mobile communication system is witnessing a new paradigm shift, i.e., integrated sensing-communication system. A comprehensive dataset is a prerequisite for 6G integrated sensing-communication research. This paper develops a novel simulation dataset, named M3SC, for mixed multi-modal (MMM) sensing-communication integration, and the generation framework of the M3SC data… ▽ More The sixth generation (6G) of mobile communication system is witnessing a new paradigm shift, i.e., integrated sensing-communication system. A comprehensive dataset is a prerequisite for 6G integrated sensing-communication research. This paper develops a novel simulation dataset, named M3SC, for mixed multi-modal (MMM) sensing-communication integration, and the generation framework of the M3SC dataset is further given. To obtain multi-modal sensory data in physical space and communication data in electromagnetic space, we utilize AirSim and WaveFarer to collect multi-modal sensory data and exploit Wireless InSite to collect communication data. Furthermore, the in-depth integration and precise alignment of AirSim, WaveFarer, and Wireless InSite are achieved. The M3SC dataset covers various weather conditions, various frequency bands, and different times of the day. Currently, the M3SC dataset contains 1500 snapshots, including 80 RGB images, 160 depth maps, 80 LiDAR point clouds, 256 sets of mmWave waveforms with 8 radar point clouds, and 72 channel impulse response (CIR) matrices per snapshot, thus totaling 120,000 RGB images, 240,000 depth maps, 120,000 LiDAR point clouds, 384,000 sets of mmWave waveforms with 12,000 radar point clouds, and 108,000 CIR matrices. The data processing result presents the multi-modal sensory information and communication channel statistical properties. Finally, the MMM sensing-communication application, which can be supported by the M3SC dataset, is discussed. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: 12 pages, 12 figures

arXiv:2306.07019 [pdf, other]

Dynamic Causal Graph Convolutional Network for Traffic Prediction

Authors: Junpeng Lin, Ziyue Li, Zhishuai Li, Lei Bai, Rui Zhao, Chen Zhang

Abstract: Modeling complex spatiotemporal dependencies in correlated traffic series is essential for traffic prediction. While recent works have shown improved prediction performance by using neural networks to extract spatiotemporal correlations, their effectiveness depends on the quality of the graph structures used to represent the spatial topology of the traffic network. In this work, we propose a novel… ▽ More Modeling complex spatiotemporal dependencies in correlated traffic series is essential for traffic prediction. While recent works have shown improved prediction performance by using neural networks to extract spatiotemporal correlations, their effectiveness depends on the quality of the graph structures used to represent the spatial topology of the traffic network. In this work, we propose a novel approach for traffic prediction that embeds time-varying dynamic Bayesian network to capture the fine spatiotemporal topology of traffic data. We then use graph convolutional networks to generate traffic forecasts. To enable our method to efficiently model nonlinear traffic propagation patterns, we develop a deep learning-based module as a hyper-network to generate stepwise dynamic causal graphs. Our experimental results on a real traffic dataset demonstrate the superior prediction performance of the proposed method. The code is available at https://github.com/MonBG/DCGCN. △ Less

Submitted 7 September, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

Comments: Accepted to IEEE CASE 2023; Peter Luh Best Memorial Award for Young Researcher (Finalist)

arXiv:2305.11686 [pdf, other]

Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs Towards Robot-assisted Intubation

Authors: Guankun Wang, Tian-Ao Ren, Jiewen Lai, Long Bai, Hongliang Ren

Abstract: Robotic-assisted tracheal intubation requires the robot to distinguish anatomical features like an experienced physician using deep-learning techniques. However, real datasets of oropharyngeal organs are limited due to patient privacy issues, making it challenging to train deep-learning models for accurate image segmentation. We hereby consider generating a new data modality through a virtual envi… ▽ More Robotic-assisted tracheal intubation requires the robot to distinguish anatomical features like an experienced physician using deep-learning techniques. However, real datasets of oropharyngeal organs are limited due to patient privacy issues, making it challenging to train deep-learning models for accurate image segmentation. We hereby consider generating a new data modality through a virtual environment to assist the training process. Specifically, this work introduces a virtual dataset generated by the Simulation Open Framework Architecture (SOFA) framework to overcome the limited availability of actual endoscopic images. We also propose a domain adaptive Sim-to-Real method for oropharyngeal organ image segmentation, which employs an image blending strategy called IoU-Ranking Blend (IRB) and style-transfer techniques to address discrepancies between datasets. Experimental results demonstrate the superior performance of the proposed approach with domain adaptive models, improving segmentation accuracy and training stability. In the practical application, the trained segmentation model holds great promise for robot-assisted intubation surgery and intelligent surgical navigation. △ Less

Submitted 27 June, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: Extended abstract in IEEE ICRA 2023 Workshop (New Evolutions in Surgical Robotics: Embracing Multimodal Imaging Guidance, Intelligence, and Bio-inspired Mechanisms). arXiv admin note: text overlap with arXiv:2305.10883

arXiv:2305.10883 [pdf, other]

Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs

Authors: Guankun Wang, Tian-Ao Ren, Jiewen Lai, Long Bai, Hongliang Ren

Abstract: Video-assisted transoral tracheal intubation (TI) necessitates using an endoscope that helps the physician insert a tracheal tube into the glottis instead of the esophagus. The growing trend of robotic-assisted TI would require a medical robot to distinguish anatomical features like an experienced physician which can be imitated by utilizing supervised deep-learning techniques. However, the real d… ▽ More Video-assisted transoral tracheal intubation (TI) necessitates using an endoscope that helps the physician insert a tracheal tube into the glottis instead of the esophagus. The growing trend of robotic-assisted TI would require a medical robot to distinguish anatomical features like an experienced physician which can be imitated by utilizing supervised deep-learning techniques. However, the real datasets of oropharyngeal organs are often inaccessible due to limited open-source data and patient privacy. In this work, we propose a domain adaptive Sim-to-Real framework called IoU-Ranking Blend-ArtFlow (IRB-AF) for image segmentation of oropharyngeal organs. The framework includes an image blending strategy called IoU-Ranking Blend (IRB) and style-transfer method ArtFlow. Here, IRB alleviates the problem of poor segmentation performance caused by significant datasets domain differences; while ArtFlow is introduced to reduce the discrepancies between datasets further. A virtual oropharynx image dataset generated by the SOFA framework is used as the learning subject for semantic segmentation to deal with the limited availability of actual endoscopic images. We adapted IRB-AF with the state-of-the-art domain adaptive segmentation models. The results demonstrate the superior performance of our approach in further improving the segmentation accuracy and training stability. △ Less

Submitted 27 July, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: The manuscript is accepted by Medical & Biological Engineering & Computing. Code and dataset: https://github.com/gkw0010/EISOST-Sim2Real-Dataset-Release

arXiv:2301.03377 [pdf, other]

Machine Learning for Large-Scale Optimization in 6G Wireless Networks

Authors: Yandong Shi, Lixiang Lian, Yuanming Shi, Zixin Wang, Yong Zhou, Liqun Fu, Lin Bai, Jun Zhang, Wei Zhang

Abstract: The sixth generation (6G) wireless systems are envisioned to enable the paradigm shift from "connected things" to "connected intelligence", featured by ultra high density, large-scale, dynamic heterogeneity, diversified functional requirements and machine learning capabilities, which leads to a growing need for highly efficient intelligent algorithms. The classic optimization-based algorithms usua… ▽ More The sixth generation (6G) wireless systems are envisioned to enable the paradigm shift from "connected things" to "connected intelligence", featured by ultra high density, large-scale, dynamic heterogeneity, diversified functional requirements and machine learning capabilities, which leads to a growing need for highly efficient intelligent algorithms. The classic optimization-based algorithms usually require highly precise mathematical model of data links and suffer from poor performance with high computational cost in realistic 6G applications. Based on domain knowledge (e.g., optimization models and theoretical tools), machine learning (ML) stands out as a promising and viable methodology for many complex large-scale optimization problems in 6G, due to its superior performance, generalizability, computational efficiency and robustness. In this paper, we systematically review the most representative "learning to optimize" techniques in diverse domains of 6G wireless networks by identifying the inherent feature of the underlying optimization problem and investigating the specifically designed ML frameworks from the perspective of optimization. In particular, we will cover algorithm unrolling, learning to branch-and-bound, graph neural network for structured optimization, deep reinforcement learning for stochastic optimization, end-to-end learning for semantic optimization, as well as federated learning for distributed optimization, for solving challenging large-scale optimization problems arising from various important wireless applications. Through the in-depth discussion, we shed light on the excellent performance of ML-based optimization algorithms with respect to the classical methods, and provide insightful guidance to develop advanced ML techniques in 6G networks. △ Less

Submitted 3 January, 2023; originally announced January 2023.

arXiv:2212.07651 [pdf, other]

Two-stage Contextual Transformer-based Convolutional Neural Network for Airway Extraction from CT Images

Authors: Yanan Wu, Shuiqing Zhao, Shouliang Qi, Jie Feng, Haowen Pang, Runsheng Chang, Long Bai, Mengqi Li, Shuyue Xia, Wei Qian, Hongliang Ren

Abstract: Accurate airway extraction from computed tomography (CT) images is a critical step for planning navigation bronchoscopy and quantitative assessment of airway-related chronic obstructive pulmonary disease (COPD). The existing methods are challenging to sufficiently segment the airway, especially the high-generation airway, with the constraint of the limited label and cannot meet the clinical use in… ▽ More Accurate airway extraction from computed tomography (CT) images is a critical step for planning navigation bronchoscopy and quantitative assessment of airway-related chronic obstructive pulmonary disease (COPD). The existing methods are challenging to sufficiently segment the airway, especially the high-generation airway, with the constraint of the limited label and cannot meet the clinical use in COPD. We propose a novel two-stage 3D contextual transformer-based U-Net for airway segmentation using CT images. The method consists of two stages, performing initial and refined airway segmentation. The two-stage model shares the same subnetwork with different airway masks as input. Contextual transformer block is performed both in the encoder and decoder path of the subnetwork to finish high-quality airway segmentation effectively. In the first stage, the total airway mask and CT images are provided to the subnetwork, and the intrapulmonary airway mask and corresponding CT scans to the subnetwork in the second stage. Then the predictions of the two-stage method are merged as the final prediction. Extensive experiments were performed on in-house and multiple public datasets. Quantitative and qualitative analysis demonstrate that our proposed method extracted much more branches and lengths of the tree while accomplishing state-of-the-art airway segmentation performance. The code is available at https://github.com/zhaozsq/airway_segmentation. △ Less

Submitted 15 December, 2022; originally announced December 2022.

arXiv:2210.12736 [pdf, other]

Achievable Error Exponents for Two-Phase Multiple Classification

Authors: Lin Zhou, Jun Diao, Lin Bai

Abstract: We revisit $M$-ary classification of Gutman (TIT 1989), where one is tasked to determine whether a testing sequence is generated with the same distribution as one of the $M$ training sequences or not. Our main result is a two-phase test, its theoretical analysis and its optimality guarantee. Specifically, our two-phase test is a special case of a sequential test with only two decision time points:… ▽ More We revisit $M$-ary classification of Gutman (TIT 1989), where one is tasked to determine whether a testing sequence is generated with the same distribution as one of the $M$ training sequences or not. Our main result is a two-phase test, its theoretical analysis and its optimality guarantee. Specifically, our two-phase test is a special case of a sequential test with only two decision time points: the first phase of our test is a fixed-length test with a reject option, the second-phase of our test proceeds only if a reject option is decided in the first phase and the second phase of our test does \emph{not} allow a reject option. To provide theoretical guarantee for our test, we derive achievable error exponents using the method of types and derive a converse result for the optimal sequential test using the techniques recently proposed by Hsu, Li and Wang (ITW, 2022) for binary classification. Analytically and numerically, we show that our two phase test achieves the performance of an optimal sequential test with proper choice of test parameters. In particular, similarly as the optimal sequential test, our test does not need a final reject option to achieve the optimal error exponent region while an optimal fixed-length test needs a reject option to achieve the same region. Finally, we specialize our results to binary classification when $M=2$ and to $M$-ary hypothesis testing when the ratio of the lengths of training sequences and testing sequences tends to infinity so that generating distributions can be estimated perfectly. △ Less

Submitted 26 May, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

Comments: submitted to IEEE Trans. Inf. Theory

arXiv:2210.06747 [pdf, other]

DCANet: Differential Convolution Attention Network for RGB-D Semantic Segmentation

Authors: Lizhi Bai, Jun Yang, Chunqi Tian, Yaoru Sun, Maoyu Mao, Yanjun Xu, Weirong Xu

Abstract: Combining RGB images and the corresponding depth maps in semantic segmentation proves the effectiveness in the past few years. Existing RGB-D modal fusion methods either lack the non-linear feature fusion ability or treat both modal images equally, regardless of the intrinsic distribution gap or information loss. Here we find that depth maps are suitable to provide intrinsic fine-grained patterns… ▽ More Combining RGB images and the corresponding depth maps in semantic segmentation proves the effectiveness in the past few years. Existing RGB-D modal fusion methods either lack the non-linear feature fusion ability or treat both modal images equally, regardless of the intrinsic distribution gap or information loss. Here we find that depth maps are suitable to provide intrinsic fine-grained patterns of objects due to their local depth continuity, while RGB images effectively provide a global view. Based on this, we propose a pixel differential convolution attention (DCA) module to consider geometric information and local-range correlations for depth data. Furthermore, we extend DCA to ensemble differential convolution attention (EDCA) which propagates long-range contextual dependencies and seamlessly incorporates spatial distribution for RGB data. DCA and EDCA dynamically adjust convolutional weights by pixel difference to enable self-adaptive in local and long range, respectively. A two-branch network built with DCA and EDCA, called Differential Convolutional Network (DCANet), is proposed to fuse local and global information of two-modal data. Consequently, the individual advantage of RGB and depth data are emphasized. Our DCANet is shown to set a new state-of-the-art performance for RGB-D semantic segmentation on two challenging benchmark datasets, i.e., NYUDv2 and SUN-RGBD. △ Less

Submitted 13 October, 2022; originally announced October 2022.

arXiv:2210.00770 [pdf, other]

Accelerate Reinforcement Learning with PID Controllers in the Pendulum Simulations

Authors: Li** Bai

Abstract: We propose a Proportional Integral Derivative (PID) controller-based coaching scheme to expedite reinforcement learning (RL). We propose a Proportional Integral Derivative (PID) controller-based coaching scheme to expedite reinforcement learning (RL). △ Less

Submitted 3 October, 2022; originally announced October 2022.

arXiv:2206.10255 [pdf, other]

GNN-PMB: A Simple but Effective Online 3D Multi-Object Tracker without Bells and Whistles

Authors: Jianan Liu, Li** Bai, Yuxuan Xia, Tao Huang, Bing Zhu, Qing-Long Han

Abstract: Multi-object tracking (MOT) is among crucial applications in modern advanced driver assistance systems (ADAS) and autonomous driving (AD) systems. The global nearest neighbor (GNN) filter, as the earliest random vector-based Bayesian tracking framework, has been adopted in most of state-of-the-arts trackers in the automotive industry. The development of random finite set (RFS) theory facilitates a… ▽ More Multi-object tracking (MOT) is among crucial applications in modern advanced driver assistance systems (ADAS) and autonomous driving (AD) systems. The global nearest neighbor (GNN) filter, as the earliest random vector-based Bayesian tracking framework, has been adopted in most of state-of-the-arts trackers in the automotive industry. The development of random finite set (RFS) theory facilitates a mathematically rigorous treatment of the MOT problem, and different variants of RFS-based Bayesian filters have then been proposed. However, their effectiveness in the real ADAS and AD application is still an open problem. In this paper, it is demonstrated that the latest RFS-based Bayesian tracking framework could be superior to typical random vector-based Bayesian tracking framework via a systematic comparative study of both traditional random vector-based Bayesian filters with rule-based heuristic track maintenance and RFS-based Bayesian filters on the nuScenes validation dataset. An RFS-based tracker, namely Poisson multi-Bernoulli filter using the global nearest neighbor (GNN-PMB), is proposed to LiDAR-based MOT tasks. This GNN-PMB tracker is simple to use, and it achieves competitive results on the nuScenes dataset. Specifically, the proposed GNN-PMB tracker outperforms most state-of-the-art LiDAR-only trackers and LiDAR and camera fusion-based trackers, ranking the $3^{rd}$ among all LiDAR-only trackers on nuScenes 3D tracking challenge leader board at the time of submission. △ Less

Submitted 8 February, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

Comments: accepted by IEEE Transactions on Intelligent Vehicles

arXiv:2205.13974 [pdf]

doi 10.1109/JPROC.2022.3177230

DLMP of Competitive Markets in Active Distribution Networks: Models, Solutions, Applications, and Visions

Authors: Xiaofei Wang, Fangxing Li, Linquan Bai, Xin Fang

Abstract: Traditionally, the electric distribution system operates with uniform energy prices across all system nodes. However, as the adoption of distributed energy resources (DERs) propels a shift from passive to active distribution network (ADN) operation, a distribution-level electricity market has been proposed to manage new complexities efficiently. In addition, distribution locational marginal price… ▽ More Traditionally, the electric distribution system operates with uniform energy prices across all system nodes. However, as the adoption of distributed energy resources (DERs) propels a shift from passive to active distribution network (ADN) operation, a distribution-level electricity market has been proposed to manage new complexities efficiently. In addition, distribution locational marginal price (DLMP) has been established in the literature as the primary pricing mechanism. The DLMP inherits the LMP concept in the transmission-level wholesale market, but incorporates characteristics of the distribution system, such as high R/X ratios and power losses, system imbalance, and voltage regulation needs. The DLMP provides a solution that can be essential for competitive market operation in future distribution systems. This paper first provides an overview of the current distribution-level market architectures and their early implementations. Next, the general clearing model, model relaxations, and DLMP formulation are comprehensively reviewed. The state-of-the-art solution methods for distribution market clearing are summarized and categorized into centralized, distributed, and decentralized methods. Then, DLMP applications for the operation and planning of DERs and distribution system operators (DSOs) are discussed in detail. Finally, visions of future research directions and possible barriers and challenges are presented. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Journal ref: Proceedings of the IEEE, vol. 111, no. 7, pp. 725-743, July 2023

arXiv:2205.01509 [pdf, other]

MS Lesion Segmentation: Revisiting Weighting Mechanisms for Federated Learning

Authors: Dongnan Liu, Mariano Cabezas, Dongang Wang, Zihao Tang, Lei Bai, Geng Zhan, Yuling Luo, Kain Kyle, Linda Ly, James Yu, Chun-Chien Shieh, Aria Nguyen, Ettikan Kandasamy Karuppiah, Ryan Sullivan, Fernando Calamante, Michael Barnett, Wanli Ouyang, Weidong Cai, Chenyu Wang

Abstract: Federated learning (FL) has been widely employed for medical image analysis to facilitate multi-client collaborative learning without sharing raw data. Despite great success, FL's performance is limited for multiple sclerosis (MS) lesion segmentation tasks, due to variance in lesion characteristics imparted by different scanners and acquisition parameters. In this work, we propose the first FL MS… ▽ More Federated learning (FL) has been widely employed for medical image analysis to facilitate multi-client collaborative learning without sharing raw data. Despite great success, FL's performance is limited for multiple sclerosis (MS) lesion segmentation tasks, due to variance in lesion characteristics imparted by different scanners and acquisition parameters. In this work, we propose the first FL MS lesion segmentation framework via two effective re-weighting mechanisms. Specifically, a learnable weight is assigned to each local node during the aggregation process, based on its segmentation performance. In addition, the segmentation loss function in each client is also re-weighted according to the lesion volume for the data during training. Comparison experiments on two FL MS segmentation scenarios using public and clinical datasets have demonstrated the effectiveness of the proposed method by outperforming other FL methods significantly. Furthermore, the segmentation performance of FL incorporating our proposed aggregation mechanism can exceed centralised training with all the raw data. The extensive evaluation also indicated the superiority of our method when estimating brain volume differences estimation after lesion inpainting. △ Less

Submitted 3 May, 2022; originally announced May 2022.

Comments: 10 pages, 3 figures, and 7 tables

arXiv:2203.02384 [pdf, other]

AutoMO-Mixer: An automated multi-objective Mixer model for balanced, safe and robust prediction in medicine

Authors: Xi Chen, Jiahuan Lv, Dehua Feng, Xuanqin Mou, Ling Bai, Shu Zhang, Zhiguo Zhou

Abstract: Accurately identifying patient's status through medical images plays an important role in diagnosis and treatment. Artificial intelligence (AI), especially the deep learning, has achieved great success in many fields. However, more reliable AI model is needed in image guided diagnosis and therapy. To achieve this goal, develo** a balanced, safe and robust model with a unified framework is desira… ▽ More Accurately identifying patient's status through medical images plays an important role in diagnosis and treatment. Artificial intelligence (AI), especially the deep learning, has achieved great success in many fields. However, more reliable AI model is needed in image guided diagnosis and therapy. To achieve this goal, develo** a balanced, safe and robust model with a unified framework is desirable. In this study, a new unified model termed as automated multi-objective Mixer (AutoMO-Mixer) model was developed, which utilized a recent developed multiple layer perceptron Mixer (MLP-Mixer) as base. To build a balanced model, sensitivity and specificity were considered as the objective functions simultaneously in training stage. Meanwhile, a new evidential reasoning based on entropy was developed to achieve a safe and robust model in testing stage. The experiment on an optical coherence tomography dataset demonstrated that AutoMO-Mixer can obtain safer, more balanced, and robust results compared with MLP-Mixer and other available models. △ Less

Submitted 4 March, 2022; originally announced March 2022.

arXiv:2110.07391 [pdf]

Distribution Locational Marginal Pricing Under Uncertainty Considering Coordination of Distribution and Wholesale Markets

Authors: Zongzheng Zhao, Yixin Liu, Li Guo, Linquan Bai, Chengshan Wang

Abstract: An effective distribution electricity market (DEM) is required to manage the rapidly growing small-scale distributed energy resources (DERs) in distribution systems (DSs). This paper proposes a day-ahead DEM clearing and pricing mechanism to account for the uncertainty of DERs and the coordination with the wholesale electricity market (WEM) through a bi-level model. The upper-level model clears th… ▽ More An effective distribution electricity market (DEM) is required to manage the rapidly growing small-scale distributed energy resources (DERs) in distribution systems (DSs). This paper proposes a day-ahead DEM clearing and pricing mechanism to account for the uncertainty of DERs and the coordination with the wholesale electricity market (WEM) through a bi-level model. The upper-level model clears the WEM in the transmission system (TS) and forms the locational marginal price (LMP) and uncertainty LMP (ULMP) for energy and uncertainty/reserve, respectively. In the lower level, a robust scheduling model considering WEM-DEM coordination and uncertainties is proposed to clear the DEM. Accordingly, the distribution LMPs (DLMPs) for active power, reactive power and uncertainty/reserve are derived to reward the energy/reserve provision and charge uncertain resources in the DEM, which provide effective price signals for managing not only the voltage and congestion, but also the uncertainty in DSs. A heterogeneous decomposition (HGD) algorithm is utilized to solve the bi-level model in a decentralized manner with limited information interaction between TS and DSs, which guarantees the solution efficiency and information privacy. The effectiveness of the proposed method is verified via numerous case studies. △ Less

Submitted 14 October, 2021; originally announced October 2021.

arXiv:2110.01775 [pdf, other]

doi 10.1109/TIV.2022.3168899

Deep Instance Segmentation with Automotive Radar Detection Points

Authors: Jianan Liu, Weiyi Xiong, Li** Bai, Yuxuan Xia, Tao Huang, Wanli Ouyang, Bing Zhu

Abstract: Automotive radar provides reliable environmental perception in all-weather conditions with affordable cost, but it hardly supplies semantic and geometry information due to the sparsity of radar detection points. With the development of automotive radar technologies in recent years, instance segmentation becomes possible by using automotive radar. Its data contain contexts such as radar cross secti… ▽ More Automotive radar provides reliable environmental perception in all-weather conditions with affordable cost, but it hardly supplies semantic and geometry information due to the sparsity of radar detection points. With the development of automotive radar technologies in recent years, instance segmentation becomes possible by using automotive radar. Its data contain contexts such as radar cross section and micro-Doppler effects, and sometimes can provide detection when the field of view is obscured. The outcome from instance segmentation could be potentially used as the input of trackers for tracking targets. The existing methods often utilize a clustering-based classification framework, which fits the need of real-time processing but has limited performance due to minimum information provided by sparse radar detection points. In this paper, we propose an efficient method based on clustering of estimated semantic information to achieve instance segmentation for the sparse radar detection points. In addition, we show that the performance of the proposed approach can be further enhanced by incorporating the visual multi-layer perceptron. The effectiveness of the proposed method is verified by experimental results on the popular RadarScenes dataset, achieving 89.53% mean coverage and 86.97% mean average precision with the IoU threshold of 0.5, which is superior to other approaches in the literature. More significantly, the consumed memory is around 1MB, and the inference time is less than 40ms, indicating that our proposed algorithm is storage and time efficient. These two criteria ensure the practicality of the proposed method in real-world systems. △ Less

Submitted 5 February, 2023; v1 submitted 4 October, 2021; originally announced October 2021.

Comments: 11 pages, 9 figures, 3 tables, accepted by IEEE Transactions on Intelligent Vehicles

arXiv:2109.02643 [pdf]

doi 10.1016/j.optlaseng.2022.107023

Dual camera snapshot hyperspectral imaging system via physics informed learning

Authors: Hui Xie, Zhuang Zhao, **g Han, Yi Zhang, Lianfa Bai, Jun Lu

Abstract: We consider using the system's optical imaging process with convolutional neural networks (CNNs) to solve the snapshot hyperspectral imaging reconstruction problem, which uses a dual-camera system to capture the three-dimensional hyperspectral images (HSIs) in a compressed way. Various methods using CNNs have been developed in recent years to reconstruct HSIs, but most of the supervised deep learn… ▽ More We consider using the system's optical imaging process with convolutional neural networks (CNNs) to solve the snapshot hyperspectral imaging reconstruction problem, which uses a dual-camera system to capture the three-dimensional hyperspectral images (HSIs) in a compressed way. Various methods using CNNs have been developed in recent years to reconstruct HSIs, but most of the supervised deep learning methods aimed to fit a brute-force map** relationship between the captured compressed image and standard HSIs. Thus, the learned map** would be invalid when the observation data deviate from the training data. Especially, we usually don't have ground truth in real-life scenarios. In this paper, we present a self-supervised dual-camera equipment with an untrained physics-informed CNNs framework. Extensive simulation and experimental results show that our method without training can be adapted to a wide imaging environment with good performance. Furthermore, compared with the training-based methods, our system can be constantly fine-tuned and self-improved in real-life scenarios. △ Less

Submitted 17 November, 2021; v1 submitted 6 September, 2021; originally announced September 2021.

arXiv:2108.03026 [pdf]

The Influence of Age and Gender Information on the Diagnosis of Diabetic Retinopathy: Based on Neural Networks

Authors: Long Bai, Sihang Chen, Mingyang Gao, Leila Abdelrahman, Manal Al Ghamdi, Mohamed Abdel-Mottaleb

Abstract: This paper proposes the importance of age and gender information in the diagnosis of diabetic retinopathy. We utilized Deep Residual Neural Networks (ResNet) and Densely Connected Convolutional Networks (DenseNet), which are proven effective on image classification problems and the diagnosis of diabetic retinopathy using the retinal fundus images. We used the ensemble of several classical networks… ▽ More This paper proposes the importance of age and gender information in the diagnosis of diabetic retinopathy. We utilized Deep Residual Neural Networks (ResNet) and Densely Connected Convolutional Networks (DenseNet), which are proven effective on image classification problems and the diagnosis of diabetic retinopathy using the retinal fundus images. We used the ensemble of several classical networks and decentralized the training so that the network was simple and avoided overfitting. To observe whether the age and gender information could help enhance the performance, we added the information before the dense layer and compared the results with the results that did not add age and gender information. We found that the test accuracy of the network with age and gender information was 2.67% higher than that of the network without age and gender information. Meanwhile, compared with gender information, age information had a better help for the results. △ Less

Submitted 6 August, 2021; originally announced August 2021.

Comments: 4 pages, 4 figures, Accepted in 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE EMBC 2021

arXiv:2012.08091 [pdf]

A Locational Marginal Pricing Mechanism for Uncertainty Management Based on Improved Multi-Ellipsoidal Uncertainty Set

Authors: Zongzheng Zhao, Yixin Liu, Li Guo, Linquan Bai, Chengshan Wang

Abstract: Large-scale integration of renewable energy sources (RES) brings huge challenges to the power system. A cost-effective reserve deployment and uncertainty pricing mechanism are critical to deal with the uncertainty and variability of RES. To this end, this paper proposes a novel locational marginal pricing mechanism in day-ahead market for managing uncertainties from RES. Firstly, an improved multi… ▽ More Large-scale integration of renewable energy sources (RES) brings huge challenges to the power system. A cost-effective reserve deployment and uncertainty pricing mechanism are critical to deal with the uncertainty and variability of RES. To this end, this paper proposes a novel locational marginal pricing mechanism in day-ahead market for managing uncertainties from RES. Firstly, an improved multi-ellipsoidal uncertainty set (IMEUS) considering the temporal correlation and conditional correlation of wind power forecast is formulated to better capture the uncertainty of wind power. The dimension of each ellipsoidal subset is optimized based on a comprehensive evaluation index to reduce the invalid region without large loss of modeling accuracy, so as to reduce the conservatism. Then, an IMEUS-based robust unit commitment (RUC) model and a robust economic dispatch (RED) model are established for the day-ahead market clearing. Both the reserve cost and ram** constraints are considered in the overall dispatch process. Furthermore, based on the Langrangian function of the RED model, a new locational marginal pricing mechanism is developed. The uncertainty locational marginal price (ULMP) is introduced to charge the RES for its uncertainties and reward the generators who provide reserve to mitigate uncertainties. The new pricing mechanism can provide effective price signals to incentivize the uncertainty management in the day-ahead market. Finally, the effectiveness of the proposed methods is verified via numerous simulations on the PJM 5-bus system and IEEE 118-bus system. △ Less

Submitted 15 December, 2020; originally announced December 2020.

arXiv:2009.12320 [pdf, ps, other]

Novel Visible Light Communication Assisted Perspective-Four-Line Algorithm for Indoor Localization

Authors: Lin Bai

Abstract: In this paper, we propose a novel visible light communication (VLC) assisted Perspective-fourLine algorithm (V-P4L) for practical indoor localization. The basic idea of V-P4L is to joint VLC and computer vision to achieve high accuracy regardless of LED height differences. In particular, we first exploit the space-domain information to estimate the orientation and coordinate of a single rectangula… ▽ More In this paper, we propose a novel visible light communication (VLC) assisted Perspective-fourLine algorithm (V-P4L) for practical indoor localization. The basic idea of V-P4L is to joint VLC and computer vision to achieve high accuracy regardless of LED height differences. In particular, we first exploit the space-domain information to estimate the orientation and coordinate of a single rectangular LED luminaire in the camera coordinate system based on the plane and solid geometry. Then, based on the time-domain information transmitted by VLC and the estimated luminaire information, V-P4L can estimate the position and pose of the camera by the single-view geometry theory and the linear least square (LLS) method. To further mitigate the effect of height differences among LEDs on localization accuracy, we then propose a correction algorithm of V-P4L based on the LLS method and a simple optimization method. Due to the combination of time- and space-domain information, V-P4L only requires a single luminaire for localization without limitation on the correspondences between the features and their projections. Simulation results show that for V-P4L the position error is always less than 15 cm and the orientation error is always less than 3° using popular indoor luminaires. △ Less

Submitted 25 September, 2020; originally announced September 2020.

arXiv:2007.02438 [pdf, other]

DepthNet: Real-Time LiDAR Point Cloud Depth Completion for Autonomous Vehicles

Authors: Lin Bai, Yiming Zhao, Mahdi Elhousni, Xinming Huang

Abstract: Autonomous vehicles rely heavily on sensors such as camera and LiDAR, which provide real-time information about their surroundings for the tasks of perception, planning and control. Typically a LiDAR can only provide sparse point cloud owing to a limited number of scanning lines. By employing depth completion, a dense depth map can be generated by assigning each camera pixel a corresponding depth… ▽ More Autonomous vehicles rely heavily on sensors such as camera and LiDAR, which provide real-time information about their surroundings for the tasks of perception, planning and control. Typically a LiDAR can only provide sparse point cloud owing to a limited number of scanning lines. By employing depth completion, a dense depth map can be generated by assigning each camera pixel a corresponding depth value. However, the existing depth completion convolutional neural networks are very complex that requires high-end GPUs for processing, and thus they are not applicable to real-time autonomous driving. In this paper, a light-weight network is proposed for the task of LiDAR point cloud depth completion. With an astonishing 96.2% reduction in the number of parameters, it still achieves comparable performance (9.3% better in MAE but 3.9% worse in RMSE) to the state-of-the-art network. For real-time embedded platforms, depthwise separable technique is applied to both convolution and deconvolution operations and the number of parameters decreases further by a factor of 7.3, with only a small percentage increase in RMSE and MAE performance. Moreover, a system-on-chip architecture for depth completion is developed on a PYNQ-based FPGA platform that achieves real-time processing for HDL-64E LiDAR at the speed 11.1 frame per second. △ Less

Submitted 5 July, 2020; originally announced July 2020.

arXiv:2006.08829 [pdf, other]

Multiagent Reinforcement Learning based Energy Beamforming Control

Authors: Li** Bai, Zhongqiang Pang

Abstract: Ultra low power devices make far-field wireless power transfer a viable option for energy delivery despite the exponential attenuation. Electromagnetic beams are constructed from the stations such that wireless energy is directionally concentrated around the ultra low power devices. Energy beamforming faces different challenges compare to information beamforming due to the lack of feedback on chan… ▽ More Ultra low power devices make far-field wireless power transfer a viable option for energy delivery despite the exponential attenuation. Electromagnetic beams are constructed from the stations such that wireless energy is directionally concentrated around the ultra low power devices. Energy beamforming faces different challenges compare to information beamforming due to the lack of feedback on channel state. Various methods have been proposed such as one-bit channel feedback to enhance energy beamforming capacity, yet it still has considerable computation overhead and need to be computed centrally. Valuable resources and time is wasted on transfering control information back and forth. In this paper, we propose a novel multiagent reinforcement learning(MARL) formulation for codebook based beamforming control. It takes advantage of the inherienntly distributed structure in a wirelessly powered network and lay the ground work for fully locally computed beam control algorithms. Source code can be found at https://github.com/BaiLi**/WirelessPowerTransfer. △ Less

Submitted 15 June, 2020; originally announced June 2020.

Comments: 5 Pages, 3 Figures

arXiv:2006.07644 [pdf, other]

RoadNet-RT: High Throughput CNN Architecture and SoC Design for Real-Time Road Segmentation

Authors: Lin Bai, Yecheng Lyu, Xinming Huang

Abstract: In recent years, convolutional neural network has gained popularity in many engineering applications especially for computer vision. In order to achieve better performance, often more complex structures and advanced operations are incorporated into the neural networks, which results very long inference time. For time-critical tasks such as autonomous driving and virtual reality, real-time processi… ▽ More In recent years, convolutional neural network has gained popularity in many engineering applications especially for computer vision. In order to achieve better performance, often more complex structures and advanced operations are incorporated into the neural networks, which results very long inference time. For time-critical tasks such as autonomous driving and virtual reality, real-time processing is fundamental. In order to reach real-time process speed, a light-weight, high-throughput CNN architecture namely RoadNet-RT is proposed for road segmentation in this paper. It achieves 90.33% MaxF score on test set of KITTI road segmentation task and 8 ms per frame when running on GTX 1080 GPU. Comparing to the state-of-the-art network, RoadNet-RT speeds up the inference time by a factor of 20 at the cost of only 6.2% accuracy loss. For hardware design optimization, several techniques such as depthwise separable convolution and non-uniformed kernel size convolution are customized designed to further reduce the processing time. The proposed CNN architecture has been successfully implemented on an FPGA ZCU102 MPSoC platform that achieves the computation capability of 83.05 GOPS. The system throughput reaches 327.9 frames per second with image size 1216x176. △ Less

Submitted 17 May, 2021; v1 submitted 13 June, 2020; originally announced June 2020.

Journal ref: in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 68, no. 2, pp. 704-714, Feb. 2021

arXiv:2006.00053 [pdf, other]

A Unified Hardware Architecture for Convolutions and Deconvolutions in CNN

Authors: Lin Bai, Yecheng Lyu, Xinming Huang

Abstract: In this paper, a scalable neural network hardware architecture for image segmentation is proposed. By sharing the same computing resources, both convolution and deconvolution operations are handled by the same process element array. In addition, access to on-chip and off-chip memories is optimized to alleviate the burden introduced by partial sum. As an example, SegNet-Basic has been implemented u… ▽ More In this paper, a scalable neural network hardware architecture for image segmentation is proposed. By sharing the same computing resources, both convolution and deconvolution operations are handled by the same process element array. In addition, access to on-chip and off-chip memories is optimized to alleviate the burden introduced by partial sum. As an example, SegNet-Basic has been implemented using the proposed unified architecture by targeting on Xilinx ZC706 FPGA, which achieves the performance of 151.5 GOPS and 94.3 GOPS for convolution and deconvolution respectively. This unified convolution/deconvolution design is applicable to other CNNs with deconvolution. △ Less

Submitted 29 May, 2020; originally announced June 2020.

Comments: This paper has been accepted by ISCAS 2020

arXiv:2006.00049 [pdf, other]

PointNet on FPGA for Real-Time LiDAR Point Cloud Processing

Authors: Lin Bai, Yecheng Lyu, Xin Xu, Xinming Huang

Abstract: LiDAR sensors have been widely used in many autonomous vehicle modalities, such as perception, map**, and localization. This paper presents an FPGA-based deep learning platform for real-time point cloud processing targeted on autonomous vehicles. The software driver for the Velodyne LiDAR sensor is modified and moved into the on-chip processor system, while the programmable logic is designed as… ▽ More LiDAR sensors have been widely used in many autonomous vehicle modalities, such as perception, map**, and localization. This paper presents an FPGA-based deep learning platform for real-time point cloud processing targeted on autonomous vehicles. The software driver for the Velodyne LiDAR sensor is modified and moved into the on-chip processor system, while the programmable logic is designed as a customized hardware accelerator. As the state-of-art deep learning algorithm for point cloud processing, PointNet is successfully implemented on the proposed FPGA platform. Targeted on a Xilinx Zynq UltraScale+ MPSoC ZCU104 development board, the FPGA implementations of PointNet achieve the computing performance of 182.1 GOPS and 280.0 GOPS for classification and segmentation respectively. The proposed design can support an input up to 4096 points per frame. The processing time is 19.8 ms for classification and 34.6 ms for segmentation, which meets the real-time requirement for most of the existing LiDAR sensors. △ Less

Submitted 29 May, 2020; originally announced June 2020.

Comments: This paper has been accepted by ISCAS 2020

arXiv:2005.13529 [pdf, ps, other]

doi 10.1109/LCOMM.2020.2993961

Angle-Dependent Phase Shifter Model for Reconfigurable Intelligent Surfaces: Does the Angle-Reciprocity Hold?

Authors: Weicong Chen, Lin Bai, Wankai Tang, Shi **, Wei Xiang Jiang, Tie Jun Cui

Abstract: The existing phase shifter models adopted for reconfigurable intelligent surfaces (RISs) have ignored the electromagnetic (EM) waves propagation behavior, thus cannot reveal practical effects of RIS on wireless communication systems. Based on the equivalent circuit, this paper introduces an angle-dependent phase shifter model for varactor-based RISs. To the best of our knowledge, this is the first… ▽ More The existing phase shifter models adopted for reconfigurable intelligent surfaces (RISs) have ignored the electromagnetic (EM) waves propagation behavior, thus cannot reveal practical effects of RIS on wireless communication systems. Based on the equivalent circuit, this paper introduces an angle-dependent phase shifter model for varactor-based RISs. To the best of our knowledge, this is the first phase shifter model which reveals that the incident angle of EM waves has influence on the reflection coefficient of RIS. In addition, the angle-reciprocity on RIS is investigated and further proved to be tenable when the reflection phase difference of adjacent RIS unit cells is invariant for an im**ing EM wave and its reverse incident one. The angle-dependent characteristic of RIS is verified through full-wave simulation. According to our analysis and the simulation results, we find that the angle-reciprocity of varactor-based RIS only holds under small incident angles of both forward and reverse incident EM waves, thus limits the channel reciprocity in RIS-assisted TDD systems. △ Less

Submitted 8 May, 2020; originally announced May 2020.

Comments: Accepted by IEEE Communications Letters

arXiv:2005.11162 [pdf, ps, other]

doi 10.1364/OE.400992

A Novel Received Signal Strength Assisted Perspective-three-Point Algorithm for Indoor Visible Light Positioning

Authors: Lin Bai, Yang Yang, Chunyan Feng, Caili Guo

Abstract: In this paper, a received signal strength assisted Perspective-three-Point positioning algorithm (R-P3P) is proposed for visible light positioning (VLP) systems. The basic idea of R-P3P is to joint visual and strength information to estimate the receiver position using 3 LEDs regardless of the LEDs' orientations. R-P3P first utilizes visual information captured by the camera to estimate the incide… ▽ More In this paper, a received signal strength assisted Perspective-three-Point positioning algorithm (R-P3P) is proposed for visible light positioning (VLP) systems. The basic idea of R-P3P is to joint visual and strength information to estimate the receiver position using 3 LEDs regardless of the LEDs' orientations. R-P3P first utilizes visual information captured by the camera to estimate the incidence angles of visible lights. Then, R-P3P calculates the candidate distances between the LEDs and the receiver based on the law of cosines and the Wu-Ritt's zero decomposition method. Based on the incidence angles, the candidate distances and the physical characteristics of the LEDs, R-P3P can select the exact distances from all the candidate distances. Finally, the linear least square (LLS) method is employed to estimate the position of the receiver. Due to the combination of visual and strength information of visible light signals, R-P3P can achieve high accuracy using 3 LEDs regardless of the LEDs' orientations. Simulation results show that R-P3P can achieve positioning accuracy within 10 cm over 70% indoor area with low complexity regardless of LEDs orientations. △ Less

Submitted 21 May, 2020; originally announced May 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2004.06294

arXiv:2005.07784 [pdf, other]

A Learning-from-noise Dilated Wide Activation Network for denoising Arterial Spin Labeling (ASL) Perfusion Images

Authors: Danfeng Xie, Yiran Li, Hanlu Yang, Li Bai, Lei Zhang, Ze Wang

Abstract: Arterial spin labeling (ASL) perfusion MRI provides a non-invasive way to quantify cerebral blood flow (CBF) but it still suffers from a low signal-to-noise-ratio (SNR). Using deep machine learning (DL), several groups have shown encouraging denoising results. Interestingly, the improvement was obtained when the deep neural network was trained using noise-contaminated surrogate reference because o… ▽ More Arterial spin labeling (ASL) perfusion MRI provides a non-invasive way to quantify cerebral blood flow (CBF) but it still suffers from a low signal-to-noise-ratio (SNR). Using deep machine learning (DL), several groups have shown encouraging denoising results. Interestingly, the improvement was obtained when the deep neural network was trained using noise-contaminated surrogate reference because of the lack of golden standard high quality ASL CBF images. More strikingly, the output of these DL ASL networks (ASLDN) showed even higher SNR than the surrogate reference. This phenomenon indicates a learning-from-noise capability of deep networks for ASL CBF image denoising, which can be further enhanced by network optimization. In this study, we proposed a new ASLDN to test whether similar or even better ASL CBF image quality can be achieved in the case of highly noisy training reference. Different experiments were performed to validate the learning-from-noise hypothesis. The results showed that the learning-from-noise strategy produced better output quality than ASLDN trained with relatively high SNR reference. △ Less

Submitted 15 May, 2020; originally announced May 2020.

arXiv:2004.06294 [pdf, ps, other]

A High Coverage Camera Assisted Received Signal Strength Ratio Algorithm for Indoor Visible Light Positioning

Authors: Lin Bai, Yang Yang, Chunyan Feng, Caili Guo, Julian Cheng

Abstract: In this paper, a high coverage algorithm termed enhanced camera assisted received signal strength ratio (eCA-RSSR) positioning algorithm is proposed for visible light positioning (VLP) systems. The basic idea of eCA-RSSR is to utilize visual information captured by the camera to estimate the incidence angles of visible lights first. Based on the incidence angles, eCA-RSSR utilizes the received sig… ▽ More In this paper, a high coverage algorithm termed enhanced camera assisted received signal strength ratio (eCA-RSSR) positioning algorithm is proposed for visible light positioning (VLP) systems. The basic idea of eCA-RSSR is to utilize visual information captured by the camera to estimate the incidence angles of visible lights first. Based on the incidence angles, eCA-RSSR utilizes the received signal strength ratio (RSSR) calculated by the photodiode (PD) to estimate the ratios of the distances between the LEDs and the receiver. Based on an Euclidean plane geometry theorem, eCA-RSSR transforms the ratios of the distances into the absolute values. In this way, eCA-RSSR only requires 3 LEDs for both orientation-free 2D and 3D positioning, implying that eCA-RSSR can achieve high coverage. Based on the absolute values of the distances, the linear least square method is employed to estimate the position of the receiver. Therefore, for the receiver having a small distance between the PD and the camera, the accuracy of eCA-RSSR does not depend on the starting values of the non-linear least square method and the complexity of eCA-RSSR is low. Furthermore, since the distance between the PD and camera can significantly affect the performance of eCA-RSSR, we further propose a compensation algorithm for eCA-RSSR based on the single-view geometry. Simulation results show that eCA-RSSR can achieve centimeter-level accuracy over 80% indoor area for both the receivers having a small and a large distance between the PD and the camera. △ Less

Submitted 13 April, 2020; originally announced April 2020.

arXiv:2002.12561 [pdf, ps, other]

A Big Data Enabled Channel Model for 5G Wireless Communication Systems

Authors: Jie Huang, Cheng-Xiang Wang, Lu Bai, Jian Sun, Yang Yang, Jie Li, Olav Tirkkonen, Ming-Tuo Zhou

Abstract: The standardization process of the fifth generation (5G) wireless communications has recently been accelerated and the first commercial 5G services would be provided as early as in 2018. The increasing of enormous smartphones, new complex scenarios, large frequency bands, massive antenna elements, and dense small cells will generate big datasets and bring 5G communications to the era of big data.… ▽ More The standardization process of the fifth generation (5G) wireless communications has recently been accelerated and the first commercial 5G services would be provided as early as in 2018. The increasing of enormous smartphones, new complex scenarios, large frequency bands, massive antenna elements, and dense small cells will generate big datasets and bring 5G communications to the era of big data. This paper investigates various applications of big data analytics, especially machine learning algorithms in wireless communications and channel modeling. We propose a big data and machine learning enabled wireless channel model framework. The proposed channel model is based on artificial neural networks (ANNs), including feed-forward neural network (FNN) and radial basis function neural network (RBF-NN). The input parameters are transmitter (Tx) and receiver (Rx) coordinates, Tx-Rx distance, and carrier frequency, while the output parameters are channel statistical properties, including the received power, root mean square (RMS) delay spread (DS), and RMS angle spreads (ASs). Datasets used to train and test the ANNs are collected from both real channel measurements and a geometry based stochastic model (GBSM). Simulation results show good performance and indicate that machine learning algorithms can be powerful analytical tools for future measurement-based wireless channel modeling. △ Less

Submitted 28 February, 2020; originally announced February 2020.

arXiv:1910.12637 [pdf]

Synchronous locating and imaging behind scattering medium in a large depth based on deep learning

Authors: Shuo Zhu, Enlai Guo, Qianying Cui, Dongliang Zheng, Lianfa Bai, **g Han

Abstract: Scattering medium brings great difficulties to locate and image planar objects especially when the object has a large depth. In this letter, a novel learning-based method is presented to locate and image the object hidden behind a thin scattering diffuser. A multi-task network, named DINet, is constructed to predict the depth and the image of the hidden object from the captured speckle patterns. T… ▽ More Scattering medium brings great difficulties to locate and image planar objects especially when the object has a large depth. In this letter, a novel learning-based method is presented to locate and image the object hidden behind a thin scattering diffuser. A multi-task network, named DINet, is constructed to predict the depth and the image of the hidden object from the captured speckle patterns. The provided experiments verify that the proposed method enables to locate the object with a depth mean error less than 0.05 mm, and image the object with an average PSNR above 24 dB, in a large depth ranging from 350 mm to 1150 mm. The constructed DINet can obtain multiple physical information via a single speckle pattern, including both the depth and image. Comparing with the traditional methods, it paves the way to the practical applications requiring large imaging depth of field behind scattering media. △ Less

Submitted 29 May, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

arXiv:1910.11272 [pdf]

doi 10.1364/OE.383911

Learning-based real-time method to looking through scattering medium beyond the memory effect

Authors: Enlai Guo, Shuo Zhu, Yan Sun, Lianfa Bai, **g Han

Abstract: Strong scattering medium brings great difficulties to optical imaging, which is also a problem in medical imaging and many other fields. Optical memory effect makes it possible to image through strong random scattering medium. However, this method also has the limitation of limited angle field-of-view (FOV), which prevents it from being applied in practice. In this paper, a kind of practical convo… ▽ More Strong scattering medium brings great difficulties to optical imaging, which is also a problem in medical imaging and many other fields. Optical memory effect makes it possible to image through strong random scattering medium. However, this method also has the limitation of limited angle field-of-view (FOV), which prevents it from being applied in practice. In this paper, a kind of practical convolutional neural network called PDSNet is proposed, which effectively breaks through the limitation of optical memory effect on FOV. Experiments is conducted to prove that the scattered pattern can be reconstructed accurately in real-time by PDSNet, and it is widely applicable to retrieve complex objects of random scales and different scattering media. △ Less

Submitted 4 November, 2019; v1 submitted 19 October, 2019; originally announced October 2019.

Comments: 15 pages with 9 figures

arXiv:1907.00209 [pdf]

High Sensitivity Snapshot Spectrometer Based on Deep Network Unmixing

Authors: XiaoYu Chen, Xu Wang, Lianfa Bai, **g Han, Zhuang Zhao

Abstract: In this paper, we present a convolution neural network based method to recover the light intensity distribution from the overlapped dispersive spectra instead of adding an extra light path to capture it directly for the first time. Then, we construct a single-path sub-Hadamard snapshot spectrometer based on our previous dual-path snapshot spectrometer. In the proposed single-path spectrometer, we… ▽ More In this paper, we present a convolution neural network based method to recover the light intensity distribution from the overlapped dispersive spectra instead of adding an extra light path to capture it directly for the first time. Then, we construct a single-path sub-Hadamard snapshot spectrometer based on our previous dual-path snapshot spectrometer. In the proposed single-path spectrometer, we use the reconstructed light intensity as the original light intensity and recover high signal-to-noise ratio spectra successfully. Compared with dual-path snapshot spectrometer, the network based single-path spectrometer has a more compact structure and maintains snapshot and high sensitivity. Abundant simulated and experimental results have demonstrated that the proposed method can obtain a better reconstructed signal-to-noise ratio spectrum than the dual-path sub-Hadamard spectrometer because of its higher light throughput. △ Less

Submitted 29 June, 2019; originally announced July 2019.

Comments: 16 pages, 13 figures and 2 tables

arXiv:1906.11070 [pdf, other]

Preference-based Energy Exchange in a Network of Microgrids

Authors: Li Bai, Dimitri Thomopulos, Emanuele Crisostomi

Abstract: Peer-to-peer energy trading is emerging as a new paradigm that in the near future may disrupt conventional electricity markets and heavily affect energy exchanges in networks of microgrids. In this paper, a preference mechanism is considered to compute optimal energy exchanges in a network of microgrids with or without the supervision of the distribution system operator, and the alternating direct… ▽ More Peer-to-peer energy trading is emerging as a new paradigm that in the near future may disrupt conventional electricity markets and heavily affect energy exchanges in networks of microgrids. In this paper, a preference mechanism is considered to compute optimal energy exchanges in a network of microgrids with or without the supervision of the distribution system operator, and the alternating direction method of multipliers is adopted for its distributed solution. The effect of the preference mechanism on the resulting power flow in the network is further studied and discussed for realistic case studies. Results show that a desired power flow in the network of interconnected microgrids can be achieved with different preference values locally chosen or imposed by the system operator. In particular, appropriate preferences may be used to give rise to different clusters of microgrids and reduce energy exchanges between different clusters. △ Less

Submitted 16 September, 2019; v1 submitted 26 June, 2019; originally announced June 2019.

Showing 1–50 of 55 results for author: Bai, L