Search | arXiv e-print repository

Physics-Informed AI Inverter

Authors: Qing Shen, Yifan Zhou, Peng Zhang, Yacov A. Shamash, Xiaochuan Luo, Bin Wang, Huanfeng Zhao, Roshan Sharma, Bo Chen

Abstract: This letter devises an AI-Inverter that pilots the use of a physics-informed neural network (PINN) to enable AI-based electromagnetic transient simulations (EMT) of grid-forming inverters. The contributions are threefold: (1) A PINN-enabled AI-Inverter is formulated; (2) An enhanced learning strategy, balanced-adaptive PINN, is devised; (3) extensive validations and comparative analysis of the acc… ▽ More This letter devises an AI-Inverter that pilots the use of a physics-informed neural network (PINN) to enable AI-based electromagnetic transient simulations (EMT) of grid-forming inverters. The contributions are threefold: (1) A PINN-enabled AI-Inverter is formulated; (2) An enhanced learning strategy, balanced-adaptive PINN, is devised; (3) extensive validations and comparative analysis of the accuracy and efficiency of AI-Inverter are made to show its superiority over the classical electromagnetic transient programs (EMTP). △ Less

Submitted 1 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.13674 [pdf, other]

Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases

Authors: Xiangde Luo, Zihan Li, Shaoting Zhang, Wenjun Liao, Guotai Wang

Abstract: Deep learning has enabled great strides in abdominal multi-organ segmentation, even surpassing junior oncologists on common cases or organs. However, robustness on corner cases and complex organs remains a challenging open problem for clinical adoption. To investigate model robustness, we collected and annotated the RAOS dataset comprising 413 CT scans ($\sim$80k 2D images, $\sim$8k 3D organ annot… ▽ More Deep learning has enabled great strides in abdominal multi-organ segmentation, even surpassing junior oncologists on common cases or organs. However, robustness on corner cases and complex organs remains a challenging open problem for clinical adoption. To investigate model robustness, we collected and annotated the RAOS dataset comprising 413 CT scans ($\sim$80k 2D images, $\sim$8k 3D organ annotations) from 413 patients each with 17 (female) or 19 (male) labelled organs, manually delineated by oncologists. We grouped scans based on clinical information into 1) diagnosis/radiotherapy (317 volumes), 2) partial excision without the whole organ missing (22 volumes), and 3) excision with the whole organ missing (74 volumes). RAOS provides a potential benchmark for evaluating model robustness including organ hallucination. It also includes some organs that can be very hard to access on public datasets like the rectum, colon, intestine, prostate and seminal vesicles. We benchmarked several state-of-the-art methods in these three clinical groups to evaluate performance and robustness. We also assessed cross-generalization between RAOS and three public datasets. This dataset and comprehensive analysis establish a potential baseline for future robustness research: \url{https://github.com/Luoxd1996/RAOS}. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 10 pages, 1 figure, 6 tables, Early Accept to MICCAI 2024

arXiv:2406.13645 [pdf, other]

Advancing UWF-SLO Vessel Segmentation with Source-Free Active Domain Adaptation and a Novel Multi-Center Dataset

Authors: Hongqiu Wang, Xiangde Luo, Wu Chen, Qingqing Tang, Mei Xin, Qiong Wang, Lei Zhu

Abstract: Accurate vessel segmentation in Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images is crucial for diagnosing retinal diseases. Although recent techniques have shown encouraging outcomes in vessel segmentation, models trained on one medical dataset often underperform on others due to domain shifts. Meanwhile, manually labeling high-resolution UWF-SLO images is an extremely challenging,… ▽ More Accurate vessel segmentation in Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images is crucial for diagnosing retinal diseases. Although recent techniques have shown encouraging outcomes in vessel segmentation, models trained on one medical dataset often underperform on others due to domain shifts. Meanwhile, manually labeling high-resolution UWF-SLO images is an extremely challenging, time-consuming and expensive task. In response, this study introduces a pioneering framework that leverages a patch-based active domain adaptation approach. By actively recommending a few valuable image patches by the devised Cascade Uncertainty-Predominance (CUP) selection strategy for labeling and model-finetuning, our method significantly improves the accuracy of UWF-SLO vessel segmentation across diverse medical centers. In addition, we annotate and construct the first Multi-center UWF-SLO Vessel Segmentation (MU-VS) dataset to promote this topic research, comprising data from multiple institutions. This dataset serves as a valuable resource for cross-center evaluation, verifying the effectiveness and robustness of our approach. Experimental results demonstrate that our approach surpasses existing domain adaptation and active learning methods, considerably reducing the gap between the Upper and Lower bounds with minimal annotations, highlighting our method's practical clinical value. We will release our dataset and code to facilitate relevant research: https://github.com/whq-xxh/SFADA-UWF-SLO. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: MICCAI 2024 Early Accept

arXiv:2405.19053 [pdf, other]

doi 10.1109/AEEES61147.2024.10544962

Multiscale Spatio-Temporal Enhanced Short-term Load Forecasting of Electric Vehicle Charging Stations

Authors: Zongbao Zhang, Jiao Hao, Wenmeng Zhao, Yan Liu, Yaohui Huang, Xinhang Luo

Abstract: The rapid expansion of electric vehicles (EVs) has rendered the load forecasting of electric vehicle charging stations (EVCS) increasingly critical. The primary challenge in achieving precise load forecasting for EVCS lies in accounting for the nonlinear of charging behaviors, the spatial interactions among different stations, and the intricate temporal variations in usage patterns. To address the… ▽ More The rapid expansion of electric vehicles (EVs) has rendered the load forecasting of electric vehicle charging stations (EVCS) increasingly critical. The primary challenge in achieving precise load forecasting for EVCS lies in accounting for the nonlinear of charging behaviors, the spatial interactions among different stations, and the intricate temporal variations in usage patterns. To address these challenges, we propose a Multiscale Spatio-Temporal Enhanced Model (MSTEM) for effective load forecasting at EVCS. MSTEM incorporates a multiscale graph neural network to discern hierarchical nonlinear temporal dependencies across various time scales. Besides, it also integrates a recurrent learning component and a residual fusion mechanism, enhancing its capability to accurately capture spatial and temporal variations in charging patterns. The effectiveness of the proposed MSTEM has been validated through comparative analysis with six baseline models using three evaluation metrics. The case studies utilize real-world datasets for both fast and slow charging loads at EVCS in Perth, UK. The experimental results demonstrate the superiority of MSTEM in short-term continuous load forecasting for EVCS. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 5 pages, 1 figure, AEEES 2024

arXiv:2404.16522 [pdf, other]

A Deep Learning-Driven Pipeline for Differentiating Hypertrophic Cardiomyopathy from Cardiac Amyloidosis Using 2D Multi-View Echocardiography

Authors: Bo Peng, Xiaofeng Li, Xinyu Li, Zhenghan Wang, Hui Deng, Xiaoxian Luo, Lixue Yin, Hongmei Zhang

Abstract: Hypertrophic cardiomyopathy (HCM) and cardiac amyloidosis (CA) are both heart conditions that can progress to heart failure if untreated. They exhibit similar echocardiographic characteristics, often leading to diagnostic challenges. This paper introduces a novel multi-view deep learning approach that utilizes 2D echocardiography for differentiating between HCM and CA. The method begins by classif… ▽ More Hypertrophic cardiomyopathy (HCM) and cardiac amyloidosis (CA) are both heart conditions that can progress to heart failure if untreated. They exhibit similar echocardiographic characteristics, often leading to diagnostic challenges. This paper introduces a novel multi-view deep learning approach that utilizes 2D echocardiography for differentiating between HCM and CA. The method begins by classifying 2D echocardiography data into five distinct echocardiographic views: apical 4-chamber, parasternal long axis of left ventricle, parasternal short axis at levels of the mitral valve, papillary muscle, and apex. It then extracts features of each view separately and combines five features for disease classification. A total of 212 patients diagnosed with HCM, and 30 patients diagnosed with CA, along with 200 individuals with normal cardiac function(Normal), were enrolled in this study from 2018 to 2022. This approach achieved a precision, recall of 0.905, and micro-F1 score of 0.904, demonstrating its effectiveness in accurately identifying HCM and CA using a multi-view analysis. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2403.07622 [pdf, other]

Multiple Latent Space Map** for Compressed Dark Image Enhancement

Authors: Yi Zeng, Zhengning Wang, Yuxuan Liu, Tianjiao Zeng, Xuhang Liu, Xinglong Luo, Shuaicheng Liu, Shuyuan Zhu, Bing Zeng

Abstract: Dark image enhancement aims at converting dark images to normal-light images. Existing dark image enhancement methods take uncompressed dark images as inputs and achieve great performance. However, in practice, dark images are often compressed before storage or transmission over the Internet. Current methods get poor performance when processing compressed dark images. Artifacts hidden in the dark… ▽ More Dark image enhancement aims at converting dark images to normal-light images. Existing dark image enhancement methods take uncompressed dark images as inputs and achieve great performance. However, in practice, dark images are often compressed before storage or transmission over the Internet. Current methods get poor performance when processing compressed dark images. Artifacts hidden in the dark regions are amplified by current methods, which results in uncomfortable visual effects for observers. Based on this observation, this study aims at enhancing compressed dark images while avoiding compression artifacts amplification. Since texture details intertwine with compression artifacts in compressed dark images, detail enhancement and blocking artifacts suppression contradict each other in image space. Therefore, we handle the task in latent space. To this end, we propose a novel latent map** network based on variational auto-encoder (VAE). Firstly, different from previous VAE-based methods with single-resolution features only, we exploit multiple latent spaces with multi-resolution features, to reduce the detail blur and improve image fidelity. Specifically, we train two multi-level VAEs to project compressed dark images and normal-light images into their latent spaces respectively. Secondly, we leverage a latent map** network to transform features from compressed dark space to normal-light space. Specifically, since the degradation models of darkness and compression are different from each other, the latent map** process is divided map** into enlightening branch and deblocking branch. Comprehensive experiments demonstrate that the proposed method achieves state-of-the-art performance in compressed dark image enhancement. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.18076 [pdf, other]

Online Ecological Gearshift Strategy via Neural Network with Soft-Argmax Operator

Authors: Xi Luo, Shiying Dong, **long Hong, Bingzhao Gao, Hong Chen

Abstract: This paper presents a neural network optimizer with soft-argmax operator to achieve an ecological gearshift strategy in real-time. The strategy is reformulated as the mixed-integer model predictive control (MIMPC) problem to minimize energy consumption. Then the outer convexification is introduced to transform integer variables into relaxed binary controls. To approximate binary solutions properly… ▽ More This paper presents a neural network optimizer with soft-argmax operator to achieve an ecological gearshift strategy in real-time. The strategy is reformulated as the mixed-integer model predictive control (MIMPC) problem to minimize energy consumption. Then the outer convexification is introduced to transform integer variables into relaxed binary controls. To approximate binary solutions properly within training, the soft-argmax operator is applied to the neural network with the fact that all the operations of this scheme are differentiable. Moreover, this operator can help push the relaxed binary variables close to 0 or 1. To evaluate the strategy effect, we deployed it to a 2-speed electric vehicle (EV). In contrast to the mature solver Bonmin, our proposed method not only achieves similar energy-saving effects but also significantly reduces the solution time to meet real-time requirements. This results in a notable energy savings of 6.02% compared to the rule-based method. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 6 pages, 5 figures, submitted to 8th IFAC Conference on Nonlinear Model Predictive Control

arXiv:2401.15990 [pdf, other]

doi 10.1109/ICASSP48485.2024.10447267

Gland Segmentation Via Dual Encoders and Boundary-Enhanced Attention

Authors: Huadeng Wang, Jiejiang Yu, Bingbing Li, Xipeng Pan, Zhenbing Liu, Rushi Lan, Xiaonan Luo

Abstract: Accurate and automated gland segmentation on pathological images can assist pathologists in diagnosing the malignancy of colorectal adenocarcinoma. However, due to various gland shapes, severe deformation of malignant glands, and overlap** adhesions between glands. Gland segmentation has always been very challenging. To address these problems, we propose a DEA model. This model consists of two b… ▽ More Accurate and automated gland segmentation on pathological images can assist pathologists in diagnosing the malignancy of colorectal adenocarcinoma. However, due to various gland shapes, severe deformation of malignant glands, and overlap** adhesions between glands. Gland segmentation has always been very challenging. To address these problems, we propose a DEA model. This model consists of two branches: the backbone encoding and decoding network and the local semantic extraction network. The backbone encoding and decoding network extracts advanced Semantic features, uses the proposed feature decoder to restore feature space information, and then enhances the boundary features of the gland through boundary enhancement attention. The local semantic extraction network uses the pre-trained DeepLabv3+ as a Local semantic-guided encoder to realize the extraction of edge features. Experimental results on two public datasets, GlaS and CRAG, confirm that the performance of our method is better than other gland segmentation methods. △ Less

Submitted 9 May, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: Published in: ICASSP 2024

Journal ref: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, pp. 2345-2349,

arXiv:2312.12789 [pdf, other]

SLP-Net:An efficient lightweight network for segmentation of skin lesions

Authors: Bo Yang, Hong Peng, Chenggang Guo, Xiaohui Luo, Jun Wang, Xianzhong Long

Abstract: Prompt treatment for melanoma is crucial. To assist physicians in identifying lesion areas precisely in a quick manner, we propose a novel skin lesion segmentation technique namely SLP-Net, an ultra-lightweight segmentation network based on the spiking neural P(SNP) systems type mechanism. Most existing convolutional neural networks achieve high segmentation accuracy while neglecting the high hard… ▽ More Prompt treatment for melanoma is crucial. To assist physicians in identifying lesion areas precisely in a quick manner, we propose a novel skin lesion segmentation technique namely SLP-Net, an ultra-lightweight segmentation network based on the spiking neural P(SNP) systems type mechanism. Most existing convolutional neural networks achieve high segmentation accuracy while neglecting the high hardware cost. SLP-Net, on the contrary, has a very small number of parameters and a high computation speed. We design a lightweight multi-scale feature extractor without the usual encoder-decoder structure. Rather than a decoder, a feature adaptation module is designed to replace it and implement multi-scale information decoding. Experiments at the ISIC2018 challenge demonstrate that the proposed model has the highest Acc and DSC among the state-of-the-art methods, while experiments on the PH2 dataset also demonstrate a favorable generalization ability. Finally, we compare the computational complexity as well as the computational speed of the models in experiments, where SLP-Net has the highest overall superiority △ Less

Submitted 4 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.09576 [pdf, other]

SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

Authors: Xiangde Luo, Jia Fu, Yunxin Zhong, Shuolin Liu, Bing Han, Mehdi Astaraki, Simone Bendazzoli, Iuliana Toma-Dasu, Yiwen Ye, Ziyang Chen, Yong Xia, Yanzhou Su, ** Ye, Junjun He, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Kaixiang Yang, Xin Fang, Zhiwei Wang, Chan Woong Lee, Sang Joon Park, Jaehee Chun, Constantin Ulrich, Klaus H. Maier-Hein , et al. (17 additional authors not shown)

Abstract: Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results… ▽ More Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results in many medical image segmentation tasks. However, for NPC OARs and GTVs segmentation, few public datasets are available for model development and evaluation. To alleviate this problem, the SegRap2023 challenge was organized in conjunction with MICCAI2023 and presented a large-scale benchmark for OAR and GTV segmentation with 400 Computed Tomography (CT) scans from 200 NPC patients, each with a pair of pre-aligned non-contrast and contrast-enhanced CT scans. The challenge's goal was to segment 45 OARs and 2 GTVs from the paired CT scans. In this paper, we detail the challenge and analyze the solutions of all participants. The average Dice similarity coefficient scores for all submissions ranged from 76.68\% to 86.70\%, and 70.42\% to 73.44\% for OARs and GTVs, respectively. We conclude that the segmentation of large-size OARs is well-addressed, and more efforts are needed for GTVs and small-size or thin-structure OARs. The benchmark will remain publicly available here: https://segrap2023.grand-challenge.org △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: A challenge report of SegRap2023 (organized in conjunction with MICCAI2023)

arXiv:2312.00535 [pdf, other]

RIS-Based On-the-Air Semantic Communications -- a Diffractional Deep Neural Network Approach

Authors: Shuyi Chen, Yingzhe Hui, Yifan Qin, Yueyi Yuan, Weixiao Meng, Xuewen Luo, Hsiao-Hwa Chen

Abstract: Semantic communication has gained significant attention recently due to its advantages in achieving higher transmission efficiency by focusing on semantic information instead of bit-level information. However, current AI-based semantic communication methods require digital hardware for implementation. With the rapid advancement on reconfigurable intelligence surfaces (RISs), a new approach called… ▽ More Semantic communication has gained significant attention recently due to its advantages in achieving higher transmission efficiency by focusing on semantic information instead of bit-level information. However, current AI-based semantic communication methods require digital hardware for implementation. With the rapid advancement on reconfigurable intelligence surfaces (RISs), a new approach called on-the-air diffractional deep neural networks (D$^2$NN) can be utilized to enable semantic communications on the wave domain. This paper proposes a new paradigm of RIS-based on-the-air semantic communications, where the computational process occurs inherently as wireless signals pass through RISs. We present the system model and discuss the data and control flows of this scheme, followed by a performance analysis using image transmission as an example. In comparison to traditional hardware-based approaches, RIS-based semantic communications offer appealing features, such as light-speed computation, low computational power requirements, and the ability to handle multiple tasks simultaneously. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 17 pages, 5 figures, accepted by IEEE WCM

arXiv:2310.14515 [pdf]

First realization of macroscopic Fourier ptychography for hundred-meter distance sub-diffraction imaging

Authors: Qi Zhang, Yuran Lu, Yinghui Guo, Yingjie Shang, Mingbo Pu, Yulong Fan, Rui Zhou, Xiaoyin Li, Fei Zhang, Mingfeng Xu, Xiangang Luo

Abstract: Fourier ptychography (FP) imaging, drawing on the idea of synthetic aperture, has been demonstrated as a potential approach for remote sub-diffraction-limited imaging. Nevertheless, the farthest imaging distance is still limited around 10 m even though there has been a significant improvement in macroscopic FP. The most severely issue in increasing the imaging distance is FoV limitation caused by… ▽ More Fourier ptychography (FP) imaging, drawing on the idea of synthetic aperture, has been demonstrated as a potential approach for remote sub-diffraction-limited imaging. Nevertheless, the farthest imaging distance is still limited around 10 m even though there has been a significant improvement in macroscopic FP. The most severely issue in increasing the imaging distance is FoV limitation caused by far-field condition for diffraction. Here, we propose to modify the Fourier far-field condition for rough reflective objects, aiming to overcome the small FoV limitation by using a divergent beam to illuminate objects. A joint optimization of pupil function and target image is utilized to attain the aberration-free image while estimating the pupil function simultaneously. Benefiting from the optimized reconstruction algorithm which effectively expands the camera's effective aperture, we experimentally implement several FP systems suited for imaging distance of 12 m, 90 m, and 170 m with the maximum synthetic aperture of 200 mm. The maximum imaging distance and synthetic aperture are thus improved by more than one order of magnitude of the state-of-the-art works with a fourfold improvement in the resolution. Our findings demonstrate significant potential for advancing the field of macroscopic FP, propelling it into a new stage of development. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.08080 [pdf]

RT-SRTS: Angle-Agnostic Real-Time Simultaneous 3D Reconstruction and Tumor Segmentation from Single X-Ray Projection

Authors: Miao Zhu, Qiming Fu, Bo Liu, Mengxi Zhang, Bojian Li, Xiaoyan Luo, Fugen Zhou

Abstract: Radiotherapy is one of the primary treatment methods for tumors, but the organ movement caused by respiration limits its accuracy. Recently, 3D imaging from a single X-ray projection has received extensive attention as a promising approach to address this issue. However, current methods can only reconstruct 3D images without directly locating the tumor and are only validated for fixed-angle imagin… ▽ More Radiotherapy is one of the primary treatment methods for tumors, but the organ movement caused by respiration limits its accuracy. Recently, 3D imaging from a single X-ray projection has received extensive attention as a promising approach to address this issue. However, current methods can only reconstruct 3D images without directly locating the tumor and are only validated for fixed-angle imaging, which fails to fully meet the requirements of motion control in radiotherapy. In this study, a novel imaging method RT-SRTS is proposed which integrates 3D imaging and tumor segmentation into one network based on multi-task learning (MTL) and achieves real-time simultaneous 3D reconstruction and tumor segmentation from a single X-ray projection at any angle. Furthermore, the attention enhanced calibrator (AEC) and uncertain-region elaboration (URE) modules have been proposed to aid feature extraction and improve segmentation accuracy. The proposed method was evaluated on fifteen patient cases and compared with three state-of-the-art methods. It not only delivers superior 3D reconstruction but also demonstrates commendable tumor segmentation results. Simultaneous reconstruction and segmentation can be completed in approximately 70 ms, significantly faster than the required time threshold for real-time tumor tracking. The efficacies of both AEC and URE have also been validated in ablation studies. The code of work is available at https://github.com/ZywooSimple/RT-SRTS. △ Less

Submitted 28 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2309.16950 [pdf]

doi 10.1109/ACCESS.2024.3415478

Scalable Neural Dynamic Equivalence for Power Systems

Authors: Qing Shen, Yifan Zhou, Huanfeng Zhao, Peng Zhang, Qiang Zhang, Slava Maslenniko, Xiaochuan Luo

Abstract: Traditional grid analytics are model-based, relying strongly on accurate models of power systems, especially the dynamic models of generators, controllers, loads and other dynamic components. However, acquiring thorough power system models can be impractical in real operation due to inaccessible system parameters and privacy of consumers, which necessitate data-driven dynamic equivalencing of unkn… ▽ More Traditional grid analytics are model-based, relying strongly on accurate models of power systems, especially the dynamic models of generators, controllers, loads and other dynamic components. However, acquiring thorough power system models can be impractical in real operation due to inaccessible system parameters and privacy of consumers, which necessitate data-driven dynamic equivalencing of unknown subsystems. Learning reliable dynamic equivalent models for the external systems from SCADA and PMU data, however, is a long-standing intractable problem in power system analysis due to complicated nonlinearity and unforeseeable dynamic modes of power systems. This paper advances a practical application of neural dynamic equivalence (NeuDyE) called Driving Port NeuDyE (DP-NeuDyE), which exploits physics-informed machine learning and neural-ordinary-differential-equations (ODE-NET) to discover a dynamic equivalence of external power grids while preserving its dynamic behaviors after disturbances. The new contributions are threefold: A NeuDyE formulation to enable a continuous-time, data-driven dynamic equivalence of power systems, saving the effort and expense of acquiring inaccessible system; An introduction of a Physics-Informed NeuDyE learning (PI-NeuDyE) to actively control the closed-loop accuracy of NeuDyE; and A DP-NeuDyE to reduce the number of inputs required for the training. We conduct extensive case studies on the NPCC system to validate the generalizability and accuracy of both PI-NeuDyE and DP-NeuDyE, which span a multitude of scenarios, differing in the time required for fault clearance, the specific fault locations, and the limitations of data. Test results have demonstrated the scalability and practicality of NeuDyE, showing its potential to be used in ISO and utility control centers for online transient stability analysis and for planning purposes. △ Less

Submitted 21 March, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

Journal ref: in IEEE Access, vol. 12, pp. 86513-86522, 2024,

arXiv:2309.16934 [pdf, other]

Physics-Aware Neural Dynamic Equivalence of Power Systems

Authors: Qing Shen, Yifan Zhou, Qiang Zhang, Slava Maslennikov, Xiaochuan Luo, Peng Zhang

Abstract: This letter devises Neural Dynamic Equivalence (NeuDyE), which explores physics-aware machine learning and neural-ordinary-differential-equations (ODE-Net) to discover a dynamic equivalence of external power grids while preserving its dynamic behaviors after disturbances. The contributions are threefold: (1) an ODE-Net-enabled NeuDyE formulation to enable a continuous-time, data-driven dynamic equ… ▽ More This letter devises Neural Dynamic Equivalence (NeuDyE), which explores physics-aware machine learning and neural-ordinary-differential-equations (ODE-Net) to discover a dynamic equivalence of external power grids while preserving its dynamic behaviors after disturbances. The contributions are threefold: (1) an ODE-Net-enabled NeuDyE formulation to enable a continuous-time, data-driven dynamic equivalence of power systems; (2) a physics-informed NeuDyE learning method (PI-NeuDyE) to actively control the closed-loop accuracy of NeuDyE without an additional verification module; (3) a physics-guided NeuDyE (PG-NeuDyE) to enhance the method's applicability even in the absence of analytical physics models. Extensive case studies in the NPCC system validate the efficacy of NeuDyE, and, in particular, its capability under various contingencies. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2307.12027 [pdf, other]

On the Effectiveness of Spectral Discriminators for Perceptual Quality Improvement

Authors: Xin Luo, Yunan Zhu, Shunxin Xu, Dong Liu

Abstract: Several recent studies advocate the use of spectral discriminators, which evaluate the Fourier spectra of images for generative modeling. However, the effectiveness of the spectral discriminators is not well interpreted yet. We tackle this issue by examining the spectral discriminators in the context of perceptual image super-resolution (i.e., GAN-based SR), as SR image quality is susceptible to s… ▽ More Several recent studies advocate the use of spectral discriminators, which evaluate the Fourier spectra of images for generative modeling. However, the effectiveness of the spectral discriminators is not well interpreted yet. We tackle this issue by examining the spectral discriminators in the context of perceptual image super-resolution (i.e., GAN-based SR), as SR image quality is susceptible to spectral changes. Our analyses reveal that the spectral discriminator indeed performs better than the ordinary (a.k.a. spatial) discriminator in identifying the differences in the high-frequency range; however, the spatial discriminator holds an advantage in the low-frequency range. Thus, we suggest that the spectral and spatial discriminators shall be used simultaneously. Moreover, we improve the spectral discriminators by first calculating the patch-wise Fourier spectrum and then aggregating the spectra by Transformer. We verify the effectiveness of the proposed method twofold. On the one hand, thanks to the additional spectral discriminator, our obtained SR images have their spectra better aligned to those of the real images, which leads to a better PD tradeoff. On the other hand, our ensembled discriminator predicts the perceptual quality more accurately, as evidenced in the no-reference image quality assessment task. △ Less

Submitted 16 August, 2023; v1 submitted 22 July, 2023; originally announced July 2023.

Comments: Accepted to ICCV 2023. Code and Models are publicly available at https://github.com/Luciennnnnnn/DualFormer

arXiv:2307.05382 [pdf, other]

Protecting the Future: Neonatal Seizure Detection with Spatial-Temporal Modeling

Authors: Ziyue Li, Yuchen Fang, You Li, Kan Ren, Yansen Wang, Xufang Luo, Juanyong Duan, Congrui Huang, Dongsheng Li, Lili Qiu

Abstract: A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusing on adult epilepsy monitoring often fail due to… ▽ More A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusing on adult epilepsy monitoring often fail due to (i) dynamic seizure onset location in human brains; (ii) different montages on neonates and (iii) huge distribution shift among different subjects. In this paper, we propose a deep learning framework, namely STATENet, to address the exclusive challenges with exquisite designs at the temporal, spatial and model levels. The experiments over the real-world large-scale neonatal EEG dataset illustrate that our framework achieves significantly better seizure detection performance. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: Accepted in IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2023

arXiv:2306.14471 [pdf]

Single-shot 3D photoacoustic computed tomography with a densely packed array for transcranial functional imaging

Authors: Rui Cao, Yilin Luo, **hua Xu, Xiaofei Luo, Ku Geng, Yousuf Aborahama, Manxiu Cui, Samuel Davis, Shuai Na, Xin Tong, Cindy Liu, Karteek Sastry, Konstantin Maslov, Peng Hu, Yide Zhang, Li Lin, Yang Zhang, Lihong V. Wang

Abstract: Photoacoustic computed tomography (PACT) is emerging as a new technique for functional brain imaging, primarily due to its capabilities in label-free hemodynamic imaging. Despite its potential, the transcranial application of PACT has encountered hurdles, such as acoustic attenuations and distortions by the skull and limited light penetration through the skull. To overcome these challenges, we hav… ▽ More Photoacoustic computed tomography (PACT) is emerging as a new technique for functional brain imaging, primarily due to its capabilities in label-free hemodynamic imaging. Despite its potential, the transcranial application of PACT has encountered hurdles, such as acoustic attenuations and distortions by the skull and limited light penetration through the skull. To overcome these challenges, we have engineered a PACT system that features a densely packed hemispherical ultrasonic transducer array with 3072 channels, operating at a central frequency of 1 MHz. This system allows for single-shot 3D imaging at a rate equal to the laser repetition rate, such as 20 Hz. We have achieved a single-shot light penetration depth of approximately 9 cm in chicken breast tissue utilizing a 750 nm laser (withstanding 3295-fold light attenuation and still retaining an SNR of 74) and successfully performed transcranial imaging through an ex vivo human skull using a 1064 nm laser. Moreover, we have proven the capacity of our system to perform single-shot 3D PACT imaging in both tissue phantoms and human subjects. These results suggest that our PACT system is poised to unlock potential for real-time, in vivo transcranial functional imaging in humans. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.10826 [pdf]

An Error Correction Mid-term Electricity Load Forecasting Model Based on Seasonal Decomposition

Authors: Li** Zhang, Di Wu, Xin Luo

Abstract: Mid-term electricity load forecasting (LF) plays a critical role in power system planning and operation. To address the issue of error accumulation and transfer during the operation of existing LF models, a novel model called error correction based LF (ECLF) is proposed in this paper, which is designed to provide more accurate and stable LF. Firstly, time series analysis and feature engineering ac… ▽ More Mid-term electricity load forecasting (LF) plays a critical role in power system planning and operation. To address the issue of error accumulation and transfer during the operation of existing LF models, a novel model called error correction based LF (ECLF) is proposed in this paper, which is designed to provide more accurate and stable LF. Firstly, time series analysis and feature engineering act on the original data to decompose load data into three components and extract relevant features. Then, based on the idea of stacking ensemble, long short-term memory is employed as an error correction module to forecast the components separately, and the forecast results are treated as new features to be fed into extreme gradient boosting for the second-step forecasting. Finally, the component sub-series forecast results are reconstructed to obtain the final LF results. The proposed model is evaluated on real-world electricity load data from two cities in China, and the experimental results demonstrate its superior performance compared to the other benchmark models. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Comments: 8 pages, 3 figures

arXiv:2306.01864 [pdf, other]

Discovering COVID-19 Coughing and Breathing Patterns from Unlabeled Data Using Contrastive Learning with Varying Pre-Training Domains

Authors: **** Cai, Sudip Vhaduri, Xiao Luo

Abstract: Rapid discovery of new diseases, such as COVID-19 can enable a timely epidemic response, preventing the large-scale spread and protecting public health. However, limited research efforts have been taken on this problem. In this paper, we propose a contrastive learning-based modeling approach for COVID-19 coughing and breathing pattern discovery from non-COVID coughs. To validate our models, extens… ▽ More Rapid discovery of new diseases, such as COVID-19 can enable a timely epidemic response, preventing the large-scale spread and protecting public health. However, limited research efforts have been taken on this problem. In this paper, we propose a contrastive learning-based modeling approach for COVID-19 coughing and breathing pattern discovery from non-COVID coughs. To validate our models, extensive experiments have been conducted using four large audio datasets and one image dataset. We further explore the effects of different factors, such as domain relevance and augmentation order on the pre-trained models. Our results show that the proposed model can effectively distinguish COVID-19 coughing and breathing from unlabeled data and labeled non-COVID coughs with an accuracy of up to 0.81 and 0.86, respectively. Findings from this work will guide future research to detect an outbreak of a new disease early. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: Accepted by Proceedings of INTERSPEECH 2023

Journal ref: Proceedings of INTERSPEECH 2023

arXiv:2305.20006 [pdf, other]

Physics-Informed Ensemble Representation for Light-Field Image Super-Resolution

Authors: Manchang **, Gaosheng Liu, Kunshu Hu, Xin Luo, Kun Li, **gyu Yang

Abstract: Recent learning-based approaches have achieved significant progress in light field (LF) image super-resolution (SR) by exploring convolution-based or transformer-based network structures. However, LF imaging has many intrinsic physical priors that have not been fully exploited. In this paper, we analyze the coordinate transformation of the LF imaging process to reveal the geometric relationship in… ▽ More Recent learning-based approaches have achieved significant progress in light field (LF) image super-resolution (SR) by exploring convolution-based or transformer-based network structures. However, LF imaging has many intrinsic physical priors that have not been fully exploited. In this paper, we analyze the coordinate transformation of the LF imaging process to reveal the geometric relationship in the LF images. Based on such geometric priors, we introduce a new LF subspace of virtual-slit images (VSI) that provide sub-pixel information complementary to sub-aperture images. To leverage the abundant correlation across the four-dimensional data with manageable complexity, we propose learning ensemble representation of all $C_4^2$ LF subspaces for more effective feature extraction. To super-resolve image structures from undersampled LF data, we propose a geometry-aware decoder, named EPIXformer, which constrains the transformer's operational searching regions with a LF physical prior. Experimental results on both spatial and angular SR tasks demonstrate that the proposed method outperforms other state-of-the-art schemes, especially in handling various disparities. △ Less

Submitted 31 May, 2023; originally announced May 2023.

arXiv:2212.13913 [pdf]

Highly-Accurate Electricity Load Estimation via Knowledge Aggregation

Authors: Yuting Ding, Di Wu, Yi He, Xin Luo, Song Deng

Abstract: Mid-term and long-term electric energy demand prediction is essential for the planning and operations of the smart grid system. Mainly in countries where the power system operates in a deregulated environment. Traditional forecasting models fail to incorporate external knowledge while modern data-driven ignore the interpretation of the model, and the load series can be influenced by many complex f… ▽ More Mid-term and long-term electric energy demand prediction is essential for the planning and operations of the smart grid system. Mainly in countries where the power system operates in a deregulated environment. Traditional forecasting models fail to incorporate external knowledge while modern data-driven ignore the interpretation of the model, and the load series can be influenced by many complex factors making it difficult to cope with the highly unstable and nonlinear power load series. To address the forecasting problem, we propose a more accurate district level load prediction model Based on domain knowledge and the idea of decomposition and ensemble. Its main idea is three-fold: a) According to the non-stationary characteristics of load time series with obvious cyclicality and periodicity, decompose into series with actual economic meaning and then carry out load analysis and forecast. 2) Kernel Principal Component Analysis(KPCA) is applied to extract the principal components of the weather and calendar rule feature sets to realize data dimensionality reduction. 3) Give full play to the advantages of various models based on the domain knowledge and propose a hybrid model(XASXG) based on Autoregressive Integrated Moving Average model(ARIMA), support vector regression(SVR) and Extreme gradient boosting model(XGBoost). With such designs, it accurately forecasts the electricity demand in spite of their highly unstable characteristic. We compared our method with nine benchmark methods, including classical statistical models as well as state-of-the-art models based on machine learning, on the real time series of monthly electricity demand in four Chinese cities. The empirical study shows that the proposed hybrid model is superior to all competitors in terms of accuracy and prediction bias. △ Less

Submitted 6 December, 2022; originally announced December 2022.

arXiv:2211.05309 [pdf]

Generic Cryo-CMOS Device Modeling and EDACompatible Platform for Reliable Cryogenic IC Design

Authors: Zhidong Tang, Zewei Wang, Yumeng Yuan, Chang He, Xin Luo, Ao Guo, Renhe Chen, Yongqi Hu, Longfei Yang, Chengwei Cao, Linlin Liu, Liujiang Yu, Ganbing Shang, Yongfeng Cao, Shoumian Chen, Yuhang Zhao, Shaojian Hu, Xufeng Kou

Abstract: This paper outlines the establishment of a generic cryogenic CMOS database in which key electrical parameters and transfer characteristics of the MOSFETs are quantified as functions of device size, temperature/frequency responses. Meanwhile, comprehensive device statistical study is conducted to evaluate the influence of variation and mismatch effects at low temperatures. Furthermore, by incorpora… ▽ More This paper outlines the establishment of a generic cryogenic CMOS database in which key electrical parameters and transfer characteristics of the MOSFETs are quantified as functions of device size, temperature/frequency responses. Meanwhile, comprehensive device statistical study is conducted to evaluate the influence of variation and mismatch effects at low temperatures. Furthermore, by incorporating the Cryo-CMOS compact model into the process design kit (PDK), the cryogenic 4 Kb SRAM, 5-bit flash ADC and 8-bit current steering DAC are designed, and their performance is readily investigated and optimized on the EDA-compatible platform, hence laying a solid foundation for large-scale cryogenic IC design. △ Less

Submitted 9 February, 2024; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2208.09350 [pdf, other]

doi 10.1016/j.cmpb.2023.107398

PyMIC: A deep learning toolkit for annotation-efficient medical image segmentation

Authors: Guotai Wang, Xiangde Luo, Ran Gu, Shuojue Yang, Yijie Qu, Shuwei Zhai, Qianfei Zhao, Kang Li, Shaoting Zhang

Abstract: Background and Objective: Open-source deep learning toolkits are one of the driving forces for develo** medical image segmentation models. Existing toolkits mainly focus on fully supervised segmentation and require full and accurate pixel-level annotations that are time-consuming and difficult to acquire for segmentation tasks, which makes learning from imperfect labels highly desired for reduci… ▽ More Background and Objective: Open-source deep learning toolkits are one of the driving forces for develo** medical image segmentation models. Existing toolkits mainly focus on fully supervised segmentation and require full and accurate pixel-level annotations that are time-consuming and difficult to acquire for segmentation tasks, which makes learning from imperfect labels highly desired for reducing the annotation cost. We aim to develop a new deep learning toolkit to support annotation-efficient learning for medical image segmentation. Methods: Our proposed toolkit named PyMIC is a modular deep learning library for medical image segmentation tasks. In addition to basic components that support development of high-performance models for fully supervised segmentation, it contains several advanced components tailored for learning from imperfect annotations, such as loading annotated and unannounced images, loss functions for unannotated, partially or inaccurately annotated images, and training procedures for co-learning between multiple networks, etc. PyMIC supports development of semi-supervised, weakly supervised and noise-robust learning methods for medical image segmentation. Results: We present several illustrative medical image segmentation tasks based on PyMIC: (1) Achieving competitive performance on fully supervised learning; (2) Semi-supervised cardiac structure segmentation with only 10% training images annotated; (3) Weakly supervised segmentation using scribble annotations; and (4) Learning from noisy labels for chest radiograph segmentation. Conclusions: The PyMIC toolkit is easy to use and facilitates efficient development of medical image segmentation models with imperfect annotations. It is modular and flexible, which enables researchers to develop high-performance models with low annotation cost. The source code is available at: https://github.com/HiLab-git/PyMIC. △ Less

Submitted 4 February, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

Comments: 12 pages, 6 figures

Journal ref: Computer Methods and Programs in Biomedicine, Volume 231, April 2023, 107398

arXiv:2208.08868 [pdf]

Physics-Informed Neural Operator for Fast and Scalable Optical Fiber Channel Modelling in Multi-Span Transmission

Authors: Yuchen Song, Danshi Wang, Qirui Fan, Xiaotian Jiang, Xiao Luo, Min Zhang

Abstract: We propose efficient modelling of optical fiber channel via NLSE-constrained physics-informed neural operator without reference solutions. This method can be easily scalable for distance, sequence length, launch power, and signal formats, and is implemented for ultra-fast simulations of 16-QAM signal transmission with ASE noise. We propose efficient modelling of optical fiber channel via NLSE-constrained physics-informed neural operator without reference solutions. This method can be easily scalable for distance, sequence length, launch power, and signal formats, and is implemented for ultra-fast simulations of 16-QAM signal transmission with ASE noise. △ Less

Submitted 11 July, 2022; originally announced August 2022.

Comments: accepted by ECOC2022

arXiv:2208.03524 [pdf]

doi 10.2139/ssrn.4253498

Deep Learning-enabled Spatial Phase Unwrap** for 3D Measurement

Authors: Xiaolong Luo, Wanzhong Song, Songlin Bai, Yu Li, Zhihe Zhao

Abstract: In terms of 3D imaging speed and system cost, the single-camera system projecting single-frequency patterns is the ideal option among all proposed Fringe Projection Profilometry (FPP) systems. This system necessitates a robust spatial phase unwrap** (SPU) algorithm. However, robust SPU remains a challenge in complex scenes. Quality-guided SPU algorithms need more efficient ways to identify the u… ▽ More In terms of 3D imaging speed and system cost, the single-camera system projecting single-frequency patterns is the ideal option among all proposed Fringe Projection Profilometry (FPP) systems. This system necessitates a robust spatial phase unwrap** (SPU) algorithm. However, robust SPU remains a challenge in complex scenes. Quality-guided SPU algorithms need more efficient ways to identify the unreliable points in phase maps before unwrap**. End-to-end deep learning SPU methods face generality and interpretability problems. This paper proposes a hybrid method combining deep learning and traditional path-following for robust SPU in FPP. This hybrid SPU scheme demonstrates better robustness than traditional quality-guided SPU methods, better interpretability than end-to-end deep learning scheme, and generality on unseen data. Experiments on the real dataset of multiple illumination conditions and multiple FPP systems differing in image resolution, the number of fringes, fringe direction, and optics wavelength verify the effectiveness of the proposed method. △ Less

Submitted 6 August, 2022; originally announced August 2022.

Comments: 26 pages

ACM Class: I.4.5

Journal ref: Optics & Laser Technology, 163 (2023) 109340

arXiv:2206.04684 [pdf, other]

Structure-consistent Restoration Network for Cataract Fundus Image Enhancement

Authors: Heng Li, Haofeng Liu, Huazhu Fu, Hai Shu, Yitian Zhao, Xiaoling Luo, Yan Hu, Jiang Liu

Abstract: Fundus photography is a routine examination in clinics to diagnose and monitor ocular diseases. However, for cataract patients, the fundus image always suffers quality degradation caused by the clouding lens. The degradation prevents reliable diagnosis by ophthalmologists or computer-aided systems. To improve the certainty in clinical diagnosis, restoration algorithms have been proposed to enhance… ▽ More Fundus photography is a routine examination in clinics to diagnose and monitor ocular diseases. However, for cataract patients, the fundus image always suffers quality degradation caused by the clouding lens. The degradation prevents reliable diagnosis by ophthalmologists or computer-aided systems. To improve the certainty in clinical diagnosis, restoration algorithms have been proposed to enhance the quality of fundus images. Unfortunately, challenges remain in the deployment of these algorithms, such as collecting sufficient training data and preserving retinal structures. In this paper, to circumvent the strict deployment requirement, a structure-consistent restoration network (SCR-Net) for cataract fundus images is developed from synthesized data that shares an identical structure. A cataract simulation model is firstly designed to collect synthesized cataract sets (SCS) formed by cataract fundus images sharing identical structures. Then high-frequency components (HFCs) are extracted from the SCS to constrain structure consistency such that the structure preservation in SCR-Net is enforced. The experiments demonstrate the effectiveness of SCR-Net in the comparison with state-of-the-art methods and the follow-up clinical applications. The code is available at https://github.com/liamheng/ArcNet-Medical-Image-Enhancement. △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2205.04044

Masked Co-attentional Transformer reconstructs 100x ultra-fast/low-dose whole-body PET from longitudinal images and anatomically guided MRI

Authors: Yan-Ran, Wang, Liangqiong Qu, Natasha Diba Sheybani, Xiaolong Luo, Jiangshan Wang, Kristina Elizabeth Hawk, Ashok Joseph Theruvath, Sergios Gatidis, Xuerong Xiao, Allison Pribnow, Daniel Rubin, Heike E. Daldrup-Link

Abstract: Despite its tremendous value for the diagnosis, treatment monitoring and surveillance of children with cancer, whole body staging with positron emission tomography (PET) is time consuming and associated with considerable radiation exposure. 100x (1% of the standard clinical dosage) ultra-low-dose/ultra-fast whole-body PET reconstruction has the potential for cancer imaging with unprecedented speed… ▽ More Despite its tremendous value for the diagnosis, treatment monitoring and surveillance of children with cancer, whole body staging with positron emission tomography (PET) is time consuming and associated with considerable radiation exposure. 100x (1% of the standard clinical dosage) ultra-low-dose/ultra-fast whole-body PET reconstruction has the potential for cancer imaging with unprecedented speed and improved safety, but it cannot be achieved by the naive use of machine learning techniques. In this study, we utilize the global similarity between baseline and follow-up PET and magnetic resonance (MR) images to develop Masked-LMCTrans, a longitudinal multi-modality co-attentional CNN-Transformer that provides interaction and joint reasoning between serial PET/MRs of the same patient. We mask the tumor area in the referenced baseline PET and reconstruct the follow-up PET scans. In this manner, Masked-LMCTrans reconstructs 100x almost-zero radio-exposure whole-body PET that was not possible before. The technique also opens a new pathway for longitudinal radiology imaging reconstruction, a significantly under-explored area to date. Our model was trained and tested with Stanford PET/MRI scans of pediatric lymphoma patients and evaluated externally on PET/MRI images from Tübingen University. The high image quality of the reconstructed 100x whole-body PET images resulting from the application of Masked-LMCTrans will substantially advance the development of safer imaging approaches and shorter exam-durations for pediatric patients, as well as expand the possibilities for frequent longitudinal monitoring of these patients by PET. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: This submission has been removed by arXiv administrators because the submitter did not have the right to assign the license at the time of submission

arXiv:2203.10395 [pdf, other]

Towards Robust Semantic Segmentation of Accident Scenes via Multi-Source Mixed Sampling and Meta-Learning

Authors: Xinyu Luo, Jiaming Zhang, Kailun Yang, Alina Roitberg, Kunyu Peng, Rainer Stiefelhagen

Abstract: Autonomous vehicles utilize urban scene segmentation to understand the real world like a human and react accordingly. Semantic segmentation of normal scenes has experienced a remarkable rise in accuracy on conventional benchmarks. However, a significant portion of real-life accidents features abnormal scenes, such as those with object deformations, overturns, and unexpected traffic behaviors. Sinc… ▽ More Autonomous vehicles utilize urban scene segmentation to understand the real world like a human and react accordingly. Semantic segmentation of normal scenes has experienced a remarkable rise in accuracy on conventional benchmarks. However, a significant portion of real-life accidents features abnormal scenes, such as those with object deformations, overturns, and unexpected traffic behaviors. Since even small mis-segmentation of driving scenes can lead to serious threats to human lives, the robustness of such models in accident scenarios is an extremely important factor in ensuring safety of intelligent transportation systems. In this paper, we propose a Multi-source Meta-learning Unsupervised Domain Adaptation (MMUDA) framework, to improve the generalization of segmentation transformers to extreme accident scenes. In MMUDA, we make use of Multi-Domain Mixed Sampling to augment the images of multiple-source domains (normal scenes) with the target data appearances (abnormal scenes). To train our model, we intertwine and study a meta-learning strategy in the multi-source setting for robustifying the segmentation results. We further enhance the segmentation backbone (SegFormer) with a HybridASPP decoder design, featuring large window attention spatial pyramid pooling and strip pooling, to efficiently aggregate long-range contextual dependencies. Our approach achieves a mIoU score of 46.97% on the DADA-seg benchmark, surpassing the previous state-of-the-art model by more than 7.50%. Code will be made publicly available at https://github.com/xinyu-laura/MMUDA. △ Less

Submitted 19 March, 2022; originally announced March 2022.

Comments: Code will be made publicly available at https://github.com/xinyu-laura/MMUDA

arXiv:2203.04299 [pdf, other]

Plug-and-play Shape Refinement Framework for Multi-site and Lifespan Brain Skull Strip**

Authors: Yunxiang Li, Ruilong Dan, Shuai Wang, Yifan Cao, Xiangde Luo, Chenghao Tan, Gangyong Jia, Huiyu Zhou, You Zhang, Yaqi Wang, Li Wang

Abstract: Skull strip** is a crucial prerequisite step in the analysis of brain magnetic resonance images (MRI). Although many excellent works or tools have been proposed, they suffer from low generalization capability. For instance, the model trained on a dataset with specific imaging parameters cannot be well applied to other datasets with different imaging parameters. Especially, for the lifespan datas… ▽ More Skull strip** is a crucial prerequisite step in the analysis of brain magnetic resonance images (MRI). Although many excellent works or tools have been proposed, they suffer from low generalization capability. For instance, the model trained on a dataset with specific imaging parameters cannot be well applied to other datasets with different imaging parameters. Especially, for the lifespan datasets, the model trained on an adult dataset is not applicable to an infant dataset due to the large domain difference. To address this issue, numerous methods have been proposed, where domain adaptation based on feature alignment is the most common. Unfortunately, this method has some inherent shortcomings, which need to be retrained for each new domain and requires concurrent access to the input images of both domains. In this paper, we design a plug-and-play shape refinement (PSR) framework for multi-site and lifespan skull strip**. To deal with the domain shift between multi-site lifespan datasets, we take advantage of the brain shape prior, which is invariant to imaging parameters and ages. Experiments demonstrate that our framework can outperform the state-of-the-art methods on multi-site lifespan datasets. △ Less

Submitted 22 December, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: 11 page

arXiv:2203.02106 [pdf, other]

Scribble-Supervised Medical Image Segmentation via Dual-Branch Network and Dynamically Mixed Pseudo Labels Supervision

Authors: Xiangde Luo, Minhao Hu, Wenjun Liao, Shuwei Zhai, Tao Song, Guotai Wang, Shaoting Zhang

Abstract: Medical image segmentation plays an irreplaceable role in computer-assisted diagnosis, treatment planning, and following-up. Collecting and annotating a large-scale dataset is crucial to training a powerful segmentation model, but producing high-quality segmentation masks is an expensive and time-consuming procedure. Recently, weakly-supervised learning that uses sparse annotations (points, scribb… ▽ More Medical image segmentation plays an irreplaceable role in computer-assisted diagnosis, treatment planning, and following-up. Collecting and annotating a large-scale dataset is crucial to training a powerful segmentation model, but producing high-quality segmentation masks is an expensive and time-consuming procedure. Recently, weakly-supervised learning that uses sparse annotations (points, scribbles, bounding boxes) for network training has achieved encouraging performance and shown the potential for annotation cost reduction. However, due to the limited supervision signal of sparse annotations, it is still challenging to employ them for networks training directly. In this work, we propose a simple yet efficient scribble-supervised image segmentation method and apply it to cardiac MRI segmentation. Specifically, we employ a dual-branch network with one encoder and two slightly different decoders for image segmentation and dynamically mix the two decoders' predictions to generate pseudo labels for auxiliary supervision. By combining the scribble supervision and auxiliary pseudo labels supervision, the dual-branch network can efficiently learn from scribble annotations end-to-end. Experiments on the public ACDC dataset show that our method performs better than current scribble-supervised segmentation methods and also outperforms several semi-supervised segmentation methods. △ Less

Submitted 3 March, 2022; originally announced March 2022.

Comments: 11 pages, 4 figures,code is available: https://github.com/HiLab-git/WSL4MIS.This is a comprehensive study about scribble-supervised medical image segmentation based on the ACDC dataset

arXiv:2201.04726 [pdf, other]

Multi-View Non-negative Matrix Factorization Discriminant Learning via Cross Entropy Loss

Authors: Jian-wei Liu, Yuan-fang Wang, Run-kun Lu, Xionglin Luo

Abstract: Multi-view learning accomplishes the task objectives of classification by leverag-ing the relationships between different views of the same object. Most existing methods usually focus on consistency and complementarity between multiple views. But not all of this information is useful for classification tasks. Instead, it is the specific discriminating information that plays an important role. Zhon… ▽ More Multi-view learning accomplishes the task objectives of classification by leverag-ing the relationships between different views of the same object. Most existing methods usually focus on consistency and complementarity between multiple views. But not all of this information is useful for classification tasks. Instead, it is the specific discriminating information that plays an important role. Zhong Zhang et al. explore the discriminative and non-discriminative information exist-ing in common and view-specific parts among different views via joint non-negative matrix factorization. In this paper, we improve this algorithm on this ba-sis by using the cross entropy loss function to constrain the objective function better. At last, we implement better classification effect than original on the same data sets and show its superiority over many state-of-the-art algorithms. △ Less

Submitted 8 January, 2022; originally announced January 2022.

arXiv:2201.03186 [pdf, other]

MyoPS: A Benchmark of Myocardial Pathology Segmentation Combining Three-Sequence Cardiac Magnetic Resonance Images

Authors: Lei Li, Fu** Wu, Sihan Wang, Xinzhe Luo, Carlos Martin-Isla, Shuwei Zhai, Jianpeng Zhang, Yanfei Liu7, Zhen Zhang, Markus J. Ankenbrand, Haochuan Jiang, Xiaoran Zhang, Linhong Wang, Tewodros Weldebirhan Arega, Elif Altunok, Zhou Zhao, Feiyan Li, Jun Ma, ** Yang, Elodie Puybareau, Ilkay Oksuz, Stephanie Bricq, Weisheng Li, Kumaradevan Punithakumar, Sotirios A. Tsaftaris , et al. (7 additional authors not shown)

Abstract: Assessment of myocardial viability is essential in diagnosis and treatment management of patients suffering from myocardial infarction, and classification of pathology on myocardium is the key to this assessment. This work defines a new task of medical image analysis, i.e., to perform myocardial pathology segmentation (MyoPS) combining three-sequence cardiac magnetic resonance (CMR) images, which… ▽ More Assessment of myocardial viability is essential in diagnosis and treatment management of patients suffering from myocardial infarction, and classification of pathology on myocardium is the key to this assessment. This work defines a new task of medical image analysis, i.e., to perform myocardial pathology segmentation (MyoPS) combining three-sequence cardiac magnetic resonance (CMR) images, which was first proposed in the MyoPS challenge, in conjunction with MICCAI 2020. The challenge provided 45 paired and pre-aligned CMR images, allowing algorithms to combine the complementary information from the three CMR sequences for pathology segmentation. In this article, we provide details of the challenge, survey the works from fifteen participants and interpret their methods according to five aspects, i.e., preprocessing, data augmentation, learning strategy, model architecture and post-processing. In addition, we analyze the results with respect to different factors, in order to examine the key obstacles and explore potential of solutions, as well as to provide a benchmark for future research. We conclude that while promising results have been reported, the research is still in the early stage, and more in-depth exploration is needed before a successful application to the clinics. Note that MyoPS data and evaluation tool continue to be publicly available upon registration via its homepage (www.sdspeople.fudan.edu.cn/zhuangxiahai/0/myops20/). △ Less

Submitted 10 January, 2022; originally announced January 2022.

arXiv:2112.04894 [pdf, other]

Semi-Supervised Medical Image Segmentation via Cross Teaching between CNN and Transformer

Authors: Xiangde Luo, Minhao Hu, Tao Song, Guotai Wang, Shaoting Zhang

Abstract: Recently, deep learning with Convolutional Neural Networks (CNNs) and Transformers has shown encouraging results in fully supervised medical image segmentation. However, it is still challenging for them to achieve good performance with limited annotations for training. In this work, we present a very simple yet efficient framework for semi-supervised medical image segmentation by introducing the c… ▽ More Recently, deep learning with Convolutional Neural Networks (CNNs) and Transformers has shown encouraging results in fully supervised medical image segmentation. However, it is still challenging for them to achieve good performance with limited annotations for training. In this work, we present a very simple yet efficient framework for semi-supervised medical image segmentation by introducing the cross teaching between CNN and Transformer. Specifically, we simplify the classical deep co-training from consistency regularization to cross teaching, where the prediction of a network is used as the pseudo label to supervise the other network directly end-to-end. Considering the difference in learning paradigm between CNN and Transformer, we introduce the Cross Teaching between CNN and Transformer rather than just using CNNs. Experiments on a public benchmark show that our method outperforms eight existing semi-supervised learning methods just with a simpler framework. Notably, this work may be the first attempt to combine CNN and transformer for semi-supervised medical image segmentation and achieve promising results on a public benchmark. The code will be released at: https://github.com/HiLab-git/SSL4MIS. △ Less

Submitted 1 March, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

Comments: accepted to MIDL2022, code in SSL4MIS:https://github.com/HiLab-git/SSL4MIS

arXiv:2111.02403 [pdf, other]

doi 10.1016/j.media.2022.102642

WORD: A large scale dataset, benchmark and clinical applicable study for abdominal organ segmentation from CT image

Authors: Xiangde Luo, Wenjun Liao, Jianghong Xiao, Jieneng Chen, Tao Song, Xiaofan Zhang, Kang Li, Dimitris N. Metaxas, Guotai Wang, Shaoting Zhang

Abstract: Whole abdominal organ segmentation is important in diagnosing abdomen lesions, radiotherapy, and follow-up. However, oncologists' delineating all abdominal organs from 3D volumes is time-consuming and very expensive. Deep learning-based medical image segmentation has shown the potential to reduce manual delineation efforts, but it still requires a large-scale fine annotated dataset for training, a… ▽ More Whole abdominal organ segmentation is important in diagnosing abdomen lesions, radiotherapy, and follow-up. However, oncologists' delineating all abdominal organs from 3D volumes is time-consuming and very expensive. Deep learning-based medical image segmentation has shown the potential to reduce manual delineation efforts, but it still requires a large-scale fine annotated dataset for training, and there is a lack of large-scale datasets covering the whole abdomen region with accurate and detailed annotations for the whole abdominal organ segmentation. In this work, we establish a new large-scale \textit{W}hole abdominal \textit{OR}gan \textit{D}ataset (\textit{WORD}) for algorithm research and clinical application development. This dataset contains 150 abdominal CT volumes (30495 slices). Each volume has 16 organs with fine pixel-level annotations and scribble-based sparse annotations, which may be the largest dataset with whole abdominal organ annotation. Several state-of-the-art segmentation methods are evaluated on this dataset. And we also invited three experienced oncologists to revise the model predictions to measure the gap between the deep learning method and oncologists. Afterwards, we investigate the inference-efficient learning on the WORD, as the high-resolution image requires large GPU memory and a long inference time in the test stage. We further evaluate the scribble-based annotation-efficient learning on this dataset, as the pixel-wise manual annotation is time-consuming and expensive. The work provided a new benchmark for the abdominal multi-organ segmentation task, and these experiments can serve as the baseline for future research and clinical application development. △ Less

Submitted 12 February, 2023; v1 submitted 2 November, 2021; originally announced November 2021.

Comments: Accepted to Medical Image Analysis, dataset at: https://github.com/HiLab-git/WORD (we corrected the results or description in this version.)

arXiv:2110.08327 [pdf, other]

Solving Image PDEs with a Shallow Network

Authors: Pascal Tom Getreuer, Peyman Milanfar, Xiyang Luo

Abstract: Partial differential equations (PDEs) are typically used as models of physical processes but are also of great interest in PDE-based image processing. However, when it comes to their use in imaging, conventional numerical methods for solving PDEs tend to require very fine grid resolution for stability, and as a result have impractically high computational cost. This work applies BLADE (Best Linear… ▽ More Partial differential equations (PDEs) are typically used as models of physical processes but are also of great interest in PDE-based image processing. However, when it comes to their use in imaging, conventional numerical methods for solving PDEs tend to require very fine grid resolution for stability, and as a result have impractically high computational cost. This work applies BLADE (Best Linear Adaptive Enhancement), a shallow learnable filtering framework, to PDE solving, and shows that the resulting approach is efficient and accurate, operating more reliably at coarse grid resolutions than classical methods. As such, the model can be flexibly used for a wide variety of problems in imaging. △ Less

Submitted 15 October, 2021; originally announced October 2021.

Comments: 21 pages, 22 figures, references arXiv:1802.06130, arXiv:1711.10700, arXiv:1606.01299

arXiv:2109.08909 [pdf, other]

doi 10.1103/PhysRevE.105.054202

Measuring the rogue wave pattern triggered from Gaussian perturbations by deep learning

Authors: Liwen Zou, XinHang Luo, Delu Zeng, Liming Ling, Li-Chen Zhao

Abstract: Weak Gaussian perturbations on a plane wave background could trigger lots of rogue waves, due to modulational instability. Numerical simulations showed that these rogue waves seemed to have similar unit structure. However, to the best of our knowledge, there is no relative result to prove that these rogue waves have the similar patterns for different perturbations, partly due to that it is hard to… ▽ More Weak Gaussian perturbations on a plane wave background could trigger lots of rogue waves, due to modulational instability. Numerical simulations showed that these rogue waves seemed to have similar unit structure. However, to the best of our knowledge, there is no relative result to prove that these rogue waves have the similar patterns for different perturbations, partly due to that it is hard to measure the rogue wave pattern automatically. In this work, we address these problems from the perspective of computer vision via using deep neural networks. We propose a Rogue Wave Detection Network (RWD-Net) model to automatically and accurately detect RWs on the images, which directly indicates they have the similar computer vision patterns. For this purpose, we herein meanwhile have designed the related dataset, termed as Rogue Wave Dataset-$10$K (RWD-$10$K), which has $10,191$ RW images with bounding box annotations for each RW unit. In our detection experiments, we get $99.29\%$ average precision on the test splits of the RWD-$10$K dataset. Finally, we derive our novel metric, the density of RW units (DRW), to characterize the evolution of Gaussian perturbations and obtain the statistical results on them. △ Less

Submitted 9 October, 2021; v1 submitted 18 September, 2021; originally announced September 2021.

Comments: 8 pages, 6 figures

arXiv:2108.07007 [pdf, other]

Flying Guide Dog: Walkable Path Discovery for the Visually Impaired Utilizing Drones and Transformer-based Semantic Segmentation

Authors: Haobin Tan, Chang Chen, Xinyu Luo, Jiaming Zhang, Constantin Seibold, Kailun Yang, Rainer Stiefelhagen

Abstract: Lacking the ability to sense ambient environments effectively, blind and visually impaired people (BVIP) face difficulty in walking outdoors, especially in urban areas. Therefore, tools for assisting BVIP are of great importance. In this paper, we propose a novel "flying guide dog" prototype for BVIP assistance using drone and street view semantic segmentation. Based on the walkable areas extracte… ▽ More Lacking the ability to sense ambient environments effectively, blind and visually impaired people (BVIP) face difficulty in walking outdoors, especially in urban areas. Therefore, tools for assisting BVIP are of great importance. In this paper, we propose a novel "flying guide dog" prototype for BVIP assistance using drone and street view semantic segmentation. Based on the walkable areas extracted from the segmentation prediction, the drone can adjust its movement automatically and thus lead the user to walk along the walkable path. By recognizing the color of pedestrian traffic lights, our prototype can help the user to cross a street safely. Furthermore, we introduce a new dataset named Pedestrian and Vehicle Traffic Lights (PVTL), which is dedicated to traffic light recognition. The result of our user study in real-world scenarios shows that our prototype is effective and easy to use, providing new insight into BVIP assistance. △ Less

Submitted 16 August, 2021; originally announced August 2021.

Comments: Code, dataset, and video demo will be made publicly available at https://github.com/EckoTan0804/flying-guide-dog

arXiv:2108.00303 [pdf]

doi 10.1109/TSG.2022.3148978

Practical Adoption of Cloud Computing in Power Systems- Drivers, Challenges, Guidance, and Real-world Use Cases

Authors: Song Zhang, Amritanshu Pandey, Xiaochuan Luo, Maggy Powell, Ranjan Banerji, Lei Fan, Abhineet Parchure, Edgardo Luzcando

Abstract: Motivated by The Federal Energy Regulatory Commission's (FERC) recent direction and ever-growing interest in cloud adoption by power utilities, a Task Force was established to assist power system practitioners with secure, reliable and cost-effective adoption of cloud technology to meet various business needs. This paper summarizes the business drivers, challenges, guidance, and best practices for… ▽ More Motivated by The Federal Energy Regulatory Commission's (FERC) recent direction and ever-growing interest in cloud adoption by power utilities, a Task Force was established to assist power system practitioners with secure, reliable and cost-effective adoption of cloud technology to meet various business needs. This paper summarizes the business drivers, challenges, guidance, and best practices for cloud adoption in power systems from the Task Force's perspective, after extensive review and deliberation by its members, including grid operators, utility companies, software vendors, and cloud providers. The paper begins by enumerating various business drivers for cloud adoption in the power industry. It follows with the discussion of the challenges and risks of migrating power grid utility workloads to the cloud. Next, for each corresponding challenge or risk, the paper provides appropriate guidance. Notably, the guidance is directed toward power industry professionals who are considering cloud solutions and are yet hesitant about the practical execution. Finally, to tie all the sections together, the paper documents various real-world use cases of cloud technology in the power system domain, which both the power industry practitioners and software vendors can look toward to design and select their own future cloud solutions. We hope that the information in this paper will serve as helpful guidance for the development of NERC guidelines and standards relevant to cloud adoption in the industry. △ Less

Submitted 2 February, 2022; v1 submitted 31 July, 2021; originally announced August 2021.

arXiv:2107.07873 [pdf]

Metasurface-Enabled On-Chip Multiplexed Diffractive Neural Networks in the Visible

Authors: Xuhao Luo, Yueqiang Hu, Xin Li, Xiangnian Ou, Jiajie Lai, Na Liu, Huigao Duan

Abstract: Replacing electrons with photons is a compelling route towards light-speed, highly parallel, and low-power artificial intelligence computing. Recently, all-optical diffractive neural deep neural networks have been demonstrated. However, the existing architectures often comprise bulky components and, most critically, they cannot mimic the human brain for multitasking. Here, we demonstrate a multi-s… ▽ More Replacing electrons with photons is a compelling route towards light-speed, highly parallel, and low-power artificial intelligence computing. Recently, all-optical diffractive neural deep neural networks have been demonstrated. However, the existing architectures often comprise bulky components and, most critically, they cannot mimic the human brain for multitasking. Here, we demonstrate a multi-skilled diffractive neural network based on a metasurface device, which can perform on-chip multi-channel sensing and multitasking at the speed of light in the visible. The metasurface is integrated with a complementary metal oxide semiconductor imaging sensor. Polarization multiplexing scheme of the subwavelength nanostructures are applied to construct a multi-channel classifier framework for simultaneous recognition of digital and fashionable items. The areal density of the artificial neurons can reach up to 6.25x106/mm2 multiplied by the number of channels. Our platform provides an integrated solution with all-optical on-chip sensing and computing for applications in machine vision, autonomous driving, and precision medicine. △ Less

Submitted 13 July, 2021; originally announced July 2021.

arXiv:2106.12743 [pdf, other]

A Simultaneous Denoising and Dereverberation Framework with Target Decoupling

Authors: Andong Li, Wenzhe Liu, Xiaoxue Luo, Guochen Yu, Chengshi Zheng, Xiaodong Li

Abstract: Background noise and room reverberation are regarded as two major factors to degrade the subjective speech quality. In this paper, we propose an integrated framework to address simultaneous denoising and dereverberation under complicated scenario environments. It adopts a chain optimization strategy and designs four sub-stages accordingly. In the first two stages, we decouple the multi-task learni… ▽ More Background noise and room reverberation are regarded as two major factors to degrade the subjective speech quality. In this paper, we propose an integrated framework to address simultaneous denoising and dereverberation under complicated scenario environments. It adopts a chain optimization strategy and designs four sub-stages accordingly. In the first two stages, we decouple the multi-task learning w.r.t. complex spectrum into magnitude and phase, and only implement noise and reverberation removal in the magnitude domain. Based on the estimated priors above, we further polish the spectrum in the third stage, where both magnitude and phase information are explicitly repaired with the residual learning. Due to the data mismatch and nonlinear effect of DNNs, the residual noise often exists in the DNN-processed spectrum. To resolve the problem, we adopt a light-weight algorithm as the post-processing module to capture and suppress the residual noise in the non-active regions. In the Interspeech 2021 Deep Noise Suppression (DNS) Challenge, our submitted system ranked top-1 for the real-time track in terms of Mean Opinion Score (MOS) with ITU-T P.835 framework △ Less

Submitted 23 June, 2021; originally announced June 2021.

Comments: Accepted at Interspeech 2021

arXiv:2106.08554 [pdf, ps, other]

doi 10.1145/3468264.3468568

iBatch: Saving Ethereum Fees via Secure and Cost-Effective Batching of Smart-Contract Invocations

Authors: Yibo Wang, Kai Li, Yuzhe Tang, Jiaqi Chen, Qi Zhang, Xiapu Luo, Ting Chen

Abstract: This paper presents iBatch, a middleware system running on top of an operational Ethereum network to enable secure batching of smart-contract invocations against an untrusted relay server off-chain. iBatch does so at a low overhead by validating the server's batched invocations in smart contracts without additional states. The iBatch mechanism supports a variety of policies, ranging from conservat… ▽ More This paper presents iBatch, a middleware system running on top of an operational Ethereum network to enable secure batching of smart-contract invocations against an untrusted relay server off-chain. iBatch does so at a low overhead by validating the server's batched invocations in smart contracts without additional states. The iBatch mechanism supports a variety of policies, ranging from conservative to aggressive batching, and can be configured adaptively to the current workloads. iBatch automatically rewrites smart contracts to integrate with legacy applications and support large-scale deployment. For cost evaluation, we develop a platform with fast and cost-accurate transaction replaying, build real transaction benchmarks on popular Ethereum applications, and build a functional prototype of iBatch on Ethereum. The evaluation results show that iBatch saves 14.6%-59.1% Gas cost per invocation with a moderate 2-minute delay and 19.06%-31.52% Ether cost per invocation with a delay of 0.26-1.66 blocks. △ Less

Submitted 24 August, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: Extended version from the ESEC/FSE 2021 paper

arXiv:2105.09511 [pdf, other]

Medical Image Segmentation Using Squeeze-and-Expansion Transformers

Authors: Shaohua Li, Xiuchao Sui, Xiangde Luo, Xinxing Xu, Yong Liu, Rick Goh

Abstract: Medical image segmentation is important for computer-aided diagnosis. Good segmentation demands the model to see the big picture and fine details simultaneously, i.e., to learn image features that incorporate large context while keep high spatial resolutions. To approach this goal, the most widely used methods -- U-Net and variants, extract and fuse multi-scale features. However, the fused feature… ▽ More Medical image segmentation is important for computer-aided diagnosis. Good segmentation demands the model to see the big picture and fine details simultaneously, i.e., to learn image features that incorporate large context while keep high spatial resolutions. To approach this goal, the most widely used methods -- U-Net and variants, extract and fuse multi-scale features. However, the fused features still have small "effective receptive fields" with a focus on local image cues, limiting their performance. In this work, we propose Segtran, an alternative segmentation framework based on transformers, which have unlimited "effective receptive fields" even at high feature resolutions. The core of Segtran is a novel Squeeze-and-Expansion transformer: a squeezed attention block regularizes the self attention of transformers, and an expansion block learns diversified representations. Additionally, we propose a new positional encoding scheme for transformers, imposing a continuity inductive bias for images. Experiments were performed on 2D and 3D medical image segmentation tasks: optic disc/cup segmentation in fundus images (REFUGE'20 challenge), polyp segmentation in colonoscopy images, and brain tumor segmentation in MRI scans (BraTS'19 challenge). Compared with representative existing methods, Segtran consistently achieved the highest segmentation accuracy, and exhibited good cross-domain generalization capabilities. The source code of Segtran is released at https://github.com/askerlee/segtran. △ Less

Submitted 1 June, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

Comments: Camera ready for IJCAI'2021

arXiv:2104.13450 [pdf, other]

Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings

Authors: Innfarn Yoo, Huiwen Chang, Xiyang Luo, Ondrej Stava, Ce Liu, Peyman Milanfar, Feng Yang

Abstract: Digital watermarking is widely used for copyright protection. Traditional 3D watermarking approaches or commercial software are typically designed to embed messages into 3D meshes, and later retrieve the messages directly from distorted/undistorted watermarked 3D meshes. However, in many cases, users only have access to rendered 2D images instead of 3D meshes. Unfortunately, retrieving messages fr… ▽ More Digital watermarking is widely used for copyright protection. Traditional 3D watermarking approaches or commercial software are typically designed to embed messages into 3D meshes, and later retrieve the messages directly from distorted/undistorted watermarked 3D meshes. However, in many cases, users only have access to rendered 2D images instead of 3D meshes. Unfortunately, retrieving messages from 2D renderings of 3D meshes is still challenging and underexplored. We introduce a novel end-to-end learning framework to solve this problem through: 1) an encoder to covertly embed messages in both mesh geometry and textures; 2) a differentiable renderer to render watermarked 3D objects from different camera angles and under varied lighting conditions; 3) a decoder to recover the messages from 2D rendered images. From our experiments, we show that our model can learn to embed information visually imperceptible to humans, and to retrieve the embedded information from 2D renderings that undergo 3D distortions. In addition, we demonstrate that our method can also work with other renderers, such as ray tracers and real-time renderers with and without fine-tuning. △ Less

Submitted 29 March, 2022; v1 submitted 27 April, 2021; originally announced April 2021.

Comments: Accepted by CVPR 2022

arXiv:2103.05142 [pdf, other]

Formal Verification of Stochastic Systems with ReLU Neural Network Controllers

Authors: Shiqi Sun, Yan Zhang, Xusheng Luo, Panagiotis Vlantis, Miroslav Pajic, Michael M. Zavlanos

Abstract: In this work, we address the problem of formal safety verification for stochastic cyber-physical systems (CPS) equipped with ReLU neural network (NN) controllers. Our goal is to find the set of initial states from where, with a predetermined confidence, the system will not reach an unsafe configuration within a specified time horizon. Specifically, we consider discrete-time LTI systems with Gaussi… ▽ More In this work, we address the problem of formal safety verification for stochastic cyber-physical systems (CPS) equipped with ReLU neural network (NN) controllers. Our goal is to find the set of initial states from where, with a predetermined confidence, the system will not reach an unsafe configuration within a specified time horizon. Specifically, we consider discrete-time LTI systems with Gaussian noise, which we abstract by a suitable graph. Then, we formulate a Satisfiability Modulo Convex (SMC) problem to estimate upper bounds on the transition probabilities between nodes in the graph. Using this abstraction, we propose a method to compute tight bounds on the safety probabilities of nodes in this graph, despite possible over-approximations of the transition probabilities between these nodes. Additionally, using the proposed SMC formula, we devise a heuristic method to refine the abstraction of the system in order to further improve the estimated safety bounds. Finally, we corroborate the efficacy of the proposed method with simulation results considering a robot navigation example and comparison against a state-of-the-art verification scheme. △ Less

Submitted 8 March, 2021; originally announced March 2021.

arXiv:2102.04198 [pdf, other]

ICASSP 2021 Deep Noise Suppression Challenge: Decoupling Magnitude and Phase Optimization with a Two-Stage Deep Network

Authors: Andong Li, Wenzhe Liu, Xiaoxue Luo, Chengshi Zheng, Xiaodong Li

Abstract: It remains a tough challenge to recover the speech signals contaminated by various noises under real acoustic environments. To this end, we propose a novel system for denoising in the complicated applications, which is mainly comprised of two pipelines, namely a two-stage network and a post-processing module. The first pipeline is proposed to decouple the optimization problem w:r:t: magnitude and… ▽ More It remains a tough challenge to recover the speech signals contaminated by various noises under real acoustic environments. To this end, we propose a novel system for denoising in the complicated applications, which is mainly comprised of two pipelines, namely a two-stage network and a post-processing module. The first pipeline is proposed to decouple the optimization problem w:r:t: magnitude and phase, i.e., only the magnitude is estimated in the first stage and both of them are further refined in the second stage. The second pipeline aims to further suppress the remaining unnatural distorted noise, which is demonstrated to sufficiently improve the subjective quality. In the ICASSP 2021 Deep Noise Suppression (DNS) Challenge, our submitted system ranked top-1 for the real-time track 1 in terms of Mean Opinion Score (MOS) with ITU-T P.808 framework. △ Less

Submitted 1 March, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

Comments: 5 pages, 3 figures, accepted by ICASSP 2021

arXiv:2011.08769 [pdf, other]

Anatomy Prior Based U-net for Pathology Segmentation with Attention

Authors: Yuncheng Zhou, Ke Zhang, Xinzhe Luo, Sihan Wang, Xiahai Zhuang

Abstract: Pathological area segmentation in cardiac magnetic resonance (MR) images plays a vital role in the clinical diagnosis of cardiovascular diseases. Because of the irregular shape and small area, pathological segmentation has always been a challenging task. We propose an anatomy prior based framework, which combines the U-net segmentation network with the attention technique. Leveraging the fact that… ▽ More Pathological area segmentation in cardiac magnetic resonance (MR) images plays a vital role in the clinical diagnosis of cardiovascular diseases. Because of the irregular shape and small area, pathological segmentation has always been a challenging task. We propose an anatomy prior based framework, which combines the U-net segmentation network with the attention technique. Leveraging the fact that the pathology is inclusive, we propose a neighborhood penalty strategy to gauge the inclusion relationship between the myocardium and the myocardial infarction and no-reflow areas. This neighborhood penalty strategy can be applied to any two labels with inclusive relationships (such as the whole infarction and myocardium, etc.) to form a neighboring loss. The proposed framework is evaluated on the EMIDEC dataset. Results show that our framework is effective in pathological area segmentation. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Comments: 8 pages, 3 figures, to be published in STACOM 2020 (MICCAI Workshop)

ACM Class: I.4.6

arXiv:2011.04988 [pdf, other]

AIM 2020 Challenge on Rendering Realistic Bokeh

Authors: Andrey Ignatov, Radu Timofte, Ming Qian, Congyu Qiao, Jiamin Lin, Zhenyu Guo, Chenghua Li, Cong Leng, Jian Cheng, Juewen Peng, Xianrui Luo, Ke Xian, Zi** Wu, Zhiguo Cao, Densen Puthussery, Jiji C V, Hrishikesh P S, Melvin Kuriakose, Saikat Dutta, Sourya Dipta Das, Nisarg A. Shah, Kuldeep Purohit, Praveen Kandula, Maitreya Suin, A. N. Rajagopalan , et al. (10 additional authors not shown)

Abstract: This paper reviews the second AIM realistic bokeh effect rendering challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world bokeh simulation problem, where the goal was to learn a realistic shallow focus technique using a large-scale EBB! bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using th… ▽ More This paper reviews the second AIM realistic bokeh effect rendering challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world bokeh simulation problem, where the goal was to learn a realistic shallow focus technique using a large-scale EBB! bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using the Canon 7D DSLR camera. The participants had to render bokeh effect based on only one single frame without any additional data from other cameras or sensors. The target metric used in this challenge combined the runtime and the perceptual quality of the solutions measured in the user study. To ensure the efficiency of the submitted models, we measured their runtime on standard desktop CPUs as well as were running the models on smartphone GPUs. The proposed solutions significantly improved the baseline results, defining the state-of-the-art for practical bokeh effect rendering problem. △ Less

Submitted 10 November, 2020; originally announced November 2020.

Comments: Published in ECCV 2020 Workshop (Advances in Image Manipulation), https://data.vision.ee.ethz.ch/cvl/aim20/

arXiv:2011.00526 [pdf, other]

Learning Euler's Elastica Model for Medical Image Segmentation

Authors: Xu Chen, Xiangde Luo, Yitian Zhao, Shaoting Zhang, Guotai Wang, Yalin Zheng

Abstract: Image segmentation is a fundamental topic in image processing and has been studied for many decades. Deep learning-based supervised segmentation models have achieved state-of-the-art performance but most of them are limited by using pixel-wise loss functions for training without geometrical constraints. Inspired by Euler's Elastica model and recent active contour models introduced into the field o… ▽ More Image segmentation is a fundamental topic in image processing and has been studied for many decades. Deep learning-based supervised segmentation models have achieved state-of-the-art performance but most of them are limited by using pixel-wise loss functions for training without geometrical constraints. Inspired by Euler's Elastica model and recent active contour models introduced into the field of deep learning, we propose a novel active contour with elastica (ACE) loss function incorporating Elastica (curvature and length) and region information as geometrically-natural constraints for the image segmentation tasks. We introduce the mean curvature i.e. the average of all principal curvatures, as a more effective image prior to representing curvature in our ACE loss function. Furthermore, based on the definition of the mean curvature, we propose a fast solution to approximate the ACE loss in three-dimensional (3D) by using Laplace operators for 3D image segmentation. We evaluate our ACE loss function on four 2D and 3D natural and biomedical image datasets. Our results show that the proposed loss function outperforms other mainstream loss functions on different segmentation networks. Our source code is available at https://github.com/HiLab-git/ACELoss. △ Less

Submitted 1 November, 2020; originally announced November 2020.

Comments: 9 pages, 4 figures

arXiv:2009.06943 [pdf, other]

AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

Authors: Kai Zhang, Martin Danelljan, Yawei Li, Radu Timofte, Jie Liu, Jie Tang, Gangshan Wu, Yu Zhu, Xiangyu He, Wenjie Xu, Chenghua Li, Cong Leng, Jian Cheng, Guangyang Wu, Wenyi Wang, Xiaohong Liu, Hengyuan Zhao, Xiangtao Kong, **gwen He, Yu Qiao, Chao Dong, Xiaotong Luo, Liang Chen, Jiangtao Zhang, Maitreya Suin , et al. (60 additional authors not shown)

Abstract: This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter co… ▽ More This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter count, FLOPs, activations, and memory consumption while at least maintaining PSNR of MSRResNet. The track had 150 registered participants, and 25 teams submitted the final results. They gauge the state-of-the-art in efficient single image super-resolution. △ Less

Submitted 15 September, 2020; originally announced September 2020.

Showing 1–50 of 71 results for author: Luo, X