Search | arXiv e-print repository

Towards Realistic Data Generation for Real-World Super-Resolution

Authors: Long Peng, Wenbo Li, Ren**g Pei, **g**g Ren, Xueyang Fu, Yang Wang, Yang Cao, Zheng-Jun Zha

Abstract: Existing image super-resolution (SR) techniques often fail to generalize effectively in complex real-world settings due to the significant divergence between training data and practical scenarios. To address this challenge, previous efforts have either manually simulated intricate physical-based degradations or utilized learning-based techniques, yet these approaches remain inadequate for producin… ▽ More Existing image super-resolution (SR) techniques often fail to generalize effectively in complex real-world settings due to the significant divergence between training data and practical scenarios. To address this challenge, previous efforts have either manually simulated intricate physical-based degradations or utilized learning-based techniques, yet these approaches remain inadequate for producing large-scale, realistic, and diverse data simultaneously. In this paper, we introduce a novel Realistic Decoupled Data Generator (RealDGen), an unsupervised learning data generation framework designed for real-world super-resolution. We meticulously develop content and degradation extraction strategies, which are integrated into a novel content-degradation decoupled diffusion model to create realistic low-resolution images from unpaired real LR and HR images. Extensive experiments demonstrate that RealDGen excels in generating large-scale, high-quality paired data that mirrors real-world degradations, significantly advancing the performance of popular SR models on various real-world benchmarks. △ Less

Submitted 11 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.12367 [pdf, other]

Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning

Authors: Zheyuan Zhang, Elif Keles, Gorkem Durak, Yavuz Taktak, Onkar Susladkar, Vandan Gorade, Debesh Jha, Asli C. Ormeci, Alpay Medetalibeyoglu, Lanhong Yao, Bin Wang, Ilkin Sevgi Isler, Linkai Peng, Hongyi Pan, Camila Lopes Vendrami, Amir Bourhani, Yury Velichko, Boqing Gong, Concetto Spampinato, Ayis Pyrros, Pallavi Tiwari, Derk C. F. Klatte, Megan Engels, Sanne Hoogenboom, Candice W. Bolan , et al. (13 additional authors not shown)

Abstract: Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective st… ▽ More Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective study, we collected a large dataset (767 scans from 499 participants) of T1-weighted (T1W) and T2-weighted (T2W) abdominal MRI series from five centers between March 2004 and November 2022. We also collected CT scans of 1,350 patients from publicly available sources for benchmarking purposes. We developed a new pancreas segmentation method, called PanSegNet, combining the strengths of nnUNet and a Transformer network with a new linear attention module enabling volumetric computation. We tested PanSegNet's accuracy in cross-modality (a total of 2,117 scans) and cross-center settings with Dice and Hausdorff distance (HD95) evaluation metrics. We used Cohen's kappa statistics for intra and inter-rater agreement evaluation and paired t-tests for volume and Dice comparisons, respectively. For segmentation accuracy, we achieved Dice coefficients of 88.3% (std: 7.2%, at case level) with CT, 85.0% (std: 7.9%) with T1W MRI, and 86.3% (std: 6.4%) with T2W MRI. There was a high correlation for pancreas volume prediction with R^2 of 0.91, 0.84, and 0.85 for CT, T1W, and T2W, respectively. We found moderate inter-observer (0.624 and 0.638 for T1W and T2W MRI, respectively) and high intra-observer agreement scores. All MRI data is made available at https://osf.io/kysnj/. Our source code is available at https://github.com/NUBagciLab/PaNSegNet. △ Less

Submitted 25 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: under review version

arXiv:2405.11856 [pdf, other]

Modeling and simulation of a mechanism for suppressing the flip** problem of a jum** robot

Authors: Qi Li, Liang Peng, Zhiyuan Wu, Pengda Ye, Weitao Zhang, Yi Xu, Qing Shi

Abstract: In order to solve the problem of stable jum** of micro robot, we design a special mechanism: elastic passive joint (EPJ). EPJ can assist in achieving smooth jum** through the opening-closing process when the robot jumps. First, we introduce the composition and operation principle of EPJ, and perform a dynamic modeling of the robot's jum** process. Then, in order to verify the effectiveness o… ▽ More In order to solve the problem of stable jum** of micro robot, we design a special mechanism: elastic passive joint (EPJ). EPJ can assist in achieving smooth jum** through the opening-closing process when the robot jumps. First, we introduce the composition and operation principle of EPJ, and perform a dynamic modeling of the robot's jum** process. Then, in order to verify the effectiveness of EPJ in controlling the robot's smooth jump, we design a simulation experiment based on MATLAB. Through comparative experiments, it was proved that EPJ can greatly adjust the angular velocity of the robot and increase the jump distance of the robot. Finally, we analyze each parameter in EPJ and performs parameter optimization. After optimization, EPJ achieves a completely flip-free jump of the robot, laying an important foundation for improving the mobility of micro-robot. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.07023 [pdf, other]

Efficient Real-world Image Super-Resolution Via Adaptive Directional Gradient Convolution

Authors: Long Peng, Yang Cao, Ren**g Pei, Wenbo Li, Jiaming Guo, Xueyang Fu, Yang Wang, Zheng-Jun Zha

Abstract: Real-SR endeavors to produce high-resolution images with rich details while mitigating the impact of multiple degradation factors. Although existing methods have achieved impressive achievements in detail recovery, they still fall short when addressing regions with complex gradient arrangements due to the intensity-based linear weighting feature extraction manner. Moreover, the stochastic artifact… ▽ More Real-SR endeavors to produce high-resolution images with rich details while mitigating the impact of multiple degradation factors. Although existing methods have achieved impressive achievements in detail recovery, they still fall short when addressing regions with complex gradient arrangements due to the intensity-based linear weighting feature extraction manner. Moreover, the stochastic artifacts introduced by degradation cues during the imaging process in real LR increase the disorder of the overall image details, further complicating the perception of intrinsic gradient arrangement. To address these challenges, we innovatively introduce kernel-wise differential operations within the convolutional kernel and develop several learnable directional gradient convolutions. These convolutions are integrated in parallel with a novel linear weighting mechanism to form an Adaptive Directional Gradient Convolution (DGConv), which adaptively weights and fuses the basic directional gradients to improve the gradient arrangement perception capability for both regular and irregular textures. Coupled with DGConv, we further devise a novel equivalent parameter fusion method for DGConv that maintains its rich representational capabilities while kee** computational costs consistent with a single Vanilla Convolution (VConv), enabling DGConv to improve the performance of existing super-resolution networks without incurring additional computational expenses. To better leverage the superiority of DGConv, we further develop an Adaptive Information Interaction Block (AIIBlock) to adeptly balance the enhancement of texture and contrast while meticulously investigating the interdependencies, culminating in the creation of a DGPNet for Real-SR through simple stacking. Comparative results with 15 SOTA methods across three public datasets underscore the effectiveness and efficiency of our proposed approach. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2404.16484 [pdf, other]

Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: CVPR 2024, AI for Streaming (AIS) Workshop

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2402.11419 [pdf, other]

A Self-Healing Magnetic-Array-Type Current Sensor with Data-Driven Identification of Abnormal Magnetic Measurement Units

Authors: Xiaohu Liu, Wei Zhao, Kang Ma, Jian Liu, Lisha Peng, Songling Huang, Shisong Li

Abstract: Magnetic-array-type current sensors have garnered increasing popularity owing to their notable advantages, including broadband functionality, a large dynamic range, cost-effectiveness, and compact dimensions. However, the susceptibility of the measurement error of one or more magnetic measurement units (MMUs) within the current sensor to drift significantly from the nominal value due to environmen… ▽ More Magnetic-array-type current sensors have garnered increasing popularity owing to their notable advantages, including broadband functionality, a large dynamic range, cost-effectiveness, and compact dimensions. However, the susceptibility of the measurement error of one or more magnetic measurement units (MMUs) within the current sensor to drift significantly from the nominal value due to environmental factors poses a potential threat to the measurement accuracy of the current sensor.In light of the need to ensure sustained measurement accuracy over the long term, this paper proposes an innovative self-healing approach rooted in cyber-physics correlation. This approach aims to identify MMUs exhibiting abnormal measurement errors, allowing for the exclusive utilization of the remaining unaffected MMUs in the current measurement process. To achieve this, principal component analysis (PCA) is employed to discern the primary component, arising from fluctuations of the measured current, from the residual component, attributed to the drift in measurement error. This analysis is conducted by scrutinizing the measured data obtained from the MMUs. Subsequently, the squared prediction error (SPE) statistic (also called $Q$ statistic) is deployed to individually identify any MMU displaying abnormal behavior. The experimental results demonstrate the successful online identification of abnormal MMUs without the need for a standard magnetic field sensor. By eliminating the contributions from the identified abnormal MMUs, the accuracy of the current measurement is effectively preserved. △ Less

Submitted 17 February, 2024; originally announced February 2024.

Comments: 11 pages, 10 figures

arXiv:2309.04780 [pdf, other]

Latent Degradation Representation Constraint for Single Image Deraining

Authors: Yuhong He, Long Peng, Lu Wang, Jun Cheng

Abstract: Since rain streaks show a variety of shapes and directions, learning the degradation representation is extremely challenging for single image deraining. Existing methods are mainly targeted at designing complicated modules to implicitly learn latent degradation representation from coupled rainy images. This way, it is hard to decouple the content-independent degradation representation due to the l… ▽ More Since rain streaks show a variety of shapes and directions, learning the degradation representation is extremely challenging for single image deraining. Existing methods are mainly targeted at designing complicated modules to implicitly learn latent degradation representation from coupled rainy images. This way, it is hard to decouple the content-independent degradation representation due to the lack of explicit constraint, resulting in over- or under-enhancement problems. To tackle this issue, we propose a novel Latent Degradation Representation Constraint Network (LDRCNet) that consists of Direction-Aware Encoder (DAEncoder), UNet Deraining Network, and Multi-Scale Interaction Block (MSIBlock). Specifically, the DAEncoder is proposed to adaptively extract latent degradation representation by using the deformable convolutions to exploit the direction consistency of rain streaks. Next, a constraint loss is introduced to explicitly constraint the degradation representation learning during training. Last, we propose an MSIBlock to fuse with the learned degradation representation and decoder features of the deraining network for adaptive information interaction, which enables the deraining network to remove various complicated rainy patterns and reconstruct image details. Experimental results on synthetic and real datasets demonstrate that our method achieves new state-of-the-art performance. △ Less

Submitted 18 January, 2024; v1 submitted 9 September, 2023; originally announced September 2023.

Comments: This paper is accepted to ICASSP 2024

arXiv:2308.14536 [pdf, other]

Spoken Language Intelligence of Large Language Models for Language Learning

Authors: Linkai Peng, Baorian Nuchged, Yingming Gao

Abstract: People have long hoped for a conversational system that can assist in real-life situations, and recent progress on large language models (LLMs) is bringing this idea closer to reality. While LLMs are often impressive in performance, their efficacy in real-world scenarios that demand expert knowledge remains unclear. LLMs are believed to hold the most potential and value in education, especially in… ▽ More People have long hoped for a conversational system that can assist in real-life situations, and recent progress on large language models (LLMs) is bringing this idea closer to reality. While LLMs are often impressive in performance, their efficacy in real-world scenarios that demand expert knowledge remains unclear. LLMs are believed to hold the most potential and value in education, especially in the development of Artificial intelligence (AI) based virtual teachers capable of facilitating language learning. Our focus is centered on evaluating the efficacy of LLMs in the realm of education, specifically in the areas of spoken language learning which encompass phonetics, phonology, and second language acquisition. We introduce a new multiple-choice question dataset to evaluate the effectiveness of LLMs in the aforementioned scenarios, including understanding and application of spoken language knowledge. In addition, we investigate the influence of various prompting techniques such as zero- and few-shot method (prepending the question with question-answer exemplars), chain-of-thought (CoT, think step-by-step), in-domain exampler and external tools (Google, Wikipedia). We conducted large-scale evaluation on popular LLMs (20 distinct models) using these methods. We achieved significant performance improvements compared to the zero-shot baseline in the practical questions reasoning (GPT-3.5, 49.1% -> 63.1%; LLaMA2-70B-Chat, 42.2% -> 48.6%). We found that models of different sizes have good understanding of concepts in phonetics, phonology, and second language acquisition, but show limitations in reasoning for real-world problems. Additionally, we also explore preliminary findings on conversational communication. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: 28 pages, 7 figures, Preprint

arXiv:2307.12266 [pdf, other]

Transformer-based Joint Source Channel Coding for Textual Semantic Communication

Authors: Shicong Liu, Zhen Gao, Gaojie Chen, Yu Su, Lu Peng

Abstract: The Space-Air-Ground-Sea integrated network calls for more robust and secure transmission techniques against jamming. In this paper, we propose a textual semantic transmission framework for robust transmission, which utilizes the advanced natural language processing techniques to model and encode sentences. Specifically, the textual sentences are firstly split into tokens using wordpiece algorithm… ▽ More The Space-Air-Ground-Sea integrated network calls for more robust and secure transmission techniques against jamming. In this paper, we propose a textual semantic transmission framework for robust transmission, which utilizes the advanced natural language processing techniques to model and encode sentences. Specifically, the textual sentences are firstly split into tokens using wordpiece algorithm, and are embedded to token vectors for semantic extraction by Transformer-based encoder. The encoded data are quantized to a fixed length binary sequence for transmission, where binary erasure, symmetric, and deletion channels are considered for transmission. The received binary sequences are further decoded by the transformer decoders into tokens used for sentence reconstruction. Our proposed approach leverages the power of neural networks and attention mechanism to provide reliable and efficient communication of textual data in challenging wireless environments, and simulation results on semantic similarity and bilingual evaluation understudy prove the superiority of the proposed model in semantic transmission. △ Less

Submitted 23 July, 2023; originally announced July 2023.

Comments: 6 pages, 5 figures. Accepted by IEEE/CIC ICCC 2023

arXiv:2306.06865 [pdf, other]

Deep denoising autoencoder-based non-invasive blood flow detection for arteriovenous fistula

Authors: Li-Chin Chen, Yi-Heng Lin, Li-Ning Peng, Feng-Ming Wang, Yu-Hsin Chen, Po-Hsun Huang, Shang-Feng Yang, Yu Tsao

Abstract: Clinical guidelines underscore the importance of regularly monitoring and surveilling arteriovenous fistula (AVF) access in hemodialysis patients to promptly detect any dysfunction. Although phono-angiography/sound analysis overcomes the limitations of standardized AVF stenosis diagnosis tool, prior studies have depended on conventional feature extraction methods, restricting their applicability i… ▽ More Clinical guidelines underscore the importance of regularly monitoring and surveilling arteriovenous fistula (AVF) access in hemodialysis patients to promptly detect any dysfunction. Although phono-angiography/sound analysis overcomes the limitations of standardized AVF stenosis diagnosis tool, prior studies have depended on conventional feature extraction methods, restricting their applicability in diverse contexts. In contrast, representation learning captures fundamental underlying factors that can be readily transferred across different contexts. We propose an approach based on deep denoising autoencoders (DAEs) that perform dimensionality reduction and reconstruction tasks using the waveform obtained through one-level discrete wavelet transform, utilizing representation learning. Our results demonstrate that the latent representation generated by the DAE surpasses expectations with an accuracy of 0.93. The incorporation of noise-mixing and the utilization of a noise-to-clean scheme effectively enhance the discriminative capabilities of the latent representation. Moreover, when employed to identify patient-specific characteristics, the latent representation exhibited performance by surpassing an accuracy of 0.92. Appropriate light-weighted methods can restore the detection performance of the excessively reduced dimensionality version and enable operation on less computational devices. Our findings suggest that representation learning is a more feasible approach for extracting auscultation features in AVF, leading to improved generalization and applicability across multiple tasks. The manipulation of latent representations holds immense potential for future advancements. Further investigations in this area are promising and warrant continued exploration. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2304.12205 [pdf, other]

doi 10.1109/TIV.2023.3331024

Synthetic Datasets for Autonomous Driving: A Survey

Authors: Zhihang Song, Zimin He, Xingyu Li, Qiming Ma, Ruibo Ming, Zhiqi Mao, Huaxin Pei, Lihui Peng, Jianming Hu, Danya Yao, Yi Zhang

Abstract: Autonomous driving techniques have been flourishing in recent years while thirsting for huge amounts of high-quality data. However, it is difficult for real-world datasets to keep up with the pace of changing requirements due to their expensive and time-consuming experimental and labeling costs. Therefore, more and more researchers are turning to synthetic datasets to easily generate rich and chan… ▽ More Autonomous driving techniques have been flourishing in recent years while thirsting for huge amounts of high-quality data. However, it is difficult for real-world datasets to keep up with the pace of changing requirements due to their expensive and time-consuming experimental and labeling costs. Therefore, more and more researchers are turning to synthetic datasets to easily generate rich and changeable data as an effective complement to the real world and to improve the performance of algorithms. In this paper, we summarize the evolution of synthetic dataset generation methods and review the work to date in synthetic datasets related to single and multi-task categories for to autonomous driving study. We also discuss the role that synthetic dataset plays the evaluation, gap test, and positive effect in autonomous driving related algorithm testing, especially on trustworthiness and safety aspects. Finally, we discuss general trends and possible development directions. To the best of our knowledge, this is the first survey focusing on the application of synthetic datasets in autonomous driving. This survey also raises awareness of the problems of real-world deployment of autonomous driving technology and provides researchers with a possible solution. △ Less

Submitted 27 February, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: 19 pages, 5 figures

Journal ref: in IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 1847-1864, Jan. 2024

arXiv:2303.06811 [pdf, other]

The NPU-Elevoc Personalized Speech Enhancement System for ICASSP2023 DNS Challenge

Authors: Xiaopeng Yan, Yindi Yang, Zhihao Guo, Liangliang Peng, Lei Xie

Abstract: This paper describes our NPU-Elevoc personalized speech enhancement system (NAPSE) for the 5th Deep Noise Suppression Challenge at ICASSP 2023. Based on the superior two-stage model TEA-PSE 2.0, our system particularly explores better strategy for speaker embedding fusion, optimizes the model training pipeline, and leverages adversarial training and multi-scale loss. According to the results, our… ▽ More This paper describes our NPU-Elevoc personalized speech enhancement system (NAPSE) for the 5th Deep Noise Suppression Challenge at ICASSP 2023. Based on the superior two-stage model TEA-PSE 2.0, our system particularly explores better strategy for speaker embedding fusion, optimizes the model training pipeline, and leverages adversarial training and multi-scale loss. According to the results, our system is tied for the 1st place in the headset track (track 1) and ranked 2nd in the speakerphone track (track 2). △ Less

Submitted 15 March, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

arXiv:2212.13654 [pdf]

Large-scale single-photon imaging

Authors: Liheng Bian, Haoze Song, Lintao Peng, Xuyang Chang, Xi Yang, Roarke Horstmeyer, Lin Ye, Tong Qin, Dezhi Zheng, Jun Zhang

Abstract: Benefiting from its single-photon sensitivity, single-photon avalanche diode (SPAD) array has been widely applied in various fields such as fluorescence lifetime imaging and quantum computing. However, large-scale high-fidelity single-photon imaging remains a big challenge, due to the complex hardware manufacture craft and heavy noise disturbance of SPAD arrays. In this work, we introduce deep lea… ▽ More Benefiting from its single-photon sensitivity, single-photon avalanche diode (SPAD) array has been widely applied in various fields such as fluorescence lifetime imaging and quantum computing. However, large-scale high-fidelity single-photon imaging remains a big challenge, due to the complex hardware manufacture craft and heavy noise disturbance of SPAD arrays. In this work, we introduce deep learning into SPAD, enabling super-resolution single-photon imaging over an order of magnitude, with significant enhancement of bit depth and imaging quality. We first studied the complex photon flow model of SPAD electronics to accurately characterize multiple physical noise sources, and collected a real SPAD image dataset (64 $\times$ 32 pixels, 90 scenes, 10 different bit depth, 3 different illumination flux, 2790 images in total) to calibrate noise model parameters. With this real-world physical noise model, we for the first time synthesized a large-scale realistic single-photon image dataset (image pairs of 5 different resolutions with maximum megapixels, 17250 scenes, 10 different bit depth, 3 different illumination flux, 2.6 million images in total) for subsequent network training. To tackle the severe super-resolution challenge of SPAD inputs with low bit depth, low resolution, and heavy noise, we further built a deep transformer network with a content-adaptive self-attention mechanism and gated fusion modules, which can dig global contextual features to remove multi-source noise and extract full-frequency details. We applied the technique on a series of experiments including macroscopic and microscopic imaging, microfluidic inspection, and Fourier ptychography. The experiments validate the technique's state-of-the-art super-resolution SPAD imaging performance, with more than 5 dB superiority on PSNR compared to the existing methods. △ Less

Submitted 27 December, 2022; originally announced December 2022.

arXiv:2212.05566 [pdf, other]

YoloCurvSeg: You Only Label One Noisy Skeleton for Vessel-style Curvilinear Structure Segmentation

Authors: Li Lin, Linkai Peng, Huaqing He, Pu** Cheng, Jiewei Wu, Kenneth K. Y. Wong, Xiaoying Tang

Abstract: Weakly-supervised learning (WSL) has been proposed to alleviate the conflict between data annotation cost and model performance through employing sparsely-grained (i.e., point-, box-, scribble-wise) supervision and has shown promising performance, particularly in the image segmentation field. However, it is still a very challenging task due to the limited supervision, especially when only a small… ▽ More Weakly-supervised learning (WSL) has been proposed to alleviate the conflict between data annotation cost and model performance through employing sparsely-grained (i.e., point-, box-, scribble-wise) supervision and has shown promising performance, particularly in the image segmentation field. However, it is still a very challenging task due to the limited supervision, especially when only a small number of labeled samples are available. Additionally, almost all existing WSL segmentation methods are designed for star-convex structures which are very different from curvilinear structures such as vessels and nerves. In this paper, we propose a novel sparsely annotated segmentation framework for curvilinear structures, named YoloCurvSeg. A very essential component of YoloCurvSeg is image synthesis. Specifically, a background generator delivers image backgrounds that closely match the real distributions through inpainting dilated skeletons. The extracted backgrounds are then combined with randomly emulated curves generated by a Space Colonization Algorithm-based foreground generator and through a multilayer patch-wise contrastive learning synthesizer. In this way, a synthetic dataset with both images and curve segmentation labels is obtained, at the cost of only one or a few noisy skeleton annotations. Finally, a segmenter is trained with the generated dataset and possibly an unlabeled dataset. The proposed YoloCurvSeg is evaluated on four publicly available datasets (OCTA500, CORN, DRIVE and CHASEDB1) and the results show that YoloCurvSeg outperforms state-of-the-art WSL segmentation methods by large margins. With only one noisy skeleton annotation (respectively 0.14\%, 0.03\%, 1.40\%, and 0.65\% of the full annotation), YoloCurvSeg achieves more than 97\% of the fully-supervised performance on each dataset. Code and datasets will be released at https://github.com/llmir/YoloCurvSeg. △ Less

Submitted 18 August, 2023; v1 submitted 11 December, 2022; originally announced December 2022.

Comments: 20 pages, 15 figures, MEDIA accepted

arXiv:2208.11184 [pdf, other]

AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

Authors: Ren Yang, Radu Timofte, Xin Li, Qi Zhang, Lin Zhang, Fanglong Liu, Dongliang He, Fu li, He Zheng, Weihang Yuan, Pavel Ostyakov, Dmitry Vyal, Magauiya Zhussip, Xueyi Zou, Youliang Yan, Lei Li, **gzhu Tang, Ming Chen, Shijie Zhao, Yu Zhu, Xiaoran Qin, Chenghua Li, Cong Leng, Jian Cheng, Claudio Rota , et al. (28 additional authors not shown)

Abstract: This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 3… ▽ More This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 365 videos, including the LDV 2.0 dataset (335 videos) and 30 additional videos. In this challenge, there are 12 teams and 2 teams that submitted the final results to Track 1 and Track 2, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution on compressed image and video. The proposed LDV 3.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge is at https://github.com/RenYang-home/AIM22_CompressSR. △ Less

Submitted 25 August, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: Camera-ready version

arXiv:2207.08998 [pdf]

Discovering novel systemic biomarkers in photos of the external eye

Authors: Boris Babenko, Ilana Traynis, Christina Chen, Preeti Singh, Akib Uddin, Jorge Cuadros, Lauren P. Daskivich, April Y. Maa, Ramasamy Kim, Eugene Yu-Chuan Kang, Yossi Matias, Greg S. Corrado, Lily Peng, Dale R. Webster, Christopher Semturs, Jonathan Krause, Avinash V. Varadarajan, Naama Hammel, Yun Liu

Abstract: External eye photos were recently shown to reveal signs of diabetic retinal disease and elevated HbA1c. In this paper, we evaluate if external eye photos contain information about additional systemic medical conditions. We developed a deep learning system (DLS) that takes external eye photos as input and predicts multiple systemic parameters, such as those related to the liver (albumin, AST); kidn… ▽ More External eye photos were recently shown to reveal signs of diabetic retinal disease and elevated HbA1c. In this paper, we evaluate if external eye photos contain information about additional systemic medical conditions. We developed a deep learning system (DLS) that takes external eye photos as input and predicts multiple systemic parameters, such as those related to the liver (albumin, AST); kidney (eGFR estimated using the race-free 2021 CKD-EPI creatinine equation, the urine ACR); bone & mineral (calcium); thyroid (TSH); and blood count (Hgb, WBC, platelets). Development leveraged 151,237 images from 49,015 patients with diabetes undergoing diabetic eye screening in 11 sites across Los Angeles county, CA. Evaluation focused on 9 pre-specified systemic parameters and leveraged 3 validation sets (A, B, C) spanning 28,869 patients with and without diabetes undergoing eye screening in 3 independent sites in Los Angeles County, CA, and the greater Atlanta area, GA. We compared against baseline models incorporating available clinicodemographic variables (e.g. age, sex, race/ethnicity, years with diabetes). Relative to the baseline, the DLS achieved statistically significant superior performance at detecting AST>36, calcium<8.6, eGFR<60, Hgb<11, platelets<150, ACR>=300, and WBC<4 on validation set A (a patient population similar to the development sets), where the AUC of DLS exceeded that of the baseline by 5.2-19.4%. On validation sets B and C, with substantial patient population differences compared to the development sets, the DLS outperformed the baseline for ACR>=300 and Hgb<11 by 7.3-13.2%. Our findings provide further evidence that external eye photos contain important biomarkers of systemic health spanning multiple organ systems. Further work is needed to investigate whether and how these biomarkers can be translated into clinical impact. △ Less

Submitted 18 July, 2022; originally announced July 2022.

arXiv:2206.07289 [pdf, other]

Text-Aware End-to-end Mispronunciation Detection and Diagnosis

Authors: Linkai Peng, Yingming Gao, Binghuai Lin, Dengfeng Ke, Yanlu Xie, **song Zhang

Abstract: Mispronunciation detection and diagnosis (MDD) technology is a key component of computer-assisted pronunciation training system (CAPT). In the field of assessing the pronunciation quality of constrained speech, the given transcriptions can play the role of a teacher. Conventional methods have fully utilized the prior texts for the model construction or improving the system performance, e.g. forced… ▽ More Mispronunciation detection and diagnosis (MDD) technology is a key component of computer-assisted pronunciation training system (CAPT). In the field of assessing the pronunciation quality of constrained speech, the given transcriptions can play the role of a teacher. Conventional methods have fully utilized the prior texts for the model construction or improving the system performance, e.g. forced-alignment and extended recognition networks. Recently, some end-to-end based methods attempt to incorporate the prior texts into model training and preliminarily show the effectiveness. However, previous studies mostly consider applying raw attention mechanism to fuse audio representations with text representations, without taking possible text-pronunciation mismatch into account. In this paper, we present a gating strategy that assigns more importance to the relevant audio features while suppressing irrelevant text information. Moreover, given the transcriptions, we design an extra contrastive loss to reduce the gap between the learning objective of phoneme recognition and MDD. We conducted experiments using two publicly available datasets (TIMIT and L2-Arctic) and our best model improved the F1 score from $57.51\%$ to $61.75\%$ compared to the baselines. Besides, we provide a detailed analysis to shed light on the effectiveness of gating mechanism and contrastive learning on MDD. △ Less

Submitted 15 June, 2022; originally announced June 2022.

Comments: Rejected by Interspeech2022

arXiv:2206.04948 [pdf, other]

A Holistic Robust Motion Controller Framework for Autonomous Platooning

Authors: Hong Wang, Li-Ming Peng, Zi-Chun Wei, Kai Yang, Xian-Xu Bai, Luo Jiang, Ehsan Hashemi

Abstract: Safety is the foremost concern for autonomous platooning. The vehicle-to-vehicle (V2V) communication delay and the sudden appearance of obstacles will trigger the safety of the intended functionality (SOTIF) issues for autonomous platooning. This research proposes a holistic robust motion controller framework (MCF) for an intelligent and connected vehicle platoon system. The MCF utilizes a hierarc… ▽ More Safety is the foremost concern for autonomous platooning. The vehicle-to-vehicle (V2V) communication delay and the sudden appearance of obstacles will trigger the safety of the intended functionality (SOTIF) issues for autonomous platooning. This research proposes a holistic robust motion controller framework (MCF) for an intelligent and connected vehicle platoon system. The MCF utilizes a hierarchical structure to resolve the longitudinal string stability and the lateral control problem under the complex driving environment and time-varying communication delay. Firstly, the H-infinity feedback controller is developed to ensure the robustness of the platoon under time-varying communication delay in the upper-level coordination layer (UCL). The output from UCL will be delivered to the lower-level motion-planning layer (LML) as reference signals. Secondly, the model predictive control (MPC) algorithm is implemented in the LML to achieve multi-objective control, which comprehensively considers the reference signals, the artificial potential field, and multiple vehicle dynamics constraints. Furthermore, three critical scenarios are co-simulated for case studies, including platooning under time-varying communication delay, merging, and obstacle avoidance scenarios. The simulation results indicate that, compared with single-structure MPC, the proposed MCF can offer a better suppression on position error propagation, and get improvements on maximum position error in the three scenarios by $19.2\%$, $59.8\%$, and $15.3\%$, respectively. Last, the practicability and effectiveness of the proposed MCF are verified via hardware-in-the-loop experiment. The average conducting time of the proposed method on Speedgoat real-time target machine is 1.1 milliseconds, which meets the real-time requirements. △ Less

Submitted 10 June, 2022; originally announced June 2022.

Comments: 13 pages, 20 figures

arXiv:2205.04156 [pdf, other]

doi 10.1016/j.sigpro.2022.108728

Towards a median signal detector through the total Bregman divergence and its robustness analysis

Authors: Yusuke Ono, Linyu Peng

Abstract: A novel family of geometric signal detectors are proposed through medians of the total Bregman divergence (TBD), which are shown advantageous over the conventional methods and their mean counterparts. By interpreting the observation data as Hermitian positive-definite matrices, their mean or median play an essential role in signal detection. As is difficult to be solved analytically, we propose nu… ▽ More A novel family of geometric signal detectors are proposed through medians of the total Bregman divergence (TBD), which are shown advantageous over the conventional methods and their mean counterparts. By interpreting the observation data as Hermitian positive-definite matrices, their mean or median play an essential role in signal detection. As is difficult to be solved analytically, we propose numerical solutions through Riemannian gradient descent algorithms or fixed-point algorithms. Beside detection performance, robustness of a detector to outliers is also of vital importance, which can often be analyzed via the influence functions. Introducing an orthogonal basis for Hermitian matrices, we are able to compute the corresponding influence functions analytically and exactly by solving a linear system, which is transformed from the governing matrix equation. Numerical simulations show that the TBD medians are more robust than their mean counterparts. △ Less

Submitted 14 July, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

Comments: 15 pages, 3 figures

Journal ref: Signal Processing 201, 108728, 2022

arXiv:2204.11278 [pdf, ps, other]

doi 10.1109/TCOMM.2022.3170988

Unsupervised Learning Discriminative MIG Detectors in Nonhomogeneous Clutter

Authors: Xiaoqiang Hua, Yusuke Ono, Linyu Peng, Yuting Xu

Abstract: Principal component analysis (PCA) is a commonly used pattern analysis method that maps high-dimensional data into a lower-dimensional space maximizing the data variance, that results in the promotion of separability of data. Inspired by the principle of PCA, a novel type of learning discriminative matrix information geometry (MIG) detectors in the unsupervised scenario are developed, and applied… ▽ More Principal component analysis (PCA) is a commonly used pattern analysis method that maps high-dimensional data into a lower-dimensional space maximizing the data variance, that results in the promotion of separability of data. Inspired by the principle of PCA, a novel type of learning discriminative matrix information geometry (MIG) detectors in the unsupervised scenario are developed, and applied to signal detection in nonhomogeneous environments. Hermitian positive-definite (HPD) matrices can be used to model the sample data, while the clutter covariance matrix is estimated by the geometric mean of a set of secondary HPD matrices. We define a projection that maps the HPD matrices in a high-dimensional manifold to a low-dimensional and more discriminative one to increase the degree of separation of HPD matrices by maximizing the data variance. Learning a map** can be formulated as a two-step mini-max optimization problem in Riemannian manifolds, which can be solved by the Riemannian gradient descent algorithm. Three discriminative MIG detectors are illustrated with respect to different geometric measures, i.e., the Log-Euclidean metric, the Jensen--Bregman LogDet divergence and the symmetrized Kullback--Leibler divergence. Simulation results show that performance improvements of the novel MIG detectors can be achieved compared with the conventional detectors and their state-of-the-art counterparts within nonhomogeneous environments. △ Less

Submitted 8 May, 2022; v1 submitted 24 April, 2022; originally announced April 2022.

Comments: 14 pages, 6 figures

Journal ref: IEEE Transactions on Communications 70, 4107-4120, 2022

arXiv:2203.10139 [pdf]

AI system for fetal ultrasound in low-resource settings

Authors: Ryan G. Gomes, Bellington Vwalika, Chace Lee, Angelica Willis, Marcin Sieniek, Joan T. Price, Christina Chen, Margaret P. Kasaro, James A. Taylor, Elizabeth M. Stringer, Scott Mayer McKinney, Ntazana Sindano, George E. Dahl, William Goodnight III, Justin Gilmer, Benjamin H. Chi, Charles Lau, Terry Spitz, T Saensuksopa, Kris Liu, Jonny Wong, Rory Pilgrim, Akib Uddin, Greg Corrado, Lily Peng , et al. (4 additional authors not shown)

Abstract: Despite considerable progress in maternal healthcare, maternal and perinatal deaths remain high in low-to-middle income countries. Fetal ultrasound is an important component of antenatal care, but shortage of adequately trained healthcare workers has limited its adoption. We developed and validated an artificial intelligence (AI) system that uses novice-acquired "blind sweep" ultrasound videos to… ▽ More Despite considerable progress in maternal healthcare, maternal and perinatal deaths remain high in low-to-middle income countries. Fetal ultrasound is an important component of antenatal care, but shortage of adequately trained healthcare workers has limited its adoption. We developed and validated an artificial intelligence (AI) system that uses novice-acquired "blind sweep" ultrasound videos to estimate gestational age (GA) and fetal malpresentation. We further addressed obstacles that may be encountered in low-resourced settings. Using a simplified sweep protocol with real-time AI feedback on sweep quality, we have demonstrated the generalization of model performance to minimally trained novice ultrasound operators using low cost ultrasound devices with on-device AI integration. The GA model was non-inferior to standard fetal biometry estimates with as few as two sweeps, and the fetal malpresentation model had high AUC-ROCs across operators and devices. Our AI models have the potential to assist in upleveling the capabilities of lightly trained ultrasound operators in low resource settings. △ Less

Submitted 18 March, 2022; originally announced March 2022.

arXiv:2203.09034 [pdf, other]

doi 10.1109/TMI.2022.3201974

GATE: Graph CCA for Temporal SElf-supervised Learning for Label-efficient fMRI Analysis

Authors: Liang Peng, Nan Wang, Jie Xu, Xiaofeng Zhu, Xiaoxiao Li

Abstract: In this work, we focus on the challenging task, neuro-disease classification, using functional magnetic resonance imaging (fMRI). In population graph-based disease analysis, graph convolutional neural networks (GCNs) have achieved remarkable success. However, these achievements are inseparable from abundant labeled data and sensitive to spurious signals. To improve fMRI representation learning and… ▽ More In this work, we focus on the challenging task, neuro-disease classification, using functional magnetic resonance imaging (fMRI). In population graph-based disease analysis, graph convolutional neural networks (GCNs) have achieved remarkable success. However, these achievements are inseparable from abundant labeled data and sensitive to spurious signals. To improve fMRI representation learning and classification under a label-efficient setting, we propose a novel and theory-driven self-supervised learning (SSL) framework on GCNs, namely Graph CCA for Temporal self-supervised learning on fMRI analysis GATE. Concretely, it is demanding to design a suitable and effective SSL strategy to extract formation and robust features for fMRI. To this end, we investigate several new graph augmentation strategies from fMRI dynamic functional connectives (FC) for SSL training. Further, we leverage canonical-correlation analysis (CCA) on different temporal embeddings and present the theoretical implications. Consequently, this yields a novel two-step GCN learning procedure comprised of (i) SSL on an unlabeled fMRI population graph and (ii) fine-tuning on a small labeled fMRI dataset for a classification task. Our method is tested on two independent fMRI datasets, demonstrating superior performance on autism and dementia diagnosis. △ Less

Submitted 27 August, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

Journal ref: IEEE Transactions on Medical Imaging 2022

arXiv:2203.04586 [pdf, other]

Multi-modal Brain Tumor Segmentation via Missing Modality Synthesis and Modality-level Attention Fusion

Authors: Ziqi Huang, Li Lin, Pu** Cheng, Linkai Peng, Xiaoying Tang

Abstract: Multi-modal magnetic resonance (MR) imaging provides great potential for diagnosing and analyzing brain gliomas. In clinical scenarios, common MR sequences such as T1, T2 and FLAIR can be obtained simultaneously in a single scanning process. However, acquiring contrast enhanced modalities such as T1ce requires additional time, cost, and injection of contrast agent. As such, it is clinically meanin… ▽ More Multi-modal magnetic resonance (MR) imaging provides great potential for diagnosing and analyzing brain gliomas. In clinical scenarios, common MR sequences such as T1, T2 and FLAIR can be obtained simultaneously in a single scanning process. However, acquiring contrast enhanced modalities such as T1ce requires additional time, cost, and injection of contrast agent. As such, it is clinically meaningful to develop a method to synthesize unavailable modalities which can also be used as additional inputs to downstream tasks (e.g., brain tumor segmentation) for performance enhancing. In this work, we propose an end-to-end framework named Modality-Level Attention Fusion Network (MAF-Net), wherein we innovatively conduct patchwise contrastive learning for extracting multi-modal latent features and dynamically assigning attention weights to fuse different modalities. Through extensive experiments on BraTS2020, our proposed MAF-Net is found to yield superior T1ce synthesis performance (SSIM of 0.8879 and PSNR of 22.78) and accurate brain tumor segmentation (mean Dice scores of 67.9%, 41.8% and 88.0% on segmenting the tumor core, enhancing tumor and whole tumor). △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: 6 pages, 5 figures, submitted to ICPR 2022

arXiv:2203.03631 [pdf, other]

Student Becomes Decathlon Master in Retinal Vessel Segmentation via Dual-teacher Multi-target Domain Adaptation

Authors: Linkai Peng, Li Lin, Pu** Cheng, Huaqing He, Xiaoying Tang

Abstract: Unsupervised domain adaptation has been proposed recently to tackle the so-called domain shift between training data and test data with different distributions. However, most of them only focus on single-target domain adaptation and cannot be applied to the scenario with multiple target domains. In this paper, we propose RVms, a novel unsupervised multi-target domain adaptation approach to segment… ▽ More Unsupervised domain adaptation has been proposed recently to tackle the so-called domain shift between training data and test data with different distributions. However, most of them only focus on single-target domain adaptation and cannot be applied to the scenario with multiple target domains. In this paper, we propose RVms, a novel unsupervised multi-target domain adaptation approach to segment retinal vessels (RVs) from multimodal and multicenter retinal images. RVms mainly consists of a style augmentation and transfer (SAT) module and a dual-teacher knowledge distillation (DTKD) module. SAT augments and clusters images into source-similar domains and source-dissimilar domains via Bezier and Fourier transformations. DTKD utilizes the augmented and transformed data to train two teachers, one for source-similar domains and the other for source-dissimilar domains. Afterwards, knowledge distillation is performed to iteratively distill different domain knowledge from teachers to a generic student. The local relative intensity transformation is employed to characterize RVs in a domain invariant manner and promote the generalizability of teachers and student models. Moreover, we construct a new multimodal and multicenter vascular segmentation dataset from existing publicly-available datasets, which can be used to benchmark various domain adaptation and domain generalization methods. Through extensive experiments, RVms is found to be very close to the target-trained Oracle in terms of segmenting the RVs, largely outperforming other state-of-the-art methods. △ Less

Submitted 11 October, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

Comments: To be published in MICCAI-MLMI 2022

arXiv:2201.04812 [pdf, other]

Unsupervised Domain Adaptation for Cross-Modality Retinal Vessel Segmentation via Disentangling Representation Style Transfer and Collaborative Consistency Learning

Authors: Linkai Peng, Li Lin, Pu** Cheng, Ziqi Huang, Xiaoying Tang

Abstract: Various deep learning models have been developed to segment anatomical structures from medical images, but they typically have poor performance when tested on another target domain with different data distribution. Recently, unsupervised domain adaptation methods have been proposed to alleviate this so-called domain shift issue, but most of them are designed for scenarios with relatively small dom… ▽ More Various deep learning models have been developed to segment anatomical structures from medical images, but they typically have poor performance when tested on another target domain with different data distribution. Recently, unsupervised domain adaptation methods have been proposed to alleviate this so-called domain shift issue, but most of them are designed for scenarios with relatively small domain shifts and are likely to fail when encountering a large domain gap. In this paper, we propose DCDA, a novel cross-modality unsupervised domain adaptation framework for tasks with large domain shifts, e.g., segmenting retinal vessels from OCTA and OCT images. DCDA mainly consists of a disentangling representation style transfer (DRST) module and a collaborative consistency learning (CCL) module. DRST decomposes images into content components and style codes and performs style transfer and image reconstruction. CCL contains two segmentation models, one for source domain and the other for target domain. The two models use labeled data (together with the corresponding transferred images) for supervised learning and perform collaborative consistency learning on unlabeled data. Each model focuses on the corresponding single domain and aims to yield an expertized domain-specific segmentation model. Through extensive experiments on retinal vessel segmentation, our framework achieves Dice scores close to target-trained oracle both from OCTA to OCT and from OCT to OCTA, significantly outperforming other state-of-the-art methods. △ Less

Submitted 20 January, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

Comments: To be published in ISBI 2022

arXiv:2111.11843 [pdf, other]

doi 10.1109/TIP.2023.3276332

U-shape Transformer for Underwater Image Enhancement

Authors: Lintao Peng, Chunli Zhu, Liheng Bian

Abstract: The light absorption and scattering of underwater impurities lead to poor underwater imaging quality. The existing data-driven based underwater image enhancement (UIE) techniques suffer from the lack of a large-scale dataset containing various underwater scenes and high-fidelity reference images. Besides, the inconsistent attenuation in different color channels and space areas is not fully conside… ▽ More The light absorption and scattering of underwater impurities lead to poor underwater imaging quality. The existing data-driven based underwater image enhancement (UIE) techniques suffer from the lack of a large-scale dataset containing various underwater scenes and high-fidelity reference images. Besides, the inconsistent attenuation in different color channels and space areas is not fully considered for boosted enhancement. In this work, we constructed a large-scale underwater image (LSUI) dataset including 5004 image pairs, and reported an U-shape Transformer network where the transformer model is for the first time introduced to the UIE task. The U-shape Transformer is integrated with a channel-wise multi-scale feature fusion transformer (CMSFFT) module and a spatial-wise global feature modeling transformer (SGFMT) module, which reinforce the network's attention to the color channels and space areas with more serious attenuation. Meanwhile, in order to further improve the contrast and saturation, a novel loss function combining RGB, LAB and LCH color spaces is designed following the human vision principle. The extensive experiments on available datasets validate the state-of-the-art performance of the reported technique with more than 2dB superiority. △ Less

Submitted 12 June, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

Comments: under review

arXiv:2111.01544 [pdf]

Comprehensive and Clinically Accurate Head and Neck Organs at Risk Delineation via Stratified Deep Learning: A Large-scale Multi-Institutional Study

Authors: Dazhou Guo, Jia Ge, Xianghua Ye, Senxiang Yan, Yi Xin, Yuchen Song, Bing-shen Huang, Tsung-Min Hung, Zhuotun Zhu, Ling Peng, Yan** Ren, Rui Liu, Gong Zhang, Mengyuan Mao, Xiaohua Chen, Zhongjie Lu, Wenxiang Li, Yuzhen Chen, Lingyun Huang, **g Xiao, Adam P. Harrison, Le Lu, Chien-Yu Lin, Dakai **, Tsung-Ying Ho

Abstract: Accurate organ at risk (OAR) segmentation is critical to reduce the radiotherapy post-treatment complications. Consensus guidelines recommend a set of more than 40 OARs in the head and neck (H&N) region, however, due to the predictable prohibitive labor-cost of this task, most institutions choose a substantially simplified protocol by delineating a smaller subset of OARs and neglecting the dose di… ▽ More Accurate organ at risk (OAR) segmentation is critical to reduce the radiotherapy post-treatment complications. Consensus guidelines recommend a set of more than 40 OARs in the head and neck (H&N) region, however, due to the predictable prohibitive labor-cost of this task, most institutions choose a substantially simplified protocol by delineating a smaller subset of OARs and neglecting the dose distributions associated with other OARs. In this work we propose a novel, automated and highly effective stratified OAR segmentation (SOARS) system using deep learning to precisely delineate a comprehensive set of 42 H&N OARs. SOARS stratifies 42 OARs into anchor, mid-level, and small & hard subcategories, with specifically derived neural network architectures for each category by neural architecture search (NAS) principles. We built SOARS models using 176 training patients in an internal institution and independently evaluated on 1327 external patients across six different institutions. It consistently outperformed other state-of-the-art methods by at least 3-5% in Dice score for each institutional evaluation (up to 36% relative error reduction in other metrics). More importantly, extensive multi-user studies evidently demonstrated that 98% of the SOARS predictions need only very minor or no revisions for direct clinical acceptance (saving 90% radiation oncologists workload), and their segmentation and dosimetric accuracy are within or smaller than the inter-user variation. These findings confirmed the strong clinical applicability of SOARS for the OAR delineation process in H&N cancer radiotherapy workflows, with improved efficiency, comprehensiveness, and quality. △ Less

Submitted 1 November, 2021; originally announced November 2021.

arXiv:2110.12271 [pdf, other]

Self-Validation: Early Stop** for Single-Instance Deep Generative Priors

Authors: Taihui Li, Zhong Zhuang, Hengyue Liang, Le Peng, Hengkang Wang, Ju Sun

Abstract: Recent works have shown the surprising effectiveness of deep generative models in solving numerous image reconstruction (IR) tasks, even without training data. We call these models, such as deep image prior and deep decoder, collectively as single-instance deep generative priors (SIDGPs). The successes, however, often hinge on appropriate early stop** (ES), which by far has largely been handled… ▽ More Recent works have shown the surprising effectiveness of deep generative models in solving numerous image reconstruction (IR) tasks, even without training data. We call these models, such as deep image prior and deep decoder, collectively as single-instance deep generative priors (SIDGPs). The successes, however, often hinge on appropriate early stop** (ES), which by far has largely been handled in an ad-hoc manner. In this paper, we propose the first principled method for ES when applying SIDGPs to IR, taking advantage of the typical bell trend of the reconstruction quality. In particular, our method is based on collaborative training and self-validation: the primal reconstruction process is monitored by a deep autoencoder, which is trained online with the historic reconstructed images and used to validate the reconstruction quality constantly. Experimentally, on several IR problems and different SIDGPs, our self-validation method is able to reliably detect near-peak performance and signal good ES points. Our code is available at https://sun-umn.github.io/Self-Validation/. △ Less

Submitted 23 October, 2021; originally announced October 2021.

Comments: To appear in British Machine Vision Conference (BMVC) 2021

arXiv:2109.09271 [pdf, ps, other]

DeepStationing: Thoracic Lymph Node Station Parsing in CT Scans using Anatomical Context Encoding and Key Organ Auto-Search

Authors: Dazhou Guo, Xianghua Ye, Jia Ge, Xing Di, Le Lu, Lingyun Huang, Guotong Xie, **g Xiao, Zhongjie Liu, Ling Peng, Senxiang Yan, Dakai **

Abstract: Lymph node station (LNS) delineation from computed tomography (CT) scans is an indispensable step in radiation oncology workflow. High inter-user variabilities across oncologists and prohibitive laboring costs motivated the automated approach. Previous works exploit anatomical priors to infer LNS based on predefined ad-hoc margins. However, without voxel-level supervision, the performance is sever… ▽ More Lymph node station (LNS) delineation from computed tomography (CT) scans is an indispensable step in radiation oncology workflow. High inter-user variabilities across oncologists and prohibitive laboring costs motivated the automated approach. Previous works exploit anatomical priors to infer LNS based on predefined ad-hoc margins. However, without voxel-level supervision, the performance is severely limited. LNS is highly context-dependent - LNS boundaries are constrained by anatomical organs - we formulate it as a deep spatial and contextual parsing problem via encoded anatomical organs. This permits the deep network to better learn from both CT appearance and organ context. We develop a stratified referencing organ segmentation protocol that divides the organs into anchor and non-anchor categories and uses the former's predictions to guide the later segmentation. We further develop an auto-search module to identify the key organs that opt for the optimal LNS parsing performance. Extensive four-fold cross-validation experiments on a dataset of 98 esophageal cancer patients (with the most comprehensive set of 12 LNSs + 22 organs in thoracic region to date) are conducted. Our LNS parsing model produces significant performance improvements, with an average Dice score of 81.1% +/- 6.1%, which is 5.0% and 19.2% higher over the pure CT-based deep model and the previous representative approach, respectively. △ Less

Submitted 19 September, 2021; originally announced September 2021.

arXiv:2108.10213 [pdf]

doi 10.1109/TMC.2022.3171312

SALIENCE: An Unsupervised User Adaptation Model for Multiple Wearable Sensors Based Human Activity Recognition

Authors: Ling Chen, Yi Zhang, Shenghuan Miao, Sirou Zhu, Rong Hu, Liangying Peng, Mingqi Lv

Abstract: Unsupervised user adaptation aligns the feature distributions of the data from training users and the new user, so a well-trained wearable human activity recognition (WHAR) model can be well adapted to the new user. With the development of wearable sensors, multiple wearable sensors based WHAR is gaining more and more attention. In order to address the challenge that the transferabilities of diffe… ▽ More Unsupervised user adaptation aligns the feature distributions of the data from training users and the new user, so a well-trained wearable human activity recognition (WHAR) model can be well adapted to the new user. With the development of wearable sensors, multiple wearable sensors based WHAR is gaining more and more attention. In order to address the challenge that the transferabilities of different sensors are different, we propose SALIENCE (unsupervised user adaptation model for multiple wearable sensors based human activity recognition) model. It aligns the data of each sensor separately to achieve local alignment, while uniformly aligning the data of all sensors to ensure global alignment. In addition, an attention mechanism is proposed to focus the activity classifier of SALIENCE on the sensors with strong feature discrimination and well distribution alignment. Experiments are conducted on two public WHAR datasets, and the experimental results show that our model can yield a competitive performance. △ Less

Submitted 27 April, 2022; v1 submitted 17 August, 2021; originally announced August 2021.

Comments: Accepted by IEEE Transactions on Mobile Computing

arXiv:2108.00911 [pdf, ps, other]

Multi-phase Liver Tumor Segmentation with Spatial Aggregation and Uncertain Region Inpainting

Authors: Yue Zhang, Chengtao Peng, Liying Peng, Huimin Huang, Ruofeng Tong, Lanfen Lin, **gsong Li, Yen-Wei Chen, Qingqing Chen, Hongjie Hu, Zhiyi Peng

Abstract: Multi-phase computed tomography (CT) images provide crucial complementary information for accurate liver tumor segmentation (LiTS). State-of-the-art multi-phase LiTS methods usually fused cross-phase features through phase-weighted summation or channel-attention based concatenation. However, these methods ignored the spatial (pixel-wise) relationships between different phases, hence leading to ins… ▽ More Multi-phase computed tomography (CT) images provide crucial complementary information for accurate liver tumor segmentation (LiTS). State-of-the-art multi-phase LiTS methods usually fused cross-phase features through phase-weighted summation or channel-attention based concatenation. However, these methods ignored the spatial (pixel-wise) relationships between different phases, hence leading to insufficient feature integration. In addition, the performance of existing methods remains subject to the uncertainty in segmentation, which is particularly acute in tumor boundary regions. In this work, we propose a novel LiTS method to adequately aggregate multi-phase information and refine uncertain region segmentation. To this end, we introduce a spatial aggregation module (SAM), which encourages per-pixel interactions between different phases, to make full use of cross-phase information. Moreover, we devise an uncertain region inpainting module (URIM) to refine uncertain pixels using neighboring discriminative features. Experiments on an in-house multi-phase CT dataset of focal liver lesions (MPCT-FLLs) demonstrate that our method achieves promising liver tumor segmentation and outperforms state-of-the-arts. △ Less

Submitted 5 August, 2021; v1 submitted 2 August, 2021; originally announced August 2021.

Comments: To appear in MICCAI 2021

arXiv:2106.09891 [pdf, ps, other]

ICINet: ICI-Aware Neural Network Based Channel Estimation for Rapidly Time-Varying OFDM Systems

Authors: Yi Sun, Hong Shen, Zhenguo Du, Lan Peng, Chunming Zhao

Abstract: A novel intercarrier interference (ICI)-aware orthogonal frequency division multiplexing (OFDM) channel estimation network ICINet is presented for rapidly time-varying channels. ICINet consists of two components: a preprocessing deep neural subnetwork (PreDNN) and a cascaded residual learning-based neural subnetwork (CasResNet). By fully taking into account the impact of ICI, the proposed PreDNN f… ▽ More A novel intercarrier interference (ICI)-aware orthogonal frequency division multiplexing (OFDM) channel estimation network ICINet is presented for rapidly time-varying channels. ICINet consists of two components: a preprocessing deep neural subnetwork (PreDNN) and a cascaded residual learning-based neural subnetwork (CasResNet). By fully taking into account the impact of ICI, the proposed PreDNN first refines the initial channel estimates in a subcarrier-wise fashion. In addition, the CasResNet is designed to further enhance the estimation accuracy. The proposed cascaded network is compatible with any pilot patterns and robust against mismatched system configurations. Simulation results verify the superiority of ICINet over existing networks in terms of better performance and much less complexity. △ Less

Submitted 17 June, 2021; originally announced June 2021.

arXiv:2106.05152 [pdf, other]

Rethinking Transfer Learning for Medical Image Classification

Authors: Le Peng, Hengyue Liang, Gaoxiang Luo, Taihui Li, Ju Sun

Abstract: Transfer learning (TL) from pretrained deep models is a standard practice in modern medical image classification (MIC). However, what levels of features to be reused are problem-dependent, and uniformly finetuning all layers of pretrained models may be suboptimal. This insight has partly motivated the recent differential TL strategies, such as TransFusion (TF) and layer-wise finetuning (LWFT), whi… ▽ More Transfer learning (TL) from pretrained deep models is a standard practice in modern medical image classification (MIC). However, what levels of features to be reused are problem-dependent, and uniformly finetuning all layers of pretrained models may be suboptimal. This insight has partly motivated the recent differential TL strategies, such as TransFusion (TF) and layer-wise finetuning (LWFT), which treat the layers in the pretrained models differentially. In this paper, we add one more strategy into this family, called TruncatedTL, which reuses and finetunes appropriate bottom layers and directly discards the remaining layers. This yields not only superior MIC performance but also compact models for efficient inference, compared to other differential TL methods. Our code is available at: https://github.com/sun-umn/TTL △ Less

Submitted 26 May, 2024; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: Accepted by BMVC2023 (oral)

arXiv:2106.05082 [pdf, other]

doi 10.1364/OE.434805

Agile wide-field imaging with selective high resolution

Authors: Lintao Peng, Liheng Bian, Tiexin Liu, Jun Zhang

Abstract: Wide-field and high-resolution (HR) imaging is essential for various applications such as aviation reconnaissance, topographic map** and safety monitoring. The existing techniques require a large-scale detector array to capture HR images of the whole field, resulting in high complexity and heavy cost. In this work, we report an agile wide-field imaging framework with selective high resolution th… ▽ More Wide-field and high-resolution (HR) imaging is essential for various applications such as aviation reconnaissance, topographic map** and safety monitoring. The existing techniques require a large-scale detector array to capture HR images of the whole field, resulting in high complexity and heavy cost. In this work, we report an agile wide-field imaging framework with selective high resolution that requires only two detectors. It builds on the statistical sparsity prior of natural scenes that the important targets locate only at small regions of interests (ROI), instead of the whole field. Under this assumption, we use a short-focal camera to image wide field with a certain low resolution, and use a long-focal camera to acquire the HR images of ROI. To automatically locate ROI in the wide field in real time, we propose an efficient deep-learning based multiscale registration method that is robust and blind to the large setting differences (focal, white balance, etc) between the two cameras. Using the registered location, the long-focal camera mounted on a gimbal enables real-time tracking of the ROI for continuous HR imaging. We demonstrated the novel imaging framework by building a proof-of-concept setup with only 1181 gram weight, and assembled it on an unmanned aerial vehicle for air-to-ground monitoring. Experiments show that the setup maintains 120$^{\circ}$ wide field-of-view (FOV) with selective 0.45$mrad$ instantaneous FOV. △ Less

Submitted 11 June, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: 12pages,6figures

arXiv:2106.02118 [pdf]

doi 10.1101/2021.06.04.21258316

A Prospective Observational Study to Investigate Performance of a Chest X-ray Artificial Intelligence Diagnostic Support Tool Across 12 U.S. Hospitals

Authors: Ju Sun, Le Peng, Taihui Li, Dyah Adila, Zach Zaiman, Genevieve B. Melton, Nicholas Ingraham, Eric Murray, Daniel Boley, Sean Switzer, John L. Burns, Kun Huang, Tadashi Allen, Scott D. Steenburg, Judy Wawira Gichoya, Erich Kummerfeld, Christopher Tignanelli

Abstract: Importance: An artificial intelligence (AI)-based model to predict COVID-19 likelihood from chest x-ray (CXR) findings can serve as an important adjunct to accelerate immediate clinical decision making and improve clinical decision making. Despite significant efforts, many limitations and biases exist in previously developed AI diagnostic models for COVID-19. Utilizing a large set of local and int… ▽ More Importance: An artificial intelligence (AI)-based model to predict COVID-19 likelihood from chest x-ray (CXR) findings can serve as an important adjunct to accelerate immediate clinical decision making and improve clinical decision making. Despite significant efforts, many limitations and biases exist in previously developed AI diagnostic models for COVID-19. Utilizing a large set of local and international CXR images, we developed an AI model with high performance on temporal and external validation. Conclusions and Relevance: AI-based diagnostic tools may serve as an adjunct, but not replacement, for clinical decision support of COVID-19 diagnosis, which largely hinges on exposure history, signs, and symptoms. While AI-based tools have not yet reached full diagnostic potential in COVID-19, they may still offer valuable information to clinicians taken into consideration along with clinical signs and symptoms. △ Less

Submitted 6 June, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: Check out the medRxiv version at https://doi.org/10.1101/2021.06.04.21258316 for updates

arXiv:2105.12889 [pdf, other]

doi 10.1016/j.sigpro.2021.108176

MIG Median Detectors with Manifold Filter

Authors: Xiaoqiang Hua, Linyu Peng

Abstract: In this paper, we propose a class of median-based matrix information geometry (MIG) detectors with a manifold filter and apply them to signal detection in nonhomogeneous environments. As customary, the sample data is assumed to be modeled as Hermitian positive-definite (HPD) matrices, and the geometric median of a set of HPD matrices is interpreted as an estimate of the clutter covariance matrix (… ▽ More In this paper, we propose a class of median-based matrix information geometry (MIG) detectors with a manifold filter and apply them to signal detection in nonhomogeneous environments. As customary, the sample data is assumed to be modeled as Hermitian positive-definite (HPD) matrices, and the geometric median of a set of HPD matrices is interpreted as an estimate of the clutter covariance matrix (CCM). Then, the problem of signal detection can be reformulated as discriminating two points on the manifold of HPD matrices, one of which is the HPD matrix in the cell under test while the other represents the CCM. By manifold filter, we map a set of HPD matrices to another set of HPD matrices by weighting them, that consequently improves the discriminative power by reducing the intra-class distances while increasing the inter-class distances. Three MIG median detectors are designed by resorting to three geometric measures on the matrix manifold, and the corresponding geometric medians are shown to be robust to outliers. Numerical simulations show the advantage of the proposed MIG median detectors in comparison with their state-of-the-art counterparts as well as the conventional detectors in nonhomogeneous environments. △ Less

Submitted 26 May, 2021; originally announced May 2021.

Comments: 22 pages, 12 figures

Journal ref: Signal Processing 188, 108176, 2021

arXiv:2105.07540 [pdf]

Deep learning for detecting pulmonary tuberculosis via chest radiography: an international study across 10 countries

Authors: Sahar Kazemzadeh, ** Yu, Shahar Jamshy, Rory Pilgrim, Zaid Nabulsi, Christina Chen, Neeral Beladia, Charles Lau, Scott Mayer McKinney, Thad Hughes, Atilla Kiraly, Sreenivasa Raju Kalidindi, Monde Muyoyeta, Jameson Malemela, Ting Shih, Greg S. Corrado, Lily Peng, Katherine Chou, Po-Hsuan Cameron Chen, Yun Liu, Krish Eswaran, Daniel Tse, Shravya Shetty, Shruthi Prabhakara

Abstract: Tuberculosis (TB) is a top-10 cause of death worldwide. Though the WHO recommends chest radiographs (CXRs) for TB screening, the limited availability of CXR interpretation is a barrier. We trained a deep learning system (DLS) to detect active pulmonary TB using CXRs from 9 countries across Africa, Asia, and Europe, and utilized large-scale CXR pretraining, attention pooling, and noisy student semi… ▽ More Tuberculosis (TB) is a top-10 cause of death worldwide. Though the WHO recommends chest radiographs (CXRs) for TB screening, the limited availability of CXR interpretation is a barrier. We trained a deep learning system (DLS) to detect active pulmonary TB using CXRs from 9 countries across Africa, Asia, and Europe, and utilized large-scale CXR pretraining, attention pooling, and noisy student semi-supervised learning. Evaluation was on (1) a combined test set spanning China, India, US, and Zambia, and (2) an independent mining population in South Africa. Given WHO targets of 90% sensitivity and 70% specificity, the DLS's operating point was prespecified to favor sensitivity over specificity. On the combined test set, the DLS's ROC curve was above all 9 India-based radiologists, with an AUC of 0.90 (95%CI 0.87-0.92). The DLS's sensitivity (88%) was higher than the India-based radiologists (75% mean sensitivity), p<0.001 for superiority; and its specificity (79%) was non-inferior to the radiologists (84% mean specificity), p=0.004. Similar trends were observed within HIV positive and sputum smear positive sub-groups, and in the South Africa test set. We found that 5 US-based radiologists (where TB isn't endemic) were more sensitive and less specific than the India-based radiologists (where TB is endemic). The DLS also remained non-inferior to the US-based radiologists. In simulations, using the DLS as a prioritization tool for confirmatory testing reduced the cost per positive case detected by 40-80% compared to using confirmatory testing alone. To conclude, our DLS generalized to 5 countries, and merits prospective evaluation to assist cost-effective screening efforts in radiologist-limited settings. Operating point flexibility may permit customization of the DLS to account for site-specific factors such as TB prevalence, demographics, clinical resources, and customary practice patterns. △ Less

Submitted 29 October, 2021; v1 submitted 16 May, 2021; originally announced May 2021.

arXiv:2105.06270 [pdf, other]

Group Feature Learning and Domain Adversarial Neural Network for aMCI Diagnosis System Based on EEG

Authors: Chen-Chen Fan, Haiqun Xie, Liang Peng, Hongjun Yang, Zhen-Liang Ni, Guan'an Wang, Yan-Jie Zhou, Sheng Chen, Zhijie Fang, Shuyun Huang, Zeng-Guang Hou

Abstract: Medical diagnostic robot systems have been paid more and more attention due to its objectivity and accuracy. The diagnosis of mild cognitive impairment (MCI) is considered an effective means to prevent Alzheimer's disease (AD). Doctors diagnose MCI based on various clinical examinations, which are expensive and the diagnosis results rely on the knowledge of doctors. Therefore, it is necessary to d… ▽ More Medical diagnostic robot systems have been paid more and more attention due to its objectivity and accuracy. The diagnosis of mild cognitive impairment (MCI) is considered an effective means to prevent Alzheimer's disease (AD). Doctors diagnose MCI based on various clinical examinations, which are expensive and the diagnosis results rely on the knowledge of doctors. Therefore, it is necessary to develop a robot diagnostic system to eliminate the influence of human factors and obtain a higher accuracy rate. In this paper, we propose a novel Group Feature Domain Adversarial Neural Network (GF-DANN) for amnestic MCI (aMCI) diagnosis, which involves two important modules. A Group Feature Extraction (GFE) module is proposed to reduce individual differences by learning group-level features through adversarial learning. A Dual Branch Domain Adaptation (DBDA) module is carefully designed to reduce the distribution difference between the source and target domain in a domain adaption way. On three types of data set, GF-DANN achieves the best accuracy compared with classic machine learning and deep learning methods. On the DMS data set, GF-DANN has obtained an accuracy rate of 89.47%, and the sensitivity and specificity are 90% and 89%. In addition, by comparing three EEG data collection paradigms, our results demonstrate that the DMS paradigm has the potential to build an aMCI diagnose robot system. △ Less

Submitted 28 April, 2021; originally announced May 2021.

Comments: This paper has been accepted by 2021 International Conference on Robotics and Automation (ICRA 2021)

arXiv:2101.01668 [pdf, other]

Radio Frequency Fingerprint Identification for LoRa Using Spectrogram and CNN

Authors: Guanxiong Shen, Junqing Zhang, Alan Marshall, Linning Peng, Xianbin Wang

Abstract: Radio frequency fingerprint identification (RFFI) is an emerging device authentication technique that relies on intrinsic hardware characteristics of wireless devices. We designed an RFFI scheme for Long Range (LoRa) systems based on spectrogram and convolutional neural network (CNN). Specifically, we used spectrogram to represent the fine-grained time-frequency characteristics of LoRa signals. In… ▽ More Radio frequency fingerprint identification (RFFI) is an emerging device authentication technique that relies on intrinsic hardware characteristics of wireless devices. We designed an RFFI scheme for Long Range (LoRa) systems based on spectrogram and convolutional neural network (CNN). Specifically, we used spectrogram to represent the fine-grained time-frequency characteristics of LoRa signals. In addition, we revealed that the instantaneous carrier frequency offset (CFO) is drifting, which will result in misclassification and significantly compromise the system stability; we demonstrated CFO compensation is an effective mitigation. Finally, we designed a hybrid classifier that can adjust CNN outputs with the estimated CFO. The mean value of CFO remains relatively stable, hence it can be used to rule out CNN predictions whose estimated CFO falls out of the range. We performed experiments in real wireless environments using 20 LoRa devices under test (DUTs) and a Universal Software Radio Peripheral (USRP) N210 receiver. By comparing with the IQ-based and FFT-based RFFI schemes, our spectrogram-based scheme can reach the best classification accuracy, i.e., 97.61% for 20 LoRa DUTs. △ Less

Submitted 30 December, 2020; originally announced January 2021.

Comments: Accepted for publication in IEEE INFOCOM 2021

arXiv:2012.13861 [pdf, other]

doi 10.1109/TSP.2021.3095725

Target Detection within Nonhomogeneous Clutter via Total Bregman Divergence-Based Matrix Information Geometry Detectors

Authors: Xiaoqiang Hua, Yusuke Ono, Linyu Peng, Yongqiang Cheng, Hongqiang Wang

Abstract: Information divergences are commonly used to measure the dissimilarity of two elements on a statistical manifold. Differentiable manifolds endowed with different divergences may possess different geometric properties, which can result in totally different performances in many practical applications. In this paper, we propose a total Bregman divergence-based matrix information geometry (TBD-MIG) de… ▽ More Information divergences are commonly used to measure the dissimilarity of two elements on a statistical manifold. Differentiable manifolds endowed with different divergences may possess different geometric properties, which can result in totally different performances in many practical applications. In this paper, we propose a total Bregman divergence-based matrix information geometry (TBD-MIG) detector and apply it to detect targets emerged into nonhomogeneous clutter. In particular, each sample data is assumed to be modeled as a Hermitian positive-definite (HPD) matrix and the clutter covariance matrix is estimated by the TBD mean of a set of secondary HPD matrices. We then reformulate the problem of signal detection as discriminating two points on the HPD matrix manifold. Three TBD-MIG detectors, referred to as the total square loss, the total log-determinant and the total von Neumann MIG detectors, are proposed, and they can achieve great performances due to their power of discrimination and robustness to interferences. Simulations show the advantage of the proposed TBD-MIG detectors in comparison with the geometric detector using an affine invariant Riemannian metric as well as the adaptive matched filter in nonhomogeneous clutter. △ Less

Submitted 7 August, 2021; v1 submitted 26 December, 2020; originally announced December 2020.

Comments: 15 pages, 8 figures

Journal ref: IEEE Transactions on Signal Processing, 69, 4326-4340, 2021

arXiv:2011.11732 [pdf]

doi 10.1038/s41551-022-00867-5

Detecting hidden signs of diabetes in external eye photographs

Authors: Boris Babenko, Akinori Mitani, Ilana Traynis, Naho Kitade, Preeti Singh, April Maa, Jorge Cuadros, Greg S. Corrado, Lily Peng, Dale R. Webster, Avinash Varadarajan, Naama Hammel, Yun Liu

Abstract: Diabetes-related retinal conditions can be detected by examining the posterior of the eye. By contrast, examining the anterior of the eye can reveal conditions affecting the front of the eye, such as changes to the eyelids, cornea, or crystalline lens. In this work, we studied whether external photographs of the front of the eye can reveal insights into both diabetic retinal diseases and blood glu… ▽ More Diabetes-related retinal conditions can be detected by examining the posterior of the eye. By contrast, examining the anterior of the eye can reveal conditions affecting the front of the eye, such as changes to the eyelids, cornea, or crystalline lens. In this work, we studied whether external photographs of the front of the eye can reveal insights into both diabetic retinal diseases and blood glucose control. We developed a deep learning system (DLS) using external eye photographs of 145,832 patients with diabetes from 301 diabetic retinopathy (DR) screening sites in one US state, and evaluated the DLS on three validation sets containing images from 198 sites in 18 other US states. In validation set A (n=27,415 patients, all undilated), the DLS detected poor blood glucose control (HbA1c > 9%) with an area under receiver operating characteristic curve (AUC) of 70.2; moderate-or-worse DR with an AUC of 75.3; diabetic macular edema with an AUC of 78.0; and vision-threatening DR with an AUC of 79.4. For all 4 prediction tasks, the DLS's AUC was higher (p<0.001) than using available self-reported baseline characteristics (age, sex, race/ethnicity, years with diabetes). In terms of positive predictive value, the predicted top 5% of patients had a 67% chance of having HbA1c > 9%, and a 20% chance of having vision threatening diabetic retinopathy. The results generalized to dilated pupils (validation set B, 5,058 patients) and to a different screening service (validation set C, 10,402 patients). Our results indicate that external eye photographs contain information useful for healthcare providers managing patients with diabetes, and may help prioritize patients for in-person screening. Further work is needed to validate these findings on different devices and patient populations (those without diabetes) to evaluate its utility for remote diagnosis and management. △ Less

Submitted 23 November, 2020; originally announced November 2020.

Journal ref: Nature Biomedical Engineering 2022

arXiv:2011.08965 [pdf]

doi 10.1038/s41746-021-00427-2

Interpretable Survival Prediction for Colorectal Cancer using Deep Learning

Authors: Ellery Wulczyn, David F. Steiner, Melissa Moran, Markus Plass, Robert Reihs, Fraser Tan, Isabelle Flament-Auvigne, Trissia Brown, Peter Regitnig, Po-Hsuan Cameron Chen, Narayan Hegde, Apaar Sadhwani, Robert MacDonald, Benny Ayalew, Greg S. Corrado, Lily H. Peng, Daniel Tse, Heimo Müller, Zhaoyang Xu, Yun Liu, Martin C. Stumpe, Kurt Zatloukal, Craig H. Mermel

Abstract: Deriving interpretable prognostic features from deep-learning-based prognostic histopathology models remains a challenge. In this study, we developed a deep learning system (DLS) for predicting disease specific survival for stage II and III colorectal cancer using 3,652 cases (27,300 slides). When evaluated on two validation datasets containing 1,239 cases (9,340 slides) and 738 cases (7,140 slide… ▽ More Deriving interpretable prognostic features from deep-learning-based prognostic histopathology models remains a challenge. In this study, we developed a deep learning system (DLS) for predicting disease specific survival for stage II and III colorectal cancer using 3,652 cases (27,300 slides). When evaluated on two validation datasets containing 1,239 cases (9,340 slides) and 738 cases (7,140 slides) respectively, the DLS achieved a 5-year disease-specific survival AUC of 0.70 (95%CI 0.66-0.73) and 0.69 (95%CI 0.64-0.72), and added significant predictive value to a set of 9 clinicopathologic features. To interpret the DLS, we explored the ability of different human-interpretable features to explain the variance in DLS scores. We observed that clinicopathologic features such as T-category, N-category, and grade explained a small fraction of the variance in DLS scores (R2=18% in both validation sets). Next, we generated human-interpretable histologic features by clustering embeddings from a deep-learning based image-similarity model and showed that they explain the majority of the variance (R2 of 73% to 80%). Furthermore, the clustering-derived feature most strongly associated with high DLS scores was also highly prognostic in isolation. With a distinct visual appearance (poorly differentiated tumor cell clusters adjacent to adipose tissue), this feature was identified by annotators with 87.0-95.5% accuracy. Our approach can be used to explain predictions from a prognostic deep learning model and uncover potentially-novel prognostic features that can be reliably identified by people for future validation studies. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Journal ref: Nature Partner Journal Digital Medicine (2021)

arXiv:2010.11375 [pdf]

doi 10.1038/s41598-021-93967-2

Deep Learning for Distinguishing Normal versus Abnormal Chest Radiographs and Generalization to Unseen Diseases

Authors: Zaid Nabulsi, Andrew Sellergren, Shahar Jamshy, Charles Lau, Edward Santos, Atilla P. Kiraly, Wenxing Ye, Jie Yang, Rory Pilgrim, Sahar Kazemzadeh, ** Yu, Sreenivasa Raju Kalidindi, Mozziyar Etemadi, Florencia Garcia-Vicente, David Melnick, Greg S. Corrado, Lily Peng, Krish Eswaran, Daniel Tse, Neeral Beladia, Yun Liu, Po-Hsuan Cameron Chen, Shravya Shetty

Abstract: Chest radiography (CXR) is the most widely-used thoracic clinical imaging modality and is crucial for guiding the management of cardiothoracic conditions. The detection of specific CXR findings has been the main focus of several artificial intelligence (AI) systems. However, the wide range of possible CXR abnormalities makes it impractical to build specific systems to detect every possible conditi… ▽ More Chest radiography (CXR) is the most widely-used thoracic clinical imaging modality and is crucial for guiding the management of cardiothoracic conditions. The detection of specific CXR findings has been the main focus of several artificial intelligence (AI) systems. However, the wide range of possible CXR abnormalities makes it impractical to build specific systems to detect every possible condition. In this work, we developed and evaluated an AI system to classify CXRs as normal or abnormal. For development, we used a de-identified dataset of 248,445 patients from a multi-city hospital network in India. To assess generalizability, we evaluated our system using 6 international datasets from India, China, and the United States. Of these datasets, 4 focused on diseases that the AI was not trained to detect: 2 datasets with tuberculosis and 2 datasets with coronavirus disease 2019. Our results suggest that the AI system generalizes to new patient populations and abnormalities. In a simulated workflow where the AI system prioritized abnormal cases, the turnaround time for abnormal cases reduced by 7-28%. These results represent an important step towards evaluating whether AI can be safely used to flag cases in a general setting where previously unseen abnormalities exist. △ Less

Submitted 29 October, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

Journal ref: Nature Scientific Reports (2021)

arXiv:2008.04370 [pdf]

doi 10.1016/S2589-7500(20)30250-8

Predicting Risk of Develo** Diabetic Retinopathy using Deep Learning

Authors: Ashish Bora, Siva Balasubramanian, Boris Babenko, Sunny Virmani, Subhashini Venugopalan, Akinori Mitani, Guilherme de Oliveira Marinho, Jorge Cuadros, Paisan Ruamviboonsuk, Greg S Corrado, Lily Peng, Dale R Webster, Avinash V Varadarajan, Naama Hammel, Yun Liu, Pinal Bavishi

Abstract: Diabetic retinopathy (DR) screening is instrumental in preventing blindness, but faces a scaling challenge as the number of diabetic patients rises. Risk stratification for the development of DR may help optimize screening intervals to reduce costs while improving vision-related outcomes. We created and validated two versions of a deep learning system (DLS) to predict the development of mild-or-wo… ▽ More Diabetic retinopathy (DR) screening is instrumental in preventing blindness, but faces a scaling challenge as the number of diabetic patients rises. Risk stratification for the development of DR may help optimize screening intervals to reduce costs while improving vision-related outcomes. We created and validated two versions of a deep learning system (DLS) to predict the development of mild-or-worse ("Mild+") DR in diabetic patients undergoing DR screening. The two versions used either three-fields or a single field of color fundus photographs (CFPs) as input. The training set was derived from 575,431 eyes, of which 28,899 had known 2-year outcome, and the remaining were used to augment the training process via multi-task learning. Validation was performed on both an internal validation set (set A; 7,976 eyes; 3,678 with known outcome) and an external validation set (set B; 4,762 eyes; 2,345 with known outcome). For predicting 2-year development of DR, the 3-field DLS had an area under the receiver operating characteristic curve (AUC) of 0.79 (95%CI, 0.78-0.81) on validation set A. On validation set B (which contained only a single field), the 1-field DLS's AUC was 0.70 (95%CI, 0.67-0.74). The DLS was prognostic even after adjusting for available risk factors (p<0.001). When added to the risk factors, the 3-field DLS improved the AUC from 0.72 (95%CI, 0.68-0.76) to 0.81 (95%CI, 0.77-0.84) in validation set A, and the 1-field DLS improved the AUC from 0.62 (95%CI, 0.58-0.66) to 0.71 (95%CI, 0.68-0.75) in validation set B. The DLSs in this study identified prognostic information for DR development from CFPs. This information is independent of and more informative than the available risk factors. △ Less

Submitted 10 August, 2020; originally announced August 2020.

Journal ref: The Lancet Digital Health (2021)

arXiv:2008.01219 [pdf, other]

doi 10.1109/IGSC48788.2019.8957192

Hardware Accelerator for Adversarial Attacks on Deep Learning Neural Networks

Authors: Haoqiang Guo, Lu Peng, Jian Zhang, Fang Qi, Lide Duan

Abstract: Recent studies identify that Deep learning Neural Networks (DNNs) are vulnerable to subtle perturbations, which are not perceptible to human visual system but can fool the DNN models and lead to wrong outputs. A class of adversarial attack network algorithms has been proposed to generate robust physical perturbations under different circumstances. These algorithms are the first efforts to move for… ▽ More Recent studies identify that Deep learning Neural Networks (DNNs) are vulnerable to subtle perturbations, which are not perceptible to human visual system but can fool the DNN models and lead to wrong outputs. A class of adversarial attack network algorithms has been proposed to generate robust physical perturbations under different circumstances. These algorithms are the first efforts to move forward secure deep learning by providing an avenue to train future defense networks, however, the intrinsic complexity of them prevents their broader usage. In this paper, we propose the first hardware accelerator for adversarial attacks based on memristor crossbar arrays. Our design significantly improves the throughput of a visual adversarial perturbation system, which can further improve the robustness and security of future deep learning systems. Based on the algorithm uniqueness, we propose four implementations for the adversarial attack accelerator ($A^3$) to improve the throughput, energy efficiency, and computational efficiency. △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: IGSC'2019 (https://shirazi21.wixsite.com/igsc2019archive) Best paper award

MSC Class: 68-06 ACM Class: C.3

Journal ref: 2019 Tenth International Green and Sustainable Computing Conference (IGSC)

arXiv:2007.05500 [pdf, other]

Scientific Discovery by Generating Counterfactuals using Image Translation

Authors: Arunachalam Narayanaswamy, Subhashini Venugopalan, Dale R. Webster, Lily Peng, Greg Corrado, Paisan Ruamviboonsuk, Pinal Bavishi, Rory Sayres, Abigail Huang, Siva Balasubramanian, Michael Brenner, Philip Nelson, Avinash V. Varadarajan

Abstract: Model explanation techniques play a critical role in understanding the source of a model's performance and making its decisions transparent. Here we investigate if explanation techniques can also be used as a mechanism for scientific discovery. We make three contributions: first, we propose a framework to convert predictions from explanation techniques to a mechanism of discovery. Second, we show… ▽ More Model explanation techniques play a critical role in understanding the source of a model's performance and making its decisions transparent. Here we investigate if explanation techniques can also be used as a mechanism for scientific discovery. We make three contributions: first, we propose a framework to convert predictions from explanation techniques to a mechanism of discovery. Second, we show how generative models in combination with black-box predictors can be used to generate hypotheses (without human priors) that can be critically examined. Third, with these techniques we study classification models for retinal images predicting Diabetic Macular Edema (DME), where recent work showed that a CNN trained on these images is likely learning novel features in the image. We demonstrate that the proposed framework is able to explain the underlying scientific mechanism, thus bridging the gap between the model's performance and human understanding. △ Less

Submitted 19 July, 2020; v1 submitted 10 July, 2020; originally announced July 2020.

Comments: Accepted at MICCAI 2020. This version combines camera-ready and supplement

Journal ref: MICCAI 2020

arXiv:2006.11493 [pdf, other]

Real-time LCC-HVDC Maximum Emergency Power Capacity Estimation Based on Local PMU Measurements

Authors: Long Peng, Junbo Zhao, Yong Tang, Lamine Mili, Zhuoyuan Gu, Zongsheng Zheng

Abstract: The adjustable capacity of a line-commutated-converter High Voltage Direct Current (LCC-HVDC) connected to a power system, called the LCC-HVDC maximum emergency power capability or HVDC-MC for short, plays an important role in determining the response of that system to a large disturbance. However, it is a challenging task to obtain an accurate HVDC-MC due to system model uncertainties as well as… ▽ More The adjustable capacity of a line-commutated-converter High Voltage Direct Current (LCC-HVDC) connected to a power system, called the LCC-HVDC maximum emergency power capability or HVDC-MC for short, plays an important role in determining the response of that system to a large disturbance. However, it is a challenging task to obtain an accurate HVDC-MC due to system model uncertainties as well as to contingencies. To address this problem, this paper proposes to estimate the HVDC-MC using a Thevenin equivalent (TE) of the system seen at the HVDC terminal bus of connection with the power system, whose parameters are estimated by processing positive-sequences voltages and currents of local synchrophasor measurements. The impacts of TE potential changes on the impedance estimation under large disturbance have been extensively investigated and an adaptive screening process of current measurements is developed to reduce the error of TE impedance estimation. The uncertainties of phasor measurements have been further taken into account by resorting to the total least square estimation method. The limitations of the HVDC control characteristics, the voltage-dependent current order limit, the converter capacity, and the AC voltage on HVDC-MC estimation are also considered. The simulations show that the proposed method can accurately track the dynamics of the TE parameters and the real-time HVDC-MC after the large disturbances. △ Less

Submitted 20 June, 2020; originally announced June 2020.

Comments: 11 pages, 17 figures

arXiv:2006.11423 [pdf, other]

An Adaptive MMC Synchronous Stability Control Method Based on Local PMU measurements

Authors: Long Peng, Yong Tang, Lamine Mili, Yingbiao Li, Bing Zhao, Yijun Xu, Fan Cheng

Abstract: Reducing the current is a common method to ensure the synchronous stability of a modular multilevel converter (MMC) when there is a short-circuit fault at its AC side. However, the uncertainty of the fault location of the AC system leads to a significant difference in the maximum allowable stable operating current during the fault. This paper proposes an adaptive MMC fault-current control method u… ▽ More Reducing the current is a common method to ensure the synchronous stability of a modular multilevel converter (MMC) when there is a short-circuit fault at its AC side. However, the uncertainty of the fault location of the AC system leads to a significant difference in the maximum allowable stable operating current during the fault. This paper proposes an adaptive MMC fault-current control method using local phasor measurement unit (PMU) measurements. Based on the estimated Thevenin equivalent (TE) parameters of the system, the current can be directly calculated to ensure the maximum output power of the MMC during the fault. This control method does not rely on off-line simulation and adapts itself to various fault conditions. The effective measurements are firstly selected by the voltage threshold and parameter constraints, which allow us to handle the error due to the change on the system-side. The proposed TE estimation method can fast track the change of the system impedance without depending on the initial value and can deal with the TE potential changes after a large disturbance. The simulation shows that the TE estimation can accurately track the TE parameters after the fault, and the current control instruction during an MMC fault can ensure the maximum output power of the MMC. △ Less

Submitted 19 June, 2020; originally announced June 2020.

Comments: 8 pages, 15 figures

arXiv:2004.13761 [pdf]

A Method for Vehicle Collision Risk Assessment through Inferring Driver's Braking Actions in Near-Crash Situations

Authors: Liqun Peng, Miguel Angel Sotelo, Yi He, Yunfei Ai, Zhixiong Li

Abstract: Driving information and data under potential vehicle crashes create opportunities for extensive real-world observations of driver behaviors and relevant factors that significantly influence the driving safety in emergency scenarios. Furthermore, the availability of such data also enhances the collision avoidance systems (CASs) by evaluating driver's actions in near-crash scenarios and providing ti… ▽ More Driving information and data under potential vehicle crashes create opportunities for extensive real-world observations of driver behaviors and relevant factors that significantly influence the driving safety in emergency scenarios. Furthermore, the availability of such data also enhances the collision avoidance systems (CASs) by evaluating driver's actions in near-crash scenarios and providing timely warnings. These applications motivate the need for heuristic tools capable of inferring relationship among driving risk, driver/vehicle characteristics, and road environment. In this paper, we acquired amount of real-world driving data and built a comprehensive dataset, which contains multiple "driver-vehicle-road" attributes. The proposed method works in two steps. In the first step, a variable precision rough set (VPRS) based classification technique is applied to draw a reduced core subset from field driving dataset, which presents the essential attributes set most relevant to driving safety assessment. In the second step, we design a decision strategy by introducing mutual information entropy to quantify the significance of each attribute, then a representative index through accumulation of weighted "driver-vehicle-road" factors is calculated to reflect the driving risk for actual situation. The performance of the proposed method is demonstrated in an offline analysis of the driving data collected in field trials, where the aim is to infer the emergency braking actions in next short term. The results indicate that our proposed model is a good alternative for providing improved warnings in real-time because of its high prediction accuracy and stability. △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: 14 pages

Showing 1–50 of 52 results for author: Peng, L