Search | arXiv e-print repository

arXiv:2406.15716 [pdf, other]

Predicting fluorescent labels in label-free microscopy images with pix2pix and adaptive loss in Light My Cells challenge

Authors: Han Liu, Hao Li, Jiacheng Wang, Yubo Fan, Zhoubing Xu, Ipek Oguz

Abstract: Fluorescence labeling is the standard approach to reveal cellular structures and other subcellular constituents for microscopy images. However, this invasive procedure may perturb or even kill the cells and the procedure itself is highly time-consuming and complex. Recently, in silico labeling has emerged as a promising alternative, aiming to use machine learning models to directly predict the flu… ▽ More Fluorescence labeling is the standard approach to reveal cellular structures and other subcellular constituents for microscopy images. However, this invasive procedure may perturb or even kill the cells and the procedure itself is highly time-consuming and complex. Recently, in silico labeling has emerged as a promising alternative, aiming to use machine learning models to directly predict the fluorescently labeled images from label-free microscopy. In this paper, we propose a deep learning-based in silico labeling method for the Light My Cells challenge. Built upon pix2pix, our proposed method can be trained using the partially labeled datasets with an adaptive loss. Moreover, we explore the effectiveness of several training strategies to handle different input modalities, such as training them together or separately. The results show that our method achieves promising performance for in silico labeling. Our code is available at https://github.com/MedICL-VU/LightMyCells. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2405.20073 [pdf, other]

Power Allocation for Cell-Free Massive MIMO ISAC Systems with OTFS Signal

Authors: Yifei Fan, Shaochuan Wu, Xixi Bi, Guoyu Li

Abstract: Applying integrated sensing and communication (ISAC) to a cell-free massive multiple-input multiple-output (CF mMIMO) architecture has attracted increasing attention. This approach equips CF mMIMO networks with sensing capabilities and resolves the problem of unreliable service at cell edges in conventional cellular networks. However, existing studies on CF-ISAC systems have focused on the applica… ▽ More Applying integrated sensing and communication (ISAC) to a cell-free massive multiple-input multiple-output (CF mMIMO) architecture has attracted increasing attention. This approach equips CF mMIMO networks with sensing capabilities and resolves the problem of unreliable service at cell edges in conventional cellular networks. However, existing studies on CF-ISAC systems have focused on the application of traditional integrated signals. To address this limitation, this study explores the employment of the orthogonal time frequency space (OTFS) signal as a representative of innovative signals in the CF-ISAC system, and the system's overall performance is optimized and evaluated. A universal downlink spectral efficiency (SE) expression is derived regarding multi-antenna access points (APs) and optional sensing beams. To streamline the analysis and optimization of the CF-ISAC system with the OTFS signal, we introduce a lower bound on the achievable SE that is applicable to OTFS-signal-based systems. Based on this, a power allocation algorithm is proposed to maximize the minimum communication signal-to-interference-plus-noise ratio (SINR) of users while guaranteeing a specified sensing SINR value and meeting the per-AP power constraints. The results demonstrate the tightness of the proposed lower bound and the efficiency of the proposed algorithm. Finally, the superiority of using the OTFS signals is verified by a 13-fold expansion of the SE performance gap over the application of orthogonal frequency division multiplexing signals. These findings could guide the future deployment of the CF-ISAC systems, particularly in the field of millimeter waves with a large bandwidth. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: This work is submitted to IEEE for possible publication

arXiv:2405.16197 [pdf, other]

A 7K Parameter Model for Underwater Image Enhancement based on Transmission Map Prior

Authors: Fuheng Zhou, Dikai Wei, Ye Fan, Yulong Huang, Yonggang Zhang

Abstract: Although deep learning based models for underwater image enhancement have achieved good performance, they face limitations in both lightweight and effectiveness, which prevents their deployment and application on resource-constrained platforms. Moreover, most existing deep learning based models use data compression to get high-level semantic information in latent space instead of using the origina… ▽ More Although deep learning based models for underwater image enhancement have achieved good performance, they face limitations in both lightweight and effectiveness, which prevents their deployment and application on resource-constrained platforms. Moreover, most existing deep learning based models use data compression to get high-level semantic information in latent space instead of using the original information. Therefore, they require decoder blocks to generate the details of the output. This requires additional computational cost. In this paper, a lightweight network named lightweight selective attention network (LSNet) based on the top-k selective attention and transmission maps mechanism is proposed. The proposed model achieves a PSNR of 97\% with only 7K parameters compared to a similar attention-based model. Extensive experiments show that the proposed LSNet achieves excellent performance in state-of-the-art models with significantly fewer parameters and computational resources. The code is available at https://github.com/FuhengZhou/LSNet}{https://github.com/FuhengZhou/LSNet. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 10 pages

arXiv:2403.13909 [pdf, other]

Sequential Modeling of Complex Marine Navigation: Case Study on a Passenger Vessel (Student Abstract)

Authors: Yimeng Fan, Pedram Agand, Mo Chen, Edward J. Park, Allison Kennedy, Chanwoo Bae

Abstract: The maritime industry's continuous commitment to sustainability has led to a dedicated exploration of methods to reduce vessel fuel consumption. This paper undertakes this challenge through a machine learning approach, leveraging a real-world dataset spanning two years of a ferry in west coast Canada. Our focus centers on the creation of a time series forecasting model given the dynamic and static… ▽ More The maritime industry's continuous commitment to sustainability has led to a dedicated exploration of methods to reduce vessel fuel consumption. This paper undertakes this challenge through a machine learning approach, leveraging a real-world dataset spanning two years of a ferry in west coast Canada. Our focus centers on the creation of a time series forecasting model given the dynamic and static states, actions, and disturbances. This model is designed to predict dynamic states based on the actions provided, subsequently serving as an evaluative tool to assess the proficiency of the ferry's operation under the captain's guidance. Additionally, it lays the foundation for future optimization algorithms, providing valuable feedback on decision-making processes. To facilitate future studies, our code is available at \url{https://github.com/pagand/model_optimze_vessel/tree/AAAI} △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 5 pages, 3 figures, AAAI 2024 student abstract

arXiv:2401.10345 [pdf, other]

Attack and Defense Analysis of Learned Image Compression

Authors: Tianyu Zhu, Heming Sun, Xiankui Xiong, Xuanpeng Zhu, Yong Gong, Minge **g, Yibo Fan

Abstract: Learned image compression (LIC) is becoming more and more popular these years with its high efficiency and outstanding compression quality. Still, the practicality against modified inputs added with specific noise could not be ignored. White-box attacks such as FGSM and PGD use only gradient to compute adversarial images that mislead LIC models to output unexpected results. Our experiments compare… ▽ More Learned image compression (LIC) is becoming more and more popular these years with its high efficiency and outstanding compression quality. Still, the practicality against modified inputs added with specific noise could not be ignored. White-box attacks such as FGSM and PGD use only gradient to compute adversarial images that mislead LIC models to output unexpected results. Our experiments compare the effects of different dimensions such as attack methods, models, qualities, and targets, concluding that in the worst case, there is a 61.55% decrease in PSNR or a 19.15 times increase in bpp under the PGD attack. To improve their robustness, we conduct adversarial training by adding adversarial images into the training datasets, which obtains a 95.52% decrease in the R-D cost of the most vulnerable LIC model. We further test the robustness of H.266, whose better performance on reconstruction quality extends its possibility to defend one-step or iterative adversarial attacks. △ Less

Submitted 27 March, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

arXiv:2312.14239 [pdf, other]

PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar

Authors: Tzofi Klinghoffer, Xiaoyu Xiang, Siddharth Somasundaram, Yuchen Fan, Christian Richardt, Ramesh Raskar, Rakesh Ranjan

Abstract: 3D reconstruction from a single-view is challenging because of the ambiguity from monocular cues and lack of information about occluded regions. Neural radiance fields (NeRF), while popular for view synthesis and 3D reconstruction, are typically reliant on multi-view images. Existing methods for single-view 3D reconstruction with NeRF rely on either data priors to hallucinate views of occluded reg… ▽ More 3D reconstruction from a single-view is challenging because of the ambiguity from monocular cues and lack of information about occluded regions. Neural radiance fields (NeRF), while popular for view synthesis and 3D reconstruction, are typically reliant on multi-view images. Existing methods for single-view 3D reconstruction with NeRF rely on either data priors to hallucinate views of occluded regions, which may not be physically accurate, or shadows observed by RGB cameras, which are difficult to detect in ambient light and low albedo backgrounds. We propose using time-of-flight data captured by a single-photon avalanche diode to overcome these limitations. Our method models two-bounce optical paths with NeRF, using lidar transient data for supervision. By leveraging the advantages of both NeRF and two-bounce light measured by lidar, we demonstrate that we can reconstruct visible and occluded geometry without data priors or reliance on controlled ambient lighting or scene albedo. In addition, we demonstrate improved generalization under practical constraints on sensor spatial- and temporal-resolution. We believe our method is a promising direction as single-photon lidars become ubiquitous on consumer devices, such as phones, tablets, and headsets. △ Less

Submitted 5 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: CVPR 2024. Project Page: https://platonerf.github.io/

arXiv:2312.03640 [pdf, other]

Training Neural Networks on RAW and HDR Images for Restoration Tasks

Authors: Lei Luo, Alexandre Chapiro, Xiaoyu Xiang, Yuchen Fan, Rakesh Ranjan, Rafal Mantiuk

Abstract: The vast majority of standard image and video content available online is represented in display-encoded color spaces, in which pixel values are conveniently scaled to a limited range (0-1) and the color distribution is approximately perceptually uniform. In contrast, both camera RAW and high dynamic range (HDR) images are often represented in linear color spaces, in which color values are linearl… ▽ More The vast majority of standard image and video content available online is represented in display-encoded color spaces, in which pixel values are conveniently scaled to a limited range (0-1) and the color distribution is approximately perceptually uniform. In contrast, both camera RAW and high dynamic range (HDR) images are often represented in linear color spaces, in which color values are linearly related to colorimetric quantities of light. While training on commonly available display-encoded images is a well-established practice, there is no consensus on how neural networks should be trained for tasks on RAW and HDR images in linear color spaces. In this work, we test several approaches on three popular image restoration applications: denoising, deblurring, and single-image super-resolution. We examine whether HDR/RAW images need to be display-encoded using popular transfer functions (PQ, PU21, mu-law), or whether it is better to train in linear color spaces, but use loss functions that correct for perceptual non-uniformity. Our results indicate that neural networks train significantly better on HDR and RAW images represented in display-encoded color spaces, which offer better perceptual uniformity than linear spaces. This small change to the training strategy can bring a very substantial gain in performance, up to 10-15 dB. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2311.11325 [pdf, other]

MoVideo: Motion-Aware Video Generation with Diffusion Models

Authors: **gyun Liang, Yuchen Fan, Kai Zhang, Radu Timofte, Luc Van Gool, Rakesh Ranjan

Abstract: While recent years have witnessed great progress on using diffusion models for video generation, most of them are simple extensions of image generation frameworks, which fail to explicitly consider one of the key differences between videos and images, i.e., motion. In this paper, we propose a novel motion-aware video generation (MoVideo) framework that takes motion into consideration from two aspe… ▽ More While recent years have witnessed great progress on using diffusion models for video generation, most of them are simple extensions of image generation frameworks, which fail to explicitly consider one of the key differences between videos and images, i.e., motion. In this paper, we propose a novel motion-aware video generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow. The former regulates motion by per-frame object distances and spatial layouts, while the later describes motion by cross-frame correspondences that help in preserving fine details and improving temporal consistency. More specifically, given a key frame that exists or generated from text prompts, we first design a diffusion model with spatio-temporal modules to generate the video depth and the corresponding optical flows. Then, the video is generated in the latent space by another spatio-temporal diffusion model under the guidance of depth, optical flow-based warped latent video and the calculated occlusion mask. Lastly, we use optical flows again to align and refine different frames for better video decoding from the latent space to the pixel space. In experiments, MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, showing promising prompt consistency, frame consistency and visual quality. △ Less

Submitted 19 November, 2023; originally announced November 2023.

Comments: project homepage: https://**gyunliang.github.io/MoVideo

arXiv:2311.05477 [pdf, other]

Using ResNet to Utilize 4-class T2-FLAIR Slice Classification Based on the Cholinergic Pathways Hyperintensities Scale for Pathological Aging

Authors: Wei-Chun Kevin Tsai, Yi-Chien Liu, Ming-Chun Yu, Chia-Ju Chou, Sui-Hing Yan, Yang-Teng Fan, Yan-Hsiang Huang, Yen-Ling Chiu, Yi-Fang Chuang, Ran-Zan Wang, Yao-Chia Shih

Abstract: The Cholinergic Pathways Hyperintensities Scale (CHIPS) is a visual rating scale used to assess the extent of cholinergic white matter hyperintensities in T2-FLAIR images, serving as an indicator of dementia severity. However, the manual selection of four specific slices for rating throughout the entire brain is a time-consuming process. Our goal was to develop a deep learning-based model capable… ▽ More The Cholinergic Pathways Hyperintensities Scale (CHIPS) is a visual rating scale used to assess the extent of cholinergic white matter hyperintensities in T2-FLAIR images, serving as an indicator of dementia severity. However, the manual selection of four specific slices for rating throughout the entire brain is a time-consuming process. Our goal was to develop a deep learning-based model capable of automatically identifying the four slices relevant to CHIPS. To achieve this, we trained a 4-class slice classification model (BSCA) using the ADNI T2-FLAIR dataset (N=150) with the assistance of ResNet. Subsequently, we tested the model's performance on a local dataset (N=30). The results demonstrated the efficacy of our model, with an accuracy of 99.82% and an F1-score of 99.83%. This achievement highlights the potential impact of BSCA as an automatic screening tool, streamlining the selection of four specific T2-FLAIR slices that encompass white matter landmarks along the cholinergic pathways. Clinicians can leverage this tool to assess the risk of clinical dementia development efficiently. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: 8 pages, 2 figures, 2 tables

arXiv:2311.01702 [pdf]

Medical Image Segmentation with Domain Adaptation: A Survey

Authors: Yuemeng Li, Yong Fan

Abstract: Deep learning (DL) has shown remarkable success in various medical imaging data analysis applications. However, it remains challenging for DL models to achieve good generalization, especially when the training and testing datasets are collected at sites with different scanners, due to domain shift caused by differences in data distributions. Domain adaptation has emerged as an effective means to a… ▽ More Deep learning (DL) has shown remarkable success in various medical imaging data analysis applications. However, it remains challenging for DL models to achieve good generalization, especially when the training and testing datasets are collected at sites with different scanners, due to domain shift caused by differences in data distributions. Domain adaptation has emerged as an effective means to address this challenge by mitigating domain gaps in medical imaging applications. In this review, we specifically focus on domain adaptation approaches for DL-based medical image segmentation. We first present the motivation and background knowledge underlying domain adaptations, then provide a comprehensive review of domain adaptation applications in medical image segmentations, and finally discuss the challenges, limitations, and future research trends in the field to promote the methodology development of domain adaptation in the context of medical image segmentation. Our goal was to provide researchers with up-to-date references on the applications of domain adaptation in medical image segmentation studies. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: Survey

arXiv:2310.14515 [pdf]

First realization of macroscopic Fourier ptychography for hundred-meter distance sub-diffraction imaging

Authors: Qi Zhang, Yuran Lu, Yinghui Guo, Yingjie Shang, Mingbo Pu, Yulong Fan, Rui Zhou, Xiaoyin Li, Fei Zhang, Mingfeng Xu, Xiangang Luo

Abstract: Fourier ptychography (FP) imaging, drawing on the idea of synthetic aperture, has been demonstrated as a potential approach for remote sub-diffraction-limited imaging. Nevertheless, the farthest imaging distance is still limited around 10 m even though there has been a significant improvement in macroscopic FP. The most severely issue in increasing the imaging distance is FoV limitation caused by… ▽ More Fourier ptychography (FP) imaging, drawing on the idea of synthetic aperture, has been demonstrated as a potential approach for remote sub-diffraction-limited imaging. Nevertheless, the farthest imaging distance is still limited around 10 m even though there has been a significant improvement in macroscopic FP. The most severely issue in increasing the imaging distance is FoV limitation caused by far-field condition for diffraction. Here, we propose to modify the Fourier far-field condition for rough reflective objects, aiming to overcome the small FoV limitation by using a divergent beam to illuminate objects. A joint optimization of pupil function and target image is utilized to attain the aberration-free image while estimating the pupil function simultaneously. Benefiting from the optimized reconstruction algorithm which effectively expands the camera's effective aperture, we experimentally implement several FP systems suited for imaging distance of 12 m, 90 m, and 170 m with the maximum synthetic aperture of 200 mm. The maximum imaging distance and synthetic aperture are thus improved by more than one order of magnitude of the state-of-the-art works with a fourfold improvement in the resolution. Our findings demonstrate significant potential for advancing the field of macroscopic FP, propelling it into a new stage of development. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2309.08323 [pdf]

MLP Based Continuous Gait Recognition of a Powered Ankle Prosthesis with Serial Elastic Actuator

Authors: Yanze Li, Feixing Chen, **gqi Cao, Ruoqi Zhao, Xuan Yang, Xingbang Yang, Yubo Fan

Abstract: Powered ankle prostheses effectively assist people with lower limb amputation to perform daily activities. High performance prostheses with adjustable compliance and capability to predict and implement amputee's intent are crucial for them to be comparable to or better than a real limb. However, current designs fail to provide simple yet effective compliance of the joint with full potential of mod… ▽ More Powered ankle prostheses effectively assist people with lower limb amputation to perform daily activities. High performance prostheses with adjustable compliance and capability to predict and implement amputee's intent are crucial for them to be comparable to or better than a real limb. However, current designs fail to provide simple yet effective compliance of the joint with full potential of modification, and lack accurate gait prediction method in real time. This paper proposes an innovative design of powered ankle prosthesis with serial elastic actuator (SEA), and puts forward a MLP based gait recognition method that can accurately and continuously predict more gait parameters for motion sensing and control. The prosthesis mimics biological joint with similar weight, torque, and power which can assist walking of up to 4 m/s. A new design of planar torsional spring is proposed for the SEA, which has better stiffness, endurance, and potential of modification than current designs. The gait recognition system simultaneously generates locomotive speed, gait phase, ankle angle and angular velocity only utilizing signals of single IMU, holding advantage in continuity, adaptability for speed range, accuracy, and capability of multi-functions. △ Less

Submitted 30 March, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: Submitted to IROS 2024

arXiv:2309.04154 [pdf, other]

A novel model for layer jamming-based continuum robots

Authors: Bowen Yi, Yeman Fan, Dikai Liu

Abstract: Continuum robots with variable stiffness have gained wide popularity in the last decade. Layer jamming (LJ) has emerged as a simple and efficient technique to achieve tunable stiffness for continuum robots. Despite its merits, the development of a control-oriented dynamical model tailored for this specific class of robots remains an open problem in the literature. This paper aims to present the fi… ▽ More Continuum robots with variable stiffness have gained wide popularity in the last decade. Layer jamming (LJ) has emerged as a simple and efficient technique to achieve tunable stiffness for continuum robots. Despite its merits, the development of a control-oriented dynamical model tailored for this specific class of robots remains an open problem in the literature. This paper aims to present the first solution, to the best of our knowledge, to close the gap. We propose an energy-based model that is integrated with the LuGre frictional model for LJ-based continuum robots. Then, we take a comprehensive theoretical analysis for this model, focusing on two fundamental characteristics of LJ-based continuum robots: shape locking and adjustable stiffness. To validate the modeling approach and theoretical results, a series of experiments using our \textit{OctRobot-I} continuum robotic platform was conducted. The results show that the proposed model is capable of interpreting and predicting the dynamical behaviors in LJ-based continuum robots. △ Less

Submitted 11 September, 2023; v1 submitted 8 September, 2023; originally announced September 2023.

arXiv:2308.16551 [pdf]

Object Detection for Caries or Pit and Fissure Sealing Requirement in Children's First Permanent Molars

Authors: Chenyao Jiang, Shiyao Zhai, Hengrui Song, Yuqing Ma, Yachen Fan, Yancheng Fang, Dongmei Yu, Canyang Zhang, Sanyang Han, Runming Wang, Yong Liu, Jianbo Li, Peiwu Qin

Abstract: Dental caries is one of the most common oral diseases that, if left untreated, can lead to a variety of oral problems. It mainly occurs inside the pits and fissures on the occlusal/buccal/palatal surfaces of molars and children are a high-risk group for pit and fissure caries in permanent molars. Pit and fissure sealing is one of the most effective methods that is widely used in prevention of pit… ▽ More Dental caries is one of the most common oral diseases that, if left untreated, can lead to a variety of oral problems. It mainly occurs inside the pits and fissures on the occlusal/buccal/palatal surfaces of molars and children are a high-risk group for pit and fissure caries in permanent molars. Pit and fissure sealing is one of the most effective methods that is widely used in prevention of pit and fissure caries. However, current detection of pits and fissures or caries depends primarily on the experienced dentists, which ordinary parents do not have, and children may miss the remedial treatment without timely detection. To address this issue, we present a method to autodetect caries and pit and fissure sealing requirements using oral photos taken by smartphones. We use the YOLOv5 and YOLOX models and adopt a tiling strategy to reduce information loss during image pre-processing. The best result for YOLOXs model with tiling strategy is 72.3 mAP.5, while the best result without tiling strategy is 71.2. YOLOv5s6 model with/without tiling attains 70.9/67.9 mAP.5, respectively. We deploy the pre-trained network to mobile devices as a WeChat applet, allowing in-home detection by parents or children guardian. △ Less

Submitted 31 August, 2023; originally announced August 2023.

arXiv:2308.12440 [pdf]

HNAS-reg: hierarchical neural architecture search for deformable medical image registration

Authors: Jiong Wu, Yong Fan

Abstract: Convolutional neural networks (CNNs) have been widely used to build deep learning models for medical image registration, but manually designed network architectures are not necessarily optimal. This paper presents a hierarchical NAS framework (HNAS-Reg), consisting of both convolutional operation search and network topology search, to identify the optimal network architecture for deformable medica… ▽ More Convolutional neural networks (CNNs) have been widely used to build deep learning models for medical image registration, but manually designed network architectures are not necessarily optimal. This paper presents a hierarchical NAS framework (HNAS-Reg), consisting of both convolutional operation search and network topology search, to identify the optimal network architecture for deformable medical image registration. To mitigate the computational overhead and memory constraints, a partial channel strategy is utilized without losing optimization quality. Experiments on three datasets, consisting of 636 T1-weighted magnetic resonance images (MRIs), have demonstrated that the proposal method can build a deep learning model with improved image registration accuracy and reduced model size, compared with state-of-the-art image registration approaches, including one representative traditional approach and two unsupervised learning-based approaches. △ Less

Submitted 23 August, 2023; originally announced August 2023.

arXiv:2307.05249 [pdf, other]

DRMC: A Generalist Model with Dynamic Routing for Multi-Center PET Image Synthesis

Authors: Zhiwen Yang, Yang Zhou, Hui Zhang, Bingzheng Wei, Yubo Fan, Yan Xu

Abstract: Multi-center positron emission tomography (PET) image synthesis aims at recovering low-dose PET images from multiple different centers. The generalizability of existing methods can still be suboptimal for a multi-center study due to domain shifts, which result from non-identical data distribution among centers with different imaging systems/protocols. While some approaches address domain shifts by… ▽ More Multi-center positron emission tomography (PET) image synthesis aims at recovering low-dose PET images from multiple different centers. The generalizability of existing methods can still be suboptimal for a multi-center study due to domain shifts, which result from non-identical data distribution among centers with different imaging systems/protocols. While some approaches address domain shifts by training specialized models for each center, they are parameter inefficient and do not well exploit the shared knowledge across centers. To address this, we develop a generalist model that shares architecture and parameters across centers to utilize the shared knowledge. However, the generalist model can suffer from the center interference issue, \textit{i.e.} the gradient directions of different centers can be inconsistent or even opposite owing to the non-identical data distribution. To mitigate such interference, we introduce a novel dynamic routing strategy with cross-layer connections that routes data from different centers to different experts. Experiments show that our generalist model with dynamic routing (DRMC) exhibits excellent generalizability across centers. Code and data are available at: https://github.com/Yaziwel/Multi-Center-PET-Image-Synthesis. △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: This article has been early accepted by MICCAI 2023,but has not been fully edited. Content may change prior to final publication

arXiv:2306.13101 [pdf, other]

BrainNet: Epileptic Wave Detection from SEEG with Hierarchical Graph Diffusion Learning

Authors: Junru Chen, Yang Yang, Tao Yu, Yingying Fan, Xiaolong Mo, Carl Yang

Abstract: Epilepsy is one of the most serious neurological diseases, affecting 1-2% of the world's population. The diagnosis of epilepsy depends heavily on the recognition of epileptic waves, i.e., disordered electrical brainwave activity in the patient's brain. Existing works have begun to employ machine learning models to detect epileptic waves via cortical electroencephalogram (EEG). However, the recentl… ▽ More Epilepsy is one of the most serious neurological diseases, affecting 1-2% of the world's population. The diagnosis of epilepsy depends heavily on the recognition of epileptic waves, i.e., disordered electrical brainwave activity in the patient's brain. Existing works have begun to employ machine learning models to detect epileptic waves via cortical electroencephalogram (EEG). However, the recently developed stereoelectrocorticography (SEEG) method provides information in stereo that is more precise than conventional EEG, and has been broadly applied in clinical practice. Therefore, we propose the first data-driven study to detect epileptic waves in a real-world SEEG dataset. While offering new opportunities, SEEG also poses several challenges. In clinical practice, epileptic wave activities are considered to propagate between different regions in the brain. These propagation paths, also known as the epileptogenic network, are deemed to be a key factor in the context of epilepsy surgery. However, the question of how to extract an exact epileptogenic network for each patient remains an open problem in the field of neuroscience. To address these challenges, we propose a novel model (BrainNet) that jointly learns the dynamic diffusion graphs and models the brain wave diffusion patterns. In addition, our model effectively aids in resisting label imbalance and severe noise by employing several self-supervised learning tasks and a hierarchical framework. By experimenting with the extensive real SEEG dataset obtained from multiple patients, we find that BrainNet outperforms several latest state-of-the-art baselines derived from time-series analysis. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.03865 [pdf, other]

Simultaneous Position-and-Stiffness Control of Underactuated Antagonistic Tendon-Driven Continuum Robots

Authors: Bowen Yi, Yeman Fan, Dikai Liu, Jose Guadalupe Romero

Abstract: Continuum robots have gained widespread popularity due to their inherent compliance and flexibility, particularly their adjustable levels of stiffness for various application scenarios. Despite efforts to dynamic modeling and control synthesis over the past decade, few studies have incorporated stiffness regulation into their feedback control design; however, this is one of the initial motivations… ▽ More Continuum robots have gained widespread popularity due to their inherent compliance and flexibility, particularly their adjustable levels of stiffness for various application scenarios. Despite efforts to dynamic modeling and control synthesis over the past decade, few studies have incorporated stiffness regulation into their feedback control design; however, this is one of the initial motivations to develop continuum robots. This paper addresses the crucial challenge of controlling both the position and stiffness of underactuated continuum robots actuated by antagonistic tendons. We begin by presenting a rigid-link dynamical model that can analyze the open-loop stiffening of tendon-driven continuum robots. Based on this model, we propose a novel passivity-based position-and-stiffness controller that adheres to the non-negative tension constraint. Comprehensive experiments on our continuum robot validate the theoretical results and demonstrate the efficacy and precision of this approach. △ Less

Submitted 13 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

arXiv:2306.02691 [pdf, other]

doi 10.1109/TMI.2023.3275609

Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

Authors: Yang Zhou, Yongjian Wu, Zihua Wang, Bingzheng Wei, Maode Lai, Jianzhong Shou, Yubo Fan, Yan Xu

Abstract: Nuclei instance segmentation on histopathology images is of great clinical value for disease analysis. Generally, fully-supervised algorithms for this task require pixel-wise manual annotations, which is especially time-consuming and laborious for the high nuclei density. To alleviate the annotation burden, we seek to solve the problem through image-level weakly supervised learning, which is under… ▽ More Nuclei instance segmentation on histopathology images is of great clinical value for disease analysis. Generally, fully-supervised algorithms for this task require pixel-wise manual annotations, which is especially time-consuming and laborious for the high nuclei density. To alleviate the annotation burden, we seek to solve the problem through image-level weakly supervised learning, which is underexplored for nuclei instance segmentation. Compared with most existing methods using other weak annotations (scribble, point, etc.) for nuclei instance segmentation, our method is more labor-saving. The obstacle to using image-level annotations in nuclei instance segmentation is the lack of adequate location information, leading to severe nuclei omission or overlaps. In this paper, we propose a novel image-level weakly supervised method, called cyclic learning, to solve this problem. Cyclic learning comprises a front-end classification task and a back-end semi-supervised instance segmentation task to benefit from multi-task learning (MTL). We utilize a deep learning classifier with interpretability as the front-end to convert image-level labels to sets of high-confidence pseudo masks and establish a semi-supervised architecture as the back-end to conduct nuclei instance segmentation under the supervision of these pseudo masks. Most importantly, cyclic learning is designed to circularly share knowledge between the front-end classifier and the back-end semi-supervised part, which allows the whole system to fully extract the underlying information from image-level labels and converge to a better optimum. Experiments on three datasets demonstrate the good generality of our method, which outperforms other image-level weakly supervised methods for nuclei instance segmentation, and achieves comparable performance to fully-supervised methods. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI https://doi.org/10.1109/TMI.2023.3275609, IEEE Transactions on Medical Imaging. Code: https://github.com/wuyongjianCODE/Cyclic

arXiv:2306.02132 [pdf, ps, other]

Formation Control with Unknown Directions and General Coupling Coefficients

Authors: Zhen Li, Yang Tang, Yongqing Fan, Tingwen Huang

Abstract: Generally, the normal displacement-based formation control has a sensing mode that requires the agent not only to have certain knowledge of its direction, but also to gather its local information characterized by nonnegative coupling coefficients. However, the direction may be unknown in the sensing processes, and the coupling coefficients may also involve negative ones due to some circumstances.… ▽ More Generally, the normal displacement-based formation control has a sensing mode that requires the agent not only to have certain knowledge of its direction, but also to gather its local information characterized by nonnegative coupling coefficients. However, the direction may be unknown in the sensing processes, and the coupling coefficients may also involve negative ones due to some circumstances. This paper introduces these phenomena into a class of displacement-based formation control problem. Then, a geometric approach have been employed to overcome the difficulty of analysis on the introduced phenomena. The purpose of this approach is to construct some convex polytopes for containing the effects caused by the unknown direction, and to analyze the non-convexity by admitting the negative coupling coefficients in a certain range. Under the actions of these phenomena, the constructed polytopes are shown to be invariant in view of the contractive set method. It means that the convergence of formation shape can be guaranteed. Subsequently, an example is given to examine the applicability of derived result. △ Less

Submitted 3 June, 2023; originally announced June 2023.

arXiv:2304.04428 [pdf, other]

SPHR-SAR-Net: Superpixel High-resolution SAR Imaging Network Based on Nonlocal Total Variation

Authors: Guoru Zhou, Zhongqiu Xu, Yizhe Fan, Zhe Zhang, Xiaolan Qiu, Bingchen Zhang, Kun Fu, Yirong Wu

Abstract: High-resolution is a key trend in the development of synthetic aperture radar (SAR), which enables the capture of fine details and accurate representation of backscattering properties. However, traditional high-resolution SAR imaging algorithms face several challenges. Firstly, these algorithms tend to focus on local information, neglecting non-local information between different pixel patches. Se… ▽ More High-resolution is a key trend in the development of synthetic aperture radar (SAR), which enables the capture of fine details and accurate representation of backscattering properties. However, traditional high-resolution SAR imaging algorithms face several challenges. Firstly, these algorithms tend to focus on local information, neglecting non-local information between different pixel patches. Secondly, speckle is more pronounced and difficult to filter out in high-resolution SAR images. Thirdly, the process of high-resolution SAR imaging generally involves high time and computational complexity, making real-time imaging difficult to achieve. To address these issues, we propose a Superpixel High-Resolution SAR Imaging Network (SPHR-SAR-Net) for rapid despeckling in high-resolution SAR mode. Based on the concept of superpixel techniques, we initially combine non-convex and non-local total variation as compound regularization. This approach more effectively despeckles and manages the relationship between pixels while reducing bias effects caused by convex constraints. Subsequently, we solve the compound regularization model using the Alternating Direction Method of Multipliers (ADMM) algorithm and unfold it into a Deep Unfolded Network (DUN). The network's parameters are adaptively learned in a data-driven manner, and the learned network significantly increases imaging speed. Additionally, the Deep Unfolded Network is compatible with high-resolution imaging modes such as spotlight, staring spotlight, and sliding spotlight. In this paper, we demonstrate the superiority of SPHR-SAR-Net through experiments in both simulated and real SAR scenarios. The results indicate that SPHR-SAR-Net can rapidly perform high-resolution SAR imaging from raw echo data, producing accurate imaging results. △ Less

Submitted 10 April, 2023; originally announced April 2023.

arXiv:2304.03076 [pdf, other]

Fast QTMT Partition for VVC Intra Coding Using U-Net Framework

Authors: Zhao Zan, Leilei Huang, ShuShi Chen, Xiantao Zhang, Zhenghui Zhao, Haibing Yin, Yibo Fan

Abstract: Versatile Video Coding (VVC) has significantly increased encoding efficiency at the expense of numerous complex coding tools, particularly the flexible Quad-Tree plus Multi-type Tree (QTMT) block partition. This paper proposes a deep learning-based algorithm applied in fast QTMT partition for VVC intra coding. Our solution greatly reduces encoding time by early termination of less-likely intra pre… ▽ More Versatile Video Coding (VVC) has significantly increased encoding efficiency at the expense of numerous complex coding tools, particularly the flexible Quad-Tree plus Multi-type Tree (QTMT) block partition. This paper proposes a deep learning-based algorithm applied in fast QTMT partition for VVC intra coding. Our solution greatly reduces encoding time by early termination of less-likely intra prediction and partitions with negligible BD-BR increase. Firstly, a redesigned U-Net is recommended as the network's fundamental framework. Next, we design a Quality Parameter (QP) fusion network to regulate the effect of QPs on the partition results. Finally, we adopt a refined post-processing strategy to better balance encoding performance and complexity. Experimental results demonstrate that our solution outperforms the state-of-the-art works with a complexity reduction of 44.74% to 68.76% and a BD-BR increase of 0.60% to 2.33%. △ Less

Submitted 6 April, 2023; originally announced April 2023.

arXiv:2304.00658 [pdf, other]

Improving Meeting Inclusiveness using Speech Interruption Analysis

Authors: Szu-Wei Fu, Yaran Fan, Yasaman Hosseinkashi, Jayant Gupchup, Ross Cutler

Abstract: Meetings are a pervasive method of communication within all types of companies and organizations, and using remote collaboration systems to conduct meetings has increased dramatically since the COVID-19 pandemic. However, not all meetings are inclusive, especially in terms of the participation rates among attendees. In a recent large-scale survey conducted at Microsoft, the top suggestion given by… ▽ More Meetings are a pervasive method of communication within all types of companies and organizations, and using remote collaboration systems to conduct meetings has increased dramatically since the COVID-19 pandemic. However, not all meetings are inclusive, especially in terms of the participation rates among attendees. In a recent large-scale survey conducted at Microsoft, the top suggestion given by meeting participants for improving inclusiveness is to improve the ability of remote participants to interrupt and acquire the floor during meetings. We show that the use of the virtual raise hand (VRH) feature can lead to an increase in predicted meeting inclusiveness at Microsoft. One challenge is that VRH is used in less than 1% of all meetings. In order to drive adoption of its usage to improve inclusiveness (and participation), we present a machine learning-based system that predicts when a meeting participant attempts to obtain the floor, but fails to interrupt (termed a `failed interruption'). This prediction can be used to nudge the user to raise their virtual hand within the meeting. We believe this is the first failed speech interruption detector, and the performance on a realistic test set has an area under curve (AUC) of 0.95 with a true positive rate (TPR) of 50% at a false positive rate (FPR) of <1%. To our knowledge, this is also the first dataset of interruption categories (including the failed interruption category) for remote meetings. Finally, we believe this is the first such system designed to improve meeting inclusiveness through speech interruption analysis and active intervention. △ Less

Submitted 4 April, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

arXiv:2303.12270 [pdf, other]

EBSR: Enhanced Binary Neural Network for Image Super-Resolution

Authors: Renjie Wei, Shuwen Zhang, Zechun Liu, Meng Li, Yuchen Fan, Runsheng Wang, Ru Huang

Abstract: While the performance of deep convolutional neural networks for image super-resolution (SR) has improved significantly, the rapid increase of memory and computation requirements hinders their deployment on resource-constrained devices. Quantized networks, especially binary neural networks (BNN) for SR have been proposed to significantly improve the model inference efficiency but suffer from large… ▽ More While the performance of deep convolutional neural networks for image super-resolution (SR) has improved significantly, the rapid increase of memory and computation requirements hinders their deployment on resource-constrained devices. Quantized networks, especially binary neural networks (BNN) for SR have been proposed to significantly improve the model inference efficiency but suffer from large performance degradation. We observe the activation distribution of SR networks demonstrates very large pixel-to-pixel, channel-to-channel, and image-to-image variation, which is important for high performance SR but gets lost during binarization. To address the problem, we propose two effective methods, including the spatial re-scaling as well as channel-wise shifting and re-scaling, which augments binary convolutions by retaining more spatial and channel-wise information. Our proposed models, dubbed EBSR, demonstrate superior performance over prior art methods both quantitatively and qualitatively across different datasets and different model sizes. Specifically, for x4 SR on Set5 and Urban100, EBSRlight improves the PSNR by 0.31 dB and 0.28 dB compared to SRResNet-E2FIF, respectively, while EBSR outperforms EDSR-E2FIF by 0.29 dB and 0.32 dB PSNR, respectively. △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.02922 [pdf, other]

SurfNN: Joint Reconstruction of Multiple Cortical Surfaces from Magnetic Resonance Images

Authors: Hao Zheng, Hongming Li, Yong Fan

Abstract: To achieve fast, robust, and accurate reconstruction of the human cortical surfaces from 3D magnetic resonance images (MRIs), we develop a novel deep learning-based framework, referred to as SurfNN, to reconstruct simultaneously both inner (between white matter and gray matter) and outer (pial) surfaces from MRIs. Different from existing deep learning-based cortical surface reconstruction methods… ▽ More To achieve fast, robust, and accurate reconstruction of the human cortical surfaces from 3D magnetic resonance images (MRIs), we develop a novel deep learning-based framework, referred to as SurfNN, to reconstruct simultaneously both inner (between white matter and gray matter) and outer (pial) surfaces from MRIs. Different from existing deep learning-based cortical surface reconstruction methods that either reconstruct the cortical surfaces separately or neglect the interdependence between the inner and outer surfaces, SurfNN reconstructs both the inner and outer cortical surfaces jointly by training a single network to predict a midthickness surface that lies at the center of the inner and outer cortical surfaces. The input of SurfNN consists of a 3D MRI and an initialization of the midthickness surface that is represented both implicitly as a 3D distance map and explicitly as a triangular mesh with spherical topology, and its output includes both the inner and outer cortical surfaces, as well as the midthickness surface. The method has been evaluated on a large-scale MRI dataset and demonstrated competitive cortical surface reconstruction performance. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: ISBI 2023

arXiv:2302.06167 [pdf]

An Error-Surface-Based Fractional Motion Estimation Algorithm and Hardware Implementation for VVC

Authors: Shushi Chen, Leilei Huang, Jiahao Liu, Chao Liu, Yibo Fan

Abstract: Versatile Video Coding (VVC) introduces more coding tools to improve compression efficiency compared to its predecessor High Efficiency Video Coding (HEVC). For inter-frame coding, Fractional Motion Estimation (FME) still has a high computational effort, which limits the real-time processing capability of the video encoder. In this context, this paper proposes an error-surface-based FME algorithm… ▽ More Versatile Video Coding (VVC) introduces more coding tools to improve compression efficiency compared to its predecessor High Efficiency Video Coding (HEVC). For inter-frame coding, Fractional Motion Estimation (FME) still has a high computational effort, which limits the real-time processing capability of the video encoder. In this context, this paper proposes an error-surface-based FME algorithm and the corresponding hardware implementation. The algorithm creates an error surface constructed by the Rate-Distortion (R-D) cost of the integer motion vector (IMV) and its neighbors. This method requires no iteration and interpolation, thus reducing the area and power consumption and increasing the throughput of the hardware. The experimental results show that the corresponding BDBR loss is only 0.47% compared to VTM 16.0 in LD-P configuration. The hardware implementation was synthesized using GF 28nm process. It can support 13 different sizes of CU varying from 128x128 to 8x8. The measured throughput can reach 4K@30fps at 400MHz, with a gate count of 192k and power consumption of 12.64 mW. And the throughput can reach 8K@30fps at 631MHz when only quadtree is searched. To the best of our knowledge, this work is the first hardware architecture for VVC FME with interpolation-free strategies △ Less

Submitted 13 February, 2023; originally announced February 2023.

arXiv:2302.04948 [pdf]

NR Conformance Testing of Analog Radio-over-LWIR FSO Fronthaul link for 6G Distributed MIMO Networks

Authors: Rafael Puerta, Mengyao Han, Mahdieh Joharifar, Richard Schatz, Yan-Ting Sun, Yuchuan Fan, Anders Djupsjöbacka, Grégory Maisons, Johan Abautret, Roland Teissier, Lu Zhang, Sandis Spolitis, Muguang Wang, Vjaceslavs Bobrovs, Sebastian Lourdudoss, Xianbin Yu, Sergei Popov, Oskars Ozolins, Xiaodan Pang

Abstract: We experimentally test the compliance with 5G/NR 3GPP technical specifications of an analog radio-over-FSO link at 9 μm. The ACLR and EVM transmitter requirements are fulfilled validating the suitability of LWIR FSO for 6G fronthaul. We experimentally test the compliance with 5G/NR 3GPP technical specifications of an analog radio-over-FSO link at 9 μm. The ACLR and EVM transmitter requirements are fulfilled validating the suitability of LWIR FSO for 6G fronthaul. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Comments: Accepted in Optical Fiber Communication Conference (OFC) 2023, 3 pages, 2 figures

arXiv:2209.01257 [pdf, other]

doi 10.1109/TSIPN.2023.3302658

Decentralized Eigendecomposition for Online Learning over Graphs with Applications

Authors: Yufan Fan, Minh Trinh-Hoang, Cemil Emre Ardic, Marius Pesavento

Abstract: In this paper, the problem of decentralized eigenvalue decomposition of a general symmetric matrix that is important, e.g., in Principal Component Analysis, is studied, and a decentralized online learning algorithm is proposed. Instead of collecting all information in a fusion center, the proposed algorithm involves only local interactions among adjacent agents. It benefits from the representation… ▽ More In this paper, the problem of decentralized eigenvalue decomposition of a general symmetric matrix that is important, e.g., in Principal Component Analysis, is studied, and a decentralized online learning algorithm is proposed. Instead of collecting all information in a fusion center, the proposed algorithm involves only local interactions among adjacent agents. It benefits from the representation of the matrix as a sum of rank-one components which makes the algorithm attractive for online eigenvalue and eigenvector tracking applications. We examine the performance of the proposed algorithm in two types of important application examples: First, we consider the online eigendecomposition of a sample covariance matrix over the network, with application in decentralized Direction-of-Arrival (DoA) estimation and DoA tracking applications. Then, we investigate the online computation of the spectra of the graph Laplacian that is important in, e.g., Graph Fourier Analysis and graph dependent filter design. We apply our proposed algorithm to track the spectra of the graph Laplacian in static and dynamic networks. Simulation results reveal that the proposed algorithm outperforms existing decentralized algorithms both in terms of estimation accuracy as well as communication cost. △ Less

Submitted 11 August, 2023; v1 submitted 2 September, 2022; originally announced September 2022.

arXiv:2207.02399 [pdf]

doi 10.1002/mrm.29833

Learning Apparent Diffusion Coefficient Maps from Accelerated Radial k-Space Diffusion-Weighted MRI in Mice using a Deep CNN-Transformer Model

Authors: Yuemeng Li, Miguel Romanello Joaquim, Stephen Pickup, Hee Kwon Song, Rong Zhou, Yong Fan

Abstract: Purpose: To accelerate radially sampled diffusion weighted spin-echo (Rad-DW-SE) acquisition method for generating high quality apparent diffusion coefficient (ADC) maps. Methods: A deep learning method was developed to generate accurate ADC maps from accelerated DWI data acquired with the Rad-DW-SE method. The deep learning method integrates convolutional neural networks (CNNs) with vision transf… ▽ More Purpose: To accelerate radially sampled diffusion weighted spin-echo (Rad-DW-SE) acquisition method for generating high quality apparent diffusion coefficient (ADC) maps. Methods: A deep learning method was developed to generate accurate ADC maps from accelerated DWI data acquired with the Rad-DW-SE method. The deep learning method integrates convolutional neural networks (CNNs) with vision transformers to generate high quality ADC maps from accelerated DWI data, regularized by a monoexponential ADC model fitting term. A model was trained on DWI data of 147 mice and evaluated on DWI data of 36 mice, with acceleration factors of 4x and 8x compared to the original acquisition parameters. We have made our code publicly available at GitHub: https://github.com/ymli39/DeepADC-Net-Learning-Apparent-Diffusion-Coefficient-Maps, and our dataset can be downloaded at https://pennpancreaticcancerimagingresource.github.io/data.html. Results: Ablation studies and experimental results have demonstrated that the proposed deep learning model generates higher quality ADC maps from accelerated DWI data than alternative deep learning methods under comparison when their performance is quantified in whole images as well as in regions of interest, including tumors, kidneys, and muscles. Conclusions: The deep learning method with integrated CNNs and transformers provides an effective means to accurately compute ADC maps from accelerated DWI data acquired with the Rad-DW-SE method. △ Less

Submitted 1 August, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

Comments: Accepted by Magnetic Resonance in Medicine

Journal ref: Magn Reson Med 2023

arXiv:2206.10385 [pdf, other]

Approximate Equivariance SO(3) Needlet Convolution

Authors: Kai Yi, Jialin Chen, Yu Guang Wang, Bingxin Zhou, Pietro Liò, Yanan Fan, Jan Hamann

Abstract: This paper develops a rotation-invariant needlet convolution for rotation group SO(3) to distill multiscale information of spherical signals. The spherical needlet transform is generalized from $\mathbb{S}^2$ onto the SO(3) group, which decomposes a spherical signal to approximate and detailed spectral coefficients by a set of tight framelet operators. The spherical signal during the decomposition… ▽ More This paper develops a rotation-invariant needlet convolution for rotation group SO(3) to distill multiscale information of spherical signals. The spherical needlet transform is generalized from $\mathbb{S}^2$ onto the SO(3) group, which decomposes a spherical signal to approximate and detailed spectral coefficients by a set of tight framelet operators. The spherical signal during the decomposition and reconstruction achieves rotation invariance. Based on needlet transforms, we form a Needlet approximate Equivariance Spherical CNN (NES) with multiple SO(3) needlet convolutional layers. The network establishes a powerful tool to extract geometric-invariant features of spherical signals. The model allows sufficient network scalability with multi-resolution representation. A robust signal embedding is learned with wavelet shrinkage activation function, which filters out redundant high-pass representation while maintaining approximate rotation invariance. The NES achieves state-of-the-art performance for quantum chemistry regression and Cosmic Microwave Background (CMB) delensing reconstruction, which shows great potential for solving scientific challenges with high-resolution and multi-scale spherical signal representation. △ Less

Submitted 17 June, 2022; originally announced June 2022.

arXiv:2206.05054 [pdf, other]

A No-reference Quality Assessment Metric for Point Cloud Based on Captured Video Sequences

Authors: Yu Fan, Zicheng Zhang, Wei Sun, Xiongkuo Min, Wei Lu, Tao Wang, Ning Liu, Guangtao Zhai

Abstract: Point cloud is one of the most widely used digital formats of 3D models, the visual quality of which is quite sensitive to distortions such as downsampling, noise, and compression. To tackle the challenge of point cloud quality assessment (PCQA) in scenarios where reference is not available, we propose a no-reference quality assessment metric for colored point cloud based on captured video sequenc… ▽ More Point cloud is one of the most widely used digital formats of 3D models, the visual quality of which is quite sensitive to distortions such as downsampling, noise, and compression. To tackle the challenge of point cloud quality assessment (PCQA) in scenarios where reference is not available, we propose a no-reference quality assessment metric for colored point cloud based on captured video sequences. Specifically, three video sequences are obtained by rotating the camera around the point cloud through three specific orbits. The video sequences not only contain the static views but also include the multi-frame temporal information, which greatly helps understand the human perception of the point clouds. Then we modify the ResNet3D as the feature extraction model to learn the correlation between the capture videos and corresponding subjective quality scores. The experimental results show that our method outperforms most of the state-of-the-art full-reference and no-reference PCQA metrics, which validates the effectiveness of the proposed method. △ Less

Submitted 20 September, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: Accepted to IEEE 24th International Workshop on Multimedia Signal Processing, 2022

arXiv:2206.02146 [pdf, other]

Recurrent Video Restoration Transformer with Guided Deformable Attention

Authors: **gyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ranjan, Eddy Ilg, Simon Green, Jiezhang Cao, Kai Zhang, Radu Timofte, Luc Van Gool

Abstract: Video restoration aims at restoring multiple high-quality frames from multiple low-quality frames. Existing video restoration methods generally fall into two extreme cases, i.e., they either restore all frames in parallel or restore the video frame by frame in a recurrent way, which would result in different merits and drawbacks. Typically, the former has the advantage of temporal information fusi… ▽ More Video restoration aims at restoring multiple high-quality frames from multiple low-quality frames. Existing video restoration methods generally fall into two extreme cases, i.e., they either restore all frames in parallel or restore the video frame by frame in a recurrent way, which would result in different merits and drawbacks. Typically, the former has the advantage of temporal information fusion. However, it suffers from large model size and intensive memory consumption; the latter has a relatively small model size as it shares parameters across frames; however, it lacks long-range dependency modeling ability and parallelizability. In this paper, we attempt to integrate the advantages of the two cases by proposing a recurrent video restoration transformer, namely RVRT. RVRT processes local neighboring frames in parallel within a globally recurrent framework which can achieve a good trade-off between model size, effectiveness, and efficiency. Specifically, RVRT divides the video into multiple clips and uses the previously inferred clip feature to estimate the subsequent clip feature. Within each clip, different frame features are jointly updated with implicit feature aggregation. Across different clips, the guided deformable attention is designed for clip-to-clip alignment, which predicts multiple relevant locations from the whole inferred clip and aggregates their features by the attention mechanism. Extensive experiments on video super-resolution, deblurring, and denoising show that the proposed RVRT achieves state-of-the-art performance on benchmark datasets with balanced model size, testing memory and runtime. △ Less

Submitted 12 November, 2022; v1 submitted 5 June, 2022; originally announced June 2022.

Comments: Accepted by NeurIPS 2022. Code: https://github.com/**gyunLiang/RVRT

arXiv:2205.08887 [pdf, other]

doi 10.1109/TMI.2022.3156614

3D Segmentation Guided Style-based Generative Adversarial Networks for PET Synthesis

Authors: Yang Zhou, Zhiwen Yang, Hui Zhang, Eric I-Chao Chang, Yubo Fan, Yan Xu

Abstract: Potential radioactive hazards in full-dose positron emission tomography (PET) imaging remain a concern, whereas the quality of low-dose images is never desirable for clinical use. So it is of great interest to translate low-dose PET images into full-dose. Previous studies based on deep learning methods usually directly extract hierarchical features for reconstruction. We notice that the importance… ▽ More Potential radioactive hazards in full-dose positron emission tomography (PET) imaging remain a concern, whereas the quality of low-dose images is never desirable for clinical use. So it is of great interest to translate low-dose PET images into full-dose. Previous studies based on deep learning methods usually directly extract hierarchical features for reconstruction. We notice that the importance of each feature is different and they should be weighted dissimilarly so that tiny information can be captured by the neural network. Furthermore, the synthesis on some regions of interest is important in some applications. Here we propose a novel segmentation guided style-based generative adversarial network (SGSGAN) for PET synthesis. (1) We put forward a style-based generator employing style modulation, which specifically controls the hierarchical features in the translation process, to generate images with more realistic textures. (2) We adopt a task-driven strategy that couples a segmentation task with a generative adversarial network (GAN) framework to improve the translation performance. Extensive experiments show the superiority of our overall framework in PET synthesis, especially on those regions of interest. △ Less

Submitted 18 May, 2022; originally announced May 2022.

Comments: This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2022.3156614, IEEE Transactions on Medical Imaging

Journal ref: IEEE Transactions on Medical Imaging, 2022, 41(8): 2092-2104

arXiv:2203.04959 [pdf, other]

ModDrop++: A Dynamic Filter Network with Intra-subject Co-training for Multiple Sclerosis Lesion Segmentation with Missing Modalities

Authors: Han Liu, Yubo Fan, Hao Li, Jiacheng Wang, Dewei Hu, Can Cui, Ho Hin Lee, Huahong Zhang, Ipek Oguz

Abstract: Multiple Sclerosis (MS) is a chronic neuroinflammatory disease and multi-modality MRIs are routinely used to monitor MS lesions. Many automatic MS lesion segmentation models have been developed and have reached human-level performance. However, most established methods assume the MRI modalities used during training are also available during testing, which is not guaranteed in clinical practice. Pr… ▽ More Multiple Sclerosis (MS) is a chronic neuroinflammatory disease and multi-modality MRIs are routinely used to monitor MS lesions. Many automatic MS lesion segmentation models have been developed and have reached human-level performance. However, most established methods assume the MRI modalities used during training are also available during testing, which is not guaranteed in clinical practice. Previously, a training strategy termed Modality Dropout (ModDrop) has been applied to MS lesion segmentation to achieve the state-of-the-art performance with missing modality. In this paper, we present a novel method dubbed ModDrop++ to train a unified network adaptive to an arbitrary number of input MRI sequences. ModDrop++ upgrades the main idea of ModDrop in two key ways. First, we devise a plug-and-play dynamic head and adopt a filter scaling strategy to improve the expressiveness of the network. Second, we design a co-training strategy to leverage the intra-subject relation between full modality and missing modality. Specifically, the intra-subject co-training strategy aims to guide the dynamic head to generate similar feature representations between the full- and missing-modality data from the same subject. We use two public MS datasets to show the superiority of ModDrop++. Source code and trained models are available at https://github.com/han-liu/ModDropPlusPlus. △ Less

Submitted 1 July, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

Comments: MICCAI 2022

arXiv:2201.12288 [pdf, other]

VRT: A Video Restoration Transformer

Authors: **gyun Liang, Jiezhang Cao, Yuchen Fan, Kai Zhang, Rakesh Ranjan, Yawei Li, Radu Timofte, Luc Van Gool

Abstract: Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames. Different from single image restoration, video restoration generally requires to utilize temporal information from multiple adjacent but usually misaligned video frames. Existing deep methods generally tackle with this by exploiting a sliding window strategy or a recurrent architecture, wh… ▽ More Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames. Different from single image restoration, video restoration generally requires to utilize temporal information from multiple adjacent but usually misaligned video frames. Existing deep methods generally tackle with this by exploiting a sliding window strategy or a recurrent architecture, which either is restricted by frame-by-frame restoration or lacks long-range modelling ability. In this paper, we propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities. More specifically, VRT is composed of multiple scales, each of which consists of two kinds of modules: temporal mutual self attention (TMSA) and parallel war**. TMSA divides the video into small clips, on which mutual attention is applied for joint motion estimation, feature alignment and feature fusion, while self attention is used for feature extraction. To enable cross-clip interactions, the video sequence is shifted for every other layer. Besides, parallel war** is used to further fuse information from neighboring frames by parallel feature war**. Experimental results on five tasks, including video super-resolution, video deblurring, video denoising, video frame interpolation and space-time video super-resolution, demonstrate that VRT outperforms the state-of-the-art methods by large margins ($\textbf{up to 2.16dB}$) on fourteen benchmark datasets. △ Less

Submitted 15 June, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

Comments: add results on VFI and STVSR; SOTA results (+up to 2.16dB) on video SR, video deblurring, video denoising, video frame interpolation and space-time video super-resolution. Code: https://github.com/**gyunLiang/VRT

arXiv:2201.08221 [pdf]

doi 10.1109/CICC48029.2020.9075942

A 1.5GS/s 8b Pipelined-SAR ADC with Output Level Shifting Settling Technique in 14nm CMOS

Authors: Yuanming Zhu, Shengchang Cai, Shiva Kiran, Yang-Hang Fan, Po-Hsuan Chang, Sebastian Hoyos, Samuel Palermo

Abstract: A single channel 1.5GS/s 8-bit pipelined-SAR ADC utilizes a novel output level shifting (OLS) settling technique to reduce the power and enable low-voltage operation of the dynamic residue amplifier. The ADC consists of a 4-bit first stage and a 5-bit second stage, with 1-bit redundancy to relax the offset, gain, and settling requirements of the first stage. Employing the OLS technique allows for… ▽ More A single channel 1.5GS/s 8-bit pipelined-SAR ADC utilizes a novel output level shifting (OLS) settling technique to reduce the power and enable low-voltage operation of the dynamic residue amplifier. The ADC consists of a 4-bit first stage and a 5-bit second stage, with 1-bit redundancy to relax the offset, gain, and settling requirements of the first stage. Employing the OLS technique allows for an inter-stage gain of ~4 from the dynamic residue amplifier with a settling time that is only 28% of a conventional CML amplifier. The ADC's conversion speed is further improved with the use of parallel comparators in the two asynchronous stages. Fabricated in a 14nm FinFET technology, the ADC occupies 0.0013mm2 core area and operates with a 0.8V supply. 6.6-bit ENOB is achieved at Nyquist while consuming 2.4mW, resulting in an FOM of 16.7fJ/conv.-step. △ Less

Submitted 20 August, 2022; v1 submitted 8 January, 2022; originally announced January 2022.

Comments: it is a 4 page and 9 figure IEEE Custom Integrated Circuit Conference paper

Journal ref: IEEE Custom Integrated Circuit Conference 2020

arXiv:2201.02831 [pdf, other]

doi 10.1016/j.media.2022.102628

CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwannoma and Cochlea Segmentation

Authors: Reuben Dorent, Aaron Kujawa, Marina Ivory, Spyridon Bakas, Nicola Rieke, Samuel Joutard, Ben Glocker, Jorge Cardoso, Marc Modat, Kayhan Batmanghelich, Arseniy Belkov, Maria Baldeon Calisto, Jae Won Choi, Benoit M. Dawant, Hexin Dong, Sergio Escalera, Yubo Fan, Lasse Hansen, Mattias P. Heinrich, Smriti Joshi, Victoriya Kashtanova, Hyeon Gyu Kim, Satoshi Kondo, Christian N. Kruse, Susana K. Lai-Yuen , et al. (15 additional authors not shown)

Abstract: Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality… ▽ More Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality Domain Adaptation (crossMoDA) challenge was organised in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). CrossMoDA is the first large and multi-class benchmark for unsupervised cross-modality DA. The challenge's goal is to segment two key brain structures involved in the follow-up and treatment planning of vestibular schwannoma (VS): the VS and the cochleas. Currently, the diagnosis and surveillance in patients with VS are performed using contrast-enhanced T1 (ceT1) MRI. However, there is growing interest in using non-contrast sequences such as high-resolution T2 (hrT2) MRI. Therefore, we created an unsupervised cross-modality segmentation benchmark. The training set provides annotated ceT1 (N=105) and unpaired non-annotated hrT2 (N=105). The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 as provided in the testing set (N=137). A total of 16 teams submitted their algorithm for the evaluation phase. The level of performance reached by the top-performing teams is strikingly high (best median Dice - VS:88.4%; Cochleas:85.7%) and close to full supervision (median Dice - VS:92.5%; Cochleas:87.7%). All top-performing methods made use of an image-to-image translation approach to transform the source-domain images into pseudo-target-domain images. A segmentation network was then trained using these generated images and the manual annotations provided for the source image. △ Less

Submitted 14 December, 2022; v1 submitted 8 January, 2022; originally announced January 2022.

Comments: In Medical Image Analysis

arXiv:2201.01492 [pdf, other]

FAVER: Blind Quality Prediction of Variable Frame Rate Videos

Authors: Qi Zheng, Zhengzhong Tu, Pavan C. Madhusudana, Xiaoyang Zeng, Alan C. Bovik, Yibo Fan

Abstract: Video quality assessment (VQA) remains an important and challenging problem that affects many applications at the widest scales. Recent advances in mobile devices and cloud computing techniques have made it possible to capture, process, and share high resolution, high frame rate (HFR) videos across the Internet nearly instantaneously. Being able to monitor and control the quality of these streamed… ▽ More Video quality assessment (VQA) remains an important and challenging problem that affects many applications at the widest scales. Recent advances in mobile devices and cloud computing techniques have made it possible to capture, process, and share high resolution, high frame rate (HFR) videos across the Internet nearly instantaneously. Being able to monitor and control the quality of these streamed videos can enable the delivery of more enjoyable content and perceptually optimized rate control. Accordingly, there is a pressing need to develop VQA models that can be deployed at enormous scales. While some recent effects have been applied to full-reference (FR) analysis of variable frame rate and HFR video quality, the development of no-reference (NR) VQA algorithms targeting frame rate variations has been little studied. Here, we propose a first-of-a-kind blind VQA model for evaluating HFR videos, which we dub the Framerate-Aware Video Evaluator w/o Reference (FAVER). FAVER uses extended models of spatial natural scene statistics that encompass space-time wavelet-decomposed video signals, to conduct efficient frame rate sensitive quality prediction. Our extensive experiments on several HFR video quality datasets show that FAVER outperforms other blind VQA algorithms at a reasonable computational cost. To facilitate reproducible research and public evaluation, an implementation of FAVER is being made freely available online: \url{https://github.com/uniqzheng/HFR-BVQA}. △ Less

Submitted 5 January, 2022; originally announced January 2022.

Comments: 12 pages, 8 figures

arXiv:2112.04914 [pdf, other]

End-to-end Alexa Device Arbitration

Authors: Jarred Barber, Yifeng Fan, Tao Zhang

Abstract: We introduce a variant of the speaker localization problem, which we call device arbitration. In the device arbitration problem, a user utters a keyword that is detected by multiple distributed microphone arrays (smart home devices), and we want to determine which device was closest to the user. Rather than solving the full localization problem, we propose an end-to-end machine learning system. Th… ▽ More We introduce a variant of the speaker localization problem, which we call device arbitration. In the device arbitration problem, a user utters a keyword that is detected by multiple distributed microphone arrays (smart home devices), and we want to determine which device was closest to the user. Rather than solving the full localization problem, we propose an end-to-end machine learning system. This system learns a feature embedding that is computed independently on each device. The embeddings from each device are then aggregated together to produce the final arbitration decision. We use a large-scale room simulation to generate training and evaluation data, and compare our system against a signal processing baseline. △ Less

Submitted 16 February, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: Accepted for ICASSP 2022

arXiv:2111.02283 [pdf, other]

A Self-adaptive LSAC-PID Approach based on Lyapunov Reward Sha** for Mobile Robots

Authors: Xinyi Yu, Siyu Xu, Yuehai Fan, Linlin Ou

Abstract: To solve the coupling problem of control loops and the adaptive parameter tuning problem in the multi-input multi-output (MIMO) PID control system, a self-adaptive LSAC-PID algorithm is proposed based on deep reinforcement learning (RL) and Lyapunov-based reward sha** in this paper. For complex and unknown mobile robot control environment, an RL-based MIMO PID hybrid control strategy is firstly… ▽ More To solve the coupling problem of control loops and the adaptive parameter tuning problem in the multi-input multi-output (MIMO) PID control system, a self-adaptive LSAC-PID algorithm is proposed based on deep reinforcement learning (RL) and Lyapunov-based reward sha** in this paper. For complex and unknown mobile robot control environment, an RL-based MIMO PID hybrid control strategy is firstly presented. According to the dynamic information and environmental feedback of the mobile robot, the RL agent can output the optimal MIMO PID parameters in real time, without knowing mathematical model and decoupling multiple control loops. Then, to improve the convergence speed of RL and the stability of mobile robots, a Lyapunov-based reward sha** soft actor-critic (LSAC) algorithm is proposed based on Lyapunov theory and potential-based reward sha** method. The convergence and optimality of the algorithm are proved in terms of the policy evaluation and improvement step of soft policy iteration. In addition, for line-following robots, the region growing method is improved to adapt to the influence of forks and environmental interference. Through comparison, test and cross-validation, the simulation and real-environment experimental results all show good performance of the proposed LSAC-PID tuning algorithm. △ Less

Submitted 3 November, 2021; originally announced November 2021.

Comments: 11 pages, 13 figures

arXiv:2111.00485 [pdf, other]

Learned Image Compression with Separate Hyperprior Decoders

Authors: Zhao Zan, Chao Liu, Heming Sun, Xiaoyang Zeng, Yibo Fan

Abstract: Learned image compression techniques have achieved considerable development in recent years. In this paper, we find that the performance bottleneck lies in the use of a single hyperprior decoder, in which case the ternary Gaussian model collapses to a binary one. To solve this, we propose to use three hyperprior decoders to separate the decoding process of the mixed parameters in discrete Gaussian… ▽ More Learned image compression techniques have achieved considerable development in recent years. In this paper, we find that the performance bottleneck lies in the use of a single hyperprior decoder, in which case the ternary Gaussian model collapses to a binary one. To solve this, we propose to use three hyperprior decoders to separate the decoding process of the mixed parameters in discrete Gaussian mixture likelihoods, achieving more accurate parameters estimation. Experimental results demonstrate the proposed method optimized by MS-SSIM achieves on average 3.36% BD-rate reduction compared with state-of-the-art approach. The contribution of the proposed method to the coding time and FLOPs is negligible. △ Less

Submitted 31 October, 2021; originally announced November 2021.

Comments: This paper has been accepted by IEEE Open Journal of Circuits and Systems

arXiv:2109.06274 [pdf, other]

Cross-Modality Domain Adaptation for Vestibular Schwannoma and Cochlea Segmentation

Authors: Han Liu, Yubo Fan, Can Cui, Dingjie Su, Andrew McNeil, Benoit M. Dawant

Abstract: Automatic methods to segment the vestibular schwannoma (VS) tumors and the cochlea from magnetic resonance imaging (MRI) are critical to VS treatment planning. Although supervised methods have achieved satisfactory performance in VS segmentation, they require full annotations by experts, which is laborious and time-consuming. In this work, we aim to tackle the VS and cochlea segmentation problem i… ▽ More Automatic methods to segment the vestibular schwannoma (VS) tumors and the cochlea from magnetic resonance imaging (MRI) are critical to VS treatment planning. Although supervised methods have achieved satisfactory performance in VS segmentation, they require full annotations by experts, which is laborious and time-consuming. In this work, we aim to tackle the VS and cochlea segmentation problem in an unsupervised domain adaptation setting. Our proposed method leverages both the image-level domain alignment to minimize the domain divergence and semi-supervised training to further boost the performance. Furthermore, we propose to fuse the labels predicted from multiple models via noisy label correction. Our results on the challenge validation leaderboard showed that our unsupervised method has achieved promising VS and cochlea segmentation performance with mean dice score of 0.8261 $\pm$ 0.0416; The mean dice value for the tumor is 0.8302 $\pm$ 0.0772. This is comparable to the weakly-supervised based method. △ Less

Submitted 8 November, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

arXiv:2108.08551 [pdf, other]

Learned Video Compression with Residual Prediction and Loop Filter

Authors: Chao Liu, Heming Sun, Jiro Katto, Xiaoyang Zeng, Yibo Fan

Abstract: In this paper, we propose a learned video codec with a residual prediction network (RP-Net) and a feature-aided loop filter (LF-Net). For the RP-Net, we exploit the residual of previous multiple frames to further eliminate the redundancy of the current frame residual. For the LF-Net, the features from residual decoding network and the motion compensation network are used to aid the reconstruction… ▽ More In this paper, we propose a learned video codec with a residual prediction network (RP-Net) and a feature-aided loop filter (LF-Net). For the RP-Net, we exploit the residual of previous multiple frames to further eliminate the redundancy of the current frame residual. For the LF-Net, the features from residual decoding network and the motion compensation network are used to aid the reconstruction quality. To reduce the complexity, a light ResNet structure is used as the backbone for both RP-Net and LF-Net. Experimental results illustrate that we can save about 10% BD-rate compared with previous learned video compression frameworks. Moreover, we can achieve faster coding speed due to the ResNet backbone. This project is available at https://github.com/chaoliu18/RPLVC. △ Less

Submitted 19 August, 2021; originally announced August 2021.

arXiv:2108.01522 [pdf, other]

CSMCNet: Scalable Video Compressive Sensing Reconstruction with Interpretable Motion Estimation

Authors: Bowen Huang, Xiao Yan, **jia Zhou, Yibo Fan

Abstract: Most deep network methods for compressive sensing reconstruction suffer from the black-box characteristic of DNN. In this paper, a deep neural network with interpretable motion estimation named CSMCNet is proposed. The network is able to realize high-quality reconstruction of video compressive sensing by unfolding the iterative steps of optimization based algorithms. A DNN based, multi-hypothesis… ▽ More Most deep network methods for compressive sensing reconstruction suffer from the black-box characteristic of DNN. In this paper, a deep neural network with interpretable motion estimation named CSMCNet is proposed. The network is able to realize high-quality reconstruction of video compressive sensing by unfolding the iterative steps of optimization based algorithms. A DNN based, multi-hypothesis motion estimation module is designed to improve the reconstruction quality, and a residual module is employed to further narrow down the gap between re-construction results and original signal in our proposed method. Besides, we propose an interpolation module with corresponding training strategy to realize scalable CS reconstruction, which is capable of using the same model to decode various compression ratios. Experiments show that a PSNR of 29.34dB can be achieved at 2% CS ratio (compressed by 98%), which is superior than other state-of-the-art methods. Moreover, the interpolation module is proved to be effective, with significant cost saving and acceptable performance losses. △ Less

Submitted 3 August, 2021; originally announced August 2021.

Comments: 12 pages, 10 pages, 5 tables

arXiv:2107.03987 [pdf]

Atlas-Based Segmentation of Intracochlear Anatomy in Metal Artifact Affected CT Images of the Ear with Co-trained Deep Neural Networks

Authors: Jianing Wang, Dingjie Su, Yubo Fan, Srijata Chakravorti, Jack H. Noble, Benoit M. Dawant

Abstract: We propose an atlas-based method to segment the intracochlear anatomy (ICA) in the post-implantation CT (Post-CT) images of cochlear implant (CI) recipients that preserves the point-to-point correspondence between the meshes in the atlas and the segmented volumes. To solve this problem, which is challenging because of the strong artifacts produced by the implant, we use a pair of co-trained deep n… ▽ More We propose an atlas-based method to segment the intracochlear anatomy (ICA) in the post-implantation CT (Post-CT) images of cochlear implant (CI) recipients that preserves the point-to-point correspondence between the meshes in the atlas and the segmented volumes. To solve this problem, which is challenging because of the strong artifacts produced by the implant, we use a pair of co-trained deep networks that generate dense deformation fields (DDFs) in opposite directions. One network is tasked with registering an atlas image to the Post-CT images and the other network is tasked with registering the Post-CT images to the atlas image. The networks are trained using loss functions based on voxel-wise labels, image content, fiducial registration error, and cycle-consistency constraint. The segmentation of the ICA in the Post-CT images is subsequently obtained by transferring the predefined segmentation meshes of the ICA in the atlas image to the Post-CT images using the corresponding DDFs generated by the trained registration networks. Our model can learn the underlying geometric features of the ICA even though they are obscured by the metal artifacts. We show that our end-to-end network produces results that are comparable to the current state of the art (SOTA) that relies on a two-steps approach that first uses conditional generative adversarial networks to synthesize artifact-free images from the Post-CT images and then uses an active shape model-based method to segment the ICA in the synthetic images. Our method requires a fraction of the time needed by the SOTA, which is important for end-user acceptance. △ Less

Submitted 9 July, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

Comments: 10 pages, 5 figures

arXiv:2106.06011 [pdf, other]

A self-adapting super-resolution structures framework for automatic design of GAN

Authors: Yibo Guo, Haidi Wang, Yiming Fan, Shunyao Li, Mingliang Xu

Abstract: With the development of deep learning, the single super-resolution image reconstruction network models are becoming more and more complex. Small changes in hyperparameters of the models have a greater impact on model performance. In the existing works, experts have gradually explored a set of optimal model parameters based on empirical values or performing brute-force search. In this paper, we int… ▽ More With the development of deep learning, the single super-resolution image reconstruction network models are becoming more and more complex. Small changes in hyperparameters of the models have a greater impact on model performance. In the existing works, experts have gradually explored a set of optimal model parameters based on empirical values or performing brute-force search. In this paper, we introduce a new super-resolution image reconstruction generative adversarial network framework, and a Bayesian optimization method used to optimizing the hyperparameters of the generator and discriminator. The generator is made by self-calibrated convolution, and discriminator is made by convolution lays. We have defined the hyperparameters such as the number of network layers and the number of neurons. Our method adopts Bayesian optimization as a optimization policy of GAN in our model. Not only can find the optimal hyperparameter solution automatically, but also can construct a super-resolution image reconstruction network, reducing the manual workload. Experiments show that Bayesian optimization can search the optimal solution earlier than the other two optimization algorithms. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Comments: 9 pages, 6 figures

arXiv:2106.05545 [pdf, other]

Super-Resolution Image Reconstruction Based on Self-Calibrated Convolutional GAN

Authors: Yibo Guo, Haidi Wang, Yiming Fan, Shunyao Li, Mingliang Xu

Abstract: With the effective application of deep learning in computer vision, breakthroughs have been made in the research of super-resolution images reconstruction. However, many researches have pointed out that the insufficiency of the neural network extraction on image features may bring the deteriorating of newly reconstructed image. On the other hand, the generated pictures are sometimes too artificial… ▽ More With the effective application of deep learning in computer vision, breakthroughs have been made in the research of super-resolution images reconstruction. However, many researches have pointed out that the insufficiency of the neural network extraction on image features may bring the deteriorating of newly reconstructed image. On the other hand, the generated pictures are sometimes too artificial because of over-smoothing. In order to solve the above problems, we propose a novel self-calibrated convolutional generative adversarial networks. The generator consists of feature extraction and image reconstruction. Feature extraction uses self-calibrated convolutions, which contains four portions, and each portion has specific functions. It can not only expand the range of receptive fields, but also obtain long-range spatial and inter-channel dependencies. Then image reconstruction is performed, and finally a super-resolution image is reconstructed. We have conducted thorough experiments on different datasets including set5, set14 and BSD100 under the SSIM evaluation method. The experimental results prove the effectiveness of the proposed network. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Comments: 8 pages, 3 figures

arXiv:2105.15077 [pdf, other]

SDNet: mutil-branch for single image deraining using swin

Authors: Fuxiang Tan, YuTing Kong, Yingying Fan, Feng Liu, Daxin Zhou, Hao zhang, Long Chen, Liang Gao, Yurong Qian

Abstract: Rain streaks degrade the image quality and seriously affect the performance of subsequent computer vision tasks, such as autonomous driving, social security, etc. Therefore, removing rain streaks from a given rainy images is of great significance. Convolutional neural networks(CNN) have been widely used in image deraining tasks, however, the local computational characteristics of convolutional ope… ▽ More Rain streaks degrade the image quality and seriously affect the performance of subsequent computer vision tasks, such as autonomous driving, social security, etc. Therefore, removing rain streaks from a given rainy images is of great significance. Convolutional neural networks(CNN) have been widely used in image deraining tasks, however, the local computational characteristics of convolutional operations limit the development of image deraining tasks. Recently, the popular transformer has global computational features that can further facilitate the development of image deraining tasks. In this paper, we introduce Swin-transformer into the field of image deraining for the first time to study the performance and potential of Swin-transformer in the field of image deraining. Specifically, we improve the basic module of Swin-transformer and design a three-branch model to implement single-image rain removal. The former implements the basic rain pattern feature extraction, while the latter fuses different features to further extract and process the image features. In addition, we employ a jump connection to fuse deep features and shallow features. In terms of experiments, the existing public dataset suffers from image duplication and relatively homogeneous background. So we propose a new dataset Rain3000 to validate our model. Therefore, we propose a new dataset Rain3000 for validating our model. Experimental results on the publicly available datasets Rain100L, Rain100H and our dataset Rain3000 show that our proposed method has performance and inference speed advantages over the current mainstream single-image rain streaks removal models.The source code will be available at https://github.com/H-tfx/SDNet. △ Less

Submitted 31 May, 2021; originally announced May 2021.

arXiv:2105.10087 [pdf, other]

doi 10.1007/978-3-031-16446-0_10

DSR: Direct Simultaneous Registration for Multiple 3D Images

Authors: Zhehua Mao, Liang Zhao, Shoudong Huang, Yiting Fan, Alex Pui-Wai Lee

Abstract: This paper presents a novel algorithm named Direct Simultaneous Registration (DSR) that registers a collection of 3D images in a simultaneous fashion without specifying any reference image, feature extraction and matching, or information loss or reuse. The algorithm optimizes the global poses of local image frames by maximizing the similarity between a predefined panoramic image and local images.… ▽ More This paper presents a novel algorithm named Direct Simultaneous Registration (DSR) that registers a collection of 3D images in a simultaneous fashion without specifying any reference image, feature extraction and matching, or information loss or reuse. The algorithm optimizes the global poses of local image frames by maximizing the similarity between a predefined panoramic image and local images. Although we formulate the problem as a Direct Bundle Adjustment (DBA) that jointly optimizes the poses of local frames and the intensities of the panoramic image, by investigating the independence of pose estimation from the panoramic image in the solving process, DSR is proposed to solve the poses only and proved to be able to obtain the same optimal poses as DBA. The proposed method is particularly suitable for the scenarios where distinct features are not available, such as Transesophageal Echocardiography (TEE) images. DSR is evaluated by comparing it with four widely used methods via simulated and in-vivo 3D TEE images. It is shown that the proposed method outperforms these four methods in terms of accuracy and requires much fewer computational resources than the state-of-the-art accumulated pairwise estimates (APE). △ Less

Submitted 15 August, 2022; v1 submitted 20 May, 2021; originally announced May 2021.

Comments: 10 pages, 3 figures, The 25th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2022

Journal ref: Medical Image Computing and Computer Assisted Intervention (2022)

arXiv:2101.09642 [pdf]

doi 10.1109/CVPRW50498.2020.00088

Image Compression with Encoder-Decoder Matched Semantic Segmentation

Authors: Trinh Man Hoang, **jia Zhou, Yibo Fan

Abstract: In recent years, layered image compression is demonstrated to be a promising direction, which encodes a compact representation of the input image and apply an up-sampling network to reconstruct the image. To further improve the quality of the reconstructed image, some works transmit the semantic segment together with the compressed image data. Consequently, the compression ratio is also decreased… ▽ More In recent years, layered image compression is demonstrated to be a promising direction, which encodes a compact representation of the input image and apply an up-sampling network to reconstruct the image. To further improve the quality of the reconstructed image, some works transmit the semantic segment together with the compressed image data. Consequently, the compression ratio is also decreased because extra bits are required for transmitting the semantic segment. To solve this problem, we propose a new layered image compression framework with encoder-decoder matched semantic segmentation (EDMS). And then, followed by the semantic segmentation, a special convolution neural network is used to enhance the inaccurate semantic segment. As a result, the accurate semantic segment can be obtained in the decoder without requiring extra bits. The experimental results show that the proposed EDMS framework can get up to 35.31% BD-rate reduction over the HEVC-based (BPG) codec, 5% bitrate, and 24% encoding time saving compare to the state-of-the-art semantic-based image codec. △ Less

Submitted 30 January, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

Journal ref: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 2020, pp. 619-623

Showing 1–50 of 78 results for author: Fan, Y