Search | arXiv e-print repository

Joint MIMO Transceiver and Reflector Design for Reconfigurable Intelligent Surface-Assisted Communication

Authors: Yaqiong Zhao, **dan Xu, Wei Xu, Kezhi Wang, Xinquan Ye, Chau Yuen, Xiaohu You

Abstract: In this paper, we consider a reconfigurable intelligent surface (RIS)-assisted multiple-input multiple-output communication system with multiple antennas at both the base station (BS) and the user. We plan to maximize the achievable rate through jointly optimizing the transmit precoding matrix, the receive combining matrix, and the RIS reflection matrix under the constraints of the transmit power… ▽ More In this paper, we consider a reconfigurable intelligent surface (RIS)-assisted multiple-input multiple-output communication system with multiple antennas at both the base station (BS) and the user. We plan to maximize the achievable rate through jointly optimizing the transmit precoding matrix, the receive combining matrix, and the RIS reflection matrix under the constraints of the transmit power at the BS and the unit-modulus reflection at the RIS. Regarding the non-trivial problem form, we initially reformulate it into an considerable problem to make it tractable by utilizing the relationship between the achievable rate and the weighted minimum mean squared error. Next, the transmit precoding matrix, the receive combining matrix, and the RIS reflection matrix are alternately optimized. In particular, the optimal transmit precoding matrix and receive combining matrix are obtained in closed forms. Furthermore, a pair of computationally efficient methods are proposed for the RIS reflection matrix, namely the semi-definite relaxation (SDR) method and the successive closed form (SCF) method. We theoretically prove that both methods are ensured to converge, and the SCF-based algorithm is able to converges to a Karush-Kuhn-Tucker point of the problem. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 14 pages, 12 figures

arXiv:2404.16484 [pdf, other]

Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: CVPR 2024, AI for Streaming (AIS) Workshop

arXiv:2404.12887 [pdf, other]

3D Multi-frame Fusion for Video Stabilization

Authors: Zhan Peng, Xinyi Ye, Weiyue Zhao, Tianqi Liu, Huiqiang Sun, Baopu Li, Zhiguo Cao

Abstract: In this paper, we present RStab, a novel framework for video stabilization that integrates 3D multi-frame fusion through volume rendering. Departing from conventional methods, we introduce a 3D multi-frame perspective to generate stabilized images, addressing the challenge of full-frame generation while preserving structure. The core of our approach lies in Stabilized Rendering (SR), a volume rend… ▽ More In this paper, we present RStab, a novel framework for video stabilization that integrates 3D multi-frame fusion through volume rendering. Departing from conventional methods, we introduce a 3D multi-frame perspective to generate stabilized images, addressing the challenge of full-frame generation while preserving structure. The core of our approach lies in Stabilized Rendering (SR), a volume rendering module, which extends beyond the image fusion by incorporating feature fusion. The core of our RStab framework lies in Stabilized Rendering (SR), a volume rendering module, fusing multi-frame information in 3D space. Specifically, SR involves war** features and colors from multiple frames by projection, fusing them into descriptors to render the stabilized image. However, the precision of warped information depends on the projection accuracy, a factor significantly influenced by dynamic regions. In response, we introduce the Adaptive Ray Range (ARR) module to integrate depth priors, adaptively defining the sampling range for the projection process. Additionally, we propose Color Correction (CC) assisting geometric constraints with optical flow for accurate color aggregation. Thanks to the three modules, our RStab demonstrates superior performance compared with previous stabilizers in the field of view (FOV), image quality, and video stability across various datasets. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2404.01717 [pdf, other]

AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation

Authors: Rui Xie, Ying Tai, Chen Zhao, Kai Zhang, Zhenyu Zhang, Jun Zhou, Xiaoqian Ye, Qian Wang, Jian Yang

Abstract: Blind super-resolution methods based on stable diffusion showcase formidable generative capabilities in reconstructing clear high-resolution images with intricate details from low-resolution inputs. However, their practical applicability is often hampered by poor efficiency, stemming from the requirement of thousands or hundreds of sampling steps. Inspired by the efficient adversarial diffusion di… ▽ More Blind super-resolution methods based on stable diffusion showcase formidable generative capabilities in reconstructing clear high-resolution images with intricate details from low-resolution inputs. However, their practical applicability is often hampered by poor efficiency, stemming from the requirement of thousands or hundreds of sampling steps. Inspired by the efficient adversarial diffusion distillation (ADD), we design~\name~to address this issue by incorporating the ideas of both distillation and ControlNet. Specifically, we first propose a prediction-based self-refinement strategy to provide high-frequency information in the student model output with marginal additional time cost. Furthermore, we refine the training process by employing HR images, rather than LR images, to regulate the teacher model, providing a more robust constraint for distillation. Second, we introduce a timestep-adaptive ADD to address the perception-distortion imbalance problem introduced by original ADD. Extensive experiments demonstrate our~\name~generates better restoration results, while achieving faster speed than previous SD-based state-of-the-art models (e.g., $7$$\times$ faster than SeeSR). △ Less

Submitted 23 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.16408 [pdf, other]

Accuracy-Aware Cooperative Sensing and Computing for Connected Autonomous Vehicles

Authors: Xuehan Ye, Kaige Qu, Weihua Zhuang, Xuemin Shen

Abstract: To maintain high perception performance among connected and autonomous vehicles (CAVs), in this paper, we propose an accuracy-aware and resource-efficient raw-level cooperative sensing and computing scheme among CAVs and road-side infrastructure. The scheme enables fined-grained partial raw sensing data selection, transmission, fusion, and processing in per-object granularity, by exploiting the pa… ▽ More To maintain high perception performance among connected and autonomous vehicles (CAVs), in this paper, we propose an accuracy-aware and resource-efficient raw-level cooperative sensing and computing scheme among CAVs and road-side infrastructure. The scheme enables fined-grained partial raw sensing data selection, transmission, fusion, and processing in per-object granularity, by exploiting the parallelism among object classification subtasks associated with each object. A supervised learning model is trained to capture the relationship between the object classification accuracy and the data quality of selected object sensing data, facilitating accuracy-aware sensing data selection. We formulate an optimization problem for joint sensing data selection, subtask placement and resource allocation among multiple object classification subtasks, to minimize the total resource cost while satisfying the delay and accuracy requirements. A genetic algorithm based iterative solution is proposed for the optimization problem. Simulation results demonstrate the accuracy awareness and resource efficiency achieved by the proposed cooperative sensing and computing scheme, in comparison with benchmark solutions. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.15468 [pdf, other]

Human Detection in Realistic Through-the-Wall Environments using Raw Radar ADC Data and Parametric Neural Networks

Authors: Wei Wang, Naike Du, Yuchao Guo, Chao Sun, **gyang Liu, Rencheng Song, Xiuzhu Ye

Abstract: The radar signal processing algorithm is one of the core components in through-wall radar human detection technology. Traditional algorithms (e.g., DFT and matched filtering) struggle to adaptively handle low signal-to-noise ratio echo signals in challenging and dynamic real-world through-wall application environments, which becomes a major bottleneck in the system. In this paper, we introduce an… ▽ More The radar signal processing algorithm is one of the core components in through-wall radar human detection technology. Traditional algorithms (e.g., DFT and matched filtering) struggle to adaptively handle low signal-to-noise ratio echo signals in challenging and dynamic real-world through-wall application environments, which becomes a major bottleneck in the system. In this paper, we introduce an end-to-end through-wall radar human detection network (TWP-CNN), which takes raw radar Analog-to-Digital Converter (ADC) signals without any preprocessing as input. We replace the conventional radar signal processing flow with the proposed DFT-based adaptive feature extraction (DAFE) module. This module employs learnable parameterized 3D complex convolution layers to extract superior feature representations from ADC signals, which is beyond the limitation of traditional preprocessing methods. Additionally, by embedding phase information from radar data within the network and employing multi-task learning, a more accurate detection is achieved. Finally, due to the absence of through-wall radar datasets containing raw ADC data, we gathered a realistic through-wall (RTW) dataset using our in-house developed through-wall radar system. We trained and validated our proposed method on this dataset to confirm its effectiveness and superiority in real through-wall detection scenarios. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 11pages,13figures

arXiv:2403.15424 [pdf, other]

Cross-user activity recognition using deep domain adaptation with temporal relation information

Authors: Xiaozhou Ye, Waleed H. Abdulla, Nirmal Nair, Kevin I-Kai Wang

Abstract: Human Activity Recognition (HAR) is a cornerstone of ubiquitous computing, with promising applications in diverse fields such as health monitoring and ambient assisted living. Despite significant advancements, sensor-based HAR methods often operate under the assumption that training and testing data have identical distributions. However, in many real-world scenarios, particularly in sensor-based H… ▽ More Human Activity Recognition (HAR) is a cornerstone of ubiquitous computing, with promising applications in diverse fields such as health monitoring and ambient assisted living. Despite significant advancements, sensor-based HAR methods often operate under the assumption that training and testing data have identical distributions. However, in many real-world scenarios, particularly in sensor-based HAR, this assumption is invalidated by out-of-distribution ($\displaystyle o.o.d.$) challenges, including differences from heterogeneous sensors, change over time, and individual behavioural variability. This paper centres on the latter, exploring the cross-user HAR problem where behavioural variability across individuals results in differing data distributions. To address this challenge, we introduce the Deep Temporal State Domain Adaptation (DTSDA) model, an innovative approach tailored for time series domain adaptation in cross-user HAR. Contrary to the common assumption of sample independence in existing domain adaptation approaches, DTSDA recognizes and harnesses the inherent temporal relations in the data. Therefore, we introduce 'Temporal State', a concept that defined the different sub-activities within an activity, consistent across different users. We ensure these sub-activities follow a logical time sequence through 'Temporal Consistency' property and propose the 'Pseudo Temporal State Labeling' method to identify the user-invariant temporal relations. Moreover, the design principle of DTSDA integrates adversarial learning for better domain adaptation. Comprehensive evaluations on three HAR datasets demonstrate DTSDA's superior performance in cross-user HAR applications by briding individual behavioral variability using temporal relations across sub-activities. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.15423 [pdf, other]

Cross-user activity recognition via temporal relation optimal transport

Authors: Xiaozhou Ye, Kevin I-Kai Wang

Abstract: Current research on human activity recognition (HAR) mainly assumes that training and testing data are drawn from the same distribution to achieve a generalised model, which means all the data are considered to be independent and identically distributed $\displaystyle (i.i.d.) $. In many real-world applications, this assumption does not hold, and collected training and target testing datasets have… ▽ More Current research on human activity recognition (HAR) mainly assumes that training and testing data are drawn from the same distribution to achieve a generalised model, which means all the data are considered to be independent and identically distributed $\displaystyle (i.i.d.) $. In many real-world applications, this assumption does not hold, and collected training and target testing datasets have non-uniform distribution, such as in the case of cross-user HAR. Domain adaptation is a promising approach for cross-user HAR tasks. Existing domain adaptation works based on the assumption that samples in each domain are $\displaystyle i.i.d. $ and do not consider the knowledge of temporal relation hidden in time series data for aligning data distribution. This strong assumption of $\displaystyle i.i.d. $ may not be suitable for time series-related domain adaptation methods because the samples formed by time series segmentation and feature extraction techniques are only coarse approximations to $\displaystyle i.i.d. $ assumption in each domain. In this paper, we propose the temporal relation optimal transport (TROT) method to utilise temporal relation and relax the $\displaystyle i.i.d. $ assumption for the samples in each domain for accurate and efficient knowledge transfer. We obtain the temporal relation representation and implement temporal relation alignment of activities via the Hidden Markov model (HMM) and optimal transport (OT) techniques. Besides, a new regularisation term that preserves temporal relation order information for an improved optimal transport map** is proposed to enhance the domain adaptation performance. Comprehensive experiments are conducted on three public activity recognition datasets (i.e. OPPT, PAMAP2 and DSADS), demonstrating that TROT outperforms other state-of-the-art methods. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.15422 [pdf, other]

Machine Learning Techniques for Sensor-based Human Activity Recognition with Data Heterogeneity -- A Review

Authors: Xiaozhou Ye, Kouichi Sakurai, Nirmal Nair, Kevin I-Kai Wang

Abstract: Sensor-based Human Activity Recognition (HAR) is crucial in ubiquitous computing, analysing behaviours through multi-dimensional observations. Despite research progress, HAR confronts challenges, particularly in data distribution assumptions. Most studies often assume uniform data distributions across datasets, contrasting with the varied nature of practical sensor data in human activities. Addres… ▽ More Sensor-based Human Activity Recognition (HAR) is crucial in ubiquitous computing, analysing behaviours through multi-dimensional observations. Despite research progress, HAR confronts challenges, particularly in data distribution assumptions. Most studies often assume uniform data distributions across datasets, contrasting with the varied nature of practical sensor data in human activities. Addressing data heterogeneity issues can improve performance, reduce computational costs, and aid in develo** personalized, adaptive models with less annotated data. This review investigates how machine learning addresses data heterogeneity in HAR, by categorizing data heterogeneity types, applying corresponding suitable machine learning methods, summarizing available datasets, and discussing future challenges. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.12028 [pdf, other]

Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail

Authors: Ming** Chen, Junhao Chen, Xiaojun Ye, Huan-ang Gao, Xiaoxue Chen, Zhaoxin Fan, Hao Zhao

Abstract: 3D human body reconstruction has been a challenge in the field of computer vision. Previous methods are often time-consuming and difficult to capture the detailed appearance of the human body. In this paper, we propose a new method called \emph{Ultraman} for fast reconstruction of textured 3D human models from a single image. Compared to existing techniques, \emph{Ultraman} greatly improves the re… ▽ More 3D human body reconstruction has been a challenge in the field of computer vision. Previous methods are often time-consuming and difficult to capture the detailed appearance of the human body. In this paper, we propose a new method called \emph{Ultraman} for fast reconstruction of textured 3D human models from a single image. Compared to existing techniques, \emph{Ultraman} greatly improves the reconstruction speed and accuracy while preserving high-quality texture details. We present a set of new frameworks for human reconstruction consisting of three parts, geometric reconstruction, texture generation and texture map**. Firstly, a mesh reconstruction framework is used, which accurately extracts 3D human shapes from a single image. At the same time, we propose a method to generate a multi-view consistent image of the human body based on a single image. This is finally combined with a novel texture map** method to optimize texture details and ensure color consistency during reconstruction. Through extensive experiments and evaluations, we demonstrate the superior performance of \emph{Ultraman} on various standard datasets. In addition, \emph{Ultraman} outperforms state-of-the-art methods in terms of human rendering quality and speed. Upon acceptance of the article, we will make the code and data publicly available. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Project Page: https://air-discover.github.io/Ultraman/

arXiv:2402.04446 [pdf, other]

Pushing the limits of cell segmentation models for imaging mass cytometry

Authors: Kimberley M. Bird, Xujiong Ye, Alan M. Race, James M. Brown

Abstract: Imaging mass cytometry (IMC) is a relatively new technique for imaging biological tissue at subcellular resolution. In recent years, learning-based segmentation methods have enabled precise quantification of cell type and morphology, but typically rely on large datasets with fully annotated ground truth (GT) labels. This paper explores the effects of imperfect labels on learning-based segmentation… ▽ More Imaging mass cytometry (IMC) is a relatively new technique for imaging biological tissue at subcellular resolution. In recent years, learning-based segmentation methods have enabled precise quantification of cell type and morphology, but typically rely on large datasets with fully annotated ground truth (GT) labels. This paper explores the effects of imperfect labels on learning-based segmentation models and evaluates the generalisability of these models to different tissue types. Our results show that removing 50% of cell annotations from GT masks only reduces the dice similarity coefficient (DSC) score to 0.874 (from 0.889 achieved by a model trained on fully annotated GT masks). This implies that annotation time can in fact be reduced by at least half without detrimentally affecting performance. Furthermore, training our single-tissue model on imperfect labels only decreases DSC by 0.031 on an unseen tissue type compared to its multi-tissue counterpart, with negligible qualitative differences in segmentation. Additionally, bootstrap** the worst-performing model (with 5% of cell annotations) a total of ten times improves its original DSC score of 0.720 to 0.829. These findings imply that less time and work can be put into the process of producing comparable segmentation models; this includes eliminating the need for multiple IMC tissue types during training, whilst also providing the potential for models with very few labels to improve on themselves. Source code is available on GitHub: https://github.com/kimberley/ISBI2024. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: International Symposium on Biomedical Imaging (ISBI) 2024 Submission

ACM Class: I.2; I.4; I.4.6

arXiv:2312.07019 [pdf, other]

Beyond 1D and oversimplified kinematics: A generic analytical framework for surrogate safety measures

Authors: Sixu Li, Mohammad Anis, Dominique Lord, Hao Zhang, Yang Zhou, Xinyue Ye

Abstract: This paper presents a generic analytical framework tailored for surrogate safety measures (SSMs) that is versatile across various highway geometries, capable of encompassing vehicle dynamics of differing dimensionality and fidelity, and suitable for dynamic, real-world environments. The framework incorporates a generic vehicle movement model, accommodating a spectrum of scenarios with varying degr… ▽ More This paper presents a generic analytical framework tailored for surrogate safety measures (SSMs) that is versatile across various highway geometries, capable of encompassing vehicle dynamics of differing dimensionality and fidelity, and suitable for dynamic, real-world environments. The framework incorporates a generic vehicle movement model, accommodating a spectrum of scenarios with varying degrees of complexity and dimensionality, facilitating the prediction of future vehicle trajectories. It establishes a generic mathematical criterion to denote potential collisions, characterized by the spatial overlap between a vehicle and any other entity. A collision risk is present if the collision criterion is met at any non-negative time point, with the minimum threshold representing the remaining time to collision. The framework's proficiency spans from conventional one-dimensional (1D) SSMs to extended multi-dimensional, high-fidelity SSMs. Its validity is corroborated through simulation experiments that assess the precision of the framework when linearization is performed on the vehicle movement model. The outcomes showcase remarkable accuracy in predicting vehicle trajectories and the time remaining before potential collisions occur. The necessity of higher-dimensional and higher-fidelity SSMs is highlighted through a comparison of conventional 1D SSMs and extended three-dimensional (3D) SSMs. The results showed that using 1D SSMs over 3D SSMs could be off by 300% for non-critical Time-to-Collision (TTC) values and about 20% for critical TTC values (below 1.5 seconds). Furthermore, the framework's practical application is demonstrated through a case study that actively evaluates all potential conflicts, underscoring its effectiveness in dynamic, real-world traffic situations. △ Less

Submitted 25 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.04377 [pdf, other]

HARQ-IR Aided Short Packet Communications: BLER Analysis and Throughput Maximization

Authors: Fuchao He, Zheng Shi, Guanghua Yang, Xiaofan Li, Xinrong Ye, Shaodan Ma

Abstract: This paper introduces hybrid automatic repeat request with incremental redundancy (HARQ-IR) to boost the reliability of short packet communications. The finite blocklength information theory and correlated decoding events tremendously preclude the analysis of average block error rate (BLER). Fortunately, the recursive form of average BLER motivates us to calculate its value through the trapezoidal… ▽ More This paper introduces hybrid automatic repeat request with incremental redundancy (HARQ-IR) to boost the reliability of short packet communications. The finite blocklength information theory and correlated decoding events tremendously preclude the analysis of average block error rate (BLER). Fortunately, the recursive form of average BLER motivates us to calculate its value through the trapezoidal approximation and Gauss-Laguerre quadrature. Moreover, the asymptotic analysis is performed to derive a simple expression for the average BLER at high signal-to-noise ratio (SNR). Then, we study the maximization of long term average throughput (LTAT) via power allocation meanwhile ensuring the power and the BLER constraints. For tractability, the asymptotic BLER is employed to solve the problem through geometric programming (GP). However, the GP-based solution underestimates the LTAT at low SNR due to a large approximation error in this case. Alternatively, we also develop a deep reinforcement learning (DRL)-based framework to learn power allocation policy. In particular, the optimization problem is transformed into a constrained Markov decision process, which is solved by integrating deep deterministic policy gradient (DDPG) with subgradient method. The numerical results finally demonstrate that the DRL-based method outperforms the GP-based one at low SNR, albeit at the cost of increasing computational burden. △ Less

Submitted 9 January, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: 13 pages, 10 figures

arXiv:2311.14924 [pdf, other]

Sequencing-enabled Hierarchical Cooperative CAV On-ramp Merging Control with Enhanced Stability and Feasibility

Authors: Sixu Li, Yang Zhou, Xinyue Ye, Jiwan Jiang, Meng Wang

Abstract: This paper develops a sequencing-enabled hierarchical connected automated vehicle (CAV) cooperative on-ramp merging control framework. The proposed framework consists of a two-layer design: the upper level control sequences the vehicles to harmonize the traffic density across mainline and on-ramp segments while enhancing lower-level control efficiency through a mixed-integer linear programming for… ▽ More This paper develops a sequencing-enabled hierarchical connected automated vehicle (CAV) cooperative on-ramp merging control framework. The proposed framework consists of a two-layer design: the upper level control sequences the vehicles to harmonize the traffic density across mainline and on-ramp segments while enhancing lower-level control efficiency through a mixed-integer linear programming formulation. Subsequently, the lower-level control employs a longitudinal distributed model predictive control (MPC) supplemented by a virtual car-following (CF) concept to ensure asymptotic local stability, l_2 norm string stability, and safety. Proofs of asymptotic local stability and l_2 norm string stability are mathematically derived. Compared to other prevalent asymptotic local-stable MPC controllers, the proposed distributed MPC controller greatly expands the initial feasible set. Additionally, an auxiliary lateral control is developed to maintain lane-kee** and merging smoothness while accommodating ramp geometric curvature. To validate the proposed framework, multiple numerical experiments are conducted. Results indicate a notable outperformance of our upper-level controller against a distance-based sequencing method. Furthermore, the lower-level control effectively ensures smooth acceleration, safe merging with adequate spacing, adherence to proven longitudinal local and string stability, and rapid regulation of lateral deviations. △ Less

Submitted 25 May, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

arXiv:2310.11637 [pdf, other]

FixPix: Fixing Bad Pixels using Deep Learning

Authors: Sreetama Sarkar, Xinan Ye, Gourav Datta, Peter A. Beerel

Abstract: Efficient and effective on-line detection and correction of bad pixels can improve yield and increase the expected lifetime of image sensors. This paper presents a comprehensive Deep Learning (DL) based on-line detection-correction approach, suitable for a wide range of pixel corruption rates. A confidence calibrated segmentation approach is introduced, which achieves nearly perfect bad pixel dete… ▽ More Efficient and effective on-line detection and correction of bad pixels can improve yield and increase the expected lifetime of image sensors. This paper presents a comprehensive Deep Learning (DL) based on-line detection-correction approach, suitable for a wide range of pixel corruption rates. A confidence calibrated segmentation approach is introduced, which achieves nearly perfect bad pixel detection, even with few training samples. A computationally light-weight correction algorithm is proposed for low rates of pixel corruption, that surpasses the accuracy of traditional interpolation-based techniques. We also propose an autoencoder based image reconstruction approach which alleviates the need for prior bad pixel detection and yields promising results for high rates of pixel corruption. Unlike previous methods, which use proprietary images, we demonstrate the efficacy of the proposed methods on the open-source Samsung S7 ISP and MIT-Adobe FiveK datasets. Our approaches yield up to 99.6% detection accuracy with <0.6% false positives and corrected images within 1.5% average pixel error from 70% corrupted images. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2307.10824 [pdf, other]

Parse and Recall: Towards Accurate Lung Nodule Malignancy Prediction like Radiologists

Authors: Jianpeng Zhang, Xianghua Ye, Jianfeng Zhang, Yuxing Tang, Minfeng Xu, Jianfei Guo, Xin Chen, Zaiyi Liu, **gren Zhou, Le Lu, Ling Zhang

Abstract: Lung cancer is a leading cause of death worldwide and early screening is critical for improving survival outcomes. In clinical practice, the contextual structure of nodules and the accumulated experience of radiologists are the two core elements related to the accuracy of identification of benign and malignant nodules. Contextual information provides comprehensive information about nodules such as… ▽ More Lung cancer is a leading cause of death worldwide and early screening is critical for improving survival outcomes. In clinical practice, the contextual structure of nodules and the accumulated experience of radiologists are the two core elements related to the accuracy of identification of benign and malignant nodules. Contextual information provides comprehensive information about nodules such as location, shape, and peripheral vessels, and experienced radiologists can search for clues from previous cases as a reference to enrich the basis of decision-making. In this paper, we propose a radiologist-inspired method to simulate the diagnostic process of radiologists, which is composed of context parsing and prototype recalling modules. The context parsing module first segments the context structure of nodules and then aggregates contextual information for a more comprehensive understanding of the nodule. The prototype recalling module utilizes prototype-based learning to condense previously learned cases as prototypes for comparative analysis, which is updated online in a momentum way during training. Building on the two modules, our method leverages both the intrinsic characteristics of the nodules and the external knowledge accumulated from other nodules to achieve a sound diagnosis. To meet the needs of both low-dose and noncontrast screening, we collect a large-scale dataset of 12,852 and 4,029 nodules from low-dose and noncontrast CTs respectively, each with pathology- or follow-up-confirmed labels. Experiments on several datasets demonstrate that our method achieves advanced screening performance on both low-dose and noncontrast scenarios. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: MICCAI 2023

arXiv:2306.09116 [pdf, other]

Accurate Airway Tree Segmentation in CT Scans via Anatomy-aware Multi-class Segmentation and Topology-guided Iterative Learning

Authors: Puyang Wang, Dazhou Guo, Dandan Zheng, Minghui Zhang, Haogang Yu, Xin Sun, Jia Ge, Yun Gu, Le Lu, Xianghua Ye, Dakai **

Abstract: Intrathoracic airway segmentation in computed tomography (CT) is a prerequisite for various respiratory disease analyses such as chronic obstructive pulmonary disease (COPD), asthma and lung cancer. Unlike other organs with simpler shapes or topology, the airway's complex tree structure imposes an unbearable burden to generate the "ground truth" label (up to 7 or 3 hours of manual or semi-automati… ▽ More Intrathoracic airway segmentation in computed tomography (CT) is a prerequisite for various respiratory disease analyses such as chronic obstructive pulmonary disease (COPD), asthma and lung cancer. Unlike other organs with simpler shapes or topology, the airway's complex tree structure imposes an unbearable burden to generate the "ground truth" label (up to 7 or 3 hours of manual or semi-automatic annotation on each case). Most of the existing airway datasets are incompletely labeled/annotated, thus limiting the completeness of computer-segmented airway. In this paper, we propose a new anatomy-aware multi-class airway segmentation method enhanced by topology-guided iterative self-learning. Based on the natural airway anatomy, we formulate a simple yet highly effective anatomy-aware multi-class segmentation task to intuitively handle the severe intra-class imbalance of the airway. To solve the incomplete labeling issue, we propose a tailored self-iterative learning scheme to segment toward the complete airway tree. For generating pseudo-labels to achieve higher sensitivity , we introduce a novel breakage attention map and design a topology-guided pseudo-label refinement method by iteratively connecting breaking branches commonly existed from initial pseudo-labels. Extensive experiments have been conducted on four datasets including two public challenges. The proposed method ranked 1st in both EXACT'09 challenge using average score and ATM'22 challenge on weighted average score. In a public BAS dataset and a private lung cancer dataset, our method significantly improves previous leading approaches by extracting at least (absolute) 7.5% more detected tree length and 4.0% more tree branches, while maintaining similar precision. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.02245 [pdf, other]

doi 10.1007/s11432-023-3943-6

SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model

Authors: Dingyuan Zhang, Dingkang Liang, Hongcheng Yang, Zhikang Zou, Xiaoqing Ye, Zhe Liu, Xiang Bai

Abstract: With the development of large language models, many remarkable linguistic systems like ChatGPT have thrived and achieved astonishing success on many tasks, showing the incredible power of foundation models. In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently a… ▽ More With the development of large language models, many remarkable linguistic systems like ChatGPT have thrived and achieved astonishing success on many tasks, showing the incredible power of foundation models. In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently and presents strong zero-shot ability on many downstream 2D tasks. However, whether SAM can be adapted to 3D vision tasks has yet to be explored, especially 3D object detection. With this inspiration, we explore adapting the zero-shot ability of SAM to 3D object detection in this paper. We propose a SAM-powered BEV processing pipeline to detect objects and get promising results on the large-scale Waymo open dataset. As an early attempt, our method takes a step toward 3D object detection with vision foundation models and presents the opportunity to unleash their power on 3D vision tasks. The code is released at https://github.com/DYZhang09/SAM3D. △ Less

Submitted 29 January, 2024; v1 submitted 3 June, 2023; originally announced June 2023.

Comments: Accepted by Science China Information Sciences (SCIS)

arXiv:2301.12291 [pdf, other]

CancerUniT: Towards a Single Unified Model for Effective Detection, Segmentation, and Diagnosis of Eight Major Cancers Using a Large Collection of CT Scans

Authors: Jieneng Chen, Yingda Xia, Jiawen Yao, Ke Yan, Jianpeng Zhang, Le Lu, Fakai Wang, Bo Zhou, Mingyan Qiu, Qihang Yu, Mingze Yuan, Wei Fang, Yuxing Tang, Minfeng Xu, Jian Zhou, Yuqian Zhao, Qifeng Wang, Xianghua Ye, Xiaoli Yin, Yu Shi, Xin Chen, **gren Zhou, Alan Yuille, Zaiyi Liu, Ling Zhang

Abstract: Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice, while most medical AI systems are built to focus on single organs with a narrow list of a few diseases. This might severely limit AI's clinical adoption. A certain number of AI models need to be assembled non-trivially to match the diagnostic process of a human reading… ▽ More Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice, while most medical AI systems are built to focus on single organs with a narrow list of a few diseases. This might severely limit AI's clinical adoption. A certain number of AI models need to be assembled non-trivially to match the diagnostic process of a human reading a CT scan. In this paper, we construct a Unified Tumor Transformer (CancerUniT) model to jointly detect tumor existence & location and diagnose tumor characteristics for eight major cancers in CT scans. CancerUniT is a query-based Mask Transformer model with the output of multi-tumor prediction. We decouple the object queries into organ queries, tumor detection queries and tumor diagnosis queries, and further establish hierarchical relationships among the three groups. This clinically-inspired architecture effectively assists inter- and intra-organ representation learning of tumors and facilitates the resolution of these complex, anatomically related multi-organ cancer image reading tasks. CancerUniT is trained end-to-end using a curated large-scale CT images of 10,042 patients including eight major types of cancers and occurring non-cancer tumors (all are pathology-confirmed with 3D tumor masks annotated by radiologists). On the test set of 631 patients, CancerUniT has demonstrated strong performance under a set of clinically relevant evaluation metrics, substantially outperforming both multi-disease methods and an assembly of eight single-organ expert models in tumor detection, segmentation, and diagnosis. This moves one step closer towards a universal high performance cancer screening tool. △ Less

Submitted 6 October, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

Comments: ICCV 2023 Camera Ready Version

arXiv:2211.02538 [pdf, other]

An information theoretic vulnerability metric for data integrity attacks on smart grids

Authors: Xiuzhen Ye, Iñaki Esnaola, Samir M. Perlaza, Robert F. Harrison

Abstract: A novel metric that describes the vulnerability of the measurements in power systems to data integrity attacks is proposed. The new metric, coined vulnerability index (VuIx), leverages information theoretic measures to assess the attack effect on the fundamental limits of the disruption and detection tradeoff. The result of computing the VuIx of the measurements in the system yields an ordering of… ▽ More A novel metric that describes the vulnerability of the measurements in power systems to data integrity attacks is proposed. The new metric, coined vulnerability index (VuIx), leverages information theoretic measures to assess the attack effect on the fundamental limits of the disruption and detection tradeoff. The result of computing the VuIx of the measurements in the system yields an ordering of their vulnerability based on the level of exposure to data integrity attacks. This new framework is used to assess the measurement vulnerability of IEEE 9-bus and 30-bus test systems and it is observed that power injection measurements are overwhelmingly more vulnerable to data integrity attacks than power flow measurements. A detailed numerical evaluation of the VuIx values for IEEE test systems is provided. △ Less

Submitted 4 November, 2022; originally announced November 2022.

Comments: 7 pages, 10 figures, submitted to IET Smart Grid. arXiv admin note: substantial text overlap with arXiv:2207.06973

arXiv:2211.02301 [pdf, other]

Binaural Rendering of Ambisonic Signals by Neural Networks

Authors: Yin Zhu, Qiuqiang Kong, Junjie Shi, Shilei Liu, Xuzhou Ye, Ju-chiang Wang, Jun** Zhang

Abstract: Binaural rendering of ambisonic signals is of broad interest to virtual reality and immersive media. Conventional methods often require manually measured Head-Related Transfer Functions (HRTFs). To address this issue, we collect a paired ambisonic-binaural dataset and propose a deep learning framework in an end-to-end manner. Experimental results show that neural networks outperform the convention… ▽ More Binaural rendering of ambisonic signals is of broad interest to virtual reality and immersive media. Conventional methods often require manually measured Head-Related Transfer Functions (HRTFs). To address this issue, we collect a paired ambisonic-binaural dataset and propose a deep learning framework in an end-to-end manner. Experimental results show that neural networks outperform the conventional method in objective metrics and achieve comparable subjective metrics. To validate the proposed framework, we experimentally explore different settings of the input features, model structures, output features, and loss functions. Our proposed system achieves an SDR of 7.32 and MOSs of 3.83, 3.58, 3.87, 3.58 in quality, timbre, localization, and immersion dimensions. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2210.12345 [pdf, other]

Neural Sound Field Decomposition with Super-resolution of Sound Direction

Authors: Qiuqiang Kong, Shilei Liu, Junjie Shi, Xuzhou Ye, Yin Cao, Qiaoxi Zhu, Yong Xu, Yuxuan Wang

Abstract: Sound field decomposition predicts waveforms in arbitrary directions using signals from a limited number of microphones as inputs. Sound field decomposition is fundamental to downstream tasks, including source localization, source separation, and spatial audio reproduction. Conventional sound field decomposition methods such as Ambisonics have limited spatial decomposition resolution. This paper p… ▽ More Sound field decomposition predicts waveforms in arbitrary directions using signals from a limited number of microphones as inputs. Sound field decomposition is fundamental to downstream tasks, including source localization, source separation, and spatial audio reproduction. Conventional sound field decomposition methods such as Ambisonics have limited spatial decomposition resolution. This paper proposes a learning-based Neural Sound field Decomposition (NeSD) framework to allow sound field decomposition with fine spatial direction resolution, using recordings from microphone capsules of a few microphones at arbitrary positions. The inputs of a NeSD system include microphone signals, microphone positions, and queried directions. The outputs of a NeSD include the waveform and the presence probability of a queried position. We model the NeSD systems respectively with different neural networks, including fully connected, time delay, and recurrent neural networks. We show that the NeSD systems outperform conventional Ambisonics and DOANet methods in sound field decomposition and source localization on speech, music, and sound events datasets. Demos are available at https://www.youtube.com/watch?v=0GIr6doj3BQ. △ Less

Submitted 22 October, 2022; originally announced October 2022.

Comments: 12 pages

arXiv:2210.05104 [pdf, other]

doi 10.1016/j.compbiomed.2022.106153

3D Matting: A Benchmark Study on Soft Segmentation Method for Pulmonary Nodules Applied in Computed Tomography

Authors: Lin Wang, Xiufen Ye, Donghao Zhang, Wanji He, Lie Ju, Yi Luo, Huan Luo, Xin Wang, Wei Feng, Kaimin Song, Xin Zhao, Zongyuan Ge

Abstract: Usually, lesions are not isolated but are associated with the surrounding tissues. For example, the growth of a tumour can depend on or infiltrate into the surrounding tissues. Due to the pathological nature of the lesions, it is challenging to distinguish their boundaries in medical imaging. However, these uncertain regions may contain diagnostic information. Therefore, the simple binarization of… ▽ More Usually, lesions are not isolated but are associated with the surrounding tissues. For example, the growth of a tumour can depend on or infiltrate into the surrounding tissues. Due to the pathological nature of the lesions, it is challenging to distinguish their boundaries in medical imaging. However, these uncertain regions may contain diagnostic information. Therefore, the simple binarization of lesions by traditional binary segmentation can result in the loss of diagnostic information. In this work, we introduce the image matting into the 3D scenes and use the alpha matte, i.e., a soft mask, to describe lesions in a 3D medical image. The traditional soft mask acted as a training trick to compensate for the easily mislabelled or under-labelled ambiguous regions. In contrast, 3D matting uses soft segmentation to characterize the uncertain regions more finely, which means that it retains more structural information for subsequent diagnosis and treatment. The current study of image matting methods in 3D is limited. To address this issue, we conduct a comprehensive study of 3D matting, including both traditional and deep-learning-based methods. We adapt four state-of-the-art 2D image matting algorithms to 3D scenes and further customize the methods for CT images to calibrate the alpha matte with the radiodensity. Moreover, we propose the first end-to-end deep 3D matting network and implement a solid 3D medical image matting benchmark. Its efficient counterparts are also proposed to achieve a good performance-computation balance. Furthermore, there is no high-quality annotated dataset related to 3D matting, slowing down the development of data-driven deep-learning-based methods. To address this issue, we construct the first 3D medical matting dataset. The validity of the dataset was verified through clinicians' assessments and downstream experiments. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: Accepted by Computers in Biology and Medicine. arXiv admin note: substantial text overlap with arXiv:2209.07843

arXiv:2209.07843 [pdf, other]

3D Matting: A Soft Segmentation Method Applied in Computed Tomography

Authors: Lin Wang, Xiufen Ye, Donghao Zhang, Wanji He, Lie Ju, Xin Wang, Wei Feng, Kaimin Song, Xin Zhao, Zongyuan Ge

Abstract: Three-dimensional (3D) images, such as CT, MRI, and PET, are common in medical imaging applications and important in clinical diagnosis. Semantic ambiguity is a typical feature of many medical image labels. It can be caused by many factors, such as the imaging properties, pathological anatomy, and the weak representation of the binary masks, which brings challenges to accurate 3D segmentation. In… ▽ More Three-dimensional (3D) images, such as CT, MRI, and PET, are common in medical imaging applications and important in clinical diagnosis. Semantic ambiguity is a typical feature of many medical image labels. It can be caused by many factors, such as the imaging properties, pathological anatomy, and the weak representation of the binary masks, which brings challenges to accurate 3D segmentation. In 2D medical images, using soft masks instead of binary masks generated by image matting to characterize lesions can provide rich semantic information, describe the structural characteristics of lesions more comprehensively, and thus benefit the subsequent diagnoses and analyses. In this work, we introduce image matting into the 3D scenes to describe the lesions in 3D medical images. The study of image matting in 3D modality is limited, and there is no high-quality annotated dataset related to 3D matting, therefore slowing down the development of data-driven deep-learning-based methods. To address this issue, we constructed the first 3D medical matting dataset and convincingly verified the validity of the dataset through quality control and downstream experiments in lung nodules classification. We then adapt the four selected state-of-the-art 2D image matting algorithms to 3D scenes and further customize the methods for CT images. Also, we propose the first end-to-end deep 3D matting network and implement a solid 3D medical image matting benchmark, which will be released to encourage further research. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Comments: 12 pages, 7 figures

arXiv:2208.10944 [pdf, ps, other]

doi 10.1109/TMTT.2022.3220252

Multi-Resolution Subspace-Based Optimization Method for the Retrieval of 2D Perfect Electric Conductors

Authors: Xiuzhu Ye, Francesco Zardi, Marco Salucci, Andrea Massa

Abstract: Perfect Electric Conductors (PECs) are imaged integrating the subspace-based optimizationmethod (SOM) within the iterative multi-scaling scheme (IMSA). Without a-priori information on the number or/and the locations of the scatterers and modelling their EM scattering interactions with a (known) probing source in terms of surface electric field integral equations, a segment-based representation of… ▽ More Perfect Electric Conductors (PECs) are imaged integrating the subspace-based optimizationmethod (SOM) within the iterative multi-scaling scheme (IMSA). Without a-priori information on the number or/and the locations of the scatterers and modelling their EM scattering interactions with a (known) probing source in terms of surface electric field integral equations, a segment-based representation of PECs is retrieved from the scattered field samples. The proposed IMSA-SOM inversion method is validated against both synthetic and experimental data by assessing the reconstruction accuracy, the robustness to the noise, and the computational efficiency with some comparisons, as well. △ Less

Submitted 23 August, 2022; originally announced August 2022.

arXiv:2207.06973 [pdf, ps, other]

Power Injection Measurements are more Vulnerable to Data Integrity Attacks than Power Flow Measurements

Authors: Xiuzhen Ye, Iñaki Esnaola, Samir M. Perlaza, Robert F. Harrison

Abstract: A novel metric that describes the vulnerability of the measurements in power system to data integrity attacks is proposed. The new metric, coined vulnerability index (VuIx), leverages information theoretic measures to assess the attack effect on the fundamental limits of the disruption and detection tradeoff. The result of computing the VuIx of the measurements in the system yields an ordering of… ▽ More A novel metric that describes the vulnerability of the measurements in power system to data integrity attacks is proposed. The new metric, coined vulnerability index (VuIx), leverages information theoretic measures to assess the attack effect on the fundamental limits of the disruption and detection tradeoff. The result of computing the VuIx of the measurements in the system yields an ordering of the measurements vulnerability based on the level of exposure to data integrity attacks. This new framework is used to assess the measurements vulnerability of IEEE test systems and it is observed that power injection measurements are overwhelmingly more vulnerable to data integrity attacks than power flow measurements. A detailed numerical evaluation of the VuIx values for IEEE test systems is provided. △ Less

Submitted 14 July, 2022; originally announced July 2022.

Comments: 6 pages, 9 figures, Submitted to IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids

arXiv:2206.13348 [pdf]

A Unified Initial Alignment Method of SINS Based on FGO

Authors: Hanwen Zhou, Xiufen Ye

Abstract: The initial alignment provides an accurate attitude for SINS (strapdown inertial navigation system). By further estimating the IMU's bias and misalignment angle, the recursive Bayesian filter is accurate. However, the prior heading error has significant influence on the convergence speed and accuracy. In addition, the accuracy will be limited by its iteration at a single time-step. Coarse alignmen… ▽ More The initial alignment provides an accurate attitude for SINS (strapdown inertial navigation system). By further estimating the IMU's bias and misalignment angle, the recursive Bayesian filter is accurate. However, the prior heading error has significant influence on the convergence speed and accuracy. In addition, the accuracy will be limited by its iteration at a single time-step. Coarse alignment method OBA (optimization-based alignment) uses MLE (maximum likelihood estimation) to find the optimal attitude quickly. However, few methods consider the IMU bias and misalignment angle, which will reduce the attitude accuracy. In this paper, a unified method based on FGO (Factor graph optimization) and IBF (inertial base frame) is proposed. The attitude is estimated by MLE, IMU bias and misalignment angle are estimated by MAP estimation. The state of all time steps is optimized together to further improve the accuracy. Physical experiments on the rotation MEMS SINS show that the heading accuracy of this method is improved in limited alignment time. △ Less

Submitted 6 June, 2023; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: 8 pages, This article has been accepted for publication in IEEE Transactions on Industrial Electronics

arXiv:2205.04239 [pdf, ps, other]

Distributed and Joint Optimization of Precoding and Power for User-Centric Cell-Free Massive MIMO

Authors: Hongkang Yu, Xinquan Ye, Yijian Chen

Abstract: In the cell-free massive multiple-input multiple-output (CF mMIMO) system, the centralized transmission scheme is widely adopted to manage the inter-user interference. Unfortunately, its implementation is limited by the extensive signaling overhead between the central process unit (CPU) and the access points (APs). To solve this problem, we propose a distributed downlink transmission scheme in thi… ▽ More In the cell-free massive multiple-input multiple-output (CF mMIMO) system, the centralized transmission scheme is widely adopted to manage the inter-user interference. Unfortunately, its implementation is limited by the extensive signaling overhead between the central process unit (CPU) and the access points (APs). To solve this problem, we propose a distributed downlink transmission scheme in this letter. First, the null space-based precoding is used to cancel the interference to partial users, where only a portion of channel state information (CSI) needs to be shared among the AP cluster. Based on this, the dual decomposition method is adopted to jointly optimize the precoder and power control, where the calculation can be performed independently by each AP cluster with closed-form expression. With very few iterations, our distributed scheme achieves the same performance as the centralized one. Moreover, it significantly reduces the information exchange to the CPU. △ Less

Submitted 30 July, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

arXiv:2204.03804 [pdf, other]

A Learnable Variational Model for Joint Multimodal MRI Reconstruction and Synthesis

Authors: Wanyu Bian, Qingchao Zhang, Xiao**g Ye, Yunmei Chen

Abstract: Generating multi-contrasts/modal MRI of the same anatomy enriches diagnostic information but is limited in practice due to excessive data acquisition time. In this paper, we propose a novel deep-learning model for joint reconstruction and synthesis of multi-modal MRI using incomplete k-space data of several source modalities as inputs. The output of our model includes reconstructed images of the s… ▽ More Generating multi-contrasts/modal MRI of the same anatomy enriches diagnostic information but is limited in practice due to excessive data acquisition time. In this paper, we propose a novel deep-learning model for joint reconstruction and synthesis of multi-modal MRI using incomplete k-space data of several source modalities as inputs. The output of our model includes reconstructed images of the source modalities and high-quality image synthesized in the target modality. Our proposed model is formulated as a variational problem that leverages several learnable modality-specific feature extractors and a multimodal synthesis module. We propose a learnable optimization algorithm to solve this model, which induces a multi-phase network whose parameters can be trained using multi-modal MRI data. Moreover, a bilevel-optimization framework is employed for robust parameter training. We demonstrate the effectiveness of our approach using extensive numerical experiments. △ Less

Submitted 28 June, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

Comments: Provisional Accepted by MICCAI2022

arXiv:2201.00065 [pdf, ps, other]

Stealth Data Injection Attacks with Sparsity Constraints

Authors: Xiuzhen Ye, Iñaki Esnaola, Samir M. Perlaza, Robert F. Harrison

Abstract: Sparse stealth attack constructions that minimize the mutual information between the state variables and the observations are proposed. The attack construction is formulated as the design of a multivariate Gaussian distribution that aims to minimize the mutual information while limiting the Kullback-Leibler divergence between the distribution of the observations under attack and the distribution o… ▽ More Sparse stealth attack constructions that minimize the mutual information between the state variables and the observations are proposed. The attack construction is formulated as the design of a multivariate Gaussian distribution that aims to minimize the mutual information while limiting the Kullback-Leibler divergence between the distribution of the observations under attack and the distribution of the observations without attack. The sparsity constraint is incorporated as a support constraint of the attack distribution. Two heuristic greedy algorithms for the attack construction are proposed. The first algorithm assumes that the attack vector consists of independent entries, and therefore, requires no communication between different attacked locations. The second algorithm considers correlation between the attack vector entries which results in better attack performance at the expense of coordination between different locations. We numerically evaluate the performance of the proposed attack constructions on IEEE test systems and show that it is feasible to construct stealth attacks that generate significant disruption with a low number of compromised sensors. △ Less

Submitted 23 July, 2022; v1 submitted 31 December, 2021; originally announced January 2022.

Comments: 10 pages, 6 figures, submited to IEEE Trans. Smart Grid

arXiv:2111.01544 [pdf]

Comprehensive and Clinically Accurate Head and Neck Organs at Risk Delineation via Stratified Deep Learning: A Large-scale Multi-Institutional Study

Authors: Dazhou Guo, Jia Ge, Xianghua Ye, Senxiang Yan, Yi Xin, Yuchen Song, Bing-shen Huang, Tsung-Min Hung, Zhuotun Zhu, Ling Peng, Yan** Ren, Rui Liu, Gong Zhang, Mengyuan Mao, Xiaohua Chen, Zhongjie Lu, Wenxiang Li, Yuzhen Chen, Lingyun Huang, **g Xiao, Adam P. Harrison, Le Lu, Chien-Yu Lin, Dakai **, Tsung-Ying Ho

Abstract: Accurate organ at risk (OAR) segmentation is critical to reduce the radiotherapy post-treatment complications. Consensus guidelines recommend a set of more than 40 OARs in the head and neck (H&N) region, however, due to the predictable prohibitive labor-cost of this task, most institutions choose a substantially simplified protocol by delineating a smaller subset of OARs and neglecting the dose di… ▽ More Accurate organ at risk (OAR) segmentation is critical to reduce the radiotherapy post-treatment complications. Consensus guidelines recommend a set of more than 40 OARs in the head and neck (H&N) region, however, due to the predictable prohibitive labor-cost of this task, most institutions choose a substantially simplified protocol by delineating a smaller subset of OARs and neglecting the dose distributions associated with other OARs. In this work we propose a novel, automated and highly effective stratified OAR segmentation (SOARS) system using deep learning to precisely delineate a comprehensive set of 42 H&N OARs. SOARS stratifies 42 OARs into anchor, mid-level, and small & hard subcategories, with specifically derived neural network architectures for each category by neural architecture search (NAS) principles. We built SOARS models using 176 training patients in an internal institution and independently evaluated on 1327 external patients across six different institutions. It consistently outperformed other state-of-the-art methods by at least 3-5% in Dice score for each institutional evaluation (up to 36% relative error reduction in other metrics). More importantly, extensive multi-user studies evidently demonstrated that 98% of the SOARS predictions need only very minor or no revisions for direct clinical acceptance (saving 90% radiation oncologists workload), and their segmentation and dosimetric accuracy are within or smaller than the inter-user variation. These findings confirmed the strong clinical applicability of SOARS for the OAR delineation process in H&N cancer radiotherapy workflows, with improved efficiency, comprehensiveness, and quality. △ Less

Submitted 1 November, 2021; originally announced November 2021.

arXiv:2109.11572 [pdf, other]

SAME: Deformable Image Registration based on Self-supervised Anatomical Embeddings

Authors: Fengze Liu, Ke Yan, Adam Harrison, Dazhou Guo, Le Lu, Alan Yuille, Lingyun Huang, Guotong Xie, **g Xiao, Xianghua Ye, Dakai **

Abstract: In this work, we introduce a fast and accurate method for unsupervised 3D medical image registration. This work is built on top of a recent algorithm SAM, which is capable of computing dense anatomical/semantic correspondences between two images at the pixel level. Our method is named SAME, which breaks down image registration into three steps: affine transformation, coarse deformation, and deep d… ▽ More In this work, we introduce a fast and accurate method for unsupervised 3D medical image registration. This work is built on top of a recent algorithm SAM, which is capable of computing dense anatomical/semantic correspondences between two images at the pixel level. Our method is named SAME, which breaks down image registration into three steps: affine transformation, coarse deformation, and deep deformable registration. Using SAM embeddings, we enhance these steps by finding more coherent correspondences, and providing features and a loss function with better semantic guidance. We collect a multi-phase chest computed tomography dataset with 35 annotated organs for each patient and conduct inter-subject registration for quantitative evaluation. Results show that SAME outperforms widely-used traditional registration techniques (Elastix FFD, ANTs SyN) and learning based VoxelMorph method by at least 4.7% and 2.7% in Dice scores for two separate tasks of within-contrast-phase and across-contrast-phase registration, respectively. SAME achieves the comparable performance to the best traditional registration method, DEEDS (from our evaluation), while being orders of magnitude faster (from 45 seconds to 1.2 seconds). △ Less

Submitted 23 September, 2021; originally announced September 2021.

arXiv:2109.09738 [pdf, other]

An Optimal Control Framework for Joint-channel Parallel MRI Reconstruction without Coil Sensitivities

Authors: Wanyu Bian, Yunmei Chen, Xiao**g Ye

Abstract: Goal: This work aims at develo** a novel calibration-free fast parallel MRI (pMRI) reconstruction method incorporate with discrete-time optimal control framework. The reconstruction model is designed to learn a regularization that combines channels and extracts features by leveraging the information sharing among channels of multi-coil images. We propose to recover both magnitude and phase infor… ▽ More Goal: This work aims at develo** a novel calibration-free fast parallel MRI (pMRI) reconstruction method incorporate with discrete-time optimal control framework. The reconstruction model is designed to learn a regularization that combines channels and extracts features by leveraging the information sharing among channels of multi-coil images. We propose to recover both magnitude and phase information by taking advantage of structured convolutional networks in image and Fourier spaces. Methods: We develop a novel variational model with a learnable objective function that integrates an adaptive multi-coil image combination operator and effective image regularization in the image and Fourier spaces. We cast the reconstruction network as a structured discrete-time optimal control system, resulting in an optimal control formulation of parameter training where the parameters of the objective function play the role of control variables. We demonstrate that the Lagrangian method for solving the control problem is equivalent to back-propagation, ensuring the local convergence of the training algorithm. Results: We conduct a large number of numerical experiments of the proposed method with comparisons to several state-of-the-art pMRI reconstruction networks on real pMRI datasets. The numerical results demonstrate the promising performance of the proposed method evidently. Conclusion: We conduct a large number of numerical experiments of the proposed method with comparisons to several state-of-the-art pMRI reconstruction networks on real pMRI datasets. The numerical results demonstrate the promising performance of the proposed method evidently. Significance: By learning multi-coil image combination operator and performing regularizations in both image domain and k-space domain, the proposed method achieves a highly efficient image reconstruction network for pMRI. △ Less

Submitted 23 January, 2022; v1 submitted 20 September, 2021; originally announced September 2021.

Comments: 13 pages

arXiv:2109.09271 [pdf, ps, other]

DeepStationing: Thoracic Lymph Node Station Parsing in CT Scans using Anatomical Context Encoding and Key Organ Auto-Search

Authors: Dazhou Guo, Xianghua Ye, Jia Ge, Xing Di, Le Lu, Lingyun Huang, Guotong Xie, **g Xiao, Zhongjie Liu, Ling Peng, Senxiang Yan, Dakai **

Abstract: Lymph node station (LNS) delineation from computed tomography (CT) scans is an indispensable step in radiation oncology workflow. High inter-user variabilities across oncologists and prohibitive laboring costs motivated the automated approach. Previous works exploit anatomical priors to infer LNS based on predefined ad-hoc margins. However, without voxel-level supervision, the performance is sever… ▽ More Lymph node station (LNS) delineation from computed tomography (CT) scans is an indispensable step in radiation oncology workflow. High inter-user variabilities across oncologists and prohibitive laboring costs motivated the automated approach. Previous works exploit anatomical priors to infer LNS based on predefined ad-hoc margins. However, without voxel-level supervision, the performance is severely limited. LNS is highly context-dependent - LNS boundaries are constrained by anatomical organs - we formulate it as a deep spatial and contextual parsing problem via encoded anatomical organs. This permits the deep network to better learn from both CT appearance and organ context. We develop a stratified referencing organ segmentation protocol that divides the organs into anchor and non-anchor categories and uses the former's predictions to guide the later segmentation. We further develop an auto-search module to identify the key organs that opt for the optimal LNS parsing performance. Extensive four-fold cross-validation experiments on a dataset of 98 esophageal cancer patients (with the most comprehensive set of 12 LNSs + 22 organs in thoracic region to date) are conducted. Our LNS parsing model produces significant performance improvements, with an average Dice score of 81.1% +/- 6.1%, which is 5.0% and 19.2% higher over the pure CT-based deep model and the previous representative approach, respectively. △ Less

Submitted 19 September, 2021; originally announced September 2021.

arXiv:2105.04719 [pdf, other]

Speech2Slot: An End-to-End Knowledge-based Slot Filling from Speech

Authors: Pengwei Wang, Xin Ye, Xiaohuan Zhou, **ghui Xie, Hao Wang

Abstract: In contrast to conventional pipeline Spoken Language Understanding (SLU) which consists of automatic speech recognition (ASR) and natural language understanding (NLU), end-to-end SLU infers the semantic meaning directly from speech and overcomes the error propagation caused by ASR. End-to-end slot filling (SF) from speech is an essential component of end-to-end SLU, and is usually regarded as a se… ▽ More In contrast to conventional pipeline Spoken Language Understanding (SLU) which consists of automatic speech recognition (ASR) and natural language understanding (NLU), end-to-end SLU infers the semantic meaning directly from speech and overcomes the error propagation caused by ASR. End-to-end slot filling (SF) from speech is an essential component of end-to-end SLU, and is usually regarded as a sequence-to-sequence generation problem, heavily relied on the performance of language model of ASR. However, it is hard to generate a correct slot when the slot is out-of-vovabulary (OOV) in training data, especially when a slot is an anti-linguistic entity without grammatical rule. Inspired by object detection in computer vision that is to detect the object from an image, we consider SF as the task of slot detection from speech. In this paper, we formulate the SF task as a matching task and propose an end-to-end knowledge-based SF model, named Speech-to-Slot (Speech2Slot), to leverage knowledge to detect the boundary of a slot from the speech. We also release a large-scale dataset of Chinese speech for slot filling, containing more than 830,000 samples. The experiments show that our approach is markedly superior to the conventional pipeline SLU approach, and outperforms the state-of-the-art end-to-end SF approach with 12.51% accuracy improvement. △ Less

Submitted 10 May, 2021; originally announced May 2021.

arXiv:2104.12939 [pdf, other]

Provably Convergent Learned Inexact Descent Algorithm for Low-Dose CT Reconstruction

Authors: Qingchao Zhang, Mehrdad Alvandipour, Wenjun Xia, Yi Zhang, Xiao**g Ye, Yunmei Chen

Abstract: We propose a provably convergent method, called Efficient Learned Descent Algorithm (ELDA), for low-dose CT (LDCT) reconstruction. ELDA is a highly interpretable neural network architecture with learned parameters and meanwhile retains convergence guarantee as classical optimization algorithms. To improve reconstruction quality, the proposed ELDA also employs a new non-local feature map** and an… ▽ More We propose a provably convergent method, called Efficient Learned Descent Algorithm (ELDA), for low-dose CT (LDCT) reconstruction. ELDA is a highly interpretable neural network architecture with learned parameters and meanwhile retains convergence guarantee as classical optimization algorithms. To improve reconstruction quality, the proposed ELDA also employs a new non-local feature map** and an associated regularizer. We compare ELDA with several state-of-the-art deep image methods, such as RED-CNN and Learned Primal-Dual, on a set of LDCT reconstruction problems. Numerical experiments demonstrate improvement of reconstruction quality using ELDA with merely 19 layers, suggesting the promising performance of ELDA in solution accuracy and parameter efficiency. △ Less

Submitted 26 April, 2021; originally announced April 2021.

arXiv:2104.03540 [pdf, other]

doi 10.1109/TVT.2022.3174404

Map-based Channel Modeling and Generation for U2V mmWave Communication

Authors: Qiuming Zhu, Kai Mao, Maozhong Song, Xiaomin Chen, Boyu Hua, Weizhi Zhong, Xijuan Ye

Abstract: Unmanned aerial vehicle (UAV) aided millimeter wave (mmWave) technologies have a promising prospect in the future communication networks. By considering the factors of three-dimensional (3D) scattering space, 3D trajectory, and 3D antenna array, a non-stationary channel model for UAV-to-vehicle (U2V) mmWave communications is proposed. The computation and generation methods of channel parameters in… ▽ More Unmanned aerial vehicle (UAV) aided millimeter wave (mmWave) technologies have a promising prospect in the future communication networks. By considering the factors of three-dimensional (3D) scattering space, 3D trajectory, and 3D antenna array, a non-stationary channel model for UAV-to-vehicle (U2V) mmWave communications is proposed. The computation and generation methods of channel parameters including interpath and intra-path are analyzed in detail. The inter-path parameters are calculated in a deterministic way, while the parameters of intra-path rays are generated in a stochastic way. The statistical properties are obtained by using a Gaussian mixture model (GMM) on the massive ray tracing (RT) data. Then, a modified method of equal areas (MMEA) is developed to generate the random intra-path variables. Meanwhile, to reduce the complexity of RT method, the 3D propagation space is reconstructed based on the user-defined digital map. The simulated and analyzed results show that the proposed model and generation method can reproduce non-stationary U2V channels in accord with U2V scenarios. The generated statistical properties are consistent with the theoretical and measured ones as well. △ Less

Submitted 8 April, 2021; originally announced April 2021.

Journal ref: in IEEE Transactions on Vehicular Technology, vol. 71, no. 8, pp. 8004-8015, Aug. 2022

arXiv:2011.15002 [pdf, other]

Image Quality Assessment for Perceptual Image Restoration: A New Dataset, Benchmark and Metric

Authors: **** Gu, Haoming Cai, Haoyu Chen, Xiaoxing Ye, Jimmy Ren, Chao Dong

Abstract: Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent perceptual IR algorithms based on generative adversarial networks (GANs) have brought in significant improvement on visual performance, but also pose great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality a… ▽ More Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent perceptual IR algorithms based on generative adversarial networks (GANs) have brought in significant improvement on visual performance, but also pose great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality and the evaluation results. We present two questions: Can existing IQA methods objectively evaluate recent IR algorithms? With the focus on beating current benchmarks, are we getting better IR algorithms? To answer the questions and promote the development of IQA methods, we contribute a large-scale IQA dataset, called Perceptual Image Processing ALgorithms (PIPAL) dataset. Especially, this dataset includes the results of GAN-based IR algorithms, which are missing in previous datasets. We collect more than 1.13 million human judgments to assign subjective scores for PIPAL images using the more reliable Elo system. Based on PIPAL, we present new benchmarks for both IQA and SR methods. Our results indicate that existing IQA methods cannot fairly evaluate GAN-based IR algorithms. While using appropriate evaluation methods is important, IQA methods should also be updated along with the development of IR algorithms. At last, we shed light on how to improve the IQA performance on GAN-based distortion. Inspired by the find that the existing IQA methods have an unsatisfactory performance on the GAN-based distortion partially because of their low tolerance to spatial misalignment, we propose to improve the performance of an IQA network on GAN-based distortion by explicitly considering this misalignment. We propose the Space War** Difference Network, which includes the novel l_2 pooling layers and Space War** Difference layers. Experiments demonstrate the effectiveness of the proposed method. △ Less

Submitted 30 November, 2020; originally announced November 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2007.12142

arXiv:2011.02840 [pdf]

doi 10.1007/978-3-030-72087-2_36

DR-Unet104 for Multimodal MRI brain tumor segmentation

Authors: Jordan Colman, Lei Zhang, Wenting Duan, Xujiong Ye

Abstract: In this paper we propose a 2D deep residual Unet with 104 convolutional layers (DR-Unet104) for lesion segmentation in brain MRIs. We make multiple additions to the Unet architecture, including adding the 'bottleneck' residual block to the Unet encoder and adding dropout after each convolution block stack. We verified the effect of introducing the regularisation of dropout with small rate (e.g. 0.… ▽ More In this paper we propose a 2D deep residual Unet with 104 convolutional layers (DR-Unet104) for lesion segmentation in brain MRIs. We make multiple additions to the Unet architecture, including adding the 'bottleneck' residual block to the Unet encoder and adding dropout after each convolution block stack. We verified the effect of introducing the regularisation of dropout with small rate (e.g. 0.2) on the architecture, and found a dropout of 0.2 improved the overall performance compared to no dropout, or a dropout of 0.5. We evaluated the proposed architecture as part of the Multimodal Brain Tumor Segmentation (BraTS) 2020 Challenge and compared our method to DeepLabV3+ with a ResNet-V2-152 backbone. We found that the DR-Unet104 achieved a mean dice score coefficient of 0.8862, 0.6756 and 0.6721 for validation data, whole tumor, enhancing tumor and tumor core respectively, an overall improvement on 0.8770, 0.65242 and 0.68134 achieved by DeepLabV3+. Our method produced a final mean DSC of 0.8673, 0.7514 and 0.7983 on whole tumor, enhancing tumor and tumor core on the challenge's testing data. We produced a competitive lesion segmentation architecture, despite only 2D convolutions, having the added benefit that it can be used on lower power computers than a 3D architecture. The source code and trained model for this work is openly available at https://github.com/jordan-colman/DR-Unet104. △ Less

Submitted 4 May, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

Comments: Part of the Multimodal Brain Tumor Segmentation 2020 Challenge conference proceedings

Journal ref: BrainLes 2020. Lecture Notes in Computer Science, vol 12659, pp 410-419

arXiv:2008.11870 [pdf, other]

Lymph Node Gross Tumor Volume Detection and Segmentation via Distance-based Gating using 3D CT/PET Imaging in Radiotherapy

Authors: Zhuotun Zhu, Dakai **, Ke Yan, Tsung-Ying Ho, Xianghua Ye, Dazhou Guo, Chun-Hung Chao, **g Xiao, Alan Yuille, Le Lu

Abstract: Finding, identifying and segmenting suspicious cancer metastasized lymph nodes from 3D multi-modality imaging is a clinical task of paramount importance. In radiotherapy, they are referred to as Lymph Node Gross Tumor Volume (GTVLN). Determining and delineating the spread of GTVLN is essential in defining the corresponding resection and irradiating regions for the downstream workflows of surgical… ▽ More Finding, identifying and segmenting suspicious cancer metastasized lymph nodes from 3D multi-modality imaging is a clinical task of paramount importance. In radiotherapy, they are referred to as Lymph Node Gross Tumor Volume (GTVLN). Determining and delineating the spread of GTVLN is essential in defining the corresponding resection and irradiating regions for the downstream workflows of surgical resection and radiotherapy of various cancers. In this work, we propose an effective distance-based gating approach to simulate and simplify the high-level reasoning protocols conducted by radiation oncologists, in a divide-and-conquer manner. GTVLN is divided into two subgroups of tumor-proximal and tumor-distal, respectively, by means of binary or soft distance gating. This is motivated by the observation that each category can have distinct though overlap** distributions of appearance, size and other LN characteristics. A novel multi-branch detection-by-segmentation network is trained with each branch specializing on learning one GTVLN category features, and outputs from multi-branch are fused in inference. The proposed method is evaluated on an in-house dataset of $141$ esophageal cancer patients with both PET and CT imaging modalities. Our results validate significant improvements on the mean recall from $72.5\%$ to $78.2\%$, as compared to previous state-of-the-art work. The highest achieved GTVLN recall of $82.5\%$ at $20\%$ precision is clinically relevant and valuable since human observers tend to have low sensitivity (around $80\%$ for the most experienced radiation oncologists, as reported by literature). △ Less

Submitted 26 August, 2020; originally announced August 2020.

Comments: MICCAI2020

arXiv:2008.01410 [pdf, other]

Deep Parallel MRI Reconstruction Network Without Coil Sensitivities

Authors: Wanyu Bian, Yunmei Chen, Xiao**g Ye

Abstract: We propose a novel deep neural network architecture by map** the robust proximal gradient scheme for fast image reconstruction in parallel MRI (pMRI) with regularization function trained from data. The proposed network learns to adaptively combine the multi-coil images from incomplete pMRI data into a single image with homogeneous contrast, which is then passed to a nonlinear encoder to efficien… ▽ More We propose a novel deep neural network architecture by map** the robust proximal gradient scheme for fast image reconstruction in parallel MRI (pMRI) with regularization function trained from data. The proposed network learns to adaptively combine the multi-coil images from incomplete pMRI data into a single image with homogeneous contrast, which is then passed to a nonlinear encoder to efficiently extract sparse features of the image. Unlike most of existing deep image reconstruction networks, our network does not require knowledge of sensitivity maps, which can be difficult to estimate accurately, and have been a major bottleneck of image reconstruction in real-world pMRI applications. The experimental results demonstrate the promising performance of our method on a variety of pMRI imaging data sets. △ Less

Submitted 18 August, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

Comments: Accepted by MICCAI international workshop MLMIR 2020

arXiv:2008.00901 [pdf, other]

Automated Segmentation of Brain Gray Matter Nuclei on Quantitative Susceptibility Map** Using Deep Convolutional Neural Network

Authors: Chao Chai, Pengchong Qiao, Bin Zhao, Huiying Wang, Guohua Liu, Hong Wu, E Mark Haacke, Wen Shen, Chen Cao, Xinchen Ye, Zhiyang Liu, Shuang Xia

Abstract: Abnormal iron accumulation in the brain subcortical nuclei has been reported to be correlated to various neurodegenerative diseases, which can be measured through the magnetic susceptibility from the quantitative susceptibility map** (QSM). To quantitively measure the magnetic susceptibility, the nuclei should be accurately segmented, which is a tedious task for clinicians. In this paper, we pro… ▽ More Abnormal iron accumulation in the brain subcortical nuclei has been reported to be correlated to various neurodegenerative diseases, which can be measured through the magnetic susceptibility from the quantitative susceptibility map** (QSM). To quantitively measure the magnetic susceptibility, the nuclei should be accurately segmented, which is a tedious task for clinicians. In this paper, we proposed a double-branch residual-structured U-Net (DB-ResUNet) based on 3D convolutional neural network (CNN) to automatically segment such brain gray matter nuclei. To better tradeoff between segmentation accuracy and the memory efficiency, the proposed DB-ResUNet fed image patches with high resolution and the patches with low resolution but larger field of view into the local and global branches, respectively. Experimental results revealed that by jointly using QSM and T$_\text{1}$ weighted imaging (T$_\text{1}$WI) as inputs, the proposed method was able to achieve better segmentation accuracy over its single-branch counterpart, as well as the conventional atlas-based method and the classical 3D-UNet structure. The susceptibility values and the volumes were also measured, which indicated that the measurements from the proposed DB-ResUNet are able to present high correlation with values from the manually annotated regions of interest. △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: submitted to IEEE Transactions on Medical Imaging

arXiv:2007.12142 [pdf, other]

PIPAL: a Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration

Authors: **** Gu, Haoming Cai, Haoyu Chen, Xiaoxing Ye, Jimmy Ren, Chao Dong

Abstract: Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent IR methods based on Generative Adversarial Networks (GANs) have achieved significant improvement in visual performance, but also presented great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality and the eval… ▽ More Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent IR methods based on Generative Adversarial Networks (GANs) have achieved significant improvement in visual performance, but also presented great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality and the evaluation results. Then we raise two questions: (1) Can existing IQA methods objectively evaluate recent IR algorithms? (2) When focus on beating current benchmarks, are we getting better IR algorithms? To answer these questions and promote the development of IQA methods, we contribute a large-scale IQA dataset, called Perceptual Image Processing Algorithms (PIPAL) dataset. Especially, this dataset includes the results of GAN-based methods, which are missing in previous datasets. We collect more than 1.13 million human judgments to assign subjective scores for PIPAL images using the more reliable "Elo system". Based on PIPAL, we present new benchmarks for both IQA and super-resolution methods. Our results indicate that existing IQA methods cannot fairly evaluate GAN-based IR algorithms. While using appropriate evaluation methods is important, IQA methods should also be updated along with the development of IR algorithms. At last, we improve the performance of IQA networks on GAN-based distortions by introducing anti-aliasing pooling. Experiments show the effectiveness of the proposed method. △ Less

Submitted 26 September, 2020; v1 submitted 23 July, 2020; originally announced July 2020.

Comments: This paper has been accepted for publication at ECCV2020

arXiv:2007.02764 [pdf, ps, other]

Information Theoretic Data Injection Attacks with Sparsity Constraints

Authors: Xiuzhen Ye, Iñaki Esnaola, Samir M. Perlaza, Robert F. Harrison

Abstract: Information theoretic sparse attacks that minimize simultaneously the information obtained by the operator and the probability of detection are studied in a Bayesian state estimation setting. The attack construction is formulated as an optimization problem that aims to minimize the mutual information between the state variables and the observations while guaranteeing the stealth of the attack. Ste… ▽ More Information theoretic sparse attacks that minimize simultaneously the information obtained by the operator and the probability of detection are studied in a Bayesian state estimation setting. The attack construction is formulated as an optimization problem that aims to minimize the mutual information between the state variables and the observations while guaranteeing the stealth of the attack. Stealth is described in terms of the Kullback-Leibler (KL) divergence between the distributions of the observations under attack and without attack. To overcome the difficulty posed by the combinatorial nature of a sparse attack construction, the attack case in which only one sensor is compromised is analytically solved first. The insight generated in this case is then used to propose a greedy algorithm that constructs random sparse attacks. The performance of the proposed attack is evaluated in the IEEE 30 Bus Test Case. △ Less

Submitted 15 July, 2022; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: 6 pages, 3 figures, published in 2020 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm)

arXiv:2006.13890 [pdf, other]

doi 10.1007/978-3-030-59725-2_49

Learning Tumor Growth via Follow-Up Volume Prediction for Lung Nodules

Authors: Yamin Li, Jiancheng Yang, Yi Xu, **gwei Xu, Xiaodan Ye, Guangyu Tao, Xueqian Xie, Guixue Liu

Abstract: Follow-up serves an important role in the management of pulmonary nodules for lung cancer. Imaging diagnostic guidelines with expert consensus have been made to help radiologists make clinical decision for each patient. However, tumor growth is such a complicated process that it is difficult to stratify high-risk nodules from low-risk ones based on morphologic characteristics. On the other hand, r… ▽ More Follow-up serves an important role in the management of pulmonary nodules for lung cancer. Imaging diagnostic guidelines with expert consensus have been made to help radiologists make clinical decision for each patient. However, tumor growth is such a complicated process that it is difficult to stratify high-risk nodules from low-risk ones based on morphologic characteristics. On the other hand, recent deep learning studies using convolutional neural networks (CNNs) to predict the malignancy score of nodules, only provides clinicians with black-box predictions. To this end, we propose a unified framework, named Nodule Follow-Up Prediction Network (NoFoNet), which predicts the growth of pulmonary nodules with high-quality visual appearances and accurate quantitative results, given any time interval from baseline observations. It is achieved by predicting future displacement field of each voxel with a WarpNet. A TextureNet is further developed to refine textural details of WarpNet outputs. We also introduce techniques including Temporal Encoding Module and Warp Segmentation Loss to encourage time-aware and shape-aware representation learning. We build an in-house follow-up dataset from two medical centers to validate the effectiveness of the proposed method. NoFoNet significantly outperforms direct prediction by a U-Net in terms of visual quality; more importantly, it demonstrates accurate differentiating performance between high- and low-risk nodules. Our promising results suggest the potentials in computer aided intervention for lung nodule management. △ Less

Submitted 9 October, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

Comments: MICCAI 2020

arXiv:2006.04356 [pdf, other]

Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection

Authors: Liang Du, Xiaoqing Ye, Xiao Tan, Jianfeng Feng, Zhenbo Xu, Errui Ding, Shilei Wen

Abstract: Object detection from 3D point clouds remains a challenging task, though recent studies pushed the envelope with the deep learning techniques. Owing to the severe spatial occlusion and inherent variance of point density with the distance to sensors, appearance of a same object varies a lot in point cloud data. Designing robust feature representation against such appearance changes is hence the key… ▽ More Object detection from 3D point clouds remains a challenging task, though recent studies pushed the envelope with the deep learning techniques. Owing to the severe spatial occlusion and inherent variance of point density with the distance to sensors, appearance of a same object varies a lot in point cloud data. Designing robust feature representation against such appearance changes is hence the key issue in a 3D object detection method. In this paper, we innovatively propose a domain adaptation like approach to enhance the robustness of the feature representation. More specifically, we bridge the gap between the perceptual domain where the feature comes from a real scene and the conceptual domain where the feature is extracted from an augmented scene consisting of non-occlusion point cloud rich of detailed information. This domain adaptation approach mimics the functionality of the human brain when proceeding object perception. Extensive experiments demonstrate that our simple yet effective approach fundamentally boosts the performance of 3D point cloud object detection and achieves the state-of-the-art results. △ Less

Submitted 8 June, 2020; originally announced June 2020.

Comments: 8 pages, 5 figures, CVPR 2020

arXiv:2004.12776 [pdf, other]

Boosting Connectivity in Retinal Vessel Segmentation via a Recursive Semantics-Guided Network

Authors: Rui Xu, Tiantian Liu, Xinchen Ye, Yen-Wei Chen

Abstract: Many deep learning based methods have been proposed for retinal vessel segmentation, however few of them focus on the connectivity of segmented vessels, which is quite important for a practical computer-aided diagnosis system on retinal images. In this paper, we propose an efficient network to address this problem. A U-shape network is enhanced by introducing a semantics-guided module, which integ… ▽ More Many deep learning based methods have been proposed for retinal vessel segmentation, however few of them focus on the connectivity of segmented vessels, which is quite important for a practical computer-aided diagnosis system on retinal images. In this paper, we propose an efficient network to address this problem. A U-shape network is enhanced by introducing a semantics-guided module, which integrates the enriched semantics information to shallow layers for guiding the network to explore more powerful features. Besides, a recursive refinement iteratively applies the same network over the previous segmentation results for progressively boosting the performance while increasing no extra network parameters. The carefully designed recursive semantics-guided network has been extensively evaluated on several public datasets. Experimental results have shown the efficiency of the proposed method. △ Less

Submitted 24 April, 2020; originally announced April 2020.

arXiv:2004.06047 [pdf]

doi 10.1109/JLT.2020.3000488

Microwave Photonic Imaging Radar with a Millimeter-level Resolution

Authors: Cong Ma, Yue Yang, Ce Liu, Beichen Fan, Xingwei Ye, Yamei Zhang, Xiangchuan Wang, Shilong Pan

Abstract: Microwave photonic radars enable fast or even real-time high-resolution imaging thanks to its broad bandwidth. Nevertheless, the frequency range of the radars usually overlaps with other existed radio-frequency (RF) applications, and only a centimeter-level imaging resolution has been reported, making them insufficient for civilian applications. Here, we propose a microwave photonic imaging radar… ▽ More Microwave photonic radars enable fast or even real-time high-resolution imaging thanks to its broad bandwidth. Nevertheless, the frequency range of the radars usually overlaps with other existed radio-frequency (RF) applications, and only a centimeter-level imaging resolution has been reported, making them insufficient for civilian applications. Here, we propose a microwave photonic imaging radar with a millimeter-level resolution by introducing a frequency-stepped chirp signal based on an optical frequency shifting loop. As compared with the conventional linear-frequency modulated (LFM) signal, the frequency-stepped chirp signal can bring the system excellent capability of anti-interference. In an experiment, a frequency-stepped chirp signal with a total bandwidth of 18.2 GHz (16.9 to 35.1 GHz) is generated. Postprocessing the radar echo, radar imaging with a two-dimensional imaging resolution of ~8.5 mm$\times$~8.3 mm is achieved. An auto-regressive algorithm is used to reconstruct the disturbed signal when a frequency interference exists, and the high-resolution imaging is sustained. △ Less

Submitted 9 April, 2020; originally announced April 2020.

arXiv:2003.07628 [pdf, other]

Automated Segmentation of Left Ventricle in 2D echocardiography using deep learning

Authors: Neda Azarmehr, Xujiong Ye, Faraz Janan, James P Howard, Darrel P Francis, Massoud Zolgharni

Abstract: Following the successful application of the U-Net to medical images, there have been different encoder-decoder models proposed as an improvement to the original U-Net for segmenting echocardiographic images. This study aims to examine the performance of the state-of-the-art proposed models as well as the original U-Net model by applying them to segment the endocardium of the Left Ventricle in 2D a… ▽ More Following the successful application of the U-Net to medical images, there have been different encoder-decoder models proposed as an improvement to the original U-Net for segmenting echocardiographic images. This study aims to examine the performance of the state-of-the-art proposed models as well as the original U-Net model by applying them to segment the endocardium of the Left Ventricle in 2D automatically. The prediction outputs of the models are used to evaluate the performance of the models by comparing the automated results against the expert annotations (gold standard). Our results reveal that the U-Net model outperforms other models by achieving an average Dice coefficient of 0.92$ \pm 0.05$, and Hausdorff distance of 3.97$ \pm 0.82$. △ Less

Submitted 17 March, 2020; originally announced March 2020.

Comments: 4 pages, 1 figure, Extended Abstract MIDL conference

Report number: MIDL/2019/ExtendedAbstract/Sye8klvmcN; MyUni-UID

Journal ref: MIDL/2019/ExtendedAbstract/Sye8klvmcN; MyUni-UID

arXiv:2001.01057 [pdf, other]

Pixel-Semantic Revise of Position Learning A One-Stage Object Detector with A Shared Encoder-Decoder

Authors: Qian Li, Nan Guo, Xiaochun Ye, Dongrui Fan, Zhimin Tang

Abstract: Recently, many methods have been proposed for object detection. They cannot detect objects by semantic features, adaptively. In this work, according to channel and spatial attention mechanisms, we mainly analyze that different methods detect objects adaptively. Some state-of-the-art detectors combine different feature pyramids with many mechanisms to enhance multi-level semantic information. Howev… ▽ More Recently, many methods have been proposed for object detection. They cannot detect objects by semantic features, adaptively. In this work, according to channel and spatial attention mechanisms, we mainly analyze that different methods detect objects adaptively. Some state-of-the-art detectors combine different feature pyramids with many mechanisms to enhance multi-level semantic information. However, they require more cost. This work addresses that by an anchor-free detector with shared encoder-decoder with attention mechanism, extracting shared features. We consider features of different levels from backbone (e.g., ResNet-50) as the basis features. Then, we feed the features into a simple module, followed by a detector header to detect objects. Meantime, we use the semantic features to revise geometric locations, and the detector is a pixel-semantic revising of position. More importantly, this work analyzes the impact of different pooling strategies (e.g., mean, maximum or minimum) on multi-scale objects, and finds the minimum pooling improve detection performance on small objects better. Compared with state-of-the-art MNC based on ResNet-101 for the standard MSCOCO 2014 baseline, our method improves detection AP of 3.8%. △ Less

Submitted 28 September, 2020; v1 submitted 4 January, 2020; originally announced January 2020.

Comments: Accepted by ICONIP2020(International Conference on Neural Information Processing)

Showing 1–50 of 52 results for author: Ye, X