Search | arXiv e-print repository

Du-IN: Discrete units-guided mask modeling for decoding speech from Intracranial Neural signals

Authors: Hui Zheng, Hai-Teng Wang, Wei-Bang Jiang, Zhong-Tao Chen, Li He, Pei-Yang Lin, Peng-Hu Wei, Guo-Guang Zhao, Yun-Zhe Liu

Abstract: Invasive brain-computer interfaces have garnered significant attention due to their high performance. The current intracranial stereoElectroEncephaloGraphy (sEEG) foundation models typically build univariate representations based on a single channel. Some of them further use Transformer to model the relationship among channels. However, due to the locality and specificity of brain computation, the… ▽ More Invasive brain-computer interfaces have garnered significant attention due to their high performance. The current intracranial stereoElectroEncephaloGraphy (sEEG) foundation models typically build univariate representations based on a single channel. Some of them further use Transformer to model the relationship among channels. However, due to the locality and specificity of brain computation, their performance on more difficult tasks, e.g., speech decoding, which demands intricate processing in specific brain regions, is yet to be fully investigated. We hypothesize that building multi-variate representations within certain brain regions can better capture the specific neural processing. To explore this hypothesis, we collect a well-annotated Chinese word-reading sEEG dataset, targeting language-related brain networks, over 12 subjects. Leveraging this benchmark dataset, we developed the Du-IN model that can extract contextual embeddings from specific brain regions through discrete codebook-guided mask modeling. Our model achieves SOTA performance on the downstream 61-word classification task, surpassing all baseline models. Model comparison and ablation analysis reveal that our design choices, including (i) multi-variate representation by fusing channels in vSMC and STG regions and (ii) self-supervision by discrete codebook-guided mask modeling, significantly contribute to these performances. Collectively, our approach, inspired by neuroscience findings, capitalizing on multi-variate neural representation from specific brain regions, is suitable for invasive brain modeling. It marks a promising neuro-inspired AI approach in BCI. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.00542 [pdf, other]

UWAFA-GAN: Ultra-Wide-Angle Fluorescein Angiography Transformation via Multi-scale Generation and Registration Enhancement

Authors: Ruiquan Ge, Zhaojie Fang, Pengxue Wei, Zhanghao Chen, Hongyang Jiang, Ahmed Elazab, Wangting Li, Xiang Wan, Shaochong Zhang, Changmiao Wang

Abstract: Fundus photography, in combination with the ultra-wide-angle fundus (UWF) techniques, becomes an indispensable diagnostic tool in clinical settings by offering a more comprehensive view of the retina. Nonetheless, UWF fluorescein angiography (UWF-FA) necessitates the administration of a fluorescent dye via injection into the patient's hand or elbow unlike UWF scanning laser ophthalmoscopy (UWF-SLO… ▽ More Fundus photography, in combination with the ultra-wide-angle fundus (UWF) techniques, becomes an indispensable diagnostic tool in clinical settings by offering a more comprehensive view of the retina. Nonetheless, UWF fluorescein angiography (UWF-FA) necessitates the administration of a fluorescent dye via injection into the patient's hand or elbow unlike UWF scanning laser ophthalmoscopy (UWF-SLO). To mitigate potential adverse effects associated with injections, researchers have proposed the development of cross-modality medical image generation algorithms capable of converting UWF-SLO images into their UWF-FA counterparts. Current image generation techniques applied to fundus photography encounter difficulties in producing high-resolution retinal images, particularly in capturing minute vascular lesions. To address these issues, we introduce a novel conditional generative adversarial network (UWAFA-GAN) to synthesize UWF-FA from UWF-SLO. This approach employs multi-scale generators and an attention transmit module to efficiently extract both global structures and local lesions. Additionally, to counteract the image blurriness issue that arises from training with misaligned data, a registration module is integrated within this framework. Our method performs non-trivially on inception scores and details generation. Clinical user studies further indicate that the UWF-FA images generated by UWAFA-GAN are clinically comparable to authentic images in terms of diagnostic reliability. Empirical evaluations on our proprietary UWF image datasets elucidate that UWAFA-GAN outperforms extant methodologies. The code is accessible at https://github.com/Tinysqua/UWAFA-GAN. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2403.11870 [pdf, other]

doi 10.1109/TGRS.2024.3378720

IDF-CR: Iterative Diffusion Process for Divide-and-Conquer Cloud Removal in Remote-sensing Images

Authors: Meilin Wang, Yexing Song, Pengxu Wei, Xiaoyu Xian, Yukai Shi, Liang Lin

Abstract: Deep learning technologies have demonstrated their effectiveness in removing cloud cover from optical remote-sensing images. Convolutional Neural Networks (CNNs) exert dominance in the cloud removal tasks. However, constrained by the inherent limitations of convolutional operations, CNNs can address only a modest fraction of cloud occlusion. In recent years, diffusion models have achieved state-of… ▽ More Deep learning technologies have demonstrated their effectiveness in removing cloud cover from optical remote-sensing images. Convolutional Neural Networks (CNNs) exert dominance in the cloud removal tasks. However, constrained by the inherent limitations of convolutional operations, CNNs can address only a modest fraction of cloud occlusion. In recent years, diffusion models have achieved state-of-the-art (SOTA) proficiency in image generation and reconstruction due to their formidable generative capabilities. Inspired by the rapid development of diffusion models, we first present an iterative diffusion process for cloud removal (IDF-CR), which exhibits a strong generative capabilities to achieve component divide-and-conquer cloud removal. IDF-CR consists of a pixel space cloud removal module (Pixel-CR) and a latent space iterative noise diffusion network (IND). Specifically, IDF-CR is divided into two-stage models that address pixel space and latent space. The two-stage model facilitates a strategic transition from preliminary cloud reduction to meticulous detail refinement. In the pixel space stage, Pixel-CR initiates the processing of cloudy images, yielding a suboptimal cloud removal prior to providing the diffusion model with prior cloud removal knowledge. In the latent space stage, the diffusion model transforms low-quality cloud removal into high-quality clean output. We refine the Stable Diffusion by implementing ControlNet. In addition, an unsupervised iterative noise refinement (INR) module is introduced for diffusion model to optimize the distribution of the predicted noise, thereby enhancing advanced detail recovery. Our model performs best with other SOTA methods, including image reconstruction and optical remote-sensing cloud removal on the optical remote-sensing datasets. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Accepted by IEEE TGRS, we first present an iterative diffusion process for cloud removal, the code is available at: https://github.com/SongYxing/IDF-CR

arXiv:2403.06579 [pdf, other]

Edge Information Hub: Orchestrating Satellites, UAVs, MEC, Sensing and Communications for 6G Closed-Loop Controls

Authors: Chengleyang Lei, Wei Feng, Peng Wei, Yunfei Chen, Ning Ge, Shiwen Mao

Abstract: An increasing number of field robots would be used for mission-critical tasks in remote or post-disaster areas. Due to usually-limited individual abilities, these robots require an edge information hub (EIH), which is capable of not only communications but also sensing and computing. Such EIH could be deployed on a flexibly-dispatched unmanned aerial vehicle (UAV). Different from traditional aeria… ▽ More An increasing number of field robots would be used for mission-critical tasks in remote or post-disaster areas. Due to usually-limited individual abilities, these robots require an edge information hub (EIH), which is capable of not only communications but also sensing and computing. Such EIH could be deployed on a flexibly-dispatched unmanned aerial vehicle (UAV). Different from traditional aerial base stations or mobile edge computing (MEC), the EIH would direct the operations of robots via sensing-communication-computing-control ($\textbf{SC}^3$) closed-loop orchestration. This paper aims to optimize the closed-loop control performance of multiple $\textbf{SC}^3$ loops, under the constraints of satellite-backhaul rate, computing capability, and on-board energy. Specifically, the linear quadratic regulator (LQR) control cost is used to measure the closed-loop utility, and a sum LQR cost minimization problem is formulated to jointly optimize the splitting of sensor data and allocation of communication and computing resources. We first derive the optimal splitting ratio of sensor data, and then recast the problem to a more tractable form. An iterative algorithm is finally proposed to provide a sub-optimal solution. Simulation results demonstrate the superiority of the proposed algorithm. We also uncover the influence of $\textbf{SC}^3$ parameters on closed-loop controls, highlighting more systematic understanding. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 13pages, 9 figures

arXiv:2310.18670 [pdf, other]

Two-stage space construction for real-time modeling of distributed parameter systems under sparse sensing

Authors: Peng Wei

Abstract: Numerous industrial processes can be defined using distributed parameter systems (DPSs). This study introduces a two-stage spatial construction approach for real-time modeling of DPSs in cases of limited sensors. Initially, a discrete space-completion approach is created to recuperate the spatiotemporal patterns of non-monitored locations under sparse sensing. The high-dimensional space constructi… ▽ More Numerous industrial processes can be defined using distributed parameter systems (DPSs). This study introduces a two-stage spatial construction approach for real-time modeling of DPSs in cases of limited sensors. Initially, a discrete space-completion approach is created to recuperate the spatiotemporal patterns of non-monitored locations under sparse sensing. The high-dimensional space construction method is employed to derive continuous spatial basis functions (SBFs). The identification and adjustment of the nonlinear temporal model are carried out via the long short-term memory (LSTM) neural network. Eventually, the amalgamation of the derived SBFs and temporal model results in a spatially continuous model. The use of a cubic B-spline surface is validated as an effective solution for optimizing space construction in the sense of least squares approximation. Experimental tests conducted on a pouch-type Li-ion battery demonstrate the efficacy of the proposed modeling technique under sparse sensing. This work highlights the promise of sparse sensors in real-time full-space modeling for large-scale battery energy storage systems. △ Less

Submitted 28 October, 2023; originally announced October 2023.

arXiv:2310.08606 [pdf, other]

Multiscale Fusion for Abnormality Detection and Localization of Distributed Parameter Systems

Authors: Peng Wei, Han-Xiong Li

Abstract: Numerous industrial thermal processes and fluid processes can be described by distributed parameter systems (DPSs), wherein many process parameters and variables vary in space and time. Early internal abnormalities in the DPS may develop into uncontrollable thermal failures, causing serious safety incidents. In this study, the multiscale information fusion is proposed for internal abnormality dete… ▽ More Numerous industrial thermal processes and fluid processes can be described by distributed parameter systems (DPSs), wherein many process parameters and variables vary in space and time. Early internal abnormalities in the DPS may develop into uncontrollable thermal failures, causing serious safety incidents. In this study, the multiscale information fusion is proposed for internal abnormality detection and localization of DPSs under different scenarios. We introduce the dissimilarity statistic as a means to identify anomalies for lumped variables, whereas spatial and temporal statistic measures are presented for the anomaly detection for distributed variables. Through appropriate parameter optimization, these statistic functions are integrated into the comprehensive multiscale detection index, which outperforms traditional single-scale detection methods. The proposed multiscale statistic has good physical interpretability from the system disorder degree. Experiments on the internal short circuit (ISC) of a battery system have demonstrated that our proposed method can swiftly identify ISC abnormalities and accurately pinpoint problematic battery cells under various working conditions. △ Less

Submitted 1 December, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

arXiv:2308.02263 [pdf, other]

Efficient Monaural Speech Enhancement using Spectrum Attention Fusion

Authors: **yu Long, Jetic Gū, Binhao Bai, Zhibo Yang, ** Wei, Junli Li

Abstract: Speech enhancement is a demanding task in automated speech processing pipelines, focusing on separating clean speech from noisy channels. Transformer based models have recently bested RNN and CNN models in speech enhancement, however at the same time they are much more computationally expensive and require much more high quality training data, which is always hard to come by. In this paper, we pre… ▽ More Speech enhancement is a demanding task in automated speech processing pipelines, focusing on separating clean speech from noisy channels. Transformer based models have recently bested RNN and CNN models in speech enhancement, however at the same time they are much more computationally expensive and require much more high quality training data, which is always hard to come by. In this paper, we present an improvement for speech enhancement models that maintains the expressiveness of self-attention while significantly reducing model complexity, which we have termed Spectrum Attention Fusion. We carefully construct a convolutional module to replace several self-attention layers in a speech Transformer, allowing the model to more efficiently fuse spectral features. Our proposed model is able to achieve comparable or better results against SOTA models but with significantly smaller parameters (0.58M) on the Voice Bank + DEMAND dataset. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2308.01117 [pdf]

Optimization-Based Motion Planning for Autonomous Agricultural Vehicles Turning in Constrained Headlands

Authors: Chen Peng, Peng Wei, Zhenghao Fei, Yuankai Zhu, Stavros G. Vougioukas

Abstract: Headland maneuvering is a crucial aspect of unmanned field operations for autonomous agricultural vehicles (AAVs). While motion planning for headland turning in open fields has been extensively studied and integrated into commercial auto-guidance systems, the existing methods primarily address scenarios with ample headland space and thus may not work in more constrained headland geometries. Commer… ▽ More Headland maneuvering is a crucial aspect of unmanned field operations for autonomous agricultural vehicles (AAVs). While motion planning for headland turning in open fields has been extensively studied and integrated into commercial auto-guidance systems, the existing methods primarily address scenarios with ample headland space and thus may not work in more constrained headland geometries. Commercial orchards often contain narrow and irregularly shaped headlands, which may include static obstacles,rendering the task of planning a smooth and collision-free turning trajectory difficult. To address this challenge, we propose an optimization-based motion planning algorithm for headland turning under geometrical constraints imposed by field geometry and obstacles. △ Less

Submitted 11 June, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

arXiv:2307.11530 [pdf, other]

UWAT-GAN: Fundus Fluorescein Angiography Synthesis via Ultra-wide-angle Transformation Multi-scale GAN

Authors: Zhaojie Fang, Zhanghao Chen, Pengxue Wei, Wangting Li, Shaochong Zhang, Ahmed Elazab, Gangyong Jia, Ruiquan Ge, Changmiao Wang

Abstract: Fundus photography is an essential examination for clinical and differential diagnosis of fundus diseases. Recently, Ultra-Wide-angle Fundus (UWF) techniques, UWF Fluorescein Angiography (UWF-FA) and UWF Scanning Laser Ophthalmoscopy (UWF-SLO) have been gradually put into use. However, Fluorescein Angiography (FA) and UWF-FA require injecting sodium fluorescein which may have detrimental influence… ▽ More Fundus photography is an essential examination for clinical and differential diagnosis of fundus diseases. Recently, Ultra-Wide-angle Fundus (UWF) techniques, UWF Fluorescein Angiography (UWF-FA) and UWF Scanning Laser Ophthalmoscopy (UWF-SLO) have been gradually put into use. However, Fluorescein Angiography (FA) and UWF-FA require injecting sodium fluorescein which may have detrimental influences. To avoid negative impacts, cross-modality medical image generation algorithms have been proposed. Nevertheless, current methods in fundus imaging could not produce high-resolution images and are unable to capture tiny vascular lesion areas. This paper proposes a novel conditional generative adversarial network (UWAT-GAN) to synthesize UWF-FA from UWF-SLO. Using multi-scale generators and a fusion module patch to better extract global and local information, our model can generate high-resolution images. Moreover, an attention transmit module is proposed to help the decoder learn effectively. Besides, a supervised approach is used to train the network using multiple new weighted losses on different scales of data. Experiments on an in-house UWF image dataset demonstrate the superiority of the UWAT-GAN over the state-of-the-art methods. The source code is available at: https://github.com/Tinysqua/UWAT-GAN. △ Less

Submitted 21 July, 2023; originally announced July 2023.

Comments: 26th International Conference on Medical Image Computing and Computer Assisted Intervention

arXiv:2307.07218 [pdf, other]

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

Authors: Ziyue Jiang, **glin Liu, Yi Ren, **zheng He, Zhenhui Ye, Shengpeng Ji, Qian Yang, Chen Zhang, Pengfei Wei, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao

Abstract: Zero-shot text-to-speech (TTS) aims to synthesize voices with unseen speech prompts, which significantly reduces the data and computation requirements for voice cloning by skip** the fine-tuning process. However, the prompting mechanisms of zero-shot TTS still face challenges in the following aspects: 1) previous works of zero-shot TTS are typically trained with single-sentence prompts, which si… ▽ More Zero-shot text-to-speech (TTS) aims to synthesize voices with unseen speech prompts, which significantly reduces the data and computation requirements for voice cloning by skip** the fine-tuning process. However, the prompting mechanisms of zero-shot TTS still face challenges in the following aspects: 1) previous works of zero-shot TTS are typically trained with single-sentence prompts, which significantly restricts their performance when the data is relatively sufficient during the inference stage. 2) The prosodic information in prompts is highly coupled with timbre, making it untransferable to each other. This paper introduces Mega-TTS 2, a generic prompting mechanism for zero-shot TTS, to tackle the aforementioned challenges. Specifically, we design a powerful acoustic autoencoder that separately encodes the prosody and timbre information into the compressed latent space while providing high-quality reconstructions. Then, we propose a multi-reference timbre encoder and a prosody latent language model (P-LLM) to extract useful information from multi-sentence prompts. We further leverage the probabilities derived from multiple P-LLM outputs to produce transferable and controllable prosody. Experimental results demonstrate that Mega-TTS 2 could not only synthesize identity-preserving speech with a short prompt of an unseen speaker from arbitrary sources but consistently outperform the fine-tuning method when the volume of data ranges from 10 seconds to 5 minutes. Furthermore, our method enables to transfer various speaking styles to the target timbre in a fine-grained and controlled manner. Audio samples can be found in https://boostprompt.github.io/boostprompt/. △ Less

Submitted 10 April, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

Comments: Accepted by ICLR 2024

arXiv:2306.11647 [pdf, ps, other]

Safe and Scalable Real-Time Trajectory Planning Framework for Urban Air Mobility

Authors: Abenezer Taye, Roberto Valenti, Akshay Rajhans, Anastasia Mavrommati, Pieter J. Mosterman, Peng Wei

Abstract: This paper presents a real-time trajectory planning framework for Urban Air Mobility (UAM) that is both safe and scalable. The proposed framework employs a decentralized, free-flight concept of operation in which each aircraft independently performs separation assurance and conflict resolution, generating safe trajectories by accounting for the future states of nearby aircraft. The framework consi… ▽ More This paper presents a real-time trajectory planning framework for Urban Air Mobility (UAM) that is both safe and scalable. The proposed framework employs a decentralized, free-flight concept of operation in which each aircraft independently performs separation assurance and conflict resolution, generating safe trajectories by accounting for the future states of nearby aircraft. The framework consists of two main components: a data-driven reachability analysis tool and an efficient Markov Decision Process (MDP) based decision maker. The reachability analysis over-approximates the reachable set of each aircraft through a discrepancy function learned online from simulated trajectories. The decision maker, on the other hand, uses a 6-degrees-of-freedom guidance model of fixed-wing aircraft to ensure collision-free trajectory planning. Additionally, the proposed framework incorporates reward sha** and action shielding techniques to enhance safety performance. The proposed framework is evaluated through simulation experiments involving up to 32 aircraft in a UAM setting, with performance measured by the number of Near Mid Air Collisions (NMAC) and computational time. The results demonstrate the safety and scalability of the proposed framework. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2301.12961 [pdf, other]

doi 10.1109/ICUAS57906.2023.10156164

A Framework for Operational Volume Generation for Urban Air Mobility Strategic Deconfliction

Authors: Ellis Lee Thompson, Yan Xu, Peng Wei

Abstract: Strategic pre-flight functions focus on the planning and deconfliction of routes for aircraft systems. The urban air mobility concept calls for higher levels of autonomy with onboard and en route functions but also strategic and pre-flight systems. Existing endeavours into strategic pre-flight functions focus on improving the route generation and strategic deconfliction of these routes. Introduced… ▽ More Strategic pre-flight functions focus on the planning and deconfliction of routes for aircraft systems. The urban air mobility concept calls for higher levels of autonomy with onboard and en route functions but also strategic and pre-flight systems. Existing endeavours into strategic pre-flight functions focus on improving the route generation and strategic deconfliction of these routes. Introduced with the urban air mobility concept is the premise of operational volumes. These 4D regions of airspace, describe the intended operational region for an aircraft for finite time. Chaining these together forms a contract of finite operational volumes over a given route. It is no longer enough to only deconflict routes within the airspace, but to now consider these 4D operational volumes. To provide an effective all-in-one approach, we propose a novel framework for generating routes and accompanying contracts of operational volumes, along with deconfliction focused around 4D operational volumes. Experimental results show efficiency of operational volume generation utilising reachability analysis and demonstrate sufficient success in deconfliction of operational volumes. △ Less

Submitted 27 June, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: 8 pages, 3 Figures

Journal ref: 2023 International Conference on Unmanned Aircraft Systems (ICUAS), Warsaw, Poland, 2023, pp. 71-78

arXiv:2211.02147 [pdf, other]

A Survey on Reinforcement Learning in Aviation Applications

Authors: Pouria Razzaghi, Amin Tabrizian, Wei Guo, Shulu Chen, Abenezer Taye, Ellis Thompson, Alexis Bregeon, Ali Baheri, Peng Wei

Abstract: Compared with model-based control and optimization methods, reinforcement learning (RL) provides a data-driven, learning-based framework to formulate and solve sequential decision-making problems. The RL framework has become promising due to largely improved data availability and computing power in the aviation industry. Many aviation-based applications can be formulated or treated as sequential d… ▽ More Compared with model-based control and optimization methods, reinforcement learning (RL) provides a data-driven, learning-based framework to formulate and solve sequential decision-making problems. The RL framework has become promising due to largely improved data availability and computing power in the aviation industry. Many aviation-based applications can be formulated or treated as sequential decision-making problems. Some of them are offline planning problems, while others need to be solved online and are safety-critical. In this survey paper, we first describe standard RL formulations and solutions. Then we survey the landscape of existing RL-based applications in aviation. Finally, we summarize the paper, identify the technical gaps, and suggest future directions of RL research in aviation. △ Less

Submitted 22 November, 2022; v1 submitted 3 November, 2022; originally announced November 2022.

arXiv:2208.00428 [pdf, other]

doi 10.1145/3474085

Robust Real-World Image Super-Resolution against Adversarial Attacks

Authors: Jiutao Yue, Haofeng Li, Pengxu Wei, Guanbin Li, Liang Lin

Abstract: Recently deep neural networks (DNNs) have achieved significant success in real-world image super-resolution (SR). However, adversarial image samples with quasi-imperceptible noises could threaten deep learning SR models. In this paper, we propose a robust deep learning framework for real-world SR that randomly erases potential adversarial noises in the frequency domain of input images or features.… ▽ More Recently deep neural networks (DNNs) have achieved significant success in real-world image super-resolution (SR). However, adversarial image samples with quasi-imperceptible noises could threaten deep learning SR models. In this paper, we propose a robust deep learning framework for real-world SR that randomly erases potential adversarial noises in the frequency domain of input images or features. The rationale is that on the SR task clean images or features have a different pattern from the attacked ones in the frequency domain. Observing that existing adversarial attacks usually add high-frequency noises to input images, we introduce a novel random frequency mask module that blocks out high-frequency components possibly containing the harmful perturbations in a stochastic manner. Since the frequency masking may not only destroys the adversarial perturbations but also affects the sharp details in a clean image, we further develop an adversarial sample classifier based on the frequency domain of images to determine if applying the proposed mask module. Based on the above ideas, we devise a novel real-world image SR framework that combines the proposed frequency mask modules and the proposed adversarial classifier with an existing super-resolution backbone network. Experiments show that our proposed method is more insensitive to adversarial attacks and presents more stable SR results than existing models and defenses. △ Less

Submitted 31 July, 2022; originally announced August 2022.

Comments: ACM-MM 2021, Code: https://github.com/lhaof/Robust-SR-against-Adversarial-Attacks

Journal ref: Proceedings of the 29th ACM International Conference on Multimedia (2021) 5148-5157

arXiv:2206.02609 [pdf, other]

Real-World Image Super-Resolution by Exclusionary Dual-Learning

Authors: Hao Li, **ghui Qin, Zhi**g Yang, Pengxu Wei, **shan Pan, Liang Lin, Yukai Shi

Abstract: Real-world image super-resolution is a practical image restoration problem that aims to obtain high-quality images from in-the-wild input, has recently received considerable attention with regard to its tremendous application potentials. Although deep learning-based methods have achieved promising restoration quality on real-world image super-resolution datasets, they ignore the relationship betwe… ▽ More Real-world image super-resolution is a practical image restoration problem that aims to obtain high-quality images from in-the-wild input, has recently received considerable attention with regard to its tremendous application potentials. Although deep learning-based methods have achieved promising restoration quality on real-world image super-resolution datasets, they ignore the relationship between L1- and perceptual- minimization and roughly adopt auxiliary large-scale datasets for pre-training. In this paper, we discuss the image types within a corrupted image and the property of perceptual- and Euclidean- based evaluation protocols. Then we propose a method, Real-World image Super-Resolution by Exclusionary Dual-Learning (RWSR-EDL) to address the feature diversity in perceptual- and L1- based cooperative learning. Moreover, a noise-guidance data collection strategy is developed to address the training time consumption in multiple datasets optimization. When an auxiliary dataset is incorporated, RWSR-EDL achieves promising results and repulses any training time increment by adopting the noise-guidance data collection strategy. Extensive experiments show that RWSR-EDL achieves competitive performance over state-of-the-art methods on four in-the-wild image super-resolution datasets. △ Less

Submitted 6 June, 2022; originally announced June 2022.

Comments: IEEE TMM 2022; Considering large volume of RealSR datasets, a multi-dataset sampling scheme is developed

arXiv:2205.04590 [pdf, other]

A Verification Framework for Certifying Learning-Based Safety-Critical Aviation Systems

Authors: Ali Baheri, Hao Ren, Benjamin Johnson, Pouria Razzaghi, Peng Wei

Abstract: We present a safety verification framework for design-time and run-time assurance of learning-based components in aviation systems. Our proposed framework integrates two novel methodologies. From the design-time assurance perspective, we propose offline mixed-fidelity verification tools that incorporate knowledge from different levels of granularity in simulated environments. From the run-time ass… ▽ More We present a safety verification framework for design-time and run-time assurance of learning-based components in aviation systems. Our proposed framework integrates two novel methodologies. From the design-time assurance perspective, we propose offline mixed-fidelity verification tools that incorporate knowledge from different levels of granularity in simulated environments. From the run-time assurance perspective, we propose reachability- and statistics-based online monitoring and safety guards for a learning-based decision-making model to complement the offline verification methods. This framework is designed to be loosely coupled among modules, allowing the individual modules to be developed using independent methodologies and techniques, under varying circumstances and with different tool access. The proposed framework offers feasible solutions for meeting system safety requirements at different stages throughout the system development and deployment cycle, enabling the continuous learning and assessment of the system product. △ Less

Submitted 14 May, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

Comments: 12 pages, 9 figures

arXiv:2205.03524 [pdf, other]

Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution

Authors: Xiaoqian Xu, Pengxu Wei, Weikai Chen, Mingzhi Mao, Liang Lin, Guanbin Li

Abstract: Due to the sophisticated imaging process, an identical scene captured by different cameras could exhibit distinct imaging patterns, introducing distinct proficiency among the super-resolution (SR) models trained on images from different devices. In this paper, we investigate a novel and practical task coded cross-device SR, which strives to adapt a real-world SR model trained on the paired images… ▽ More Due to the sophisticated imaging process, an identical scene captured by different cameras could exhibit distinct imaging patterns, introducing distinct proficiency among the super-resolution (SR) models trained on images from different devices. In this paper, we investigate a novel and practical task coded cross-device SR, which strives to adapt a real-world SR model trained on the paired images captured by one camera to low-resolution (LR) images captured by arbitrary target devices. The proposed task is highly challenging due to the absence of paired data from various imaging devices. To address this issue, we propose an unsupervised domain adaptation mechanism for real-world SR, named Dual ADversarial Adaptation (DADA), which only requires LR images in the target domain with available real paired data from a source camera. DADA employs the Domain-Invariant Attention (DIA) module to establish the basis of target model training even without HR supervision. Furthermore, the dual framework of DADA facilitates an Inter-domain Adversarial Adaptation (InterAA) in one branch for two LR input images from two domains, and an Intra-domain Adversarial Adaptation (IntraAA) in two branches for an LR input image. InterAA and IntraAA together improve the model transferability from the source domain to the target. We empirically conduct experiments under six Real to Real adaptation settings among three different cameras, and achieve superior performance compared with existing state-of-the-art approaches. We also evaluate the proposed DADA to address the adaptation to the video camera, which presents a promising research topic to promote the wide applications of real-world super-resolution. Our source code is publicly available at https://github.com/lonelyhope/DADA.git. △ Less

Submitted 6 May, 2022; originally announced May 2022.

arXiv:2203.04583 [pdf, other]

Language Adaptive Cross-lingual Speech Representation Learning with Sparse Sharing Sub-networks

Authors: Yizhou Lu, Mingkun Huang, Xinghua Qu, Pengfei Wei, Zejun Ma

Abstract: Unsupervised cross-lingual speech representation learning (XLSR) has recently shown promising results in speech recognition by leveraging vast amounts of unlabeled data across multiple languages. However, standard XLSR model suffers from language interference problem due to the lack of language specific modeling ability. In this work, we investigate language adaptive training on XLSR models. More… ▽ More Unsupervised cross-lingual speech representation learning (XLSR) has recently shown promising results in speech recognition by leveraging vast amounts of unlabeled data across multiple languages. However, standard XLSR model suffers from language interference problem due to the lack of language specific modeling ability. In this work, we investigate language adaptive training on XLSR models. More importantly, we propose a novel language adaptive pre-training approach based on sparse sharing sub-networks. It makes room for language specific modeling by pruning out unimportant parameters for each language, without requiring any manually designed language specific component. After pruning, each language only maintains a sparse sub-network, while the sub-networks are partially shared with each other. Experimental results on a downstream multilingual speech recognition task show that our proposed method significantly outperforms baseline XLSR models on both high resource and low resource languages. Besides, our proposed method consistently outperforms other adaptation methods and requires fewer parameters. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: To appear in ICASSP 2022

arXiv:2202.03498 [pdf, other]

doi 10.1109/TGRS.2021.3131418

Random Ferns for Semantic Segmentation of PolSAR Images

Authors: Pengchao Wei, Ronny Hänsch

Abstract: Random Ferns -- as a less known example of Ensemble Learning -- have been successfully applied in many Computer Vision applications ranging from keypoint matching to object detection. This paper extends the Random Fern framework to the semantic segmentation of polarimetric synthetic aperture radar images. By using internal projections that are defined over the space of Hermitian matrices, the prop… ▽ More Random Ferns -- as a less known example of Ensemble Learning -- have been successfully applied in many Computer Vision applications ranging from keypoint matching to object detection. This paper extends the Random Fern framework to the semantic segmentation of polarimetric synthetic aperture radar images. By using internal projections that are defined over the space of Hermitian matrices, the proposed classifier can be directly applied to the polarimetric covariance matrices without the need to explicitly compute predefined image features. Furthermore, two distinct optimization strategies are proposed: The first based on pre-selection and grou** of internal binary features before the creation of the classifier; and the second based on iteratively improving the properties of a given Random Fern. Both strategies are able to boost the performance by filtering features that are either redundant or have a low information content and by grou** correlated features to best fulfill the independence assumptions made by the Random Fern classifier. Experiments show that results can be achieved that are similar to a more complex Random Forest model and competitive to a deep learning baseline. △ Less

Submitted 7 February, 2022; originally announced February 2022.

Comments: This is the author's version of the article as accepted for publication in IEEE Transactions on Geoscience and Remote Sensing, 2021. Link to original: https://ieeexplore.ieee.org/document/9627989

arXiv:2109.11259 [pdf, ps, other]

Multi-sensor joint target detection, tracking and classification via Bernoulli filter

Authors: Gaiyou Li, ** Wei, Giorgio Battistelli, Luigi Chisci, Lin Gao

Abstract: This paper focuses on \textit{joint detection, tracking and classification} (JDTC) of a target via multi-sensor fusion. The target can be present or not, can belong to different classes, and depending on its class can behave according to different kinematic modes. Accordingly, it is modeled as a suitably extended Bernoulli \textit{random finite set} (RFS) uniquely characterized by existence, class… ▽ More This paper focuses on \textit{joint detection, tracking and classification} (JDTC) of a target via multi-sensor fusion. The target can be present or not, can belong to different classes, and depending on its class can behave according to different kinematic modes. Accordingly, it is modeled as a suitably extended Bernoulli \textit{random finite set} (RFS) uniquely characterized by existence, classification, class-conditioned mode and class\&mode-conditioned state probability distributions. By designing suitable centralized and distributed rules for fusing information on target existence, class, mode and state from different sensor nodes, novel \textit{centralized} and \textit{distributed} JDTC \textit{Bernoulli filters} (C-JDTC-BF and D-JDTC-BF), are proposed. The performance of the proposed JDTC-BF approach is evaluated by means of simulation experiments. △ Less

Submitted 23 September, 2021; originally announced September 2021.

arXiv:2106.03254 [pdf]

Power System Transient Modeling and Simulation using Integrated Circuit

Authors: Xiang Zhang, Renchang Dai, Peng Wei, Yi**g Liu, Guangyi Liu, Zhiwei Wang

Abstract: Transient stability analysis (TSA) plays an important role in power system analysis to investigate the stability of power system. Traditionally, transient stability analysis methods have been developed using time domain simulation by means of numerical integration method. In this paper, a new approach is proposed to model power systems as an integrated circuit and simulate the power system dynamic… ▽ More Transient stability analysis (TSA) plays an important role in power system analysis to investigate the stability of power system. Traditionally, transient stability analysis methods have been developed using time domain simulation by means of numerical integration method. In this paper, a new approach is proposed to model power systems as an integrated circuit and simulate the power system dynamic behavior by integrated circuit simulator. The proposed method modeled power grid, generator, governor, and exciter with high fidelity. The power system dynamic simulation accuracy and efficiency of the proposed approach are verified and demonstrated by case study on an IEEE standard system. △ Less

Submitted 6 June, 2021; originally announced June 2021.

Comments: Has been accepted by 2021 PES General Meeting

arXiv:2006.04055 [pdf, other]

doi 10.1016/j.comcom.2020.06.002

Green Resource Allocation and Energy Management in Heterogeneous Small Cell Networks Powered by Hybrid Energy

Authors: Qiaoni Han, Bo Yang, Nan Song, Yuwei Li, ** Wei

Abstract: In heterogeneous networks (HetNets), how to improve spectrum efficiency is a crucial issue. Meanwhile increased energy consumption inspires network operators to deploy renewable energy sources as assistance to traditional electricity. Based on above aspects, we allow base stations (BSs) to share their licensed spectrum resource with each other and adjust transmission power to adapt to the renewabl… ▽ More In heterogeneous networks (HetNets), how to improve spectrum efficiency is a crucial issue. Meanwhile increased energy consumption inspires network operators to deploy renewable energy sources as assistance to traditional electricity. Based on above aspects, we allow base stations (BSs) to share their licensed spectrum resource with each other and adjust transmission power to adapt to the renewable energy level. Considering the sharing fairness among BSs, we formulate a multi-person bargaining problem as a stochastic optimization problem. We divide the optimization problem into three parts: data rate control, resource allocation and energy management. An online dynamic control algorithm is proposed to control admission rate and resource allocation to maximize the transmission and sharing profits with the least grid energy consumption. Simulation results investigate the time-varying data control and energy management of BSs and demonstrate the effectiveness of the proposed scheme. △ Less

Submitted 14 June, 2020; v1 submitted 7 June, 2020; originally announced June 2020.

Comments: 29 pages, 7 figures

arXiv:1912.02907 [pdf]

Diagnostic Image Quality Assessment and Classification in Medical Imaging: Opportunities and Challenges

Authors: Jeffrey Ma, Ukash Nakarmi, Cedric Yue Sik Kin, Christopher Sandino, Joseph Y. Cheng, Ali B. Syed, Peter Wei, John M. Pauly, Shreyas Vasanawala

Abstract: Magnetic Resonance Imaging (MRI) suffers from several artifacts, the most common of which are motion artifacts. These artifacts often yield images that are of non-diagnostic quality. To detect such artifacts, images are prospectively evaluated by experts for their diagnostic quality, which necessitates patient-revisits and rescans whenever non-diagnostic quality scans are encountered. This motivat… ▽ More Magnetic Resonance Imaging (MRI) suffers from several artifacts, the most common of which are motion artifacts. These artifacts often yield images that are of non-diagnostic quality. To detect such artifacts, images are prospectively evaluated by experts for their diagnostic quality, which necessitates patient-revisits and rescans whenever non-diagnostic quality scans are encountered. This motivates the need to develop an automated framework capable of accessing medical image quality and detecting diagnostic and non-diagnostic images. In this paper, we explore several convolutional neural network-based frameworks for medical image quality assessment and investigate several challenges therein. △ Less

Submitted 5 December, 2019; originally announced December 2019.

Comments: 4 pages, 8 Figures, Conference Submission

arXiv:1902.02523 [pdf, ps, other]

Distributed Joint Sensor Registration and Multitarget Tracking Via Sensor Network

Authors: Lin Gao, Giorgio Battistelli, Luigi Chisci, ** Wei

Abstract: This paper addresses distributed registration of a sensor network for multitarget tracking. Each sensor gets measurements of the target position in a local coordinate frame, having no knowledge about the relative positions (referred to as drift parameters) and azimuths (referred to as orientation parameters) of its neighboring nodes. The multitarget set is modeled as an independent and identically… ▽ More This paper addresses distributed registration of a sensor network for multitarget tracking. Each sensor gets measurements of the target position in a local coordinate frame, having no knowledge about the relative positions (referred to as drift parameters) and azimuths (referred to as orientation parameters) of its neighboring nodes. The multitarget set is modeled as an independent and identically distributed (i.i.d.) cluster random finite set (RFS), and a consensus cardinality probability hypothesis density (CPHD) filter is run over the network to recursively compute in each node the posterior RFS density. Then a suitable cost function, xpressing the discrepancy between the local posteriors in terms of averaged Kullback-Leibler divergence, is minimized with respect to the drift and orientation parameters for sensor registration purposes. In this way, a computationally feasible optimization approach for joint sensor registraton and multitarget tracking is devised. Finally, the effectiveness of the proposed approach is demonstrated through simulation experiments on both tree networks and networks with cycles, as well as with both linear and nonlinear sensors. △ Less

Submitted 7 February, 2019; originally announced February 2019.

arXiv:1812.10944 [pdf, ps, other]

Basis Signal Optimization for N-Continuous OFDM

Authors: Peng Wei, Yue Xiao, Wei Xiang

Abstract: A novel basis signal optimization method is proposed for reducing the interference in the N-continuous orthogonal frequency division multiplexing (NC-OFDM) system. Compared to conventional NC-OFDM, the proposed scheme is capable of improving the transmission performance while maintaining an identical sidelobe suppression performance imposed by the linear combination of two groups of basis signals.… ▽ More A novel basis signal optimization method is proposed for reducing the interference in the N-continuous orthogonal frequency division multiplexing (NC-OFDM) system. Compared to conventional NC-OFDM, the proposed scheme is capable of improving the transmission performance while maintaining an identical sidelobe suppression performance imposed by the linear combination of two groups of basis signals. Our performance results demonstrate that with a low complexity overhead, the proposed scheme is capable of striking a better trade-off among the bit error rate (BER), complexity, and the sidelobe suppression performance compared to its conventional counterpart. △ Less

Submitted 3 November, 2020; v1 submitted 28 December, 2018; originally announced December 2018.

Comments: 5 pages, 5 figures, 3 tables

arXiv:1810.11390 [pdf, other]

Joint Estimation of DOA and Frequency with Sub-Nyquist Sampling in a Binary Array Radar System

Authors: Zhan Zhang, ** Wei, Lijuan Deng, Huaguo Zhang

Abstract: Recently, several array radar structures combined with sub-Nyquist techniques and corresponding algorithms have been extensively studied. Carrier frequency and direction-of-arrival (DOA) estimations of multiple narrow-band signals received by array radars at the sub-Nyquist rates are considered in this paper. We propose a new sub-Nyquist array radar architecture (a binary array radar separately co… ▽ More Recently, several array radar structures combined with sub-Nyquist techniques and corresponding algorithms have been extensively studied. Carrier frequency and direction-of-arrival (DOA) estimations of multiple narrow-band signals received by array radars at the sub-Nyquist rates are considered in this paper. We propose a new sub-Nyquist array radar architecture (a binary array radar separately connected to a multi-coset structure with M branches) and an efficient joint estimation algorithm which can match frequencies up with corresponding DOAs. We further come up with a delay pattern augmenting method, by which the capability of the number of identifiable signals can increase from M-1 to Q-1 (Q is extended degrees of freedom). We further conclude that the minimum total sampling rate 2MB is sufficient to identify $ {K \leq Q-1}$ narrow-band signals of maximum bandwidth $B$ inside. The effectiveness and performance of the estimation algorithm together with the augmenting method have been verified by simulations. △ Less

Submitted 26 October, 2018; originally announced October 2018.

Comments: 6 pages, 2 figures, conference

arXiv:1803.06554 [pdf, other]

Fusion of an Ensemble of Augmented Image Detectors for Robust Object Detection

Authors: Pan Wei, John E. Ball, Derek T. Anderson

Abstract: A significant challenge in object detection is accurate identification of an object's position in image space, whereas one algorithm with one set of parameters is usually not enough, and the fusion of multiple algorithms and/or parameters can lead to more robust results. Herein, a new computational intelligence fusion approach based on the dynamic analysis of agreement among object detection outpu… ▽ More A significant challenge in object detection is accurate identification of an object's position in image space, whereas one algorithm with one set of parameters is usually not enough, and the fusion of multiple algorithms and/or parameters can lead to more robust results. Herein, a new computational intelligence fusion approach based on the dynamic analysis of agreement among object detection outputs is proposed. Furthermore, we propose an online versus just in training image augmentation strategy. Experiments comparing the results both with and without fusion are presented. We demonstrate that the augmented and fused combination results are the best, with respect to higher accuracy rates and reduction of outlier influences. The approach is demonstrated in the context of cone, pedestrian and box detection for Advanced Driver Assistance Systems (ADAS) applications. △ Less

Submitted 17 March, 2018; originally announced March 2018.

Comments: 21 pages, 12 figures, journal paper, MDPI Sensors, 2018

arXiv:1803.04964 [pdf]

Onion-Peeling Outlier Detection in 2-D data Sets

Authors: Archit Harsh, John E. Ball, Pan Wei

Abstract: Outlier Detection is a critical and cardinal research task due its array of applications in variety of domains ranging from data mining, clustering, statistical analysis, fraud detection, network intrusion detection and diagnosis of diseases etc. Over the last few decades, distance-based outlier detection algorithms have gained significant reputation as a viable alternative to the more traditional… ▽ More Outlier Detection is a critical and cardinal research task due its array of applications in variety of domains ranging from data mining, clustering, statistical analysis, fraud detection, network intrusion detection and diagnosis of diseases etc. Over the last few decades, distance-based outlier detection algorithms have gained significant reputation as a viable alternative to the more traditional statistical approaches due to their scalable, non-parametric and simple implementation. In this paper, we present a modified onion peeling (Convex hull) genetic algorithm to detect outliers in a Gaussian 2-D point data set. We present three different scenarios of outlier detection using a) Euclidean Distance Metric b) Standardized Euclidean Distance Metric and c) Mahalanobis Distance Metric. Finally, we analyze the performance and evaluate the results. △ Less

Submitted 12 March, 2018; originally announced March 2018.

Comments: 6 pages, 4 figures, journal paper

Journal ref: International Journal of Computer Application, Vol.139 (3), pp.26-31, April, 2016

arXiv:1803.04556 [pdf, other]

Measuring Conflict in a Multi-Source Environment as a Normal Measure

Authors: Pan Wei, John E. Ball, Derek T. Anderson, Archit Harsh, Christopher Archibald

Abstract: In a multi-source environment, each source has its own credibility. If there is no external knowledge about credibility then we can use the information provided by the sources to assess their credibility. In this paper, we propose a way to measure conflict in a multi-source environment as a normal measure. We examine our algorithm using three simulated examples of increasing conflict and one exper… ▽ More In a multi-source environment, each source has its own credibility. If there is no external knowledge about credibility then we can use the information provided by the sources to assess their credibility. In this paper, we propose a way to measure conflict in a multi-source environment as a normal measure. We examine our algorithm using three simulated examples of increasing conflict and one experimental example. The results demonstrate that the proposed measure can represent conflict in a meaningful way similar to what a human might expect and from it we can identify conflict within our sources. △ Less

Submitted 12 March, 2018; originally announced March 2018.

Comments: 4 pages, 8 figures, conference paper

Journal ref: IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), December, 2015

arXiv:1803.04551 [pdf]

Multi-Sensor Conflict Measurement and Information Fusion

Authors: Pan Wei, John E. Ball, Derek T. Anderson

Abstract: In sensing applications where multiple sensors observe the same scene, fusing sensor outputs can provide improved results. However, if some of the sensors are providing lower quality outputs, the fused results can be degraded. In this work, a multi-sensor conflict measure is proposed which estimates multi-sensor conflict by representing each sensor output as interval-valued information and examine… ▽ More In sensing applications where multiple sensors observe the same scene, fusing sensor outputs can provide improved results. However, if some of the sensors are providing lower quality outputs, the fused results can be degraded. In this work, a multi-sensor conflict measure is proposed which estimates multi-sensor conflict by representing each sensor output as interval-valued information and examines the sensor output overlaps on all possible n-tuple sensor combinations. The conflict is based on the sizes of the intervals and how many sensors output values lie in these intervals. In this work, conflict is defined in terms of how little the output from multiple sensors overlap. That is, high degrees of overlap mean low sensor conflict, while low degrees of overlap mean high conflict. This work is a preliminary step towards a robust conflict and sensor fusion framework. In addition, a sensor fusion algorithm is proposed based on a weighted sum of sensor outputs, where the weights for each sensor diminish as the conflict measure increases. The proposed methods can be utilized to (1) assess a measure of multi-sensor conflict, and (2) improve sensor output fusion by lessening weighting for sensors with high conflict. Using this measure, a simulated example is given to explain the mechanics of calculating the conflict measure, and stereo camera 3D outputs are analyzed and fused. In the stereo camera case, the sensor output is corrupted by additive impulse noise, DC offset, and Gaussian noise. Impulse noise is common in sensors due to intermittent interference, a DC offset a sensor bias or registration error, and Gaussian noise represents a sensor output with low SNR. The results show that sensor output fusion based on the conflict measure shows improved accuracy over a simple averaging fusion strategy. △ Less

Submitted 12 March, 2018; originally announced March 2018.

Comments: 15 pages, 9 figures, conference paper

Journal ref: SPIE Defense, Security, and Sensing, April, 2016

arXiv:1707.08781 [pdf, other]

Consensus-based joint target tracking and sensor localization

Authors: Lin Gao, Giorgio Battistelli, Luigi Chisci, ** Wei

Abstract: In this paper, consensus-based Kalman filtering is extended to deal with the problem of joint target tracking and sensor self-localization in a distributed wireless sensor network. The average weighted Kullback-Leibler divergence, which is a function of the unknown drift parameters, is employed as the cost to measure the discrepancy between the fused posterior distribution and the local distributi… ▽ More In this paper, consensus-based Kalman filtering is extended to deal with the problem of joint target tracking and sensor self-localization in a distributed wireless sensor network. The average weighted Kullback-Leibler divergence, which is a function of the unknown drift parameters, is employed as the cost to measure the discrepancy between the fused posterior distribution and the local distribution at each sensor. Further, a reasonable approximation of the cost is proposed and an online technique is introduced to minimize the approximated cost function with respect to the drift parameters stored in each node. The remarkable features of the proposed algorithm are that it needs no additional data exchanges, slightly increased memory space and computational load comparable to the standard consensus-based Kalman filter. Finally, the effectiveness of the proposed algorithm is demonstrated through simulation experiments on both a tree network and a network with cycles as well as for both linear and nonlinear sensors. △ Less

Submitted 27 July, 2017; originally announced July 2017.

Showing 1–31 of 31 results for author: Wei, P