Search | arXiv e-print repository

Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals

Authors: Zengding Liu, Chen Chen, Jiannong Cao, Minglei Pan, Jikui Liu, Nan Li, Fen Miao, Ye Li

Abstract: Large language models (LLMs) have captured significant interest from both academia and industry due to their impressive performance across various textual tasks. However, the potential of LLMs to analyze physiological time-series data remains an emerging research field. Particularly, there is a notable gap in the utilization of LLMs for analyzing wearable biosignals to achieve cuffless blood press… ▽ More Large language models (LLMs) have captured significant interest from both academia and industry due to their impressive performance across various textual tasks. However, the potential of LLMs to analyze physiological time-series data remains an emerging research field. Particularly, there is a notable gap in the utilization of LLMs for analyzing wearable biosignals to achieve cuffless blood pressure (BP) measurement, which is critical for the management of cardiovascular diseases. This paper presents the first work to explore the capacity of LLMs to perform cuffless BP estimation based on wearable biosignals. We extracted physiological features from electrocardiogram (ECG) and photoplethysmogram (PPG) signals and designed context-enhanced prompts by combining these features with BP domain knowledge and user information. Subsequently, we adapted LLMs to BP estimation tasks through fine-tuning. To evaluate the proposed approach, we conducted assessments of ten advanced LLMs using a comprehensive public dataset of wearable biosignals from 1,272 participants. The experimental results demonstrate that the optimally fine-tuned LLM significantly surpasses conventional task-specific baselines, achieving an estimation error of 0.00 $\pm$ 9.25 mmHg for systolic BP and 1.29 $\pm$ 6.37 mmHg for diastolic BP. Notably, the ablation studies highlight the benefits of our context enhancement strategy, leading to an 8.9% reduction in mean absolute error for systolic BP estimation. This paper pioneers the exploration of LLMs for cuffless BP measurement, providing a potential solution to enhance the accuracy of cuffless BP measurement. △ Less

Submitted 26 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.04324 [pdf, other]

SF-V: Single Forward Video Generation Model

Authors: Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian Ren

Abstract: Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune p… ▽ More Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune pre-trained video diffusion models. We show that, through the adversarial training, the multi-steps video diffusion model, i.e., Stable Video Diffusion (SVD), can be trained to perform single forward pass to synthesize high-quality videos, capturing both temporal and spatial dependencies in the video data. Extensive experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead for the denoising process (i.e., around $23\times$ speedup compared with SVD and $6\times$ speedup compared with existing works, with even better generation quality), paving the way for real-time video synthesis and editing. More visualization results are made publicly available at https://snap-research.github.io/SF-V. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Project Page: https://snap-research.github.io/SF-V

arXiv:2406.00516 [pdf, other]

Deep Learning based Performance Testing for Analog Integrated Circuits

Authors: Jiawei Cao, Chongtao Guo, Hao Li, Zhigang Wang, Houjun Wang, Geoffrey Ye Li

Abstract: In this paper, we propose a deep learning based performance testing framework to minimize the number of required test modules while guaranteeing the accuracy requirement, where a test module corresponds to a combination of one circuit and one stimulus. First, we apply a deep neural network (DNN) to establish the map** from the response of the circuit under test (CUT) in each module to all specif… ▽ More In this paper, we propose a deep learning based performance testing framework to minimize the number of required test modules while guaranteeing the accuracy requirement, where a test module corresponds to a combination of one circuit and one stimulus. First, we apply a deep neural network (DNN) to establish the map** from the response of the circuit under test (CUT) in each module to all specifications to be tested. Then, the required test modules are selected by solving a 0-1 integer programming problem. Finally, the predictions from the selected test modules are combined by a DNN to form the specification estimations. The simulation results validate the proposed approach in terms of testing accuracy and cost. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.20693 [pdf, other]

R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction

Authors: Ruyi Zha, Tao Jun Lin, Yuanhao Cai, Jiwen Cao, Yanhao Zhang, Hongdong Li

Abstract: 3D Gaussian splatting (3DGS) has shown promising results in image rendering and surface reconstruction. However, its potential in volumetric reconstruction tasks, such as X-ray computed tomography, remains under-explored. This paper introduces R2-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction. By carefully deriving X-ray rasterization functions, we discover a p… ▽ More 3D Gaussian splatting (3DGS) has shown promising results in image rendering and surface reconstruction. However, its potential in volumetric reconstruction tasks, such as X-ray computed tomography, remains under-explored. This paper introduces R2-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction. By carefully deriving X-ray rasterization functions, we discover a previously unknown integration bias in the standard 3DGS formulation, which hampers accurate volume retrieval. To address this issue, we propose a novel rectification technique via refactoring the projection from 3D to 2D Gaussians. Our new method presents three key innovations: (1) introducing tailored Gaussian kernels, (2) extending rasterization to X-ray imaging, and (3) develo** a CUDA-based differentiable voxelizer. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches by 0.93 dB in PSNR and 0.014 in SSIM. Crucially, it delivers high-quality results in 3 minutes, which is 12x faster than NeRF-based methods and on par with traditional algorithms. The superior performance and rapid convergence of our method highlight its practical value. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.10561 [pdf, other]

Infrared Image Super-Resolution via Lightweight Information Split Network

Authors: Shijie Liu, Kang Yan, Feiwei Qin, Changmiao Wang, Ruiquan Ge, Kai Zhang, Jie Huang, Yong Peng, ** Cao

Abstract: Single image super-resolution (SR) is an established pixel-level vision task aimed at reconstructing a high-resolution image from its degraded low-resolution counterpart. Despite the notable advancements achieved by leveraging deep neural networks for SR, most existing deep learning architectures feature an extensive number of layers, leading to high computational complexity and substantial memory… ▽ More Single image super-resolution (SR) is an established pixel-level vision task aimed at reconstructing a high-resolution image from its degraded low-resolution counterpart. Despite the notable advancements achieved by leveraging deep neural networks for SR, most existing deep learning architectures feature an extensive number of layers, leading to high computational complexity and substantial memory demands. These issues become particularly pronounced in the context of infrared image SR, where infrared devices often have stringent storage and computational constraints. To mitigate these challenges, we introduce a novel, efficient, and precise single infrared image SR model, termed the Lightweight Information Split Network (LISN). The LISN comprises four main components: shallow feature extraction, deep feature extraction, dense feature fusion, and high-resolution infrared image reconstruction. A key innovation within this model is the introduction of the Lightweight Information Split Block (LISB) for deep feature extraction. The LISB employs a sequential process to extract hierarchical features, which are then aggregated based on the relevance of the features under consideration. By integrating channel splitting and shift operations, the LISB successfully strikes an optimal balance between enhanced SR performance and a lightweight framework. Comprehensive experimental evaluations reveal that the proposed LISN achieves superior performance over contemporary state-of-the-art methods in terms of both SR quality and model complexity, affirming its efficacy for practical deployment in resource-constrained infrared imaging applications. △ Less

Submitted 27 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.04867 [pdf, other]

MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhi**g Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Hai** Zeng, Kai Feng , et al. (24 additional authors not shown)

Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

arXiv:2404.15278 [pdf, other]

Security-Sensitive Task Offloading in Integrated Satellite-Terrestrial Networks

Authors: Wenjun Lan, Kongyang Chen, Jiannong Cao, Yikai Li, Ning Li, Qi Chen, Yuvraj Sahni

Abstract: With the rapid development of sixth-generation (6G) communication technology, global communication networks are moving towards the goal of comprehensive and seamless coverage. In particular, low earth orbit (LEO) satellites have become a critical component of satellite communication networks. The emergence of LEO satellites has brought about new computational resources known as the \textit{LEO sat… ▽ More With the rapid development of sixth-generation (6G) communication technology, global communication networks are moving towards the goal of comprehensive and seamless coverage. In particular, low earth orbit (LEO) satellites have become a critical component of satellite communication networks. The emergence of LEO satellites has brought about new computational resources known as the \textit{LEO satellite edge}, enabling ground users (GU) to offload computing tasks to the resource-rich LEO satellite edge. However, existing LEO satellite computational offloading solutions primarily focus on optimizing system performance, neglecting the potential issue of malicious satellite attacks during task offloading. In this paper, we propose the deployment of LEO satellite edge in an integrated satellite-terrestrial networks (ISTN) structure to support \textit{security-sensitive computing task offloading}. We model the task allocation and offloading order problem as a joint optimization problem to minimize task offloading delay, energy consumption, and the number of attacks while satisfying reliability constraints. To achieve this objective, we model the task offloading process as a Markov decision process (MDP) and propose a security-sensitive task offloading strategy optimization algorithm based on proximal policy optimization (PPO). Experimental results demonstrate that our algorithm significantly outperforms other benchmark methods in terms of performance. △ Less

Submitted 20 January, 2024; originally announced April 2024.

arXiv:2403.12115 [pdf, other]

Deep learning automates Cobb angle measurement compared with multi-expert observers

Authors: Keyu Li, Hanxue Gu, Roy Colglazier, Robert Lark, Elizabeth Hubbard, Robert French, Denise Smith, Jikai Zhang, Erin McCrum, Anthony Catanzano, Joseph Cao, Leah Waldman, Maciej A. Mazurowski, Benjamin Alman

Abstract: Scoliosis, a prevalent condition characterized by abnormal spinal curvature leading to deformity, requires precise assessment methods for effective diagnosis and management. The Cobb angle is a widely used scoliosis quantification method that measures the degree of curvature between the tilted vertebrae. Yet, manual measuring of Cobb angles is time-consuming and labor-intensive, fraught with signi… ▽ More Scoliosis, a prevalent condition characterized by abnormal spinal curvature leading to deformity, requires precise assessment methods for effective diagnosis and management. The Cobb angle is a widely used scoliosis quantification method that measures the degree of curvature between the tilted vertebrae. Yet, manual measuring of Cobb angles is time-consuming and labor-intensive, fraught with significant interobserver and intraobserver variability. To address these challenges and the lack of interpretability found in certain existing automated methods, we have created fully automated software that not only precisely measures the Cobb angle but also provides clear visualizations of these measurements. This software integrates deep neural network-based spine region detection and segmentation, spine centerline identification, pinpointing the most significantly tilted vertebrae, and direct visualization of Cobb angles on the original images. Upon comparison with the assessments of 7 expert readers, our algorithm exhibited a mean deviation in Cobb angle measurements of 4.17 degrees, notably surpassing the manual approach's average intra-reader discrepancy of 5.16 degrees. The algorithm also achieved intra-class correlation coefficients (ICC) exceeding 0.96 and Pearson correlation coefficients above 0.944, reflecting robust agreement with expert assessments and superior measurement reliability. Through the comprehensive reader study and statistical analysis, we believe this algorithm not only ensures a higher consensus with expert readers but also enhances interpretability and reproducibility during assessments. It holds significant promise for clinical application, potentially aiding physicians in more accurate scoliosis assessment and diagnosis, thereby improving patient care. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 17 pages, 5 figures

arXiv:2403.00892 [pdf, other]

PowerFlowMultiNet: Multigraph Neural Networks for Unbalanced Three-Phase Distribution Systems

Authors: Salah Ghamizi, Jun Cao, Aoxiang Ma, Pedro Rodriguez

Abstract: Efficiently solving unbalanced three-phase power flow in distribution grids is pivotal for grid analysis and simulation. There is a pressing need for scalable algorithms capable of handling large-scale unbalanced power grids that can provide accurate and fast solutions. To address this, deep learning techniques, especially Graph Neural Networks (GNNs), have emerged. However, existing literature pr… ▽ More Efficiently solving unbalanced three-phase power flow in distribution grids is pivotal for grid analysis and simulation. There is a pressing need for scalable algorithms capable of handling large-scale unbalanced power grids that can provide accurate and fast solutions. To address this, deep learning techniques, especially Graph Neural Networks (GNNs), have emerged. However, existing literature primarily focuses on balanced networks, leaving a critical gap in supporting unbalanced three-phase power grids. This letter introduces PowerFlowMultiNet, a novel multigraph GNN framework explicitly designed for unbalanced three-phase power grids. The proposed approach models each phase separately in a multigraph representation, effectively capturing the inherent asymmetry in unbalanced grids. A graph embedding mechanism utilizing message passing is introduced to capture spatial dependencies within the power system network. PowerFlowMultiNet outperforms traditional methods and other deep learning approaches in terms of accuracy and computational speed. Rigorous testing reveals significantly lower error rates and a notable hundredfold increase in computational speed for large power networks compared to model-based methods. △ Less

Submitted 12 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.19013 [pdf, other]

Ultraviolet Positioning via TDOA: Error Analysis and System Prototype

Authors: Shihui Yu, Chubing Lv, Yueke Yang, Yuchen Pan, Lei Sun, Juliang Cao, Ruihang Yu, Chen Gong, Wenqi Wu, Zhengyuan Xu

Abstract: This work performs the design, real-time hardware realization, and experimental evaluation of a positioning system by ultra-violet (UV) communication under photon-level signal detection. The positioning is based on time-difference of arrival (TDOA) principle. Time division-based transmission of synchronization sequence from three transmitters with known positions is applied. We investigate the pos… ▽ More This work performs the design, real-time hardware realization, and experimental evaluation of a positioning system by ultra-violet (UV) communication under photon-level signal detection. The positioning is based on time-difference of arrival (TDOA) principle. Time division-based transmission of synchronization sequence from three transmitters with known positions is applied. We investigate the positioning error via decomposing it into two parts, the transmitter-side timing error and the receiver-side synchronization error. The theoretical average error matches well with the simulation results, which indicates that theoretical fitting can provide reliable guidance and prediction for hardware experiments. We also conduct real-time hardware realization of the TDOA-based positioning system using Field Programmable Gate Array (FPGA), which is experimentally evaluated via outdoor experiments. Experimental results match well with the theoretical and simulation results. △ Less

Submitted 14 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.17718 [pdf]

Towards a Digital Twin Framework in Additive Manufacturing: Machine Learning and Bayesian Optimization for Time Series Process Optimization

Authors: Vispi Karkaria, Anthony Goeckner, Ru**g Zha, Jie Chen, Jian**g Zhang, Qi Zhu, Jian Cao, Robert X. Gao, Wei Chen

Abstract: Laser-directed-energy deposition (DED) offers advantages in additive manufacturing (AM) for creating intricate geometries and material grading. Yet, challenges like material inconsistency and part variability remain, mainly due to its layer-wise fabrication. A key issue is heat accumulation during DED, which affects the material microstructure and properties. While closed-loop control methods for… ▽ More Laser-directed-energy deposition (DED) offers advantages in additive manufacturing (AM) for creating intricate geometries and material grading. Yet, challenges like material inconsistency and part variability remain, mainly due to its layer-wise fabrication. A key issue is heat accumulation during DED, which affects the material microstructure and properties. While closed-loop control methods for heat management are common in DED research, few integrate real-time monitoring, physics-based modeling, and control in a unified framework. Our work presents a digital twin (DT) framework for real-time predictive control of DED process parameters to meet specific design objectives. We develop a surrogate model using Long Short-Term Memory (LSTM)-based machine learning with Bayesian Inference to predict temperatures in DED parts. This model predicts future temperature states in real time. We also introduce Bayesian Optimization (BO) for Time Series Process Optimization (BOTSPO), based on traditional BO but featuring a unique time series process profile generator with reduced dimensions. BOTSPO dynamically optimizes processes, identifying optimal laser power profiles to attain desired mechanical properties. The established process trajectory guides online optimizations, aiming to enhance performance. This paper outlines the digital twin framework's components, promoting its integration into a comprehensive system for AM. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 12 Pages, 10 Figures, 1 Table, NAMRC Conference

arXiv:2312.16064 [pdf, other]

Goal-Oriented Integration of Sensing, Communication, Computing, and Control for Mission-Critical Internet-of-Things

Authors: Jie Cao, Ernest Kurniawan, Amnart Boonkajay, Sumei Sun, Petar Popovski, Xu Zhu

Abstract: Driven by the development goal of network paradigm and demand for various functions in the sixth-generation (6G) mission-critical Internet-of-Things (MC-IoT), we foresee a goal-oriented integration of sensing, communication, computing, and control (GIS3C) in this paper. We first provide an overview of the tasks, requirements, and challenges of MC-IoT. Then we introduce an end-to-end GIS3C architec… ▽ More Driven by the development goal of network paradigm and demand for various functions in the sixth-generation (6G) mission-critical Internet-of-Things (MC-IoT), we foresee a goal-oriented integration of sensing, communication, computing, and control (GIS3C) in this paper. We first provide an overview of the tasks, requirements, and challenges of MC-IoT. Then we introduce an end-to-end GIS3C architecture, in which goal-oriented communication is leveraged to bridge and empower sensing, communication, control, and computing functionalities. By revealing the interplay among multiple subsystems in terms of key performance indicators and parameters, this paper introduces unified metrics, i.e., task completion effectiveness and cost, to facilitate S3C co-design in MC-IoT. The preliminary results demonstrate the benefits of GIS3C in improving task completion effectiveness while reducing costs. We also identify and highlight the gaps and challenges in applying GIS3C in the future 6G networks. △ Less

Submitted 1 January, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.16061 [pdf, other]

Goal-Oriented Communication, Estimation, and Control over Bidirectional Wireless Links

Authors: Jie Cao, Ernest Kurniawan, Amnart Boonkajay, Nikolaos Pappas, Sumei Sun, Petar Popovski

Abstract: We consider a wireless networked control system (WNCS) with bidirectional imperfect links for real-time applications such as smart grids. To maintain the stability of WNCS, captured by the probability that plant state violates preset values, at minimal cost, heterogeneous physical processes are monitored by multiple sensors. This status information, such as dynamic plant state and Markov Process-b… ▽ More We consider a wireless networked control system (WNCS) with bidirectional imperfect links for real-time applications such as smart grids. To maintain the stability of WNCS, captured by the probability that plant state violates preset values, at minimal cost, heterogeneous physical processes are monitored by multiple sensors. This status information, such as dynamic plant state and Markov Process-based context information, is then received/estimated by the controller for remote control. However, scheduling multiple sensors and designing the controller with limited resources is challenging due to their coupling, delay, and transmission loss. We formulate a Constrained Markov Decision Problem (CMDP) to minimize violation probability with cost constraints. We reveal the relationship between the goal and different updating actions by analyzing the significance of information that incorporates goal-related usefulness and contextual importance. Subsequently, a goal-oriented deterministic scheduling policy is proposed. Two sensing-assisted control strategies and a control-aware estimation policy are proposed to improve the violation probability-cost tradeoff, integrated with the scheduling policy to form a goal-oriented co-design framework. Additionally, we explore retransmission in downlink transmission and qualitatively analyze its preference scenario. Simulation results demonstrate that the proposed goal-oriented co-design policy outperforms previous work in simultaneously reducing violation probability and cost △ Less

Submitted 1 January, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.15454 [pdf, other]

Risk-Aware and Energy-Efficient AoI Optimization for Multi-Connectivity WNCS with Short Packet Transmissions

Authors: Jie Cao, Xu Zhu, Sumei Sun, Ernest Kurniawan, Amnart Boonkajay

Abstract: Age of Information (AoI) has been proposed to quantify the freshness of information for emerging real-time applications such as remote monitoring and control in wireless networked control systems (WNCSs). Minimization of the average AoI and its outage probability can ensure timely and stable transmission. Energy efficiency (EE) also plays an important role in WNCSs, as many devices are featured by… ▽ More Age of Information (AoI) has been proposed to quantify the freshness of information for emerging real-time applications such as remote monitoring and control in wireless networked control systems (WNCSs). Minimization of the average AoI and its outage probability can ensure timely and stable transmission. Energy efficiency (EE) also plays an important role in WNCSs, as many devices are featured by low cost and limited battery. Multi-connectivity over multiple links enables a decrease in AoI, at the cost of energy. We tackle the unresolved problem of selecting the optimal number of connections that is both AoI-optimal and energy-efficient, while avoiding risky states. To address this issue, the average AoI and peak AoI (PAoI), as well as PAoI violation probability are formulated as functions of the number of connections. Then the EE-PAoI ratio is introduced to allow a tradeoff between AoI and energy, which is maximized by the proposed risk-aware, AoI-optimal and energy-efficient connectivity scheme. To obtain this, we analyze the property of the formulated EE-PAoI ratio and prove the monotonicity of PAoI violation probability. Interestingly, we reveal that the multi-connectivity scheme is not always preferable, and the signal-to-noise ratio (SNR) threshold that determines the selection of the multi-connectivity scheme is derived as a function of the coding rate. Also, the optimal number of connections is obtained and shown to be a decreasing function of the transmit power. Simulation results demonstrate that the proposed scheme enables more than 15 folds of EE-PAoI gain at the low SNR than the single-connectivity scheme. △ Less

Submitted 1 January, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

arXiv:2312.14551 [pdf, other]

doi 10.1109/TMM.2022.3219646

DDistill-SR: Reparameterized Dynamic Distillation Network for Lightweight Image Super-Resolution

Authors: Yan Wang, Tongtong Su, Yusen Li, Jiuwen Cao, Gang Wang, Xiaoguang Liu

Abstract: Recent research on deep convolutional neural networks (CNNs) has provided a significant performance boost on efficient super-resolution (SR) tasks by trading off the performance and applicability. However, most existing methods focus on subtracting feature processing consumption to reduce the parameters and calculations without refining the immediate features, which leads to inadequate information… ▽ More Recent research on deep convolutional neural networks (CNNs) has provided a significant performance boost on efficient super-resolution (SR) tasks by trading off the performance and applicability. However, most existing methods focus on subtracting feature processing consumption to reduce the parameters and calculations without refining the immediate features, which leads to inadequate information in the restoration. In this paper, we propose a lightweight network termed DDistill-SR, which significantly improves the SR quality by capturing and reusing more helpful information in a static-dynamic feature distillation manner. Specifically, we propose a plug-in reparameterized dynamic unit (RDU) to promote the performance and inference cost trade-off. During the training phase, the RDU learns to linearly combine multiple reparameterizable blocks by analyzing varied input statistics to enhance layer-level representation. In the inference phase, the RDU is equally converted to simple dynamic convolutions that explicitly capture robust dynamic and static feature maps. Then, the information distillation block is constructed by several RDUs to enforce hierarchical refinement and selective fusion of spatial context information. Furthermore, we propose a dynamic distillation fusion (DDF) module to enable dynamic signals aggregation and communication between hierarchical modules to further improve performance. Empirical results show that our DDistill-SR outperforms the baselines and achieves state-of-the-art results on most super-resolution domains with much fewer parameters and less computational overhead. We have released the code of DDistill-SR at https://github.com/icandle/DDistill-SR. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: Accepted by IEEE Transactions on Multimedia (TMM)

Journal ref: IEEE Transactions on Multimedia, 25, 7222-7234 (2023)

arXiv:2312.09416 [pdf]

A Miniature Non-Uniform Conformal Antenna Array Using Fast Synthesis for Wide-Scan UAV Application

Authors: Yuanyan Su, Icaro V. Soares, Siegfred Daquioag Balon, Jun Cao, Denys Nikolayev, Anja K. Skrivervik

Abstract: To overcome the limited payload of lightweight vehicles such as unmanned aerial vehicle (UAV) and the aerodynamic constraints on the onboard radar, a compact nonuniform conformal array is proposed in order to achieve a wide beamscanning range and to reduce the sidelobes of the planar array. The non-uniform array consists of 7x4 elements where the inner two rows follow a geometric sequence while th… ▽ More To overcome the limited payload of lightweight vehicles such as unmanned aerial vehicle (UAV) and the aerodynamic constraints on the onboard radar, a compact nonuniform conformal array is proposed in order to achieve a wide beamscanning range and to reduce the sidelobes of the planar array. The non-uniform array consists of 7x4 elements where the inner two rows follow a geometric sequence while the outer two rows follow an arithmetic sequence along the x axis. The element spacing along the y axis is gradient from the center as well. This geometry not only provides more degrees of freedom to optimize the array radiation, but also reduces the computation cost when synthesizing the excitation and the configuration of the array for a specific beam pattern. As field cancellation may happen due to the convex and concave features of the non-canonical UAV surface, a fast and low-cost in-house code to calculate the radiation pattern of a large scale conformal array for an arbitrary surface and element pattern is employed to optimize the array structure. As a proof of concept, the proposed array with a total volume of 142x93x40 mm3 is implemented at ISM band (5.8 GHz) using a miniature widebeam single-layer patch antenna with a dimension of 0.12lambdax0.12lambdax0.025lambda. By using the beamforming technique, an active onboard system is measured, which achieves the maximum gain of 21.8 dBi and a scanning range of >50deg and -28deg~28deg with a small scan loss of 2.2 and 0.5 dB in elevation and azimuth, respectively. Therefore, our design has high potential for wireless communication and sensing on UAV. △ Less

Submitted 11 November, 2023; originally announced December 2023.

Comments: 11 pages,14 figures

arXiv:2312.07818 [pdf]

Brain Computer Interface Technology for Future Battlefield

Authors: Guodong Xiong, Xinyan Ma, Wei Li, Jiaqi Cao, Jian Zhong, Yicong Su

Abstract: With the development of artificial intelligence and unmanned equipment, human-machine hybrid formations will be the main focus in future combat formations. With the development of big data and various situational awareness technologies, while enhancing the breadth and depth of information, decision-making has also become more complex. The operation mode of existing unmanned equipment often require… ▽ More With the development of artificial intelligence and unmanned equipment, human-machine hybrid formations will be the main focus in future combat formations. With the development of big data and various situational awareness technologies, while enhancing the breadth and depth of information, decision-making has also become more complex. The operation mode of existing unmanned equipment often requires complex manual input, which is not conducive to the battlefield environment. How to reduce the cognitive load of information exchange between soldiers and various unmanned equipment is an important issue in future intelligent warfare. This paper proposes a brain computer interface communication system for soldier combat, which takes into account the characteristics of soldier combat scenarios in design. The stimulation paradigm is combined with helmets, portable computers, and firearms, and brain computer interface technology is used to achieve fast, barrier free, and hands-free communication between humans and machines. Intelligent algorithms are combined to assist decision-making in fully perceiving and fusing situational information on the battlefield, and a large amount of data is processed quickly, understanding and integrating a large amount of data from human and machine networks, achieving real-time perception of battlefield information, making intelligent decisions, and achieving the effect of direct control of drone swarms and other equipment by the human brain to assist in soldier scenarios. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 4 pages, 1 figure

arXiv:2311.02646 [pdf]

Flexible uniform-sampling foveated Fourier single-pixel imaging

Authors: Huan Cui, Jie Cao, Qun Hao, Haoyu Zhang, Chang Zhou

Abstract: Fourier single-pixel imaging (FSI) is a data-efficient single-pixel imaging (SPI). However, there is still a serious challenge to obtain higher imaging quality using fewer measurements, which limits the development of real-time SPI. In this work, a uniform-sampling foveated FSI (UFFSI) is proposed with three features, uniform sampling, effective sampling and flexible fovea, to achieve under-sampli… ▽ More Fourier single-pixel imaging (FSI) is a data-efficient single-pixel imaging (SPI). However, there is still a serious challenge to obtain higher imaging quality using fewer measurements, which limits the development of real-time SPI. In this work, a uniform-sampling foveated FSI (UFFSI) is proposed with three features, uniform sampling, effective sampling and flexible fovea, to achieve under-sampling high-efficiency and high-quality SPI, even in a large-scale scene. First, by flexibly using the three proposed foveated pattern structures, data redundancy is reduced significantly to only require high resolution (HR) on regions of interest (ROIs), which radically reduces the need of total data number. Next, by the non-uniform weight distribution processing, non-uniform spatial sampling is transformed into uniform sampling, then the fast Fourier transform is used accurately and directly to obtain under-sampling high imaging quality with further reduced measurements. At a sampling ratio of 0.0084 referring to HR FSI with 1024*768 pixels, experimentally, by UFFSI with 255*341 cells of 89% reduction in data redundancy, the ROI has a significantly better imaging quality to meet imaging needs. We hope this work can provide a breakthrough for future real-time SPI. △ Less

Submitted 5 November, 2023; originally announced November 2023.

Comments: 7 pages,5 figures

arXiv:2311.00246 [pdf, ps, other]

RAUNE-Net: A Residual and Attention-Driven Underwater Image Enhancement Method

Authors: Wangzhen Peng, Chenghao Zhou, Runze Hu, **gchao Cao, Yutao Liu

Abstract: Underwater image enhancement (UIE) poses challenges due to distinctive properties of the underwater environment, including low contrast, high turbidity, visual blurriness, and color distortion. In recent years, the application of deep learning has quietly revolutionized various areas of scientific research, including UIE. However, existing deep learning-based UIE methods generally suffer from issu… ▽ More Underwater image enhancement (UIE) poses challenges due to distinctive properties of the underwater environment, including low contrast, high turbidity, visual blurriness, and color distortion. In recent years, the application of deep learning has quietly revolutionized various areas of scientific research, including UIE. However, existing deep learning-based UIE methods generally suffer from issues of weak robustness and limited adaptability. In this paper, inspired by residual and attention mechanisms, we propose a more reliable and reasonable UIE network called RAUNE-Net by employing residual learning of high-level features at the network's bottle-neck and two aspects of attention manipulations in the down-sampling procedure. Furthermore, we collect and create two datasets specifically designed for evaluating UIE methods, which contains different types of underwater distortions and degradations. The experimental validation demonstrates that our method obtains promising objective performance and consistent visual results across various real-world underwater images compared to other eight UIE methods. Our example code and datasets are publicly available at https://github.com/fansuregrin/RAUNE-Net. △ Less

Submitted 31 October, 2023; originally announced November 2023.

arXiv:2310.11690 [pdf, other]

doi 10.1016/j.rser.2023.113913

Deep learning based on Transformer architecture for power system short-term voltage stability assessment with class imbalance

Authors: Yang Li, Jiting Cao, Yan Xu, Lipeng Zhu, Zhao Yang Dong

Abstract: Most existing data-driven power system short-term voltage stability assessment (STVSA) approaches presume class-balanced input data. However, in practical applications, the occurrence of short-term voltage instability following a disturbance is minimal, leading to a significant class imbalance problem and a consequent decline in classifier performance. This work proposes a Transformer-based STVSA… ▽ More Most existing data-driven power system short-term voltage stability assessment (STVSA) approaches presume class-balanced input data. However, in practical applications, the occurrence of short-term voltage instability following a disturbance is minimal, leading to a significant class imbalance problem and a consequent decline in classifier performance. This work proposes a Transformer-based STVSA method to address this challenge. By utilizing the basic Transformer architecture, a stability assessment Transformer (StaaT) is developed {as a classification model to reflect the correlation between the operational states of the system and the resulting stability outcomes}. To combat the negative impact of imbalanced datasets, this work employs a conditional Wasserstein generative adversarial network with gradient penalty (CWGAN-GP) for synthetic data generation, aiding in the creation of a balanced, representative training set for the classifier. Semi-supervised clustering learning is implemented to enhance clustering quality, addressing the lack of a unified quantitative criterion for short-term voltage stability. {Numerical tests on the IEEE 39-bus test system extensively demonstrate that the proposed method exhibits robust performance under class imbalances up to 100:1 and noisy environments, and maintains consistent effectiveness even with an increased penetration of renewable energy}. Comparative results reveal that the CWGAN-GP generates more balanced datasets than traditional oversampling methods and that the StaaT outperforms other deep learning algorithms. This study presents a compelling solution for real-world STVSA applications that often face class imbalance and data noise challenges. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: Accepted by Renewable and Sustainable Energy Reviews

Journal ref: Renewable and Sustainable Energy Reviews 189 (2024) 113913

arXiv:2309.11745 [pdf, other]

PIE: Simulating Disease Progression via Progressive Image Editing

Authors: Kaizhao Liang, Xu Cao, Kuei-Da Liao, Tianren Gao, Wenqian Ye, Zhengyu Chen, Jianguo Cao, Tejas Nama, Jimeng Sun

Abstract: Disease progression simulation is a crucial area of research that has significant implications for clinical diagnosis, prognosis, and treatment. One major challenge in this field is the lack of continuous medical imaging monitoring of individual patients over time. To address this issue, we develop a novel framework termed Progressive Image Editing (PIE) that enables controlled manipulation of dis… ▽ More Disease progression simulation is a crucial area of research that has significant implications for clinical diagnosis, prognosis, and treatment. One major challenge in this field is the lack of continuous medical imaging monitoring of individual patients over time. To address this issue, we develop a novel framework termed Progressive Image Editing (PIE) that enables controlled manipulation of disease-related image features, facilitating precise and realistic disease progression simulation. Specifically, we leverage recent advancements in text-to-image generative models to simulate disease progression accurately and personalize it for each patient. We theoretically analyze the iterative refining process in our framework as a gradient descent with an exponentially decayed learning rate. To validate our framework, we conduct experiments in three medical imaging domains. Our results demonstrate the superiority of PIE over existing methods such as Stable Diffusion Walk and Style-Based Manifold Extrapolation based on CLIP score (Realism) and Disease Classification Confidence (Alignment). Our user study collected feedback from 35 veteran physicians to assess the generated progressions. Remarkably, 76.2% of the feedback agrees with the fidelity of the generated progressions. To our best knowledge, PIE is the first of its kind to generate disease progression images meeting real-world standards. It is a promising tool for medical research and clinical practice, potentially allowing healthcare providers to model disease trajectories over time, predict future treatment responses, and improve patient outcomes. △ Less

Submitted 5 October, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: Code and checkpoints for replicating our results can be found at https://github.com/IrohXu/PIE and https://huggingface.co/IrohXu/stable-diffusion-mimic-cxr-v0.1

arXiv:2309.08323 [pdf]

MLP Based Continuous Gait Recognition of a Powered Ankle Prosthesis with Serial Elastic Actuator

Authors: Yanze Li, Feixing Chen, **gqi Cao, Ruoqi Zhao, Xuan Yang, Xingbang Yang, Yubo Fan

Abstract: Powered ankle prostheses effectively assist people with lower limb amputation to perform daily activities. High performance prostheses with adjustable compliance and capability to predict and implement amputee's intent are crucial for them to be comparable to or better than a real limb. However, current designs fail to provide simple yet effective compliance of the joint with full potential of mod… ▽ More Powered ankle prostheses effectively assist people with lower limb amputation to perform daily activities. High performance prostheses with adjustable compliance and capability to predict and implement amputee's intent are crucial for them to be comparable to or better than a real limb. However, current designs fail to provide simple yet effective compliance of the joint with full potential of modification, and lack accurate gait prediction method in real time. This paper proposes an innovative design of powered ankle prosthesis with serial elastic actuator (SEA), and puts forward a MLP based gait recognition method that can accurately and continuously predict more gait parameters for motion sensing and control. The prosthesis mimics biological joint with similar weight, torque, and power which can assist walking of up to 4 m/s. A new design of planar torsional spring is proposed for the SEA, which has better stiffness, endurance, and potential of modification than current designs. The gait recognition system simultaneously generates locomotive speed, gait phase, ankle angle and angular velocity only utilizing signals of single IMU, holding advantage in continuity, adaptability for speed range, accuracy, and capability of multi-functions. △ Less

Submitted 30 March, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: Submitted to IROS 2024

arXiv:2308.03953 [pdf]

doi 10.1109/TIM.2023.3311065

PMU measurements based short-term voltage stability assessment of power systems via deep transfer learning

Authors: Yang Li, Shitu Zhang, Yuanzheng Li, Jiting Cao, Shuyue Jia

Abstract: Deep learning has emerged as an effective solution for addressing the challenges of short-term voltage stability assessment (STVSA) in power systems. However, existing deep learning-based STVSA approaches face limitations in adapting to topological changes, sample labeling, and handling small datasets. To overcome these challenges, this paper proposes a novel phasor measurement unit (PMU) measurem… ▽ More Deep learning has emerged as an effective solution for addressing the challenges of short-term voltage stability assessment (STVSA) in power systems. However, existing deep learning-based STVSA approaches face limitations in adapting to topological changes, sample labeling, and handling small datasets. To overcome these challenges, this paper proposes a novel phasor measurement unit (PMU) measurements-based STVSA method by using deep transfer learning. The method leverages the real-time dynamic information captured by PMUs to create an initial dataset. It employs temporal ensembling for sample labeling and utilizes least squares generative adversarial networks (LSGAN) for data augmentation, enabling effective deep learning on small-scale datasets. Additionally, the method enhances adaptability to topological changes by exploring connections between different faults. Experimental results on the IEEE 39-bus test system demonstrate that the proposed method improves model evaluation accuracy by approximately 20% through transfer learning, exhibiting strong adaptability to topological changes. Leveraging the self-attention mechanism of the Transformer model, this approach offers significant advantages over shallow learning methods and other deep learning-based approaches. △ Less

Submitted 27 August, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

Comments: Accepted by IEEE Transactions on Instrumentation & Measurement

Journal ref: IEEE Transactions on Instrumentation and Measurement 72 (2023) 2526111

arXiv:2307.15874 [pdf, ps, other]

Resilient Controller Synthesis Against DoS Attacks for Vehicular Platooning in Spatial Domain

Authors: Jian Gong, Carlos Murguia, Anggera Bayuwindra, **de Cao

Abstract: This paper proposes a vehicular platoon control approach under Denial-of-Service (DoS) attacks and external disturbances. DoS attacks increase the service time on the communication network and cause additional transmission delays, which consequently increase the risk of rear-end collisions of vehicles in the platoon. To counter DoS attacks, we propose a resilient control scheme that exploits polyt… ▽ More This paper proposes a vehicular platoon control approach under Denial-of-Service (DoS) attacks and external disturbances. DoS attacks increase the service time on the communication network and cause additional transmission delays, which consequently increase the risk of rear-end collisions of vehicles in the platoon. To counter DoS attacks, we propose a resilient control scheme that exploits polytopic overapproximations of the closed-loop dynamics under DoS attacks. This scheme allows synthesizing robust controllers that guarantee tracking of both the desired spacing policy and spatially varying reference velocity for all space-varying DoS attacks satisfying a hard upper bound on the attack duration. In addition, L2 string stability conditions are derived to ensure that external perturbations do not grow as they propagate through the platoon, thus ensuring the string stability. Numerical simulations illustrate the effectiveness of the proposed control method. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2307.04133 [pdf, other]

Ultrasonic Image's Annotation Removal: A Self-supervised Noise2Noise Approach

Authors: Yuanheng Zhang, Nan Jiang, Zhaoheng Xie, Junying Cao, Yueyang Teng

Abstract: Accurately annotated ultrasonic images are vital components of a high-quality medical report. Hospitals often have strict guidelines on the types of annotations that should appear on imaging results. However, manually inspecting these images can be a cumbersome task. While a neural network could potentially automate the process, training such a model typically requires a dataset of paired input an… ▽ More Accurately annotated ultrasonic images are vital components of a high-quality medical report. Hospitals often have strict guidelines on the types of annotations that should appear on imaging results. However, manually inspecting these images can be a cumbersome task. While a neural network could potentially automate the process, training such a model typically requires a dataset of paired input and target images, which in turn involves significant human labour. This study introduces an automated approach for detecting annotations in images. This is achieved by treating the annotations as noise, creating a self-supervised pretext task and using a model trained under the Noise2Noise scheme to restore the image to a clean state. We tested a variety of model structures on the denoising task against different types of annotation, including body marker annotation, radial line annotation, etc. Our results demonstrate that most models trained under the Noise2Noise scheme outperformed their counterparts trained with noisy-clean data pairs. The costumed U-Net yielded the most optimal outcome on the body marker annotation dataset, with high scores on segmentation precision and reconstruction similarity. We released our code at https://github.com/GrandArth/UltrasonicImage-N2N-Approach. △ Less

Submitted 9 July, 2023; originally announced July 2023.

Comments: 10 pages, 7 figures

arXiv:2307.00535 [pdf, other]

Goal-oriented Tensor: Beyond Age of Information Towards Semantics-Empowered Goal-Oriented Communications

Authors: Aimin Li, Shaohua Wu, Sumei Sun, Jie Cao

Abstract: Optimizations premised on open-loop metrics such as Age of Information (AoI) indirectly enhance the system's decision-making utility. We therefore propose a novel closed-loop metric named Goal-oriented Tensor (GoT) to directly quantify the impact of semantic mismatches on goal-oriented decision-making utility. Leveraging the GoT, we consider a sampler & decision-maker pair that works collaborative… ▽ More Optimizations premised on open-loop metrics such as Age of Information (AoI) indirectly enhance the system's decision-making utility. We therefore propose a novel closed-loop metric named Goal-oriented Tensor (GoT) to directly quantify the impact of semantic mismatches on goal-oriented decision-making utility. Leveraging the GoT, we consider a sampler & decision-maker pair that works collaboratively and distributively to achieve a shared goal of communications. We formulate a two-agent infinite-horizon Decentralized Partially Observable Markov Decision Process (Dec-POMDP) to conjointly deduce the optimal deterministic sampling policy and decision-making policy. To circumvent the curse of dimensionality in obtaining an optimal deterministic joint policy through Brute-Force-Search, a sub-optimal yet computationally efficient algorithm is developed. This algorithm is predicated on the search for a Nash Equilibrium between the sampler and the decision-maker. Simulation results reveal that the proposed sampler & decision-maker co-design surpasses the current literature on AoI and its variants in terms of both goal achievement utility and sparse sampling rate, signifying progress in the semantics-conscious, goal-driven sparse sampling design. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: 30 pages, 9 figures. arXiv admin note: substantial text overlap with arXiv:2305.04083

arXiv:2306.05708 [pdf, other]

Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion

Authors: Haogeng Liu, Tao Wang, Jie Cao, Ran He, Jianhua Tao

Abstract: Denoising Diffusion Probabilistic Models have shown extraordinary ability on various generative tasks. However, their slow inference speed renders them impractical in speech synthesis. This paper proposes a linear diffusion model (LinDiff) based on an ordinary differential equation to simultaneously reach fast inference and high sample quality. Firstly, we employ linear interpolation between the t… ▽ More Denoising Diffusion Probabilistic Models have shown extraordinary ability on various generative tasks. However, their slow inference speed renders them impractical in speech synthesis. This paper proposes a linear diffusion model (LinDiff) based on an ordinary differential equation to simultaneously reach fast inference and high sample quality. Firstly, we employ linear interpolation between the target and noise to design a diffusion sequence for training, while previously the diffusion path that links the noise and target is a curved segment. When decreasing the number of sampling steps (i.e., the number of line segments used to fit the path), the ease of fitting straight lines compared to curves allows us to generate higher quality samples from a random noise with fewer iterations. Secondly, to reduce computational complexity and achieve effective global modeling of noisy speech, LinDiff employs a patch-based processing approach that partitions the input signal into small patches. The patch-wise token leverages Transformer architecture for effective modeling of global information. Adversarial training is used to further improve the sample quality with decreased sampling steps. We test proposed method with speech synthesis conditioned on acoustic feature (Mel-spectrograms). Experimental results verify that our model can synthesize high-quality speech even with only one diffusion step. Both subjective and objective evaluations demonstrate that our model can synthesize speech of a quality comparable to that of autoregressive models with faster synthesis speed (3 diffusion steps). △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

arXiv:2305.08995 [pdf, other]

Denoising Diffusion Models for Plug-and-Play Image Restoration

Authors: Yuanzhi Zhu, Kai Zhang, **gyun Liang, Jiezhang Cao, Bihan Wen, Radu Timofte, Luc Van Gool

Abstract: Plug-and-play Image Restoration (IR) has been widely recognized as a flexible and interpretable method for solving various inverse problems by utilizing any off-the-shelf denoiser as the implicit image prior. However, most existing methods focus on discriminative Gaussian denoisers. Although diffusion models have shown impressive performance for high-quality image synthesis, their potential to ser… ▽ More Plug-and-play Image Restoration (IR) has been widely recognized as a flexible and interpretable method for solving various inverse problems by utilizing any off-the-shelf denoiser as the implicit image prior. However, most existing methods focus on discriminative Gaussian denoisers. Although diffusion models have shown impressive performance for high-quality image synthesis, their potential to serve as a generative denoiser prior to the plug-and-play IR methods remains to be further explored. While several other attempts have been made to adopt diffusion models for image restoration, they either fail to achieve satisfactory results or typically require an unacceptable number of Neural Function Evaluations (NFEs) during inference. This paper proposes DiffPIR, which integrates the traditional plug-and-play method into the diffusion sampling framework. Compared to plug-and-play IR methods that rely on discriminative Gaussian denoisers, DiffPIR is expected to inherit the generative ability of diffusion models. Experimental results on three representative IR tasks, including super-resolution, image deblurring, and inpainting, demonstrate that DiffPIR achieves state-of-the-art performance on both the FFHQ and ImageNet datasets in terms of reconstruction faithfulness and perceptual quality with no more than 100 NFEs. The source code is available at {\url{https://github.com/yuanzhi-zhu/DiffPIR}} △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2305.05991 [pdf, other]

DMNR: Unsupervised De-noising of Point Clouds Corrupted by Airborne Particles

Authors: Chu Chen, Yanqi Ma, Bingcheng Dong, Junjie Cao

Abstract: LiDAR sensors are critical for autonomous driving and robotics applications due to their ability to provide accurate range measurements and their robustness to lighting conditions. However, airborne particles, such as fog, rain, snow, and dust, will degrade its performance and it is inevitable to encounter these inclement environmental conditions outdoors. It would be a straightforward approach to… ▽ More LiDAR sensors are critical for autonomous driving and robotics applications due to their ability to provide accurate range measurements and their robustness to lighting conditions. However, airborne particles, such as fog, rain, snow, and dust, will degrade its performance and it is inevitable to encounter these inclement environmental conditions outdoors. It would be a straightforward approach to remove them by supervised semantic segmentation. But annotating these particles point wisely is too laborious. To address this problem and enhance the perception under inclement conditions, we develop two dynamic filtering methods called Dynamic Multi-threshold Noise Removal (DMNR) and DMNR-H by accurate analysis of the position distribution and intensity characteristics of noisy points and clean points on publicly available WADS and DENSE datasets. Both DMNR and DMNR-H outperform state-of-the-art unsupervised methods by a significant margin on the two datasets and are slightly better than supervised deep learning-based methods. Furthermore, our methods are more robust to different LiDAR sensors and airborne particles, such as snow and fog. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: 8 pages, 6 figures, 15 references, submitted paper

arXiv:2305.04047 [pdf, other]

Degradation-Noise-Aware Deep Unfolding Transformer for Hyperspectral Image Denoising

Authors: Hai** Zeng, Jiezhang Cao, Kai Feng, Shaoguang Huang, Hongyan Zhang, Hiep Luong, Wilfried Philips

Abstract: Hyperspectral imaging (HI) has emerged as a powerful tool in diverse fields such as medical diagnosis, industrial inspection, and agriculture, owing to its ability to detect subtle differences in physical properties through high spectral resolution. However, hyperspectral images (HSIs) are often quite noisy because of narrow band spectral filtering. To reduce the noise in HSI data cubes, both mode… ▽ More Hyperspectral imaging (HI) has emerged as a powerful tool in diverse fields such as medical diagnosis, industrial inspection, and agriculture, owing to its ability to detect subtle differences in physical properties through high spectral resolution. However, hyperspectral images (HSIs) are often quite noisy because of narrow band spectral filtering. To reduce the noise in HSI data cubes, both model-driven and learning-based denoising algorithms have been proposed. However, model-based approaches rely on hand-crafted priors and hyperparameters, while learning-based methods are incapable of estimating the inherent degradation patterns and noise distributions in the imaging procedure, which could inform supervised learning. Secondly, learning-based algorithms predominantly rely on CNN and fail to capture long-range dependencies, resulting in limited interpretability. This paper proposes a Degradation-Noise-Aware Unfolding Network (DNA-Net) that addresses these issues. Firstly, DNA-Net models sparse noise, Gaussian noise, and explicitly represent image prior using transformer. Then the model is unfolded into an end-to-end network, the hyperparameters within the model are estimated from the noisy HSI and degradation model and utilizes them to control each iteration. Additionally, we introduce a novel U-Shaped Local-Non-local-Spectral Transformer (U-LNSA) that captures spectral correlation, local contents, and non-local dependencies simultaneously. By integrating U-LNSA into DNA-Net, we present the first Transformer-based deep unfolding HSI denoising method. Experimental results show that DNA-Net outperforms state-of-the-art methods, and the modeling of noise distributions helps in cases with heavy noise. △ Less

Submitted 6 May, 2023; originally announced May 2023.

arXiv:2305.02124 [pdf]

Adaptative Diffraction Image Registration for 4D-STEM to optimize ACOM Pattern Matching

Authors: Nicolas Folastre, Junhao Cao, Gozde Oney, Sunkyu Park, Arash Jamali, Christian Masquelier, Laurence Croguennec, Muriel Veron, Edgar F. Rauch, Arnaud Demortière

Abstract: The technique known as 4D-STEM has recently emerged as a powerful tool for the local characterization of crystalline structures in materials, such as cathode materials for Li-ion batteries or perovskite materials for photovoltaics. However, the use of new detectors optimized for electron diffraction patterns and other advanced techniques requires constant adaptation of methodologies to address the… ▽ More The technique known as 4D-STEM has recently emerged as a powerful tool for the local characterization of crystalline structures in materials, such as cathode materials for Li-ion batteries or perovskite materials for photovoltaics. However, the use of new detectors optimized for electron diffraction patterns and other advanced techniques requires constant adaptation of methodologies to address the challenges associated with crystalline materials. In this study, we present a novel image processing method to improve pattern matching in the determination of crystalline orientations and phases. Our approach uses sub-pixelar adaptative image processing to register and reconstruct electron diffraction signals in large 4D-STEM datasets. By using adaptive prominence and linear filters such as mean and gaussian blur, we are able to improve the quality of the diffraction pattern registration. The resulting data compression rate of 103 is well-suited for the era of big data and provides a significant enhancement in the performance of the entire ACOM data processing method. Our approach is evaluated using dedicated metrics, which demonstrate a high improvement in phase recognition. Our results demonstrate that this data preparation method not only enhances the quality of the resulting image but also boosts the confidence level in the analysis of the outcomes related to determining crystal orientation and phase. Additionally, it mitigates the impact of user bias that may occur during the application of the method through the manipulation of parameters. △ Less

Submitted 3 May, 2023; originally announced May 2023.

Comments: 22 pages (13 pages SI), 7 figures (10 figures SI)

arXiv:2304.07495 [pdf]

Anti-scattering medium computational ghost imaging with modified Hadamard patterns

Authors: Li-Xing Lin, Jie Cao, Qun Hao

Abstract: Illumination patterns of computational ghost imaging (CGI) systems suffer from reduced contrast when passing through a scattering medium, which causes the effective information in the reconstruction result to be drowned out by noise. A two-dimensional (2D) Gaussian filter performs linear smoothing operation on the whole image for image denoising. It can be combined with linear reconstruction algor… ▽ More Illumination patterns of computational ghost imaging (CGI) systems suffer from reduced contrast when passing through a scattering medium, which causes the effective information in the reconstruction result to be drowned out by noise. A two-dimensional (2D) Gaussian filter performs linear smoothing operation on the whole image for image denoising. It can be combined with linear reconstruction algorithms of CGI to obtain the noise-reduced results directly, without post-processing. However, it results in blurred image edges while performing denoising and, in addition, a suitable standard deviation is difficult to choose in advance, especially in an unknown scattering environment. In this work, we subtly exploit the characteristics of CGI to solve these two problems very well. A kind of modified Hadamard pattern based on the 2D Gaussian filter and the differential operation features of Hadamard-based CGI is developed. We analyze and demonstrate that using Hadamard patterns for illumination but using our developed modified Hadamard patterns for reconstruction (MHCGI) can enhance the robustness of CGI against turbid scattering medium. Our method not only helps directly obtain noise-reduced results without blurred edges but also requires only an approximate standard deviation, i.e., it can be set in advance. The experimental results on transmitted and reflected targets demonstrate the feasibility of our method. Our method helps to promote the practical application of CGI in the scattering environment. △ Less

Submitted 15 April, 2023; originally announced April 2023.

Comments: 14 pages, 7 figures

arXiv:2303.13571 [pdf, other]

Inheriting Bayer's Legacy-Joint Remosaicing and Denoising for Quad Bayer Image Sensor

Authors: Hai** Zeng, Kai Feng, Jiezhang Cao, Shaoguang Huang, Yongqiang Zhao, Hiep Luong, Jan Aelterman, Wilfried Philips

Abstract: Pixel binning based Quad sensors have emerged as a promising solution to overcome the hardware limitations of compact cameras in low-light imaging. However, binning results in lower spatial resolution and non-Bayer CFA artifacts. To address these challenges, we propose a dual-head joint remosaicing and denoising network (DJRD), which enables the conversion of noisy Quad Bayer and standard noise-fr… ▽ More Pixel binning based Quad sensors have emerged as a promising solution to overcome the hardware limitations of compact cameras in low-light imaging. However, binning results in lower spatial resolution and non-Bayer CFA artifacts. To address these challenges, we propose a dual-head joint remosaicing and denoising network (DJRD), which enables the conversion of noisy Quad Bayer and standard noise-free Bayer pattern without any resolution loss. DJRD includes a newly designed Quad Bayer remosaicing (QB-Re) block, integrated denoising modules based on Swin-transformer and multi-scale wavelet transform. The QB-Re block constructs the convolution kernel based on the CFA pattern to achieve a periodic color distribution in the perceptual field, which is used to extract exact spectral information and reduce color misalignment. The integrated Swin-Transformer and multi-scale wavelet transform capture non-local dependencies, frequency and location information to effectively reduce practical noise. By identifying challenging patches utilizing Moire and zipper detection metrics, we enable our model to concentrate on difficult patches during the post-training phase, which enhances the model's performance in hard cases. Our proposed model outperforms competing models by approximately 3dB, without additional complexity in hardware or software. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.13404 [pdf, other]

MSFA-Frequency-Aware Transformer for Hyperspectral Images Demosaicing

Authors: Hai** Zeng, Kai Feng, Shaoguang Huang, Jiezhang Cao, Yongyong Chen, Hongyan Zhang, Hiep Luong, Wilfried Philips

Abstract: Hyperspectral imaging systems that use multispectral filter arrays (MSFA) capture only one spectral component in each pixel. Hyperspectral demosaicing is used to recover the non-measured components. While deep learning methods have shown promise in this area, they still suffer from several challenges, including limited modeling of non-local dependencies, lack of consideration of the periodic MSFA… ▽ More Hyperspectral imaging systems that use multispectral filter arrays (MSFA) capture only one spectral component in each pixel. Hyperspectral demosaicing is used to recover the non-measured components. While deep learning methods have shown promise in this area, they still suffer from several challenges, including limited modeling of non-local dependencies, lack of consideration of the periodic MSFA pattern that could be linked to periodic artifacts, and difficulty in recovering high-frequency details. To address these challenges, this paper proposes a novel de-mosaicing framework, the MSFA-frequency-aware Transformer network (FDM-Net). FDM-Net integrates a novel MSFA-frequency-aware multi-head self-attention mechanism (MaFormer) and a filter-based Fourier zero-padding method to reconstruct high pass components with greater difficulty and low pass components with relative ease, separately. The advantage of Maformer is that it can leverage the MSFA information and non-local dependencies present in the data. Additionally, we introduce a joint spatial and frequency loss to transfer MSFA information and enhance training on frequency components that are hard to recover. Our experimental results demonstrate that FDM-Net outperforms state-of-the-art methods with 6dB PSNR, and reconstructs high-fidelity details successfully. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.06877 [pdf, other]

Progressive Open Space Expansion for Open-Set Model Attribution

Authors: Tianyun Yang, Danding Wang, Fan Tang, Xinying Zhao, Juan Cao, Sheng Tang

Abstract: Despite the remarkable progress in generative technology, the Janus-faced issues of intellectual property protection and malicious content supervision have arisen. Efforts have been paid to manage synthetic images by attributing them to a set of potential source models. However, the closed-set classification setting limits the application in real-world scenarios for handling contents generated by… ▽ More Despite the remarkable progress in generative technology, the Janus-faced issues of intellectual property protection and malicious content supervision have arisen. Efforts have been paid to manage synthetic images by attributing them to a set of potential source models. However, the closed-set classification setting limits the application in real-world scenarios for handling contents generated by arbitrary models. In this study, we focus on a challenging task, namely Open-Set Model Attribution (OSMA), to simultaneously attribute images to known models and identify those from unknown ones. Compared to existing open-set recognition (OSR) tasks focusing on semantic novelty, OSMA is more challenging as the distinction between images from known and unknown models may only lie in visually imperceptible traces. To this end, we propose a Progressive Open Space Expansion (POSE) solution, which simulates open-set samples that maintain the same semantics as closed-set samples but embedded with different imperceptible traces. Guided by a diversity constraint, the open space is simulated progressively by a set of lightweight augmentation models. We consider three real-world scenarios and construct an OSMA benchmark dataset, including unknown models trained with different random seeds, architectures, and datasets from known ones. Extensive experiments on the dataset demonstrate POSE is superior to both existing model attribution methods and off-the-shelf OSR methods. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: accepted to CVPR2023

arXiv:2302.07468 [pdf]

Edge-weighted pFISTA-Net for MRI Reconstruction

Authors: Jianpeng Cao

Abstract: Deep learning based on unrolled algorithm has served as an effective method for accelerated magnetic resonance imaging (MRI). However, many methods ignore the direct use of edge information to assist MRI reconstruction. In this work, we present the edge-weighted pFISTA-Net that directly applies the detected edge map to the soft-thresholding part of pFISTA-Net. The soft-thresholding value of differ… ▽ More Deep learning based on unrolled algorithm has served as an effective method for accelerated magnetic resonance imaging (MRI). However, many methods ignore the direct use of edge information to assist MRI reconstruction. In this work, we present the edge-weighted pFISTA-Net that directly applies the detected edge map to the soft-thresholding part of pFISTA-Net. The soft-thresholding value of different regions will be adjusted according to the edge map. Experimental results of a public brain dataset show that the proposed yields reconstructions with lower error and better artifact suppression compared with the state-of-the-art deep learning-based methods. The edge-weighted pFISTA-Net also shows robustness for different undersampling masks and edge detection operators. In addition, we extend the edge weighted structure to joint reconstruction and segmentation network and obtain improved reconstruction performance and more accurate segmentation results. △ Less

Submitted 14 February, 2023; originally announced February 2023.

arXiv:2211.13479 [pdf]

Alternating Deep Low-Rank Approach for Exponential Function Reconstruction and Its Biomedical Magnetic Resonance Applications

Authors: Yihui Huang, Zi Wang, Xinlin Zhang, Jian Cao, Zhangren Tu, Mei** Lin, Di Guo, Xiaobo Qu

Abstract: Undersampling can accelerate the signal acquisition but at the cost of bringing in artifacts. Removing these artifacts is a fundamental problem in signal processing and this task is also called signal reconstruction. Through modeling signals as the superimposed exponential functions, deep learning has achieved fast and high-fidelity signal reconstruction by training a map** from the undersampled… ▽ More Undersampling can accelerate the signal acquisition but at the cost of bringing in artifacts. Removing these artifacts is a fundamental problem in signal processing and this task is also called signal reconstruction. Through modeling signals as the superimposed exponential functions, deep learning has achieved fast and high-fidelity signal reconstruction by training a map** from the undersampled exponentials to the fully sampled ones. However, the mismatch, such as the sampling rate of undersampling, the organ and the contrast of imaging, between the training and target data will heavily compromise the reconstruction. To address this issue, we propose Alternating Deep Low-Rank (ADLR), which combines deep learning solvers and classic optimization solvers. Experiments on the reconstruction of synthetic and realistic biomedical magnetic resonance signals demonstrate that ADLR can effectively mitigate the mismatch issue and achieve lower reconstruction errors than state-of-the-art methods. △ Less

Submitted 8 August, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

Comments: 13 pages

arXiv:2211.09672 [pdf, other]

Network-Wide Task Offloading With LEO Satellites: A Computation and Transmission Fusion Approach

Authors: Jiaqi Cao, Shengli Zhang, Qingxia Chen, Houtian Wang, Mingzhe Wang, Nai** Liu

Abstract: Computing tasks are ubiquitous in space missions. Conventionally, these tasks are offloaded to ground servers for computation, where the transmission of raw data on satellite-to-ground links severely constrains the performance. To overcome this limitation, recent works offload tasks to visible low-earth-orbit (LEO) satellites. However, this offloading scheme is difficult to achieve good performanc… ▽ More Computing tasks are ubiquitous in space missions. Conventionally, these tasks are offloaded to ground servers for computation, where the transmission of raw data on satellite-to-ground links severely constrains the performance. To overcome this limitation, recent works offload tasks to visible low-earth-orbit (LEO) satellites. However, this offloading scheme is difficult to achieve good performance in actual networks with uneven loads because visible satellites over hotspots tend to be overloaded. Therefore, it is urgent to extend the offloading targets to the entire network. To address the network-wide offloading problem, we propose a metagraph-based computation and transmission fusion offloading scheme for multi-tier networks. Specifically, virtual edges, between the original network and its duplicate, are generated to fuse computation and transmission in single-tier networks. Then, a metagraph is employed to integrate the fusion in multi-tier networks. After assigning appropriate edge weights to the metagraph, the network-wide offloading problem can be solved by searching the shortest path. In addition, we apply the proposed scheme to solve the spatial computation offloading problem in a real multi-tier network. The superiority of the proposed scheme over two benchmark schemes are proved theoretically and empirically. Simulation results show that the proposed scheme decreases the weighted average delay by up to 87.51% and 18.70% compared with the ground offloading scheme and the visible offloading scheme, respectively. △ Less

Submitted 16 November, 2022; originally announced November 2022.

arXiv:2211.08820 [pdf, other]

Computing-Aware Routing for LEO Satellite Networks: A Transmission and Computation Integration Approach

Authors: Jiaqi Cao, Shengli Zhang, Qingxia Chen, Houtian Wang, Mingzhe Wang, Nai** Liu

Abstract: The advancements of remote sensing (RS) pose increasingly high demands on computation and transmission resources. Conventional ground-offloading techniques, which transmit large amounts of raw data to the ground, suffer from poor satellite-to-ground link quality. In addition, existing satellite-offloading techniques, which offload computational tasks to low earth orbit (LEO) satellites located wit… ▽ More The advancements of remote sensing (RS) pose increasingly high demands on computation and transmission resources. Conventional ground-offloading techniques, which transmit large amounts of raw data to the ground, suffer from poor satellite-to-ground link quality. In addition, existing satellite-offloading techniques, which offload computational tasks to low earth orbit (LEO) satellites located within the visible range of RS satellites for processing, cannot leverage the full computing capability of the network because the computational resources of visible LEO satellites are limited. This situation is even worse in hotspot areas. In this paper, for efficient offloading via LEO satellite networks, we propose a novel computing-aware routing scheme. It fuses the transmission and computation processes and optimizes the overall delay of both. Specifically, we first model the LEO satellite network as a snapshot-free dynamic network, whose nodes and edges both have time-varying weights. By utilizing time-varying network parameters to characterize the network dynamics, the proposed method establishes a continuous-time model which scales well on large networks and improves the accuracy. Next, we propose a computing-aware routing scheme following the model. It processes tasks during the routing process instead of offloading raw data to ground stations, reducing the overall delay and avoiding network congestion consequently. Finally, we formulate the computing-aware routing problem in the dynamic network as a combination of multiple dynamic single source shortest path (DSSSP) problems and propose a genetic algorithm (GA) based method to approximate the results in a reasonable time. Simulation results show that the computing-aware routing scheme decreases the overall delay by up to 78.31% compared with offloading raw data to the ground to process. △ Less

Submitted 16 November, 2022; originally announced November 2022.

arXiv:2211.08779 [pdf, other]

Adaptive Task Offloading for Space Missions: A State-Graph-Based Approach

Authors: Jiaqi Cao, Shengli Zhang, Mingzhe Wang, Qingxia Chen, Houtian Wang, Nai** Liu

Abstract: Advances in space exploration have led to an explosion of tasks. Conventionally, these tasks are offloaded to ground servers for enhanced computing capability, or to adjacent low-earth-orbit satellites for reduced transmission delay. However, the overall delay is determined by both computation and transmission costs. The existing offloading schemes, while being highly-optimized for either costs, c… ▽ More Advances in space exploration have led to an explosion of tasks. Conventionally, these tasks are offloaded to ground servers for enhanced computing capability, or to adjacent low-earth-orbit satellites for reduced transmission delay. However, the overall delay is determined by both computation and transmission costs. The existing offloading schemes, while being highly-optimized for either costs, can be abysmal for the overall performance. The computation-transmission cost dilemma is yet to be solved. In this paper, we propose an adaptive offloading scheme to reduce the overall delay. The core idea is to jointly model and optimize the transmission-computation process over the entire network. Specifically, to represent the computation state migrations, we generalize graph nodes with multiple states. In this way, the joint optimization problem is transformed into a shortest path problem over the state graph. We further provide an extended Dijkstra's algorithm for efficient path finding. Simulation results show that the proposed scheme outperforms the ground and one-hop offloading schemes by up to 37.56% and 39.35% respectively on SpaceCube v2.0. △ Less

Submitted 16 November, 2022; originally announced November 2022.

arXiv:2210.01476 [pdf, ps, other]

Learning-based Design of Luenberger Observers for Autonomous Nonlinear Systems

Authors: Muhammad Umar B. Niazi, John Cao, Xudong Sun, Amritam Das, Karl Henrik Johansson

Abstract: Designing Luenberger observers for nonlinear systems involves the challenging task of transforming the state to an alternate coordinate system, possibly of higher dimensions, where the system is asymptotically stable and linear up to output injection. The observer then estimates the system's state in the original coordinates by inverting the transformation map. However, finding a suitable injectiv… ▽ More Designing Luenberger observers for nonlinear systems involves the challenging task of transforming the state to an alternate coordinate system, possibly of higher dimensions, where the system is asymptotically stable and linear up to output injection. The observer then estimates the system's state in the original coordinates by inverting the transformation map. However, finding a suitable injective transformation whose inverse can be derived remains a primary challenge for general nonlinear systems. We propose a novel approach that uses supervised physics-informed neural networks to approximate both the transformation and its inverse. Our method exhibits superior generalization capabilities to contemporary methods and demonstrates robustness to both neural network's approximation errors and system uncertainties. △ Less

Submitted 5 April, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: Proceedings of the 2023 American Control Conference (ACC)

arXiv:2209.12265 [pdf, other]

Cooperative Sensing and Heterogeneous Information Fusion in VCPS: A Multi-agent Deep Reinforcement Learning Approach

Authors: Xincao Xu, Kai Liu, Penglin Dai, Ruitao Xie, **g**g Cao, Jiangtao Luo

Abstract: Cooperative sensing and heterogeneous information fusion are critical to realize vehicular cyber-physical systems (VCPSs). This paper makes the first attempt to quantitatively measure the quality of VCPS by designing a new metric called Age of View (AoV). Specifically, we first present the system architecture where heterogeneous information can be cooperatively sensed and uploaded via vehicle-to-i… ▽ More Cooperative sensing and heterogeneous information fusion are critical to realize vehicular cyber-physical systems (VCPSs). This paper makes the first attempt to quantitatively measure the quality of VCPS by designing a new metric called Age of View (AoV). Specifically, we first present the system architecture where heterogeneous information can be cooperatively sensed and uploaded via vehicle-to-infrastructure (V2I) communications in vehicular edge computing (VEC). Logical views are constructed by fusing the heterogeneous information at edge nodes. Further, we formulate the problem by deriving a cooperative sensing model based on the multi-class M/G/1 priority queue, and defining the AoV by modeling the timeliness, completeness and consistency of the logical views. On this basis, a multi-agent deep reinforcement learning solution is proposed. In particular, the system state includes vehicle sensed information, edge cached information and view requirements. The vehicle action space consists of the sensing frequencies and uploading priorities of information. A difference-reward-based credit assignment is designed to divide the system reward, which is defined as the VCPS quality, into the difference reward for vehicles. Edge node allocates V2I bandwidth to vehicles based on predicted vehicle trajectories and view requirements. Finally, we build the simulation model and give a comprehensive performance evaluation, which conclusively demonstrates the superiority of the proposed solution. △ Less

Submitted 27 January, 2023; v1 submitted 25 September, 2022; originally announced September 2022.

arXiv:2209.00196 [pdf, other]

Group frame neural network of moving object ghost imaging combined with frame merging algorithm

Authors: Da Chen, Shan-Guo Feng, Hua-Hua Wang, Jia-Ning Cao, Zhi-Wei Zhang, Zhi-Xin Yang, Ao Yan, Lu Gao, Ze Zhang

Abstract: The nature of multiple samples to extract correlation information limits the applications of ghost imaging of moving objects. A novel multi-to-one neural network is proposed and the concept of "batch frame" is introduced to improve the serial imaging method. The neural network extracts more correlation information from a small number of samples, thus reducing the sampling ratio of the ghost imagin… ▽ More The nature of multiple samples to extract correlation information limits the applications of ghost imaging of moving objects. A novel multi-to-one neural network is proposed and the concept of "batch frame" is introduced to improve the serial imaging method. The neural network extracts more correlation information from a small number of samples, thus reducing the sampling ratio of the ghost imaging technique. We combine the correlation characteristics between images to propose a frame merging algorithm, which eliminates the dynamic blur of high-speed moving objects and further improves the reconstruction quality of moving object images at a low sampling ratio. The experimental results are consistent with the simulation results. △ Less

Submitted 31 August, 2022; originally announced September 2022.

Comments: 12 pages, 7 figures

arXiv:2208.12779 [pdf, other]

Battery and Hydrogen Energy Storage Control in a Smart Energy Network with Flexible Energy Demand using Deep Reinforcement Learning

Authors: Cephas Samende, Zhong Fan, Jun Cao

Abstract: Smart energy networks provide for an effective means to accommodate high penetrations of variable renewable energy sources like solar and wind, which are key for deep decarbonisation of energy production. However, given the variability of the renewables as well as the energy demand, it is imperative to develop effective control and energy storage schemes to manage the variable energy generation an… ▽ More Smart energy networks provide for an effective means to accommodate high penetrations of variable renewable energy sources like solar and wind, which are key for deep decarbonisation of energy production. However, given the variability of the renewables as well as the energy demand, it is imperative to develop effective control and energy storage schemes to manage the variable energy generation and achieve desired system economics and environmental goals. In this paper, we introduce a hybrid energy storage system composed of battery and hydrogen energy storage to handle the uncertainties related to electricity prices, renewable energy production and consumption. We aim to improve renewable energy utilisation and minimise energy costs and carbon emissions while ensuring energy reliability and stability within the network. To achieve this, we propose a multi-agent deep deterministic policy gradient approach, which is a deep reinforcement learning-based control strategy to optimise the scheduling of the hybrid energy storage system and energy demand in real-time. The proposed approach is model-free and does not require explicit knowledge and rigorous mathematical models of the smart energy network environment. Simulation results based on real-world data show that: (i) integration and optimised operation of the hybrid energy storage system and energy demand reduces carbon emissions by 78.69%, improves cost savings by 23.5% and renewable energy utilisation by over 13.2% compared to other baseline models and (ii) the proposed algorithm outperforms the state-of-the-art self-learning algorithms like deep-Q network. △ Less

Submitted 26 August, 2022; originally announced August 2022.

Comments: 13 pages, 10 figures

arXiv:2208.11803 [pdf, other]

Learning Task-Oriented Flows to Mutually Guide Feature Alignment in Synthesized and Real Video Denoising

Authors: Jiezhang Cao, Qin Wang, **gyun Liang, Yulun Zhang, Kai Zhang, Radu Timofte, Luc Van Gool

Abstract: Video denoising aims at removing noise from videos to recover clean ones. Some existing works show that optical flow can help the denoising by exploiting the additional spatial-temporal clues from nearby frames. However, the flow estimation itself is also sensitive to noise, and can be unusable under large noise levels. To this end, we propose a new multi-scale refined optical flow-guided video de… ▽ More Video denoising aims at removing noise from videos to recover clean ones. Some existing works show that optical flow can help the denoising by exploiting the additional spatial-temporal clues from nearby frames. However, the flow estimation itself is also sensitive to noise, and can be unusable under large noise levels. To this end, we propose a new multi-scale refined optical flow-guided video denoising method, which is more robust to different noise levels. Our method mainly consists of a denoising-oriented flow refinement (DFR) module and a flow-guided mutual denoising propagation (FMDP) module. Unlike previous works that directly use off-the-shelf flow solutions, DFR first learns robust multi-scale optical flows, and FMDP makes use of the flow guidance by progressively introducing and refining more flow information from low resolution to high resolution. Together with real noise degradation synthesis, the proposed multi-scale flow-guided denoising network achieves state-of-the-art performance on both synthetic Gaussian denoising and real video denoising. The codes will be made publicly available. △ Less

Submitted 25 March, 2023; v1 submitted 24 August, 2022; originally announced August 2022.

arXiv:2206.02146 [pdf, other]

Recurrent Video Restoration Transformer with Guided Deformable Attention

Authors: **gyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ranjan, Eddy Ilg, Simon Green, Jiezhang Cao, Kai Zhang, Radu Timofte, Luc Van Gool

Abstract: Video restoration aims at restoring multiple high-quality frames from multiple low-quality frames. Existing video restoration methods generally fall into two extreme cases, i.e., they either restore all frames in parallel or restore the video frame by frame in a recurrent way, which would result in different merits and drawbacks. Typically, the former has the advantage of temporal information fusi… ▽ More Video restoration aims at restoring multiple high-quality frames from multiple low-quality frames. Existing video restoration methods generally fall into two extreme cases, i.e., they either restore all frames in parallel or restore the video frame by frame in a recurrent way, which would result in different merits and drawbacks. Typically, the former has the advantage of temporal information fusion. However, it suffers from large model size and intensive memory consumption; the latter has a relatively small model size as it shares parameters across frames; however, it lacks long-range dependency modeling ability and parallelizability. In this paper, we attempt to integrate the advantages of the two cases by proposing a recurrent video restoration transformer, namely RVRT. RVRT processes local neighboring frames in parallel within a globally recurrent framework which can achieve a good trade-off between model size, effectiveness, and efficiency. Specifically, RVRT divides the video into multiple clips and uses the previously inferred clip feature to estimate the subsequent clip feature. Within each clip, different frame features are jointly updated with implicit feature aggregation. Across different clips, the guided deformable attention is designed for clip-to-clip alignment, which predicts multiple relevant locations from the whole inferred clip and aggregates their features by the attention mechanism. Extensive experiments on video super-resolution, deblurring, and denoising show that the proposed RVRT achieves state-of-the-art performance on benchmark datasets with balanced model size, testing memory and runtime. △ Less

Submitted 12 November, 2022; v1 submitted 5 June, 2022; originally announced June 2022.

Comments: Accepted by NeurIPS 2022. Code: https://github.com/**gyunLiang/RVRT

arXiv:2205.02682 [pdf]

doi 10.1016/j.optcom.2022.128982

Temporally and Spatially variant-resolution illumination patterns in computational ghost imaging

Authors: Dong Zhou, Jie Cao, Huan Cui, Li-Xing Lin, Haoyu Zhang, Yingqiang Zhang, Qun Hao

Abstract: Conventional computational ghost imaging (CGI) uses light carrying a sequence of patterns with uniform-resolution to illuminate the object, then performs correlation calculation based on the light intensity value reflected by the target and the preset patterns to obtain object image. It requires a large number of measurements to obtain high-quality images, especially if high-resolution images are… ▽ More Conventional computational ghost imaging (CGI) uses light carrying a sequence of patterns with uniform-resolution to illuminate the object, then performs correlation calculation based on the light intensity value reflected by the target and the preset patterns to obtain object image. It requires a large number of measurements to obtain high-quality images, especially if high-resolution images are to be obtained. To solve this problem, we developed temporally variable-resolution illumination patterns, replacing the conventional uniform-resolution illumination patterns with a sequence of patterns of different imaging resolutions. In addition, we propose to combine temporally variable-resolution illumination patterns and spatially variable-resolution structure to develop temporally and spatially variable-resolution (TSV) illumination patterns, which not only improve the imaging quality of the region of interest (ROI) but also improve the robustness to noise. The methods using proposed illumination patterns are verified by simulations and experiments compared with CGI. For the same number of measurements, the method using temporally variable-resolution illumination patterns has better imaging quality than CGI, but it is less robust to noise. The method using TSV illumination patterns has better imaging quality in ROI than the method using temporally variable-resolution illumination patterns and CGI under the same number of measurements. We also experimentally verify that the method using TSV patterns have better imaging performance when applied to higher resolution imaging. The proposed methods are expected to solve the current computational ghost imaging that is difficult to achieve high-resolution and high-quality imaging. △ Less

Submitted 14 May, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

arXiv:2205.01550 [pdf, other]

Point Cloud Semantic Segmentation using Multi Scale Sparse Convolution Neural Network

Authors: Yunzheng Su, Lei Jiang, Jie Cao

Abstract: In recent years, with the development of computing resources and LiDAR, point cloud semantic segmentation has attracted many researchers. For the sparsity of point clouds, although there is already a way to deal with sparse convolution, multi-scale features are not considered. In this letter, we propose a feature extraction module based on multi-scale sparse convolution and a feature selection mod… ▽ More In recent years, with the development of computing resources and LiDAR, point cloud semantic segmentation has attracted many researchers. For the sparsity of point clouds, although there is already a way to deal with sparse convolution, multi-scale features are not considered. In this letter, we propose a feature extraction module based on multi-scale sparse convolution and a feature selection module based on channel attention and build a point cloud segmentation network framework based on this. By introducing multi-scale sparse convolution, the network could capture richer feature information based on convolution kernels with different sizes, improving the segmentation result of point cloud segmentation. Experimental results on Stanford large-scale 3-D Indoor Spaces(S3DIS) dataset and outdoor dataset(SemanticKITTI), demonstrate effectiveness and superiority of the proposed mothod. △ Less

Submitted 29 June, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

arXiv:2204.03939 [pdf, ps, other]

GigaST: A 10,000-hour Pseudo Speech Translation Corpus

Authors: Rong Ye, Chengqi Zhao, Tom Ko, Chutong Meng, Tao Wang, Mingxuan Wang, Jun Cao

Abstract: This paper introduces GigaST, a large-scale pseudo speech translation (ST) corpus. We create the corpus by translating the text in GigaSpeech, an English ASR corpus, into German and Chinese. The training set is translated by a strong machine translation system and the test set is translated by human. ST models trained with an addition of our corpus obtain new state-of-the-art results on the MuST-C… ▽ More This paper introduces GigaST, a large-scale pseudo speech translation (ST) corpus. We create the corpus by translating the text in GigaSpeech, an English ASR corpus, into German and Chinese. The training set is translated by a strong machine translation system and the test set is translated by human. ST models trained with an addition of our corpus obtain new state-of-the-art results on the MuST-C English-German benchmark test set. We provide a detailed description of the translation process and verify its quality. We make the translated text data public and hope to facilitate research in speech translation. Additionally, we also release the training scripts on NeurST to make it easy to replicate our systems. GigaST dataset is available at https://st-benchmark.github.io/resources/GigaST. △ Less

Submitted 6 June, 2023; v1 submitted 8 April, 2022; originally announced April 2022.

Comments: Accepted at Interspeech 2023. GigaST dataset is available at https://st-benchmark.github.io/resources/GigaST

arXiv:2204.00293 [pdf]

The role of living laboratories in unlocking the potential of low-carbon energy technologies on the journey to net-zero

Authors: Zhong Fan, Jun Cao, Taskin Jamal, Chris Fogwill, Cephas Samende, Zoe Robinson, Fiona Polack, Mark Ormerod, Sharon George, Adam Peacock, David Healey

Abstract: We demonstrate the potential role of one of the largest at scale multi-vector Smart Energy Network Demonstrator (SEND). We demonstrate the potential role of one of the largest at scale multi-vector Smart Energy Network Demonstrator (SEND). △ Less

Submitted 1 April, 2022; originally announced April 2022.

Showing 1–50 of 90 results for author: Cao, J