Search | arXiv e-print repository

CaFNet: A Confidence-Driven Framework for Radar Camera Depth Estimation

Authors: Huawei Sun, Hao Feng, Julius Ott, Lorenzo Servadei, Robert Wille

Abstract: Depth estimation is critical in autonomous driving for interpreting 3D scenes accurately. Recently, radar-camera depth estimation has become of sufficient interest due to the robustness and low-cost properties of radar. Thus, this paper introduces a two-stage, end-to-end trainable Confidence-aware Fusion Net (CaFNet) for dense depth estimation, combining RGB imagery with sparse and noisy radar poi… ▽ More Depth estimation is critical in autonomous driving for interpreting 3D scenes accurately. Recently, radar-camera depth estimation has become of sufficient interest due to the robustness and low-cost properties of radar. Thus, this paper introduces a two-stage, end-to-end trainable Confidence-aware Fusion Net (CaFNet) for dense depth estimation, combining RGB imagery with sparse and noisy radar point cloud data. The first stage addresses radar-specific challenges, such as ambiguous elevation and noisy measurements, by predicting a radar confidence map and a preliminary coarse depth map. A novel approach is presented for generating the ground truth for the confidence map, which involves associating each radar point with its corresponding object to identify potential projection surfaces. These maps, together with the initial radar input, are processed by a second encoder. For the final depth estimation, we innovate a confidence-aware gated fusion mechanism to integrate radar and image features effectively, thereby enhancing the reliability of the depth map by filtering out radar noise. Our methodology, evaluated on the nuScenes dataset, demonstrates superior performance, improving upon the current leading model by 3.2% in Mean Absolute Error (MAE) and 2.7% in Root Mean Square Error (RMSE). △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: Accepted by IROS 2024

arXiv:2406.17244 [pdf, other]

A Near-Field Super-Resolution Network for Accelerating Antenna Characterization

Authors: Yuchen Gu, Hai-Han Sun, Daniel W. van der Weide

Abstract: We present a deep neural network-enabled method to accelerate near-field (NF) antenna measurement. We develop a Near-field Super-resolution Network (NFS-Net) to reconstruct significantly undersampled near-field data as high-resolution data, which considerably reduces the number of sampling points required for NF measurement and thus improves measurement efficiency. The high-resolution near-field d… ▽ More We present a deep neural network-enabled method to accelerate near-field (NF) antenna measurement. We develop a Near-field Super-resolution Network (NFS-Net) to reconstruct significantly undersampled near-field data as high-resolution data, which considerably reduces the number of sampling points required for NF measurement and thus improves measurement efficiency. The high-resolution near-field data reconstructed by the network is further processed by a near-field-to-far-field (NF2FF) transformation to obtain far-field antenna radiation patterns. Our experiments demonstrate that the NFS-Net exhibits both accuracy and generalizability in restoring high-resolution near-field data from low-resolution input. The NF measurement workflow that combines the NFS-Net and the NF2FF algorithm enables accurate radiation pattern characterization with only 11% of the Nyquist rate samples. Though the experiments in this study are conducted on a planar setup with a uniform grid, the proposed method can serve as a universal strategy to accelerate measurements under different setups and conditions. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2406.12300 [pdf]

IR2QSM: Quantitative Susceptibility Map** via Deep Neural Networks with Iterative Reverse Concatenations and Recurrent Modules

Authors: Min Li, Chen Chen, Zhuang Xiong, Ying Liu, Pengfei Rong, Shanshan Shan, Feng Liu, Hongfu Sun, Yang Gao

Abstract: Quantitative susceptibility map** (QSM) is an MRI phase-based post-processing technique to extract the distribution of tissue susceptibilities, demonstrating significant potential in studying neurological diseases. However, the ill-conditioned nature of dipole inversion makes QSM reconstruction from the tissue field prone to noise and artifacts. In this work, we propose a novel deep learning-bas… ▽ More Quantitative susceptibility map** (QSM) is an MRI phase-based post-processing technique to extract the distribution of tissue susceptibilities, demonstrating significant potential in studying neurological diseases. However, the ill-conditioned nature of dipole inversion makes QSM reconstruction from the tissue field prone to noise and artifacts. In this work, we propose a novel deep learning-based IR2QSM method for QSM reconstruction. It is designed by iterating four times of a reverse concatenations and middle recurrent modules enhanced U-net, which could dramatically improve the efficiency of latent feature utilization. Simulated and in vivo experiments were conducted to compare IR2QSM with several traditional algorithms (MEDI and iLSQR) and state-of-the-art deep learning methods (U-net, xQSM, and LPCNN). The results indicated that IR2QSM was able to obtain QSM images with significantly increased accuracy and mitigated artifacts over other methods. Particularly, IR2QSM demonstrated on average the best NRMSE (27.59%) in simulated experiments, which is 15.48%, 7.86%, 17.24%, 9.26%, and 29.13% lower than iLSQR, MEDI, U-net, xQSM, LPCNN, respectively, and led to improved QSM results with fewer artifacts for the in vivo data. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 10 pages, 9 figures

arXiv:2406.04158 [pdf, other]

Sparse Multi-baseline SAR Cross-modal 3D Reconstruction of Vehicle Targets

Authors: Da Li, Guoqiang Zhao, Houjun Sun, Jiacheng Bao

Abstract: Multi-baseline SAR 3D imaging faces significant challenges due to data sparsity. In recent years, deep learning techniques have achieved notable success in enhancing the quality of sparse SAR 3D imaging. However, previous work typically rely on full-aperture high-resolution radar images to supervise the training of deep neural networks (DNNs), utilizing only single-modal information from radar dat… ▽ More Multi-baseline SAR 3D imaging faces significant challenges due to data sparsity. In recent years, deep learning techniques have achieved notable success in enhancing the quality of sparse SAR 3D imaging. However, previous work typically rely on full-aperture high-resolution radar images to supervise the training of deep neural networks (DNNs), utilizing only single-modal information from radar data. Consequently, imaging performance is limited, and acquiring full-aperture data for multi-baseline SAR is costly and sometimes impractical in real-world applications. In this paper, we propose a Cross-Modal Reconstruction Network (CMR-Net), which integrates differentiable render and cross-modal supervision with optical images to reconstruct highly sparse multi-baseline SAR 3D images of vehicle targets into visually structured and high-resolution images. We meticulously designed the network architecture and training strategies to enhance network generalization capability. Remarkably, CMR-Net, trained solely on simulated data, demonstrates high-resolution reconstruction capabilities on both publicly available simulation datasets and real measured datasets, outperforming traditional sparse reconstruction algorithms based on compressed sensing and other learning-based methods. Additionally, using optical images as supervision provides a cost-effective way to build training datasets, reducing the difficulty of method dissemination. Our work showcases the broad prospects of deep learning in multi-baseline SAR 3D imaging and offers a novel path for researching radar imaging based on cross-modal learning theory. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03814 [pdf, other]

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores

Authors: Jiaming Zhou, Shiwan Zhao, Hui Wang, Tian-Hao Zhang, Haoqin Sun, Xuechen Wang, Yong Qin

Abstract: The kNN-CTC model has proven to be effective for monolingual automatic speech recognition (ASR). However, its direct application to multilingual scenarios like code-switching, presents challenges. Although there is potential for performance improvement, a kNN-CTC model utilizing a single bilingual datastore can inadvertently introduce undesirable noise from the alternative language. To address thi… ▽ More The kNN-CTC model has proven to be effective for monolingual automatic speech recognition (ASR). However, its direct application to multilingual scenarios like code-switching, presents challenges. Although there is potential for performance improvement, a kNN-CTC model utilizing a single bilingual datastore can inadvertently introduce undesirable noise from the alternative language. To address this, we propose a novel kNN-CTC-based code-switching ASR (CS-ASR) framework that employs dual monolingual datastores and a gated datastore selection mechanism to reduce noise interference. Our method selects the appropriate datastore for decoding each frame, ensuring the injection of language-specific information into the ASR process. We apply this framework to cutting-edge CTC-based models, develo** an advanced CS-ASR system. Extensive experiments demonstrate the remarkable effectiveness of our gated datastore mechanism in enhancing the performance of zero-shot Chinese-English CS-ASR. △ Less

Submitted 13 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.00163 [pdf, other]

A Stochastic Incentive-based Demand Response Program for Virtual Power Plant with Solar, Battery, Electric Vehicles, and Controllable Loads

Authors: Pratik Harsh, Hongjian Sun, Debapriya Das, Goyal Awagan, **g Jiang

Abstract: The growing integration of distributed energy resources (DERs) into the power grid necessitates an effective coordination strategy to maximize their benefits. Acting as an aggregator of DERs, a virtual power plant (VPP) facilitates this coordination, thereby amplifying their impact on the transmission level of the power grid. Further, a demand response program enhances the scheduling approach by m… ▽ More The growing integration of distributed energy resources (DERs) into the power grid necessitates an effective coordination strategy to maximize their benefits. Acting as an aggregator of DERs, a virtual power plant (VPP) facilitates this coordination, thereby amplifying their impact on the transmission level of the power grid. Further, a demand response program enhances the scheduling approach by managing the energy demands in parallel with the uncertain energy outputs of the DERs. This work presents a stochastic incentive-based demand response model for the scheduling operation of VPP comprising solar-powered generating stations, battery swap** stations, electric vehicle charging stations, and consumers with controllable loads. The work also proposes a priority mechanism to consider the individual preferences of electric vehicle users and consumers with controllable loads. The scheduling approach for the VPP is framed as a multi-objective optimization problem, normalized using the utopia-tracking method. Subsequently, the normalized optimization problem is transformed into a stochastic formulation to address uncertainties in energy demand from charging stations and controllable loads. The proposed VPP scheduling approach is addressed on a 33-node distribution system simulated using MATLAB software, which is further validated using a real-time digital simulator. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 11 pages, 8 figures, submitted to IEEE Transactions on Industry Applications for potential publication

arXiv:2405.14221 [pdf, other]

doi 10.1109/JETCAS.2024.3403524

Survey on Visual Signal Coding and Processing with Generative Models: Technologies, Standards and Optimization

Authors: Zhibo Chen, Heming Sun, Li Zhang, Fan Zhang

Abstract: This paper provides a survey of the latest developments in visual signal coding and processing with generative models. Specifically, our focus is on presenting the advancement of generative models and their influence on research in the domain of visual signal coding and processing. This survey study begins with a brief introduction of well-established generative models, including the Variational A… ▽ More This paper provides a survey of the latest developments in visual signal coding and processing with generative models. Specifically, our focus is on presenting the advancement of generative models and their influence on research in the domain of visual signal coding and processing. This survey study begins with a brief introduction of well-established generative models, including the Variational Autoencoder (VAE) models, Generative Adversarial Network (GAN) models, Autoregressive (AR) models, Normalizing Flows and Diffusion models. The subsequent section of the paper explores the advancements in visual signal coding based on generative models, as well as the ongoing international standardization activities. In the realm of visual signal processing, our focus lies on the application and development of various generative models in the research of visual signal restoration. We also present the latest developments in generative visual signal synthesis and editing, along with visual signal quality assessment using generative models and quality assessment for generative models. The practical implementation of these studies is closely linked to the investigation of fast optimization. This paper additionally presents the latest advancements in fast optimization on visual signal coding and processing with generative models. We hope to advance this field by providing researchers and practitioners a comprehensive literature review on the topic of visual signal coding and processing with generative models. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.12432 [pdf, ps, other]

Power Measurement Based Channel Estimation for IRS-Enhanced Wireless Coverage

Authors: He Sun, Lipeng Zhu, Weidong Mei, Rui Zhang

Abstract: In this paper, we study an IRS-assisted coverage enhancement problem for a given region, aiming to optimize the passive reflection of the IRS for improving the average communication performance in the region by accounting for both deterministic and random channels in the environment. To this end, we first derive the closed-form expression of the average received signal power in terms of the determ… ▽ More In this paper, we study an IRS-assisted coverage enhancement problem for a given region, aiming to optimize the passive reflection of the IRS for improving the average communication performance in the region by accounting for both deterministic and random channels in the environment. To this end, we first derive the closed-form expression of the average received signal power in terms of the deterministic base station (BS)-IRS-user cascaded channels over all user locations, and propose an IRS-aided coverage enhancement framework to facilitate the estimation of such deterministic channels for IRS passive reflection design. Specifically, to avoid the exorbitant overhead of estimating the cascaded channels at all possible user locations, a location selection method is first proposed to select only a set of typical user locations for channel estimation by exploiting the channel spatial correlation in the region. To estimate the deterministic cascaded channels at the selected user locations, conventional IRS channel estimation methods require additional pilot signals, which not only results in high system training overhead but also may not be compatible with the existing communication protocols. To overcome this issue, we further propose a single-layer neural network (NN)-enabled IRS channel estimation method in this paper, based on only the average received signal power measurements at each selected location corresponding to different IRS random training reflections, which can be offline implemented in current wireless systems. Numerical results demonstrate that our proposed scheme can significantly improve the coverage performance of the target region and outperform the existing power-measurement-based IRS reflection designs. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2309.08275

arXiv:2405.09554 [pdf, ps, other]

Underdetermined DOA Estimation of Off-Grid Sources Based on the Generalized Double Pareto Prior

Authors: Yongfeng Huang, Zhendong Chen, Kun Ye, Lang Zhou, Haixin Sun

Abstract: In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to mod… ▽ More In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to model the real array manifold matrix, and Bayesian inference is utilized to calculate the off-grid error, which mitigates the grid dictionary mismatch problem in underdetermined scenarios. Secondly, an innovative grid refinement method is introduced, treating grid points as iterative parameters to minimize the modeling error between the source and grid points. The numerical simulation results verify the superiority of the proposed strategy, especially when dealing with a coarse grid and few snapshots. △ Less

Submitted 17 May, 2024; v1 submitted 18 April, 2024; originally announced May 2024.

arXiv:2405.06342 [pdf, other]

Compression-Realized Deep Structural Network for Video Quality Enhancement

Authors: Hanchi Sun, Xiaohong Liu, Xinyang Jiang, Yifei Shen, Dongsheng Li, Xiongkuo Min, Guangtao Zhai

Abstract: This paper focuses on the task of quality enhancement for compressed videos. Although deep network-based video restorers achieve impressive progress, most of the existing methods lack a structured design to optimally leverage the priors within compression codecs. Since the quality degradation of the video is primarily induced by the compression algorithm, a new paradigm is urgently needed for a mo… ▽ More This paper focuses on the task of quality enhancement for compressed videos. Although deep network-based video restorers achieve impressive progress, most of the existing methods lack a structured design to optimally leverage the priors within compression codecs. Since the quality degradation of the video is primarily induced by the compression algorithm, a new paradigm is urgently needed for a more "conscious" process of quality enhancement. As a result, we propose the Compression-Realize Deep Structural Network (CRDS), introducing three inductive biases aligned with the three primary processes in the classic compression codec, merging the strengths of classical encoder architecture with deep network capabilities. Inspired by the residual extraction and domain transformation process in the codec, a pre-trained Latent Degradation Residual Auto-Encoder is proposed to transform video frames into a latent feature space, and the mutual neighborhood attention mechanism is integrated for precise motion estimation and residual extraction. Furthermore, drawing inspiration from the quantization noise distribution of the codec, CRDS proposes a novel Progressive Denoising framework with intermediate supervision that decomposes the quality enhancement into a series of simpler denoising sub-tasks. Experimental results on datasets like LDV 2.0 and MFQE 2.0 indicate our approach surpasses state-of-the-art models. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.01515 [pdf, other]

Model-based Deep Learning for Rate Split Multiple Access in Vehicular Communications

Authors: Hanwen Zhang, Mingzhe Chen, Alireza Vahid, Haijian Sun

Abstract: Rate split multiple access (RSMA) has been proven as an effective communication scheme for 5G and beyond, especially in vehicular scenarios. However, RSMA requires complicated iterative algorithms for proper resource allocation, which cannot fulfill the stringent latency requirement in resource constrained vehicles. Although data driven approaches can alleviate this issue, they suffer from poor ge… ▽ More Rate split multiple access (RSMA) has been proven as an effective communication scheme for 5G and beyond, especially in vehicular scenarios. However, RSMA requires complicated iterative algorithms for proper resource allocation, which cannot fulfill the stringent latency requirement in resource constrained vehicles. Although data driven approaches can alleviate this issue, they suffer from poor generalizability and scarce training data. In this paper, we propose a fractional programming (FP) based deep unfolding (DU) approach to address resource allocation problem for a weighted sum rate optimization in RSMA. By carefully designing the penalty function, we couple the variable update with projected gradient descent algorithm (PGD). Following the structure of PGD, we embed few learnable parameters in each layer of the DU network. Through extensive simulation, we have shown that the proposed model-based neural networks has similar performance as optimal results given by traditional algorithm but with much lower computational complexity, less training data, and higher resilience to test set data and out-of-distribution (OOD) data. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: submitted to IEEE conference

arXiv:2405.00719 [pdf, other]

EEG-Deformer: A Dense Convolutional Transformer for Brain-computer Interfaces

Authors: Yi Ding, Yong Li, Hao Sun, Rui Liu, Chengxuan Tong, Cuntai Guan

Abstract: Effectively learning the temporal dynamics in electroencephalogram (EEG) signals is challenging yet essential for decoding brain activities using brain-computer interfaces (BCIs). Although Transformers are popular for their long-term sequential learning ability in the BCI field, most methods combining Transformers with convolutional neural networks (CNNs) fail to capture the coarse-to-fine tempora… ▽ More Effectively learning the temporal dynamics in electroencephalogram (EEG) signals is challenging yet essential for decoding brain activities using brain-computer interfaces (BCIs). Although Transformers are popular for their long-term sequential learning ability in the BCI field, most methods combining Transformers with convolutional neural networks (CNNs) fail to capture the coarse-to-fine temporal dynamics of EEG signals. To overcome this limitation, we introduce EEG-Deformer, which incorporates two main novel components into a CNN-Transformer: (1) a Hierarchical Coarse-to-Fine Transformer (HCT) block that integrates a Fine-grained Temporal Learning (FTL) branch into Transformers, effectively discerning coarse-to-fine temporal patterns; and (2) a Dense Information Purification (DIP) module, which utilizes multi-level, purified temporal information to enhance decoding accuracy. Comprehensive experiments on three representative cognitive tasks consistently verify the generalizability of our proposed EEG-Deformer, demonstrating that it either outperforms existing state-of-the-art methods or is comparable to them. Visualization results show that EEG-Deformer learns from neurophysiologically meaningful brain regions for the corresponding cognitive tasks. The source code can be found at https://github.com/yi-ding-cs/EEG-Deformer. △ Less

Submitted 25 April, 2024; originally announced May 2024.

Comments: 10 pages, 9 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2404.19415 [pdf, other]

Two-Stage Robust Planning Model for Park-Level Integrated Energy System Considering Uncertain Equipment Contingency

Authors: Zuxun Xiong, Xinwei Shen, Hongbin Sun

Abstract: In this paper, we propose a two-stage robust planning model for an Integrated Energy System (IES) that serves an industrial park. The term 'Park-level IES' is used to refers to IES of a smaller scale but have high demands for various forms of energy. The proposed planning model considers uncertainties like load demand fluctuations and equipment contingencies, and provides a reliable scheme of equi… ▽ More In this paper, we propose a two-stage robust planning model for an Integrated Energy System (IES) that serves an industrial park. The term 'Park-level IES' is used to refers to IES of a smaller scale but have high demands for various forms of energy. The proposed planning model considers uncertainties like load demand fluctuations and equipment contingencies, and provides a reliable scheme of equipment selection and sizing for IES investors. Inspired by the unit commitment problem, we formulate an equipment contingency uncertainty set to accurately describe the potential equipment contingencies which happen and can be repaired within a day. Then, a novel and modified nested column-and-constraint generation algorithm is applied to solve this two-stage robust planning model with integer recourse efficiently. In the case study, the role of energy storage system for IES reliability enhancement is analyzed in detail. Computational results demonstrate the advantage of the proposed models over the deterministic planning model in terms of improving reliability. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.12887 [pdf, other]

3D Multi-frame Fusion for Video Stabilization

Authors: Zhan Peng, Xinyi Ye, Weiyue Zhao, Tianqi Liu, Huiqiang Sun, Baopu Li, Zhiguo Cao

Abstract: In this paper, we present RStab, a novel framework for video stabilization that integrates 3D multi-frame fusion through volume rendering. Departing from conventional methods, we introduce a 3D multi-frame perspective to generate stabilized images, addressing the challenge of full-frame generation while preserving structure. The core of our approach lies in Stabilized Rendering (SR), a volume rend… ▽ More In this paper, we present RStab, a novel framework for video stabilization that integrates 3D multi-frame fusion through volume rendering. Departing from conventional methods, we introduce a 3D multi-frame perspective to generate stabilized images, addressing the challenge of full-frame generation while preserving structure. The core of our approach lies in Stabilized Rendering (SR), a volume rendering module, which extends beyond the image fusion by incorporating feature fusion. The core of our RStab framework lies in Stabilized Rendering (SR), a volume rendering module, fusing multi-frame information in 3D space. Specifically, SR involves war** features and colors from multiple frames by projection, fusing them into descriptors to render the stabilized image. However, the precision of warped information depends on the projection accuracy, a factor significantly influenced by dynamic regions. In response, we introduce the Adaptive Ray Range (ARR) module to integrate depth priors, adaptively defining the sampling range for the projection process. Additionally, we propose Color Correction (CC) assisting geometric constraints with optical flow for accurate color aggregation. Thanks to the three modules, our RStab demonstrates superior performance compared with previous stabilizers in the field of view (FOV), image quality, and video stability across various datasets. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2404.08286 [pdf, other]

Energy-modified Leverage Sampling for Radio Map Construction via Matrix Completion

Authors: Hao Sun, Junting Chen

Abstract: This paper explores an energy-modified leverage sampling strategy for matrix completion in radio map construction. The main goal is to address potential identifiability issues in matrix completion with sparse observations by using a probabilistic sampling approach. Although conventional leverage sampling is commonly employed for designing sampling patterns, it often assigns high sampling probabili… ▽ More This paper explores an energy-modified leverage sampling strategy for matrix completion in radio map construction. The main goal is to address potential identifiability issues in matrix completion with sparse observations by using a probabilistic sampling approach. Although conventional leverage sampling is commonly employed for designing sampling patterns, it often assigns high sampling probability to locations with low received signal strength (RSS) values, leading to a low sampling efficiency. Theoretical analysis demonstrates that the leverage score produces pseudo images of sources, and in the regions around the source locations, the leverage probability is asymptotically consistent with the RSS. Based on this finding, an energy-modified leverage probability-based sampling strategy is investigated for efficient sampling. Numerical demonstrations indicate that the proposed sampling strategy can decrease the normalized mean squared error (NMSE) of radio map construction by more than 10% for both matrix completion and interpolation-assisted matrix completion schemes, compared to conventional methods. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.08224

HCL-MTSAD: Hierarchical Contrastive Consistency Learning for Accurate Detection of Industrial Multivariate Time Series Anomalies

Authors: Haili Sun, Yan Huang, Lansheng Han, Cai Fu, Chunjie Zhou

Abstract: Multivariate Time Series (MTS) anomaly detection focuses on pinpointing samples that diverge from standard operational patterns, which is crucial for ensuring the safety and security of industrial applications. The primary challenge in this domain is to develop representations capable of discerning anomalies effectively. The prevalent methods for anomaly detection in the literature are predominant… ▽ More Multivariate Time Series (MTS) anomaly detection focuses on pinpointing samples that diverge from standard operational patterns, which is crucial for ensuring the safety and security of industrial applications. The primary challenge in this domain is to develop representations capable of discerning anomalies effectively. The prevalent methods for anomaly detection in the literature are predominantly reconstruction-based and predictive in nature. However, they typically concentrate on a single-dimensional instance level, thereby not fully harnessing the complex associations inherent in industrial MTS. To address this issue, we propose a novel self-supervised hierarchical contrastive consistency learning method for detecting anomalies in MTS, named HCL-MTSAD. It innovatively leverages data consistency at multiple levels inherent in industrial MTS, systematically capturing consistent associations across four latent levels-measurement, sample, channel, and process. By develo** a multi-layer contrastive loss, HCL-MTSAD can extensively mine data consistency and spatio-temporal association, resulting in more informative representations. Subsequently, an anomaly discrimination module, grounded in self-supervised hierarchical contrastive learning, is designed to detect timestamp-level anomalies by calculating multi-scale data consistency. Extensive experiments conducted on six diverse MTS datasets retrieved from real cyber-physical systems and server machines, in comparison with 20 baselines, indicate that HCL-MTSAD's anomaly detection capability outperforms the state-of-the-art benchmark models by an average of 1.8\% in terms of F1 score. △ Less

Submitted 18 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: This paper is a manuscript that is still in the process of revision, including Table 1, Figure 2, problem definition in section III.B and method description proposed in section IV. In addition, the submitter has not been authorized by the first author and other co-authors to post the paper to arXiv

arXiv:2404.08120 [pdf, other]

A least-square method for non-asymptotic identification in linear switching control

Authors: Haoyuan Sun, Ali Jadbabaie

Abstract: The focus of this paper is on linear system identification in the setting where it is known that the underlying partially-observed linear dynamical system lies within a finite collection of known candidate models. We first consider the problem of identification from a given trajectory, which in this setting reduces to identifying the index of the true model with high probability. We characterize t… ▽ More The focus of this paper is on linear system identification in the setting where it is known that the underlying partially-observed linear dynamical system lies within a finite collection of known candidate models. We first consider the problem of identification from a given trajectory, which in this setting reduces to identifying the index of the true model with high probability. We characterize the finite-time sample complexity of this problem by leveraging recent advances in the non-asymptotic analysis of linear least-square methods in the literature. In comparison to the earlier results that assume no prior knowledge of the system, our approach takes advantage of the smaller hypothesis class and leads to the design of a learner with a dimension-free sample complexity bound. Next, we consider the switching control of linear systems, where there is a candidate controller for each of the candidate models and data is collected through interaction of the system with a collection of potentially destabilizing controllers. We develop a dimension-dependent criterion that can detect those destabilizing controllers in finite time. By leveraging these results, we propose a data-driven switching strategy that identifies the unknown parameters of the underlying system. We then provide a non-asymptotic analysis of its performance and discuss its implications on the classical method of estimator-based supervisory control. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.06165 [pdf, other]

Enhanced Radar Perception via Multi-Task Learning: Towards Refined Data for Sensor Fusion Applications

Authors: Huawei Sun, Hao Feng, Gianfranco Mauro, Julius Ott, Georg Stettinger, Lorenzo Servadei, Robert Wille

Abstract: Radar and camera fusion yields robustness in perception tasks by leveraging the strength of both sensors. The typical extracted radar point cloud is 2D without height information due to insufficient antennas along the elevation axis, which challenges the network performance. This work introduces a learning-based approach to infer the height of radar points associated with 3D objects. A novel robus… ▽ More Radar and camera fusion yields robustness in perception tasks by leveraging the strength of both sensors. The typical extracted radar point cloud is 2D without height information due to insufficient antennas along the elevation axis, which challenges the network performance. This work introduces a learning-based approach to infer the height of radar points associated with 3D objects. A novel robust regression loss is introduced to address the sparse target challenge. In addition, a multi-task training strategy is employed, emphasizing important features. The average radar absolute height error decreases from 1.69 to 0.25 meters compared to the state-of-the-art height extension method. The estimated target height values are used to preprocess and enrich radar data for downstream perception tasks. Integrating this refined radar information further enhances the performance of existing radar camera fusion models for object detection and depth estimation tasks. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: Accepted by IEEE Intelligent Vehicles Symposium (IV 2024)

arXiv:2404.03706 [pdf, other]

Bi-level Guided Diffusion Models for Zero-Shot Medical Imaging Inverse Problems

Authors: Hossein Askari, Fred Roosta, Hongfu Sun

Abstract: In the realm of medical imaging, inverse problems aim to infer high-quality images from incomplete, noisy measurements, with the objective of minimizing expenses and risks to patients in clinical settings. The Diffusion Models have recently emerged as a promising approach to such practical challenges, proving particularly useful for the zero-shot inference of images from partially acquired measure… ▽ More In the realm of medical imaging, inverse problems aim to infer high-quality images from incomplete, noisy measurements, with the objective of minimizing expenses and risks to patients in clinical settings. The Diffusion Models have recently emerged as a promising approach to such practical challenges, proving particularly useful for the zero-shot inference of images from partially acquired measurements in Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). A central challenge in this approach, however, is how to guide an unconditional prediction to conform to the measurement information. Existing methods rely on deficient projection or inefficient posterior score approximation guidance, which often leads to suboptimal performance. In this paper, we propose \underline{\textbf{B}}i-level \underline{G}uided \underline{D}iffusion \underline{M}odels ({BGDM}), a zero-shot imaging framework that efficiently steers the initial unconditional prediction through a \emph{bi-level} guidance strategy. Specifically, BGDM first approximates an \emph{inner-level} conditional posterior mean as an initial measurement-consistent reference point and then solves an \emph{outer-level} proximal optimization objective to reinforce the measurement consistency. Our experimental findings, using publicly available MRI and CT medical datasets, reveal that BGDM is more effective and efficient compared to the baselines, faithfully generating high-fidelity medical images and substantially reducing hallucinatory artifacts in cases of severe degradation. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 19 pages, 14 figures

arXiv:2403.16830 [pdf, other]

Exploring Communication Technologies, Standards, and Challenges in Electrified Vehicle Charging

Authors: Xiang Ma, Yuan Zhou, Hanwen Zhang, Qun Wang, Haijian Sun, Hongjie Wang, Rose Qingyang Hu

Abstract: As public awareness of environmental protection continues to grow, the trend of integrating more electric vehicles (EVs) into the transportation sector is rising. Unlike conventional internal combustion engine (ICE) vehicles, EVs can minimize carbon emissions and potentially achieve autonomous driving. However, several obstacles hinder the widespread adoption of EVs, such as their constrained driv… ▽ More As public awareness of environmental protection continues to grow, the trend of integrating more electric vehicles (EVs) into the transportation sector is rising. Unlike conventional internal combustion engine (ICE) vehicles, EVs can minimize carbon emissions and potentially achieve autonomous driving. However, several obstacles hinder the widespread adoption of EVs, such as their constrained driving range and the extended time required for charging. One alternative solution to address these challenges is implementing dynamic wireless power transfer (DWPT), charging EVs in motion on the road. Moreover, charging stations with static wireless power transfer (SWPT) infrastructure can replace existing gas stations, enabling users to charge EVs in parking lots or at home. This paper surveys the communication infrastructure for static and dynamic wireless charging in electric vehicles. It encompasses all communication aspects involved in the wireless charging process. The architecture and communication requirements for static and dynamic wireless charging are presented separately. Additionally, a comprehensive comparison of existing communication standards is provided. The communication with the grid is also explored in detail. The survey gives attention to security and privacy issues arising during communications. In summary, the paper addresses the challenges and outlines upcoming trends in communication for EV wireless charging. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: submitted to IET Communication as a survey paper

arXiv:2403.14070 [pdf]

QSMDiff: Unsupervised 3D Diffusion Models for Quantitative Susceptibility Map**

Authors: Zhuang Xiong, Wei Jiang, Yang Gao, Feng Liu, Hongfu Sun

Abstract: Quantitative Susceptibility Map** (QSM) dipole inversion is an ill-posed inverse problem for quantifying magnetic susceptibility distributions from MRI tissue phases. While supervised deep learning methods have shown success in specific QSM tasks, their generalizability across different acquisition scenarios remains constrained. Recent developments in diffusion models have demonstrated potential… ▽ More Quantitative Susceptibility Map** (QSM) dipole inversion is an ill-posed inverse problem for quantifying magnetic susceptibility distributions from MRI tissue phases. While supervised deep learning methods have shown success in specific QSM tasks, their generalizability across different acquisition scenarios remains constrained. Recent developments in diffusion models have demonstrated potential for solving 2D medical imaging inverse problems. However, their application to 3D modalities, such as QSM, remains challenging due to high computational demands. In this work, we developed a 3D image patch-based diffusion model, namely QSMDiff, for robust QSM reconstruction across different scan parameters, alongside simultaneous super-resolution and image-denoising tasks. QSMDiff adopts unsupervised 3D image patch training and full-size measurement guidance during inference for controlled image generation. Evaluation on simulated and in-vivo human brains, using gradient-echo and echo-planar imaging sequences across different acquisition parameters, demonstrates superior performance. The method proposed in QSMDiff also holds promise for impacting other 3D medical imaging applications beyond QSM. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.07390 [pdf, other]

Learning Correction Errors via Frequency-Self Attention for Blind Image Super-Resolution

Authors: Haochen Sun, Yan Yuan, Lijuan Su, Haotian Shao

Abstract: Previous approaches for blind image super-resolution (SR) have relied on degradation estimation to restore high-resolution (HR) images from their low-resolution (LR) counterparts. However, accurate degradation estimation poses significant challenges. The SR model's incompatibility with degradation estimation methods, particularly the Correction Filter, may significantly impair performance as a res… ▽ More Previous approaches for blind image super-resolution (SR) have relied on degradation estimation to restore high-resolution (HR) images from their low-resolution (LR) counterparts. However, accurate degradation estimation poses significant challenges. The SR model's incompatibility with degradation estimation methods, particularly the Correction Filter, may significantly impair performance as a result of correction errors. In this paper, we introduce a novel blind SR approach that focuses on Learning Correction Errors (LCE). Our method employs a lightweight Corrector to obtain a corrected low-resolution (CLR) image. Subsequently, within an SR network, we jointly optimize SR performance by utilizing both the original LR image and the frequency learning of the CLR image. Additionally, we propose a new Frequency-Self Attention block (FSAB) that enhances the global information utilization ability of Transformer. This block integrates both self-attention and frequency spatial attention mechanisms. Extensive ablation and comparison experiments conducted across various settings demonstrate the superiority of our method in terms of visual quality and accuracy. Our approach effectively addresses the challenges associated with degradation estimation and correction errors, paving the way for more accurate blind image SR. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 16 pages

arXiv:2403.02616 [pdf]

Unsupervised Spatio-Temporal State Estimation for Fine-grained Adaptive Anomaly Diagnosis of Industrial Cyber-physical Systems

Authors: Haili Sun, Yan Huang, Lansheng Han, Cai Fu, Chunjie Zhou

Abstract: Accurate detection and diagnosis of abnormal behaviors such as network attacks from multivariate time series (MTS) are crucial for ensuring the stable and effective operation of industrial cyber-physical systems (CPS). However, existing researches pay little attention to the logical dependencies among system working states, and have difficulties in explaining the evolution mechanisms of abnormal s… ▽ More Accurate detection and diagnosis of abnormal behaviors such as network attacks from multivariate time series (MTS) are crucial for ensuring the stable and effective operation of industrial cyber-physical systems (CPS). However, existing researches pay little attention to the logical dependencies among system working states, and have difficulties in explaining the evolution mechanisms of abnormal signals. To reveal the spatio-temporal association relationships and evolution mechanisms of the working states of industrial CPS, this paper proposes a fine-grained adaptive anomaly diagnosis method (i.e. MAD-Transformer) to identify and diagnose anomalies in MTS. MAD-Transformer first constructs a temporal state matrix to characterize and estimate the change patterns of the system states in the temporal dimension. Then, to better locate the anomalies, a spatial state matrix is also constructed to capture the inter-sensor state correlation relationships within the system. Subsequently, based on these two types of state matrices, a three-branch structure of series-temporal-spatial attention module is designed to simultaneously capture the series, temporal, and space dependencies among MTS. Afterwards, three associated alignment loss functions and a reconstruction loss are constructed to jointly optimize the model. Finally, anomalies are determined and diagnosed by comparing the residual matrices with the original matrices. We conducted comparative experiments on five publicly datasets spanning three application domains (service monitoring, spatial and earth exploration, and water treatment), along with a petroleum refining simulation dataset collected by ourselves. The results demonstrate that MAD-Transformer can adaptively detect fine-grained anomalies with short duration, and outperforms the state-of-the-art baselines in terms of noise robustness and localization performance. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 23 pages, 7 figures

arXiv:2403.02601 [pdf, other]

Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning

Authors: Haoyu Chen, Wenbo Li, **** Gu, **g**g Ren, Haoze Sun, Xueyi Zou, Zhensong Zhang, Youliang Yan, Lei Zhu

Abstract: For image super-resolution (SR), bridging the gap between the performance on synthetic datasets and real-world degradation scenarios remains a challenge. This work introduces a novel "Low-Res Leads the Way" (LWay) training framework, merging Supervised Pre-training with Self-supervised Learning to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (L… ▽ More For image super-resolution (SR), bridging the gap between the performance on synthetic datasets and real-world degradation scenarios remains a challenge. This work introduces a novel "Low-Res Leads the Way" (LWay) training framework, merging Supervised Pre-training with Self-supervised Learning to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (LR) reconstruction network to extract degradation embeddings from LR images, merging them with super-resolved outputs for LR reconstruction. Leveraging unseen LR images for self-supervised learning guides the model to adapt its modeling space to the target domain, facilitating fine-tuning of SR models without requiring paired high-resolution (HR) images. The integration of Discrete Wavelet Transform (DWT) further refines the focus on high-frequency details. Extensive evaluations show that our method significantly improves the generalization and detail restoration capabilities of SR models on unseen real-world datasets, outperforming existing methods. Our training regime is universally compatible, requiring no network architecture modifications, making it a practical solution for real-world SR applications. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2402.17138 [pdf, other]

Integrated Interpolation and Block-term Tensor Decomposition for Spectrum Map Construction

Authors: Hao Sun, Junting Chen

Abstract: This paper addresses the challenge of reconstructing a 3D power spectrum map from sparse, scattered, and incomplete spectrum measurements. It proposes an integrated approach combining interpolation and block-term tensor decomposition (BTD). This approach leverages an interpolation model with the BTD structure to exploit the spatial correlation of power spectrum maps. Additionally, nuclear norm reg… ▽ More This paper addresses the challenge of reconstructing a 3D power spectrum map from sparse, scattered, and incomplete spectrum measurements. It proposes an integrated approach combining interpolation and block-term tensor decomposition (BTD). This approach leverages an interpolation model with the BTD structure to exploit the spatial correlation of power spectrum maps. Additionally, nuclear norm regularization is incorporated to effectively capture the low-rank characteristics. To implement this approach, a novel algorithm that combines alternating regression with singular value thresholding is developed. Analytical justification for the enhancement provided by the BTD structure in interpolating power spectrum maps is provided, yielding several important theoretical insights. The analysis explores the impact of the spectrum on the error in the proposed method and compares it to conventional local polynomial interpolation. Extensive numerical results demonstrate that the proposed method outperforms state-of-the-art methods in terms of signal source separation and power spectrum map construction, and remains stable under off-grid measurements and inhomogeneous measurement topologies. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.09263 [pdf]

Uncertainty-Aware Transient Stability-Constrained Preventive Redispatch: A Distributional Reinforcement Learning Approach

Authors: Zhengcheng Wang, Fei Teng, Yanzhen Zhou, Qinglai Guo, Hongbin Sun

Abstract: Transient stability-constrained preventive redispatch plays a crucial role in ensuring power system security and stability. Since redispatch strategies need to simultaneously satisfy complex transient constraints and the economic need, model-based formulation and optimization become extremely challenging. In addition, the increasing uncertainty and variability introduced by renewable sources start… ▽ More Transient stability-constrained preventive redispatch plays a crucial role in ensuring power system security and stability. Since redispatch strategies need to simultaneously satisfy complex transient constraints and the economic need, model-based formulation and optimization become extremely challenging. In addition, the increasing uncertainty and variability introduced by renewable sources start to drive the system stability consideration from deterministic to probabilistic, which further exaggerates the complexity. In this paper, a Graph neural network guided Distributional Deep Reinforcement Learning (GD2RL) method is proposed, for the first time, to solve the uncertainty-aware transient stability-constrained preventive redispatch problem. First, a graph neural network-based transient simulator is trained by supervised learning to efficiently generate post-contingency rotor angle curves with the steady-state and contingency as inputs, which serves as a feature extractor for operating states and a surrogate time-domain simulator during the environment interaction for reinforcement learning. Distributional deep reinforcement learning with explicit uncertainty distribution of system operational conditions is then applied to generate the redispatch strategy to balance the user-specified probabilistic stability performance and economy preferences. The full distribution of the post-redispatch transient stability index is directly provided as the output. Case studies on the modified New England 39-bus system validate the proposed method. △ Less

Submitted 29 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: 13 pages,11 figures,accepted by IEEE Transactions on Power Systems on 24-Jun-2024

arXiv:2402.07379 [pdf, other]

Distribution Locational Marginal Emission for Carbon Alleviation in Distribution Networks: Formulation, Calculation, and Implication

Authors: Linwei Sang, Yinliang Xu, Hongbin Sun, Qiuwei Wu, Wenchuan Wu

Abstract: Regulating the proper carbon-aware intervention policy is one of the keys to emission alleviation in the distribution network, whose basis lies in effectively attributing the emission responsibility using emission factors. This paper establishes the distribution locational marginal emission (DLME) to calculate the marginal change of emission from the marginal change of both active and reactive loa… ▽ More Regulating the proper carbon-aware intervention policy is one of the keys to emission alleviation in the distribution network, whose basis lies in effectively attributing the emission responsibility using emission factors. This paper establishes the distribution locational marginal emission (DLME) to calculate the marginal change of emission from the marginal change of both active and reactive load demand for incentivizing carbon alleviation. It first formulates the day-head distribution network scheduling model based on the second-order cone program (SOCP). The emission propagation and responsibility are analyzed from demand to supply to system emission. Considering the complex and implicit map** of the SOCP-based scheduling model, the implicit theorem is leveraged to exploit the optimal condition of SOCP. The corresponding SOCP-based implicit derivation approach is proposed to calculate the DLMEs effectively in a model-based way. Comprehensive numerical studies are conducted to verify the superiority of the proposed method by comparing its calculation efficacy to the conventional marginal estimation approach, assessing its effectiveness in carbon alleviation with comparison to the average emission factors, and evaluating its carbon alleviation ability of reactive DLME. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.01523 [pdf]

Active Support of Inverters for Improving Short-Term Voltage Security in 100% IBRsPenetrated Power Systems

Authors: Yinhong Lin, Bin Wang, Qinglai Guo, Haotian Zhao, Hongbin Sun

Abstract: Due to the energy crisis and environmental pollution, the installed capacity of inverter-based resources (IBRs) in power grids is rapidly increasing, and grid-following control (GFL) is the most prevalent at present. Meanwhile, grid-forming control-based (GFM) devices have been installed in the grid to provide active support for frequency and voltage. In the future GFL devices combined with GFM wi… ▽ More Due to the energy crisis and environmental pollution, the installed capacity of inverter-based resources (IBRs) in power grids is rapidly increasing, and grid-following control (GFL) is the most prevalent at present. Meanwhile, grid-forming control-based (GFM) devices have been installed in the grid to provide active support for frequency and voltage. In the future GFL devices combined with GFM will be promising, especially in power systems with high penetration or 100% IBRs. When a short-circuit fault occurs in the grid, the controlled current source characteristic of the GFL devices leads to insufficient dynamic voltage support (DVS), while the GFM devices usually reduce the internal voltage to limit the current. Thus, deep voltage sags and undesired disconnections of IBRs may occur. Moreover, due to the dispersed locations and the control strategies' diversity of IBRs, the voltage support of different devices may not be fully coordinated, which is not conducive to short-term voltage security (STVS). To address this issue, a control scheme based on the simulation of transient characteristics of synchronous machines (SMs) is proposed. Then, a new fault ride-through strategy (FRT) is proposed based on the characteristic differences between GFL and GFM devices, and an optimization model of multi-device control parameters is formulated to meet the short-term voltage security constraints (SVSCs) and device capacity constraints. Finally, a fast solution method based on analytical modeling is proposed for the model. Test results based on the doublegenerator-one-load system, the IEEE 14-bus system, and other systems of different sizes show that the proposed method can effectively enhance the active support capability of GFL and GFM to the grid voltage, and avoid the large-scale disconnection of IBRs △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: 10 pages, 13 figures

arXiv:2401.15307 [pdf, other]

ParaTransCNN: Parallelized TransCNN Encoder for Medical Image Segmentation

Authors: Hongkun Sun, **g Xu, Yu** Duan

Abstract: The convolutional neural network-based methods have become more and more popular for medical image segmentation due to their outstanding performance. However, they struggle with capturing long-range dependencies, which are essential for accurately modeling global contextual correlations. Thanks to the ability to model long-range dependencies by expanding the receptive field, the transformer-based… ▽ More The convolutional neural network-based methods have become more and more popular for medical image segmentation due to their outstanding performance. However, they struggle with capturing long-range dependencies, which are essential for accurately modeling global contextual correlations. Thanks to the ability to model long-range dependencies by expanding the receptive field, the transformer-based methods have gained prominence. Inspired by this, we propose an advanced 2D feature extraction method by combining the convolutional neural network and Transformer architectures. More specifically, we introduce a parallelized encoder structure, where one branch uses ResNet to extract local information from images, while the other branch uses Transformer to extract global information. Furthermore, we integrate pyramid structures into the Transformer to extract global information at varying resolutions, especially in intensive prediction tasks. To efficiently utilize the different information in the parallelized encoder at the decoder stage, we use a channel attention module to merge the features of the encoder and propagate them through skip connections and bottlenecks. Intensive numerical experiments are performed on both aortic vessel tree, cardiac, and multi-organ datasets. By comparing with state-of-the-art medical image segmentation methods, our method is shown with better segmentation accuracy, especially on small organs. The code is publicly available on https://github.com/HongkunSun/ParaTransCNN. △ Less

Submitted 27 January, 2024; originally announced January 2024.

arXiv:2401.11377 [pdf, other]

Joint User Scheduling and Computing Resource Allocation Optimization in Asynchronous Mobile Edge Computing Networks

Authors: Yihan Cang, Ming Chen, Yi** Pan, Zhaohui Yang, Ye Hu, Haijian Sun, Mingzhe Chen

Abstract: In this paper, the problem of joint user scheduling and computing resource allocation in asynchronous mobile edge computing (MEC) networks is studied. In such networks, edge devices will offload their computational tasks to an MEC server, using the energy they harvest from this server. To get their tasks processed on time using the harvested energy, edge devices will strategically schedule their t… ▽ More In this paper, the problem of joint user scheduling and computing resource allocation in asynchronous mobile edge computing (MEC) networks is studied. In such networks, edge devices will offload their computational tasks to an MEC server, using the energy they harvest from this server. To get their tasks processed on time using the harvested energy, edge devices will strategically schedule their task offloading, and compete for the computational resource at the MEC server. Then, the MEC server will execute these tasks asynchronously based on the arrival of the tasks. This joint user scheduling, time and computation resource allocation problem is posed as an optimization framework whose goal is to find the optimal scheduling and allocation strategy that minimizes the energy consumption of these mobile computing tasks. To solve this mixed-integer non-linear programming problem, the general benders decomposition method is adopted which decomposes the original problem into a primal problem and a master problem. Specifically, the primal problem is related to computation resource and time slot allocation, of which the optimal closed-form solution is obtained. The master problem regarding discrete user scheduling variables is constructed by adding optimality cuts or feasibility cuts according to whether the primal problem is feasible, which is a standard mixed-integer linear programming problem and can be efficiently solved. By iteratively solving the primal problem and master problem, the optimal scheduling and resource allocation scheme is obtained. Simulation results demonstrate that the proposed asynchronous computing framework reduces 87.17% energy consumption compared with conventional synchronous computing counterpart. △ Less

Submitted 20 January, 2024; originally announced January 2024.

arXiv:2401.10345 [pdf, other]

Attack and Defense Analysis of Learned Image Compression

Authors: Tianyu Zhu, Heming Sun, Xiankui Xiong, Xuanpeng Zhu, Yong Gong, Minge **g, Yibo Fan

Abstract: Learned image compression (LIC) is becoming more and more popular these years with its high efficiency and outstanding compression quality. Still, the practicality against modified inputs added with specific noise could not be ignored. White-box attacks such as FGSM and PGD use only gradient to compute adversarial images that mislead LIC models to output unexpected results. Our experiments compare… ▽ More Learned image compression (LIC) is becoming more and more popular these years with its high efficiency and outstanding compression quality. Still, the practicality against modified inputs added with specific noise could not be ignored. White-box attacks such as FGSM and PGD use only gradient to compute adversarial images that mislead LIC models to output unexpected results. Our experiments compare the effects of different dimensions such as attack methods, models, qualities, and targets, concluding that in the worst case, there is a 61.55% decrease in PSNR or a 19.15 times increase in bpp under the PGD attack. To improve their robustness, we conduct adversarial training by adding adversarial images into the training datasets, which obtains a 95.52% decrease in the R-D cost of the most vulnerable LIC model. We further test the robustness of H.266, whose better performance on reconstruction quality extends its possibility to defend one-step or iterative adversarial attacks. △ Less

Submitted 27 March, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

arXiv:2312.15575 [pdf, other]

Neural Born Series Operator for Biomedical Ultrasound Computed Tomography

Authors: Zhijun Zeng, Yihang Zheng, Youjia Zheng, Yubing Li, Zuoqiang Shi, He Sun

Abstract: Ultrasound Computed Tomography (USCT) provides a radiation-free option for high-resolution clinical imaging. Despite its potential, the computationally intensive Full Waveform Inversion (FWI) required for tissue property reconstruction limits its clinical utility. This paper introduces the Neural Born Series Operator (NBSO), a novel technique designed to speed up wave simulations, thereby facilita… ▽ More Ultrasound Computed Tomography (USCT) provides a radiation-free option for high-resolution clinical imaging. Despite its potential, the computationally intensive Full Waveform Inversion (FWI) required for tissue property reconstruction limits its clinical utility. This paper introduces the Neural Born Series Operator (NBSO), a novel technique designed to speed up wave simulations, thereby facilitating a more efficient USCT image reconstruction process through an NBSO-based FWI pipeline. Thoroughly validated on comprehensive brain and breast datasets, simulated under experimental USCT conditions, the NBSO proves to be accurate and efficient in both forward simulation and image reconstruction. This advancement demonstrates the potential of neural operators in facilitating near real-time USCT reconstruction, making the clinical application of USCT increasingly viable and promising. △ Less

Submitted 24 December, 2023; originally announced December 2023.

ACM Class: I.4.5; J.3

arXiv:2312.13567 [pdf, other]

Fine-grained Disentangled Representation Learning for Multimodal Emotion Recognition

Authors: Haoqin Sun, Shiwan Zhao, Xuechen Wang, Wenjia Zeng, Yong Chen, Yong Qin

Abstract: Multimodal emotion recognition (MMER) is an active research field that aims to accurately recognize human emotions by fusing multiple perceptual modalities. However, inherent heterogeneity across modalities introduces distribution gaps and information redundancy, posing significant challenges for MMER. In this paper, we propose a novel fine-grained disentangled representation learning (FDRL) frame… ▽ More Multimodal emotion recognition (MMER) is an active research field that aims to accurately recognize human emotions by fusing multiple perceptual modalities. However, inherent heterogeneity across modalities introduces distribution gaps and information redundancy, posing significant challenges for MMER. In this paper, we propose a novel fine-grained disentangled representation learning (FDRL) framework to address these challenges. Specifically, we design modality-shared and modality-private encoders to project each modality into modality-shared and modality-private subspaces, respectively. In the shared subspace, we introduce a fine-grained alignment component to learn modality-shared representations, thus capturing modal consistency. Subsequently, we tailor a fine-grained disparity component to constrain the private subspaces, thereby learning modality-private representations and enhancing their diversity. Lastly, we introduce a fine-grained predictor component to ensure that the labels of the output representations from the encoders remain unchanged. Experimental results on the IEMOCAP dataset show that FDRL outperforms the state-of-the-art methods, achieving 78.34% and 79.44% on WAR and UAR, respectively. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: Accepted by ICASSP 2024

arXiv:2312.13556 [pdf, other]

Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions

Authors: Yang Liu, Haoqin Sun, Geng Chen, Qingyue Wang, Zhen Zhao, Xugang Lu, Longbiao Wang

Abstract: Speech emotion recognition (SER) performance deteriorates significantly in the presence of noise, making it challenging to achieve competitive performance in noisy conditions. To this end, we propose a multi-level knowledge distillation (MLKD) method, which aims to transfer the knowledge from a teacher model trained on clean speech to a simpler student model trained on noisy speech. Specifically,… ▽ More Speech emotion recognition (SER) performance deteriorates significantly in the presence of noise, making it challenging to achieve competitive performance in noisy conditions. To this end, we propose a multi-level knowledge distillation (MLKD) method, which aims to transfer the knowledge from a teacher model trained on clean speech to a simpler student model trained on noisy speech. Specifically, we use clean speech features extracted by the wav2vec-2.0 as the learning goal and train the distil wav2vec-2.0 to approximate the feature extraction ability of the original wav2vec-2.0 under noisy conditions. Furthermore, we leverage the multi-level knowledge of the original wav2vec-2.0 to supervise the single-level output of the distil wav2vec-2.0. We evaluate the effectiveness of our proposed method by conducting extensive experiments using five types of noise-contaminated speech on the IEMOCAP dataset, which show promising results compared to state-of-the-art models. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: Accepted by INTERSPEECH 2023

arXiv:2312.12565 [pdf, other]

Precise Coil Alignment for Dynamic Wireless Charging of Electric Vehicles with RFID Sensing

Authors: Haijian Sun, Xiang Ma, Rose Qingyang Hu, Randy Christensen

Abstract: Electric vehicle (EV) has emerged as a transformative force for the sustainable and environmentally friendly future. To alleviate range anxiety caused by battery and charging facility, dynamic wireless power transfer (DWPT) is increasingly recognized as a key enabler for widespread EV adoption, yet it faces significant technical challenges, primarily in precise coil alignment. This article begins… ▽ More Electric vehicle (EV) has emerged as a transformative force for the sustainable and environmentally friendly future. To alleviate range anxiety caused by battery and charging facility, dynamic wireless power transfer (DWPT) is increasingly recognized as a key enabler for widespread EV adoption, yet it faces significant technical challenges, primarily in precise coil alignment. This article begins by reviewing current alignment methodologies and evaluates their advantages and limitations. We observe that achieving the necessary alignment precision is challenging with these existing methods. To address this, we present an innovative RFID-based DWPT coil alignment system, utilizing coherent phase detection and a maximum likelihood estimation algorithm, capable of achieving sub-10 cm accuracy. This system's efficacy in providing both lateral and vertical misalignment estimates has been verified through laboratory and experimental tests. We also discuss potential challenges in broader system implementation and propose corresponding solutions. This research offers a viable and promising solution for enhancing DWPT efficiency. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: submitted to IEEE magazine for potential publication. 5 figure

arXiv:2312.02605 [pdf, other]

doi 10.1109/PCS60826.2024.10566283

Accelerating Learnt Video Codecs with Gradient Decay and Layer-wise Distillation

Authors: Tianhao Peng, Ge Gao, Heming Sun, Fan Zhang, David Bull

Abstract: In recent years, end-to-end learnt video codecs have demonstrated their potential to compete with conventional coding algorithms in term of compression efficiency. However, most learning-based video compression models are associated with high computational complexity and latency, in particular at the decoder side, which limits their deployment in practical applications. In this paper, we present a… ▽ More In recent years, end-to-end learnt video codecs have demonstrated their potential to compete with conventional coding algorithms in term of compression efficiency. However, most learning-based video compression models are associated with high computational complexity and latency, in particular at the decoder side, which limits their deployment in practical applications. In this paper, we present a novel model-agnostic pruning scheme based on gradient decay and adaptive layer-wise distillation. Gradient decay enhances parameter exploration during sparsification whilst preventing runaway sparsity and is superior to the standard Straight-Through Estimation. The adaptive layer-wise distillation regulates the sparse training in various stages based on the distortion of intermediate features. This stage-wise design efficiently updates parameters with minimal computational overhead. The proposed approach has been applied to three popular end-to-end learnt video codecs, FVC, DCVC, and DCVC-HEM. Results confirm that our method yields up to 65% reduction in MACs and 2x speed-up with less than 0.3dB drop in BD-PSNR. Supporting code and supplementary material can be downloaded from: https://jasminepp.github.io/lightweightdvc/ △ Less

Submitted 5 December, 2023; originally announced December 2023.

Report number: 2312.02605

arXiv:2312.01361 [pdf, other]

MoEC: Mixture of Experts Implicit Neural Compression

Authors: Jianchen Zhao, Cheng-Ching Tseng, Ming Lu, Ruichuan An, Xiaobao Wei, He Sun, Shanghang Zhang

Abstract: Emerging Implicit Neural Representation (INR) is a promising data compression technique, which represents the data using the parameters of a Deep Neural Network (DNN). Existing methods manually partition a complex scene into local regions and overfit the INRs into those regions. However, manually designing the partition scheme for a complex scene is very challenging and fails to jointly learn the… ▽ More Emerging Implicit Neural Representation (INR) is a promising data compression technique, which represents the data using the parameters of a Deep Neural Network (DNN). Existing methods manually partition a complex scene into local regions and overfit the INRs into those regions. However, manually designing the partition scheme for a complex scene is very challenging and fails to jointly learn the partition and INRs. To solve the problem, we propose MoEC, a novel implicit neural compression method based on the theory of mixture of experts. Specifically, we use a gating network to automatically assign a specific INR to a 3D point in the scene. The gating network is trained jointly with the INRs of different local regions. Compared with block-wise and tree-structured partitions, our learnable partition can adaptively find the optimal partition in an end-to-end manner. We conduct detailed experiments on massive and diverse biomedical data to demonstrate the advantages of MoEC against existing approaches. In most of experiment settings, we have achieved state-of-the-art results. Especially in cases of extreme compression ratios, such as 6000x, we are able to uphold the PSNR of 48.16. △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2311.16531 [pdf]

Channel Modeling for Terahertz Communications in Rain

Authors: Peian Li, Wenbo Liu, Jiacheng Liu, Da Li, Guohao Liu, Yuanshuai Lei, Jiabiao Zhao, Xiaopeng Wang, Houjun Sun, Jianjun Ma, John F. Federici

Abstract: Terahertz (THz) communication channels, integral to outdoor applications, are critically influenced by natural factors like rainfall. Our research focused on the nuanced effects of rain on these channels, employing an advanced rainfall emulation system. By analyzing key parameters such as rain rate, altitude based variations in rainfall, and diverse raindrop sizes, we identified the paramount sign… ▽ More Terahertz (THz) communication channels, integral to outdoor applications, are critically influenced by natural factors like rainfall. Our research focused on the nuanced effects of rain on these channels, employing an advanced rainfall emulation system. By analyzing key parameters such as rain rate, altitude based variations in rainfall, and diverse raindrop sizes, we identified the paramount significance of the number of raindrops in the THz channel, particularly in scenarios with constant rain rates but varying drop sizes. Central to our findings is a novel model grounded in Mie scattering theory, which adeptly incorporates the variability of raindrop size distributions at different altitudes. This model has displayed strong congruence with our experimental results. In essence, our study underscores the inadequacy of solely depending on a fixed ground-based rain rate and emphasizes the imperative of calibrating distribution metrics to cater to specific environmental and operational contexts. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: submitted to IEEE Transactions on Antennas and Propagation

arXiv:2311.15594 [pdf]

doi 10.1109/TSTE.2024.3355123

Networked Multiagent Safe Reinforcement Learning for Low-carbon Demand Management in Distribution Network

Authors: Jichen Zhang, Linwei Sang, Yinliang Xu, Hongbin Sun

Abstract: This paper proposes a multiagent based bi-level operation framework for the low-carbon demand management in distribution networks considering the carbon emission allowance on the demand side. In the upper level, the aggregate load agents optimize the control signals for various types of loads to maximize the profits; in the lower level, the distribution network operator makes optimal dispatching d… ▽ More This paper proposes a multiagent based bi-level operation framework for the low-carbon demand management in distribution networks considering the carbon emission allowance on the demand side. In the upper level, the aggregate load agents optimize the control signals for various types of loads to maximize the profits; in the lower level, the distribution network operator makes optimal dispatching decisions to minimize the operational costs and calculates the distribution locational marginal price and carbon intensity. The distributed flexible load agent has only incomplete information of the distribution network and cooperates with other agents using networked communication. Finally, the problem is formulated into a networked multi-agent constrained Markov decision process, which is solved using a safe reinforcement learning algorithm called consensus multi-agent constrained policy optimization considering the carbon emission allowance for each agent. Case studies with the IEEE 33-bus and 123-bus distribution network systems demonstrate the effectiveness of the proposed approach, in terms of satisfying the carbon emission constraint on demand side, ensuring the safe operation of the distribution network and preserving privacy of both sides. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: Submitted to IEEE Transactions on Sustainable Energy

arXiv:2311.13832 [pdf, other]

Dynamic Operating Envelopes Embedded Peer-to-Peer-to-Grid Energy Trading

Authors: Zhisen Jiang, Ye Guo, Hongbin Sun, Jianxiao Wang

Abstract: A novel decentralized peer-to-peer-to-grid (P2P2G) trading mechanism considering distribution network integrity is proposed. In order to direct prosumers' peer-to-peer (P2P) trading behavior to be grid-friendly, the proposed method incorporates Dynamic Operating Envelopes (DOEs) into the existing P2P2G trading. Moreover, DOEs are determined through negotiations between the distribution system oper… ▽ More A novel decentralized peer-to-peer-to-grid (P2P2G) trading mechanism considering distribution network integrity is proposed. In order to direct prosumers' peer-to-peer (P2P) trading behavior to be grid-friendly, the proposed method incorporates Dynamic Operating Envelopes (DOEs) into the existing P2P2G trading. Moreover, DOEs are determined through negotiations between the distribution system operator (DSO) and prosumers alongside the process of P2P trading, avoiding compromising prosumers' privacy and network parameters leakage. To reduce communication costs during P2P trading, a variant of the alternating direction method of multipliers (ADMM), i.e., communication-censored ADMM (COCA) is used to solve the P2P2G trading problem. Finally, the DOE price is shown to be comprised of several economically interpretable components. Simulations validate the effectiveness of the proposed mechanism. △ Less

Submitted 23 November, 2023; originally announced November 2023.

arXiv:2311.12078 [pdf, other]

Fast Controllable Diffusion Models for Undersampled MRI Reconstruction

Authors: Wei Jiang, Zhuang Xiong, Feng Liu, Nan Ye, Hongfu Sun

Abstract: Supervised deep learning methods have shown promise in undersampled Magnetic Resonance Imaging (MRI) reconstruction, but their requirement for paired data limits their generalizability to the diverse MRI acquisition parameters. Recently, unsupervised controllable generative diffusion models have been applied to undersampled MRI reconstruction, without paired data or model retraining for different… ▽ More Supervised deep learning methods have shown promise in undersampled Magnetic Resonance Imaging (MRI) reconstruction, but their requirement for paired data limits their generalizability to the diverse MRI acquisition parameters. Recently, unsupervised controllable generative diffusion models have been applied to undersampled MRI reconstruction, without paired data or model retraining for different MRI acquisitions. However, diffusion models are generally slow in sampling and state-of-the-art acceleration techniques can lead to sub-optimal results when directly applied to the controllable generation process. This study introduces a new algorithm called Predictor-Projector-Noisor (PPN), which enhances and accelerates controllable generation of diffusion models for undersampled MRI reconstruction. Our results demonstrate that PPN produces high-fidelity MR images that conform to undersampled k-space measurements with significantly shorter reconstruction time than other controllable sampling methods. In addition, the unsupervised PPN accelerated diffusion models are adaptable to different MRI acquisition parameters, making them more practical for clinical use than supervised learning techniques. △ Less

Submitted 11 June, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.11969 [pdf, other]

SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks

Authors: ** Ye, Junlong Cheng, Jianpin Chen, Zhongying Deng, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Min Zhu, Shaoting Zhang, Junjun He, Yu Qiao

Abstract: Segment Anything Model (SAM) has achieved impressive results for natural image segmentation with input prompts such as points and bounding boxes. Its success largely owes to massive labeled training data. However, directly applying SAM to medical image segmentation cannot perform well because SAM lacks medical knowledge -- it does not use medical images for training. To incorporate medical knowled… ▽ More Segment Anything Model (SAM) has achieved impressive results for natural image segmentation with input prompts such as points and bounding boxes. Its success largely owes to massive labeled training data. However, directly applying SAM to medical image segmentation cannot perform well because SAM lacks medical knowledge -- it does not use medical images for training. To incorporate medical knowledge into SAM, we introduce SA-Med2D-20M, a large-scale segmentation dataset of 2D medical images built upon numerous public and private datasets. It consists of 4.6 million 2D medical images and 19.7 million corresponding masks, covering almost the whole body and showing significant diversity. This paper describes all the datasets collected in SA-Med2D-20M and details how to process these datasets. Furthermore, comprehensive statistics of SA-Med2D-20M are presented to facilitate the better use of our dataset, which can help the researchers build medical vision foundation models or apply their models to downstream medical applications. We hope that the large scale and diversity of SA-Med2D-20M can be leveraged to develop medical artificial intelligence for enhancing diagnosis, medical image analysis, knowledge sharing, and education. The data with the redistribution license is publicly available at https://github.com/OpenGVLab/SAM-Med2D. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.07823 [pdf, other]

doi 10.1016/j.media.2024.103160

Plug-and-Play Latent Feature Editing for Orientation-Adaptive Quantitative Susceptibility Map** Neural Networks

Authors: Yang Gao, Zhuang Xiong, Shanshan Shan, Yin Liu, Pengfei Rong, Min Li, Alan H Wilman, G. Bruce Pike, Feng Liu, Hongfu Sun

Abstract: Quantitative susceptibility map** (QSM) is a post-processing technique for deriving tissue magnetic susceptibility distribution from MRI phase measurements. Deep learning (DL) algorithms hold great potential for solving the ill-posed QSM reconstruction problem. However, a significant challenge facing current DL-QSM approaches is their limited adaptability to magnetic dipole field orientation var… ▽ More Quantitative susceptibility map** (QSM) is a post-processing technique for deriving tissue magnetic susceptibility distribution from MRI phase measurements. Deep learning (DL) algorithms hold great potential for solving the ill-posed QSM reconstruction problem. However, a significant challenge facing current DL-QSM approaches is their limited adaptability to magnetic dipole field orientation variations during training and testing. In this work, we propose a novel Orientation-Adaptive Latent Feature Editing (OA-LFE) module to learn the encoding of acquisition orientation vectors and seamlessly integrate them into the latent features of deep networks. Importantly, it can be directly Plug-and-Play (PnP) into various existing DL-QSM architectures, enabling reconstructions of QSM from arbitrary magnetic dipole orientations. Its effectiveness is demonstrated by combining the OA-LFE module into our previously proposed phase-to-susceptibility single-step instant QSM (iQSM) network, which was initially tailored for pure-axial acquisitions. The proposed OA-LFE-empowered iQSM, which we refer to as iQSM+, is trained in a self-supervised manner on a specially-designed simulation brain dataset. Comprehensive experiments are conducted on simulated and in vivo human brain datasets, encompassing subjects ranging from healthy individuals to those with pathological conditions. These experiments involve various MRI platforms (3T and 7T) and aim to compare our proposed iQSM+ against several established QSM reconstruction frameworks, including the original iQSM. The iQSM+ yields QSM images with significantly improved accuracies and mitigates artifacts, surpassing other state-of-the-art DL-QSM algorithms. △ Less

Submitted 26 March, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: 13pages, 9figures

arXiv:2311.06705 [pdf, other]

Equal Incremental Cost-Based Optimization Method to Enhance Efficiency for IPOP-Type Converters

Authors: Hanfeng Cai, Haiyang Liu, Heyang Sun, Qiao Wang

Abstract: Systematic optimization over a wide power range is often achieved through the combination of modules of different power levels. This paper addresses the issue of enhancing the efficiency of a multiple module system connected in parallel during operation and proposes an algorithm based on equal incremental cost for dynamic load allocation. Initially, a polynomial fitting technique is employed to fi… ▽ More Systematic optimization over a wide power range is often achieved through the combination of modules of different power levels. This paper addresses the issue of enhancing the efficiency of a multiple module system connected in parallel during operation and proposes an algorithm based on equal incremental cost for dynamic load allocation. Initially, a polynomial fitting technique is employed to fit efficiency test points for individual modules. Subsequently, the equal incremental cost-based optimization is utilized to formulate an efficiency optimization and allocation scheme for the multi-module system. A simulated annealing algorithm is applied to determine the optimal power output strategy for each module at given total power flow requirement. Finally, a dual active bridge (DAB) experimental prototype with two input-parallel-output-parallel (IPOP) configurations is constructed to validate the effectiveness of the proposed strategy. Experimental results demonstrate that under the 800W operating condition, the approach in this paper achieves an efficiency improvement of over 0.74\% by comparison with equal power sharing between both modules. △ Less

Submitted 11 November, 2023; originally announced November 2023.

arXiv:2311.02378 [pdf]

doi 10.1016/j.cose.2023.103570

MTS-DVGAN: Anomaly Detection in Cyber-Physical Systems using a Dual Variational Generative Adversarial Network

Authors: Haili Sun, Yan Huang, Lansheng Han, Cai Fu, Hongle Liu, Xiang Long

Abstract: Deep generative models are promising in detecting novel cyber-physical attacks, mitigating the vulnerability of Cyber-physical systems (CPSs) without relying on labeled information. Nonetheless, these generative models face challenges in identifying attack behaviors that closely resemble normal data, or deviate from the normal data distribution but are in close proximity to the manifold of the nor… ▽ More Deep generative models are promising in detecting novel cyber-physical attacks, mitigating the vulnerability of Cyber-physical systems (CPSs) without relying on labeled information. Nonetheless, these generative models face challenges in identifying attack behaviors that closely resemble normal data, or deviate from the normal data distribution but are in close proximity to the manifold of the normal cluster in latent space. To tackle this problem, this article proposes a novel unsupervised dual variational generative adversarial model named MST-DVGAN, to perform anomaly detection in multivariate time series data for CPS security. The central concept is to enhance the model's discriminative capability by widening the distinction between reconstructed abnormal samples and their normal counterparts. Specifically, we propose an augmented module by imposing contrastive constraints on the reconstruction process to obtain a more compact embedding. Then, by exploiting the distribution property and modeling the normal patterns of multivariate time series, a variational autoencoder is introduced to force the generative adversarial network (GAN) to generate diverse samples. Furthermore, two augmented loss functions are designed to extract essential characteristics in a self-supervised manner through mutual guidance between the augmented samples and original samples. Finally, a specific feature center loss is introduced for the generator network to enhance its stability. Empirical experiments are conducted on three public datasets, namely SWAT, WADI and NSL_KDD. Comparing with the state-of-the-art methods, the evaluation results show that the proposed MTS-DVGAN is more stable and can achieve consistent performance improvement. △ Less

Submitted 4 November, 2023; originally announced November 2023.

Comments: 27 pages, 14 figures, 8 tables. Accepted by Computers & Security

Journal ref: Computers & Security, 2023, 103570

arXiv:2310.17997 [pdf]

Deep Learning Enables Large Depth-of-Field Images for Sub-Diffraction-Limit Scanning Superlens Microscopy

Authors: Hui Sun, Hao Luo, Feifei Wang, Qingjiu Chen, Meng Chen, Xiaoduo Wang, Haibo Yu, Guanglie Zhang, Lianqing Liu, Jian** Wang, Dapeng Wu, Wen Jung Li

Abstract: Scanning electron microscopy (SEM) is indispensable in diverse applications ranging from microelectronics to food processing because it provides large depth-of-field images with a resolution beyond the optical diffraction limit. However, the technology requires coating conductive films on insulator samples and a vacuum environment. We use deep learning to obtain the map** relationship between op… ▽ More Scanning electron microscopy (SEM) is indispensable in diverse applications ranging from microelectronics to food processing because it provides large depth-of-field images with a resolution beyond the optical diffraction limit. However, the technology requires coating conductive films on insulator samples and a vacuum environment. We use deep learning to obtain the map** relationship between optical super-resolution (OSR) images and SEM domain images, which enables the transformation of OSR images into SEM-like large depth-of-field images. Our custom-built scanning superlens microscopy (SSUM) system, which requires neither coating samples by conductive films nor a vacuum environment, is used to acquire the OSR images with features down to ~80 nm. The peak signal-to-noise ratio (PSNR) and structural similarity index measure values indicate that the deep learning method performs excellently in image-to-image translation, with a PSNR improvement of about 0.74 dB over the optical super-resolution images. The proposed method provides a high level of detail in the reconstructed results, indicating that it has broad applicability to chip-level defect detection, biological sample analysis, forensics, and various other fields. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 13 pages,7 figures

arXiv:2310.17401 [pdf, other]

Energy Efficient Robust Beamforming for Vehicular ISAC with Imperfect Channel Estimation

Authors: Hanwen Zhang, Haijian Sun, Tianyi He, Weiming Xiang, Rose Qingyang Hu

Abstract: This paper investigates robust beamforming for system-centric energy efficiency (EE) optimization in the vehicular integrated sensing and communication (ISAC) system, where the mobility of vehicles poses significant challenges to channel estimation. To obtain the optimal beamforming under channel uncertainty, we first formulate an optimization problem for maximizing the system EE under bounded cha… ▽ More This paper investigates robust beamforming for system-centric energy efficiency (EE) optimization in the vehicular integrated sensing and communication (ISAC) system, where the mobility of vehicles poses significant challenges to channel estimation. To obtain the optimal beamforming under channel uncertainty, we first formulate an optimization problem for maximizing the system EE under bounded channel estimation errors. Next, fractional programming and semidefinite relaxation (SDR) are utilized to relax the rank-1 constraints. We further use Schur complement and S-Procedure to transform Cramer-Rao bound (CRB) and channel estimation error constraints into convex forms, respectively. Based on the Lagrangian dual function and Karush-Kuhn-Tucker (KKT) conditions, it is proved that the optimal beamforming solution is rank-1. Finally, we present comprehensive simulation results to demonstrate two key findings: 1) the proposed algorithm exhibits a favorable convergence rate, and 2) the approach effectively mitigates the impact of channel estimation errors. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: Submitted to IEEE for future publication

arXiv:2310.12733 [pdf, other]

Multiscale Motion-Aware and Spatial-Temporal-Channel Contextual Coding Network for Learned Video Compression

Authors: Yiming Wang, Qian Huang, Bin Tang, Huashan Sun, Xing Li

Abstract: Recently, learned video compression has achieved exciting performance. Following the traditional hybrid prediction coding framework, most learned methods generally adopt the motion estimation motion compensation (MEMC) method to remove inter-frame redundancy. However, inaccurate motion vector (MV) usually lead to the distortion of reconstructed frame. In addition, most approaches ignore the spatia… ▽ More Recently, learned video compression has achieved exciting performance. Following the traditional hybrid prediction coding framework, most learned methods generally adopt the motion estimation motion compensation (MEMC) method to remove inter-frame redundancy. However, inaccurate motion vector (MV) usually lead to the distortion of reconstructed frame. In addition, most approaches ignore the spatial and channel redundancy. To solve above problems, we propose a motion-aware and spatial-temporal-channel contextual coding based video compression network (MASTC-VC), which learns the latent representation and uses variational autoencoders (VAEs) to capture the characteristics of intra-frame pixels and inter-frame motion. Specifically, we design a multiscale motion-aware module (MS-MAM) to estimate spatial-temporal-channel consistent motion vector by utilizing the multiscale motion prediction information in a coarse-to-fine way. On the top of it, we further propose a spatial-temporal-channel contextual module (STCCM), which explores the correlation of latent representation to reduce the bit consumption from spatial, temporal and channel aspects respectively. Comprehensive experiments show that our proposed MASTC-VC is surprior to previous state-of-the-art (SOTA) methods on three public benchmark datasets. More specifically, our method brings average 10.15\% BD-rate savings against H.265/HEVC (HM-16.20) in PSNR metric and average 23.93\% BD-rate savings against H.266/VVC (VTM-13.2) in MS-SSIM metric. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 12pages,12 figures

arXiv:2310.08364 [pdf, other]

Map2Schedule: An End-to-End Link Scheduling Method for Urban V2V Communications

Authors: Lihao Zhang, Haijian Sun, ** Sun, Ramviyas Parasuraman, Yinghui Ye, Rose Qingyang Hu

Abstract: Urban vehicle-to-vehicle (V2V) link scheduling with shared spectrum is a challenging problem. Its main goal is to find the scheduling policy that can maximize system performance (usually the sum capacity of each link or their energy efficiency). Given that each link can experience interference from all other active links, the scheduling becomes a combinatorial integer programming problem and gener… ▽ More Urban vehicle-to-vehicle (V2V) link scheduling with shared spectrum is a challenging problem. Its main goal is to find the scheduling policy that can maximize system performance (usually the sum capacity of each link or their energy efficiency). Given that each link can experience interference from all other active links, the scheduling becomes a combinatorial integer programming problem and generally does not scale well with the number of V2V pairs. Moreover, link scheduling requires accurate channel state information (CSI), which is very difficult to estimate with good accuracy under high vehicle mobility. In this paper, we propose an end-to-end urban V2V link scheduling method called Map2Schedule, which can directly generate V2V scheduling policy from the city map and vehicle locations. Map2Schedule delivers comparable performance to the physical-model-based methods in urban settings while maintaining low computation complexity. This enhanced performance is achieved by machine learning (ML) technologies. Specifically, we first deploy the convolutional neural network (CNN) model to estimate the CSI from street layout and vehicle locations and then apply the graph embedding model for optimal scheduling policy. The results show that the proposed method can achieve high accuracy with much lower overhead and latency. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: submitted to IEEE conference for future publication

arXiv:2310.07747 [pdf, other]

Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples

Authors: Hao Sun, Alihan Hüyük, Daniel Jarrett, Mihaela van der Schaar

Abstract: Learning controllers with offline data in decision-making systems is an essential area of research due to its potential to reduce the risk of applications in real-world systems. However, in responsibility-sensitive settings such as healthcare, decision accountability is of paramount importance, yet has not been adequately addressed by the literature. This paper introduces the Accountable Offline C… ▽ More Learning controllers with offline data in decision-making systems is an essential area of research due to its potential to reduce the risk of applications in real-world systems. However, in responsibility-sensitive settings such as healthcare, decision accountability is of paramount importance, yet has not been adequately addressed by the literature. This paper introduces the Accountable Offline Controller (AOC) that employs the offline dataset as the Decision Corpus and performs accountable control based on a tailored selection of examples, referred to as the Corpus Subset. AOC operates effectively in low-data scenarios, can be extended to the strictly offline imitation setting, and displays qualities of both conservation and adaptability. We assess AOC's performance in both simulated and real-world healthcare scenarios, emphasizing its capability to manage offline control tasks with high levels of performance while maintaining accountability. △ Less

Submitted 27 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

Showing 1–50 of 263 results for author: Sun, H