-
CaFNet: A Confidence-Driven Framework for Radar Camera Depth Estimation
Authors:
Huawei Sun,
Hao Feng,
Julius Ott,
Lorenzo Servadei,
Robert Wille
Abstract:
Depth estimation is critical in autonomous driving for interpreting 3D scenes accurately. Recently, radar-camera depth estimation has become of sufficient interest due to the robustness and low-cost properties of radar. Thus, this paper introduces a two-stage, end-to-end trainable Confidence-aware Fusion Net (CaFNet) for dense depth estimation, combining RGB imagery with sparse and noisy radar poi…
▽ More
Depth estimation is critical in autonomous driving for interpreting 3D scenes accurately. Recently, radar-camera depth estimation has become of sufficient interest due to the robustness and low-cost properties of radar. Thus, this paper introduces a two-stage, end-to-end trainable Confidence-aware Fusion Net (CaFNet) for dense depth estimation, combining RGB imagery with sparse and noisy radar point cloud data. The first stage addresses radar-specific challenges, such as ambiguous elevation and noisy measurements, by predicting a radar confidence map and a preliminary coarse depth map. A novel approach is presented for generating the ground truth for the confidence map, which involves associating each radar point with its corresponding object to identify potential projection surfaces. These maps, together with the initial radar input, are processed by a second encoder. For the final depth estimation, we innovate a confidence-aware gated fusion mechanism to integrate radar and image features effectively, thereby enhancing the reliability of the depth map by filtering out radar noise. Our methodology, evaluated on the nuScenes dataset, demonstrates superior performance, improving upon the current leading model by 3.2% in Mean Absolute Error (MAE) and 2.7% in Root Mean Square Error (RMSE).
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
A Near-Field Super-Resolution Network for Accelerating Antenna Characterization
Authors:
Yuchen Gu,
Hai-Han Sun,
Daniel W. van der Weide
Abstract:
We present a deep neural network-enabled method to accelerate near-field (NF) antenna measurement. We develop a Near-field Super-resolution Network (NFS-Net) to reconstruct significantly undersampled near-field data as high-resolution data, which considerably reduces the number of sampling points required for NF measurement and thus improves measurement efficiency. The high-resolution near-field d…
▽ More
We present a deep neural network-enabled method to accelerate near-field (NF) antenna measurement. We develop a Near-field Super-resolution Network (NFS-Net) to reconstruct significantly undersampled near-field data as high-resolution data, which considerably reduces the number of sampling points required for NF measurement and thus improves measurement efficiency. The high-resolution near-field data reconstructed by the network is further processed by a near-field-to-far-field (NF2FF) transformation to obtain far-field antenna radiation patterns. Our experiments demonstrate that the NFS-Net exhibits both accuracy and generalizability in restoring high-resolution near-field data from low-resolution input. The NF measurement workflow that combines the NFS-Net and the NF2FF algorithm enables accurate radiation pattern characterization with only 11% of the Nyquist rate samples. Though the experiments in this study are conducted on a planar setup with a uniform grid, the proposed method can serve as a universal strategy to accelerate measurements under different setups and conditions.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
IR2QSM: Quantitative Susceptibility Map** via Deep Neural Networks with Iterative Reverse Concatenations and Recurrent Modules
Authors:
Min Li,
Chen Chen,
Zhuang Xiong,
Ying Liu,
Pengfei Rong,
Shanshan Shan,
Feng Liu,
Hongfu Sun,
Yang Gao
Abstract:
Quantitative susceptibility map** (QSM) is an MRI phase-based post-processing technique to extract the distribution of tissue susceptibilities, demonstrating significant potential in studying neurological diseases. However, the ill-conditioned nature of dipole inversion makes QSM reconstruction from the tissue field prone to noise and artifacts. In this work, we propose a novel deep learning-bas…
▽ More
Quantitative susceptibility map** (QSM) is an MRI phase-based post-processing technique to extract the distribution of tissue susceptibilities, demonstrating significant potential in studying neurological diseases. However, the ill-conditioned nature of dipole inversion makes QSM reconstruction from the tissue field prone to noise and artifacts. In this work, we propose a novel deep learning-based IR2QSM method for QSM reconstruction. It is designed by iterating four times of a reverse concatenations and middle recurrent modules enhanced U-net, which could dramatically improve the efficiency of latent feature utilization. Simulated and in vivo experiments were conducted to compare IR2QSM with several traditional algorithms (MEDI and iLSQR) and state-of-the-art deep learning methods (U-net, xQSM, and LPCNN). The results indicated that IR2QSM was able to obtain QSM images with significantly increased accuracy and mitigated artifacts over other methods. Particularly, IR2QSM demonstrated on average the best NRMSE (27.59%) in simulated experiments, which is 15.48%, 7.86%, 17.24%, 9.26%, and 29.13% lower than iLSQR, MEDI, U-net, xQSM, LPCNN, respectively, and led to improved QSM results with fewer artifacts for the in vivo data.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Sparse Multi-baseline SAR Cross-modal 3D Reconstruction of Vehicle Targets
Authors:
Da Li,
Guoqiang Zhao,
Houjun Sun,
Jiacheng Bao
Abstract:
Multi-baseline SAR 3D imaging faces significant challenges due to data sparsity. In recent years, deep learning techniques have achieved notable success in enhancing the quality of sparse SAR 3D imaging. However, previous work typically rely on full-aperture high-resolution radar images to supervise the training of deep neural networks (DNNs), utilizing only single-modal information from radar dat…
▽ More
Multi-baseline SAR 3D imaging faces significant challenges due to data sparsity. In recent years, deep learning techniques have achieved notable success in enhancing the quality of sparse SAR 3D imaging. However, previous work typically rely on full-aperture high-resolution radar images to supervise the training of deep neural networks (DNNs), utilizing only single-modal information from radar data. Consequently, imaging performance is limited, and acquiring full-aperture data for multi-baseline SAR is costly and sometimes impractical in real-world applications. In this paper, we propose a Cross-Modal Reconstruction Network (CMR-Net), which integrates differentiable render and cross-modal supervision with optical images to reconstruct highly sparse multi-baseline SAR 3D images of vehicle targets into visually structured and high-resolution images. We meticulously designed the network architecture and training strategies to enhance network generalization capability. Remarkably, CMR-Net, trained solely on simulated data, demonstrates high-resolution reconstruction capabilities on both publicly available simulation datasets and real measured datasets, outperforming traditional sparse reconstruction algorithms based on compressed sensing and other learning-based methods. Additionally, using optical images as supervision provides a cost-effective way to build training datasets, reducing the difficulty of method dissemination. Our work showcases the broad prospects of deep learning in multi-baseline SAR 3D imaging and offers a novel path for researching radar imaging based on cross-modal learning theory.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores
Authors:
Jiaming Zhou,
Shiwan Zhao,
Hui Wang,
Tian-Hao Zhang,
Haoqin Sun,
Xuechen Wang,
Yong Qin
Abstract:
The kNN-CTC model has proven to be effective for monolingual automatic speech recognition (ASR). However, its direct application to multilingual scenarios like code-switching, presents challenges. Although there is potential for performance improvement, a kNN-CTC model utilizing a single bilingual datastore can inadvertently introduce undesirable noise from the alternative language. To address thi…
▽ More
The kNN-CTC model has proven to be effective for monolingual automatic speech recognition (ASR). However, its direct application to multilingual scenarios like code-switching, presents challenges. Although there is potential for performance improvement, a kNN-CTC model utilizing a single bilingual datastore can inadvertently introduce undesirable noise from the alternative language. To address this, we propose a novel kNN-CTC-based code-switching ASR (CS-ASR) framework that employs dual monolingual datastores and a gated datastore selection mechanism to reduce noise interference. Our method selects the appropriate datastore for decoding each frame, ensuring the injection of language-specific information into the ASR process. We apply this framework to cutting-edge CTC-based models, develo** an advanced CS-ASR system. Extensive experiments demonstrate the remarkable effectiveness of our gated datastore mechanism in enhancing the performance of zero-shot Chinese-English CS-ASR.
△ Less
Submitted 13 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
A Stochastic Incentive-based Demand Response Program for Virtual Power Plant with Solar, Battery, Electric Vehicles, and Controllable Loads
Authors:
Pratik Harsh,
Hongjian Sun,
Debapriya Das,
Goyal Awagan,
**g Jiang
Abstract:
The growing integration of distributed energy resources (DERs) into the power grid necessitates an effective coordination strategy to maximize their benefits. Acting as an aggregator of DERs, a virtual power plant (VPP) facilitates this coordination, thereby amplifying their impact on the transmission level of the power grid. Further, a demand response program enhances the scheduling approach by m…
▽ More
The growing integration of distributed energy resources (DERs) into the power grid necessitates an effective coordination strategy to maximize their benefits. Acting as an aggregator of DERs, a virtual power plant (VPP) facilitates this coordination, thereby amplifying their impact on the transmission level of the power grid. Further, a demand response program enhances the scheduling approach by managing the energy demands in parallel with the uncertain energy outputs of the DERs. This work presents a stochastic incentive-based demand response model for the scheduling operation of VPP comprising solar-powered generating stations, battery swap** stations, electric vehicle charging stations, and consumers with controllable loads. The work also proposes a priority mechanism to consider the individual preferences of electric vehicle users and consumers with controllable loads. The scheduling approach for the VPP is framed as a multi-objective optimization problem, normalized using the utopia-tracking method. Subsequently, the normalized optimization problem is transformed into a stochastic formulation to address uncertainties in energy demand from charging stations and controllable loads. The proposed VPP scheduling approach is addressed on a 33-node distribution system simulated using MATLAB software, which is further validated using a real-time digital simulator.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Survey on Visual Signal Coding and Processing with Generative Models: Technologies, Standards and Optimization
Authors:
Zhibo Chen,
Heming Sun,
Li Zhang,
Fan Zhang
Abstract:
This paper provides a survey of the latest developments in visual signal coding and processing with generative models. Specifically, our focus is on presenting the advancement of generative models and their influence on research in the domain of visual signal coding and processing. This survey study begins with a brief introduction of well-established generative models, including the Variational A…
▽ More
This paper provides a survey of the latest developments in visual signal coding and processing with generative models. Specifically, our focus is on presenting the advancement of generative models and their influence on research in the domain of visual signal coding and processing. This survey study begins with a brief introduction of well-established generative models, including the Variational Autoencoder (VAE) models, Generative Adversarial Network (GAN) models, Autoregressive (AR) models, Normalizing Flows and Diffusion models. The subsequent section of the paper explores the advancements in visual signal coding based on generative models, as well as the ongoing international standardization activities. In the realm of visual signal processing, our focus lies on the application and development of various generative models in the research of visual signal restoration. We also present the latest developments in generative visual signal synthesis and editing, along with visual signal quality assessment using generative models and quality assessment for generative models. The practical implementation of these studies is closely linked to the investigation of fast optimization. This paper additionally presents the latest advancements in fast optimization on visual signal coding and processing with generative models. We hope to advance this field by providing researchers and practitioners a comprehensive literature review on the topic of visual signal coding and processing with generative models.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Power Measurement Based Channel Estimation for IRS-Enhanced Wireless Coverage
Authors:
He Sun,
Lipeng Zhu,
Weidong Mei,
Rui Zhang
Abstract:
In this paper, we study an IRS-assisted coverage enhancement problem for a given region, aiming to optimize the passive reflection of the IRS for improving the average communication performance in the region by accounting for both deterministic and random channels in the environment. To this end, we first derive the closed-form expression of the average received signal power in terms of the determ…
▽ More
In this paper, we study an IRS-assisted coverage enhancement problem for a given region, aiming to optimize the passive reflection of the IRS for improving the average communication performance in the region by accounting for both deterministic and random channels in the environment. To this end, we first derive the closed-form expression of the average received signal power in terms of the deterministic base station (BS)-IRS-user cascaded channels over all user locations, and propose an IRS-aided coverage enhancement framework to facilitate the estimation of such deterministic channels for IRS passive reflection design. Specifically, to avoid the exorbitant overhead of estimating the cascaded channels at all possible user locations, a location selection method is first proposed to select only a set of typical user locations for channel estimation by exploiting the channel spatial correlation in the region. To estimate the deterministic cascaded channels at the selected user locations, conventional IRS channel estimation methods require additional pilot signals, which not only results in high system training overhead but also may not be compatible with the existing communication protocols. To overcome this issue, we further propose a single-layer neural network (NN)-enabled IRS channel estimation method in this paper, based on only the average received signal power measurements at each selected location corresponding to different IRS random training reflections, which can be offline implemented in current wireless systems. Numerical results demonstrate that our proposed scheme can significantly improve the coverage performance of the target region and outperform the existing power-measurement-based IRS reflection designs.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Underdetermined DOA Estimation of Off-Grid Sources Based on the Generalized Double Pareto Prior
Authors:
Yongfeng Huang,
Zhendong Chen,
Kun Ye,
Lang Zhou,
Haixin Sun
Abstract:
In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to mod…
▽ More
In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to model the real array manifold matrix, and Bayesian inference is utilized to calculate the off-grid error, which mitigates the grid dictionary mismatch problem in underdetermined scenarios. Secondly, an innovative grid refinement method is introduced, treating grid points as iterative parameters to minimize the modeling error between the source and grid points. The numerical simulation results verify the superiority of the proposed strategy, especially when dealing with a coarse grid and few snapshots.
△ Less
Submitted 17 May, 2024; v1 submitted 18 April, 2024;
originally announced May 2024.
-
Compression-Realized Deep Structural Network for Video Quality Enhancement
Authors:
Hanchi Sun,
Xiaohong Liu,
Xinyang Jiang,
Yifei Shen,
Dongsheng Li,
Xiongkuo Min,
Guangtao Zhai
Abstract:
This paper focuses on the task of quality enhancement for compressed videos. Although deep network-based video restorers achieve impressive progress, most of the existing methods lack a structured design to optimally leverage the priors within compression codecs. Since the quality degradation of the video is primarily induced by the compression algorithm, a new paradigm is urgently needed for a mo…
▽ More
This paper focuses on the task of quality enhancement for compressed videos. Although deep network-based video restorers achieve impressive progress, most of the existing methods lack a structured design to optimally leverage the priors within compression codecs. Since the quality degradation of the video is primarily induced by the compression algorithm, a new paradigm is urgently needed for a more "conscious" process of quality enhancement. As a result, we propose the Compression-Realize Deep Structural Network (CRDS), introducing three inductive biases aligned with the three primary processes in the classic compression codec, merging the strengths of classical encoder architecture with deep network capabilities. Inspired by the residual extraction and domain transformation process in the codec, a pre-trained Latent Degradation Residual Auto-Encoder is proposed to transform video frames into a latent feature space, and the mutual neighborhood attention mechanism is integrated for precise motion estimation and residual extraction. Furthermore, drawing inspiration from the quantization noise distribution of the codec, CRDS proposes a novel Progressive Denoising framework with intermediate supervision that decomposes the quality enhancement into a series of simpler denoising sub-tasks. Experimental results on datasets like LDV 2.0 and MFQE 2.0 indicate our approach surpasses state-of-the-art models.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Model-based Deep Learning for Rate Split Multiple Access in Vehicular Communications
Authors:
Hanwen Zhang,
Mingzhe Chen,
Alireza Vahid,
Haijian Sun
Abstract:
Rate split multiple access (RSMA) has been proven as an effective communication scheme for 5G and beyond, especially in vehicular scenarios. However, RSMA requires complicated iterative algorithms for proper resource allocation, which cannot fulfill the stringent latency requirement in resource constrained vehicles. Although data driven approaches can alleviate this issue, they suffer from poor ge…
▽ More
Rate split multiple access (RSMA) has been proven as an effective communication scheme for 5G and beyond, especially in vehicular scenarios. However, RSMA requires complicated iterative algorithms for proper resource allocation, which cannot fulfill the stringent latency requirement in resource constrained vehicles. Although data driven approaches can alleviate this issue, they suffer from poor generalizability and scarce training data. In this paper, we propose a fractional programming (FP) based deep unfolding (DU) approach to address resource allocation problem for a weighted sum rate optimization in RSMA. By carefully designing the penalty function, we couple the variable update with projected gradient descent algorithm (PGD). Following the structure of PGD, we embed few learnable parameters in each layer of the DU network. Through extensive simulation, we have shown that the proposed model-based neural networks has similar performance as optimal results given by traditional algorithm but with much lower computational complexity, less training data, and higher resilience to test set data and out-of-distribution (OOD) data.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
EEG-Deformer: A Dense Convolutional Transformer for Brain-computer Interfaces
Authors:
Yi Ding,
Yong Li,
Hao Sun,
Rui Liu,
Chengxuan Tong,
Cuntai Guan
Abstract:
Effectively learning the temporal dynamics in electroencephalogram (EEG) signals is challenging yet essential for decoding brain activities using brain-computer interfaces (BCIs). Although Transformers are popular for their long-term sequential learning ability in the BCI field, most methods combining Transformers with convolutional neural networks (CNNs) fail to capture the coarse-to-fine tempora…
▽ More
Effectively learning the temporal dynamics in electroencephalogram (EEG) signals is challenging yet essential for decoding brain activities using brain-computer interfaces (BCIs). Although Transformers are popular for their long-term sequential learning ability in the BCI field, most methods combining Transformers with convolutional neural networks (CNNs) fail to capture the coarse-to-fine temporal dynamics of EEG signals. To overcome this limitation, we introduce EEG-Deformer, which incorporates two main novel components into a CNN-Transformer: (1) a Hierarchical Coarse-to-Fine Transformer (HCT) block that integrates a Fine-grained Temporal Learning (FTL) branch into Transformers, effectively discerning coarse-to-fine temporal patterns; and (2) a Dense Information Purification (DIP) module, which utilizes multi-level, purified temporal information to enhance decoding accuracy. Comprehensive experiments on three representative cognitive tasks consistently verify the generalizability of our proposed EEG-Deformer, demonstrating that it either outperforms existing state-of-the-art methods or is comparable to them. Visualization results show that EEG-Deformer learns from neurophysiologically meaningful brain regions for the corresponding cognitive tasks. The source code can be found at https://github.com/yi-ding-cs/EEG-Deformer.
△ Less
Submitted 25 April, 2024;
originally announced May 2024.
-
Two-Stage Robust Planning Model for Park-Level Integrated Energy System Considering Uncertain Equipment Contingency
Authors:
Zuxun Xiong,
Xinwei Shen,
Hongbin Sun
Abstract:
In this paper, we propose a two-stage robust planning model for an Integrated Energy System (IES) that serves an industrial park. The term 'Park-level IES' is used to refers to IES of a smaller scale but have high demands for various forms of energy. The proposed planning model considers uncertainties like load demand fluctuations and equipment contingencies, and provides a reliable scheme of equi…
▽ More
In this paper, we propose a two-stage robust planning model for an Integrated Energy System (IES) that serves an industrial park. The term 'Park-level IES' is used to refers to IES of a smaller scale but have high demands for various forms of energy. The proposed planning model considers uncertainties like load demand fluctuations and equipment contingencies, and provides a reliable scheme of equipment selection and sizing for IES investors. Inspired by the unit commitment problem, we formulate an equipment contingency uncertainty set to accurately describe the potential equipment contingencies which happen and can be repaired within a day. Then, a novel and modified nested column-and-constraint generation algorithm is applied to solve this two-stage robust planning model with integer recourse efficiently. In the case study, the role of energy storage system for IES reliability enhancement is analyzed in detail. Computational results demonstrate the advantage of the proposed models over the deterministic planning model in terms of improving reliability.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
3D Multi-frame Fusion for Video Stabilization
Authors:
Zhan Peng,
Xinyi Ye,
Weiyue Zhao,
Tianqi Liu,
Huiqiang Sun,
Baopu Li,
Zhiguo Cao
Abstract:
In this paper, we present RStab, a novel framework for video stabilization that integrates 3D multi-frame fusion through volume rendering. Departing from conventional methods, we introduce a 3D multi-frame perspective to generate stabilized images, addressing the challenge of full-frame generation while preserving structure. The core of our approach lies in Stabilized Rendering (SR), a volume rend…
▽ More
In this paper, we present RStab, a novel framework for video stabilization that integrates 3D multi-frame fusion through volume rendering. Departing from conventional methods, we introduce a 3D multi-frame perspective to generate stabilized images, addressing the challenge of full-frame generation while preserving structure. The core of our approach lies in Stabilized Rendering (SR), a volume rendering module, which extends beyond the image fusion by incorporating feature fusion. The core of our RStab framework lies in Stabilized Rendering (SR), a volume rendering module, fusing multi-frame information in 3D space. Specifically, SR involves war** features and colors from multiple frames by projection, fusing them into descriptors to render the stabilized image. However, the precision of warped information depends on the projection accuracy, a factor significantly influenced by dynamic regions. In response, we introduce the Adaptive Ray Range (ARR) module to integrate depth priors, adaptively defining the sampling range for the projection process. Additionally, we propose Color Correction (CC) assisting geometric constraints with optical flow for accurate color aggregation. Thanks to the three modules, our RStab demonstrates superior performance compared with previous stabilizers in the field of view (FOV), image quality, and video stability across various datasets.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Energy-modified Leverage Sampling for Radio Map Construction via Matrix Completion
Authors:
Hao Sun,
Junting Chen
Abstract:
This paper explores an energy-modified leverage sampling strategy for matrix completion in radio map construction. The main goal is to address potential identifiability issues in matrix completion with sparse observations by using a probabilistic sampling approach. Although conventional leverage sampling is commonly employed for designing sampling patterns, it often assigns high sampling probabili…
▽ More
This paper explores an energy-modified leverage sampling strategy for matrix completion in radio map construction. The main goal is to address potential identifiability issues in matrix completion with sparse observations by using a probabilistic sampling approach. Although conventional leverage sampling is commonly employed for designing sampling patterns, it often assigns high sampling probability to locations with low received signal strength (RSS) values, leading to a low sampling efficiency. Theoretical analysis demonstrates that the leverage score produces pseudo images of sources, and in the regions around the source locations, the leverage probability is asymptotically consistent with the RSS. Based on this finding, an energy-modified leverage probability-based sampling strategy is investigated for efficient sampling. Numerical demonstrations indicate that the proposed sampling strategy can decrease the normalized mean squared error (NMSE) of radio map construction by more than 10% for both matrix completion and interpolation-assisted matrix completion schemes, compared to conventional methods.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
HCL-MTSAD: Hierarchical Contrastive Consistency Learning for Accurate Detection of Industrial Multivariate Time Series Anomalies
Authors:
Haili Sun,
Yan Huang,
Lansheng Han,
Cai Fu,
Chunjie Zhou
Abstract:
Multivariate Time Series (MTS) anomaly detection focuses on pinpointing samples that diverge from standard operational patterns, which is crucial for ensuring the safety and security of industrial applications. The primary challenge in this domain is to develop representations capable of discerning anomalies effectively. The prevalent methods for anomaly detection in the literature are predominant…
▽ More
Multivariate Time Series (MTS) anomaly detection focuses on pinpointing samples that diverge from standard operational patterns, which is crucial for ensuring the safety and security of industrial applications. The primary challenge in this domain is to develop representations capable of discerning anomalies effectively. The prevalent methods for anomaly detection in the literature are predominantly reconstruction-based and predictive in nature. However, they typically concentrate on a single-dimensional instance level, thereby not fully harnessing the complex associations inherent in industrial MTS. To address this issue, we propose a novel self-supervised hierarchical contrastive consistency learning method for detecting anomalies in MTS, named HCL-MTSAD. It innovatively leverages data consistency at multiple levels inherent in industrial MTS, systematically capturing consistent associations across four latent levels-measurement, sample, channel, and process. By develo** a multi-layer contrastive loss, HCL-MTSAD can extensively mine data consistency and spatio-temporal association, resulting in more informative representations. Subsequently, an anomaly discrimination module, grounded in self-supervised hierarchical contrastive learning, is designed to detect timestamp-level anomalies by calculating multi-scale data consistency. Extensive experiments conducted on six diverse MTS datasets retrieved from real cyber-physical systems and server machines, in comparison with 20 baselines, indicate that HCL-MTSAD's anomaly detection capability outperforms the state-of-the-art benchmark models by an average of 1.8\% in terms of F1 score.
△ Less
Submitted 18 April, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
A least-square method for non-asymptotic identification in linear switching control
Authors:
Haoyuan Sun,
Ali Jadbabaie
Abstract:
The focus of this paper is on linear system identification in the setting where it is known that the underlying partially-observed linear dynamical system lies within a finite collection of known candidate models. We first consider the problem of identification from a given trajectory, which in this setting reduces to identifying the index of the true model with high probability. We characterize t…
▽ More
The focus of this paper is on linear system identification in the setting where it is known that the underlying partially-observed linear dynamical system lies within a finite collection of known candidate models. We first consider the problem of identification from a given trajectory, which in this setting reduces to identifying the index of the true model with high probability. We characterize the finite-time sample complexity of this problem by leveraging recent advances in the non-asymptotic analysis of linear least-square methods in the literature. In comparison to the earlier results that assume no prior knowledge of the system, our approach takes advantage of the smaller hypothesis class and leads to the design of a learner with a dimension-free sample complexity bound. Next, we consider the switching control of linear systems, where there is a candidate controller for each of the candidate models and data is collected through interaction of the system with a collection of potentially destabilizing controllers. We develop a dimension-dependent criterion that can detect those destabilizing controllers in finite time. By leveraging these results, we propose a data-driven switching strategy that identifies the unknown parameters of the underlying system. We then provide a non-asymptotic analysis of its performance and discuss its implications on the classical method of estimator-based supervisory control.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Enhanced Radar Perception via Multi-Task Learning: Towards Refined Data for Sensor Fusion Applications
Authors:
Huawei Sun,
Hao Feng,
Gianfranco Mauro,
Julius Ott,
Georg Stettinger,
Lorenzo Servadei,
Robert Wille
Abstract:
Radar and camera fusion yields robustness in perception tasks by leveraging the strength of both sensors. The typical extracted radar point cloud is 2D without height information due to insufficient antennas along the elevation axis, which challenges the network performance. This work introduces a learning-based approach to infer the height of radar points associated with 3D objects. A novel robus…
▽ More
Radar and camera fusion yields robustness in perception tasks by leveraging the strength of both sensors. The typical extracted radar point cloud is 2D without height information due to insufficient antennas along the elevation axis, which challenges the network performance. This work introduces a learning-based approach to infer the height of radar points associated with 3D objects. A novel robust regression loss is introduced to address the sparse target challenge. In addition, a multi-task training strategy is employed, emphasizing important features. The average radar absolute height error decreases from 1.69 to 0.25 meters compared to the state-of-the-art height extension method. The estimated target height values are used to preprocess and enrich radar data for downstream perception tasks. Integrating this refined radar information further enhances the performance of existing radar camera fusion models for object detection and depth estimation tasks.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Bi-level Guided Diffusion Models for Zero-Shot Medical Imaging Inverse Problems
Authors:
Hossein Askari,
Fred Roosta,
Hongfu Sun
Abstract:
In the realm of medical imaging, inverse problems aim to infer high-quality images from incomplete, noisy measurements, with the objective of minimizing expenses and risks to patients in clinical settings. The Diffusion Models have recently emerged as a promising approach to such practical challenges, proving particularly useful for the zero-shot inference of images from partially acquired measure…
▽ More
In the realm of medical imaging, inverse problems aim to infer high-quality images from incomplete, noisy measurements, with the objective of minimizing expenses and risks to patients in clinical settings. The Diffusion Models have recently emerged as a promising approach to such practical challenges, proving particularly useful for the zero-shot inference of images from partially acquired measurements in Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). A central challenge in this approach, however, is how to guide an unconditional prediction to conform to the measurement information. Existing methods rely on deficient projection or inefficient posterior score approximation guidance, which often leads to suboptimal performance. In this paper, we propose \underline{\textbf{B}}i-level \underline{G}uided \underline{D}iffusion \underline{M}odels ({BGDM}), a zero-shot imaging framework that efficiently steers the initial unconditional prediction through a \emph{bi-level} guidance strategy. Specifically, BGDM first approximates an \emph{inner-level} conditional posterior mean as an initial measurement-consistent reference point and then solves an \emph{outer-level} proximal optimization objective to reinforce the measurement consistency. Our experimental findings, using publicly available MRI and CT medical datasets, reveal that BGDM is more effective and efficient compared to the baselines, faithfully generating high-fidelity medical images and substantially reducing hallucinatory artifacts in cases of severe degradation.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Exploring Communication Technologies, Standards, and Challenges in Electrified Vehicle Charging
Authors:
Xiang Ma,
Yuan Zhou,
Hanwen Zhang,
Qun Wang,
Haijian Sun,
Hongjie Wang,
Rose Qingyang Hu
Abstract:
As public awareness of environmental protection continues to grow, the trend of integrating more electric vehicles (EVs) into the transportation sector is rising. Unlike conventional internal combustion engine (ICE) vehicles, EVs can minimize carbon emissions and potentially achieve autonomous driving. However, several obstacles hinder the widespread adoption of EVs, such as their constrained driv…
▽ More
As public awareness of environmental protection continues to grow, the trend of integrating more electric vehicles (EVs) into the transportation sector is rising. Unlike conventional internal combustion engine (ICE) vehicles, EVs can minimize carbon emissions and potentially achieve autonomous driving. However, several obstacles hinder the widespread adoption of EVs, such as their constrained driving range and the extended time required for charging. One alternative solution to address these challenges is implementing dynamic wireless power transfer (DWPT), charging EVs in motion on the road. Moreover, charging stations with static wireless power transfer (SWPT) infrastructure can replace existing gas stations, enabling users to charge EVs in parking lots or at home. This paper surveys the communication infrastructure for static and dynamic wireless charging in electric vehicles. It encompasses all communication aspects involved in the wireless charging process. The architecture and communication requirements for static and dynamic wireless charging are presented separately. Additionally, a comprehensive comparison of existing communication standards is provided. The communication with the grid is also explored in detail. The survey gives attention to security and privacy issues arising during communications. In summary, the paper addresses the challenges and outlines upcoming trends in communication for EV wireless charging.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
QSMDiff: Unsupervised 3D Diffusion Models for Quantitative Susceptibility Map**
Authors:
Zhuang Xiong,
Wei Jiang,
Yang Gao,
Feng Liu,
Hongfu Sun
Abstract:
Quantitative Susceptibility Map** (QSM) dipole inversion is an ill-posed inverse problem for quantifying magnetic susceptibility distributions from MRI tissue phases. While supervised deep learning methods have shown success in specific QSM tasks, their generalizability across different acquisition scenarios remains constrained. Recent developments in diffusion models have demonstrated potential…
▽ More
Quantitative Susceptibility Map** (QSM) dipole inversion is an ill-posed inverse problem for quantifying magnetic susceptibility distributions from MRI tissue phases. While supervised deep learning methods have shown success in specific QSM tasks, their generalizability across different acquisition scenarios remains constrained. Recent developments in diffusion models have demonstrated potential for solving 2D medical imaging inverse problems. However, their application to 3D modalities, such as QSM, remains challenging due to high computational demands. In this work, we developed a 3D image patch-based diffusion model, namely QSMDiff, for robust QSM reconstruction across different scan parameters, alongside simultaneous super-resolution and image-denoising tasks. QSMDiff adopts unsupervised 3D image patch training and full-size measurement guidance during inference for controlled image generation. Evaluation on simulated and in-vivo human brains, using gradient-echo and echo-planar imaging sequences across different acquisition parameters, demonstrates superior performance. The method proposed in QSMDiff also holds promise for impacting other 3D medical imaging applications beyond QSM.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Learning Correction Errors via Frequency-Self Attention for Blind Image Super-Resolution
Authors:
Haochen Sun,
Yan Yuan,
Lijuan Su,
Haotian Shao
Abstract:
Previous approaches for blind image super-resolution (SR) have relied on degradation estimation to restore high-resolution (HR) images from their low-resolution (LR) counterparts. However, accurate degradation estimation poses significant challenges. The SR model's incompatibility with degradation estimation methods, particularly the Correction Filter, may significantly impair performance as a res…
▽ More
Previous approaches for blind image super-resolution (SR) have relied on degradation estimation to restore high-resolution (HR) images from their low-resolution (LR) counterparts. However, accurate degradation estimation poses significant challenges. The SR model's incompatibility with degradation estimation methods, particularly the Correction Filter, may significantly impair performance as a result of correction errors. In this paper, we introduce a novel blind SR approach that focuses on Learning Correction Errors (LCE). Our method employs a lightweight Corrector to obtain a corrected low-resolution (CLR) image. Subsequently, within an SR network, we jointly optimize SR performance by utilizing both the original LR image and the frequency learning of the CLR image. Additionally, we propose a new Frequency-Self Attention block (FSAB) that enhances the global information utilization ability of Transformer. This block integrates both self-attention and frequency spatial attention mechanisms. Extensive ablation and comparison experiments conducted across various settings demonstrate the superiority of our method in terms of visual quality and accuracy. Our approach effectively addresses the challenges associated with degradation estimation and correction errors, paving the way for more accurate blind image SR.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Unsupervised Spatio-Temporal State Estimation for Fine-grained Adaptive Anomaly Diagnosis of Industrial Cyber-physical Systems
Authors:
Haili Sun,
Yan Huang,
Lansheng Han,
Cai Fu,
Chunjie Zhou
Abstract:
Accurate detection and diagnosis of abnormal behaviors such as network attacks from multivariate time series (MTS) are crucial for ensuring the stable and effective operation of industrial cyber-physical systems (CPS). However, existing researches pay little attention to the logical dependencies among system working states, and have difficulties in explaining the evolution mechanisms of abnormal s…
▽ More
Accurate detection and diagnosis of abnormal behaviors such as network attacks from multivariate time series (MTS) are crucial for ensuring the stable and effective operation of industrial cyber-physical systems (CPS). However, existing researches pay little attention to the logical dependencies among system working states, and have difficulties in explaining the evolution mechanisms of abnormal signals. To reveal the spatio-temporal association relationships and evolution mechanisms of the working states of industrial CPS, this paper proposes a fine-grained adaptive anomaly diagnosis method (i.e. MAD-Transformer) to identify and diagnose anomalies in MTS. MAD-Transformer first constructs a temporal state matrix to characterize and estimate the change patterns of the system states in the temporal dimension. Then, to better locate the anomalies, a spatial state matrix is also constructed to capture the inter-sensor state correlation relationships within the system. Subsequently, based on these two types of state matrices, a three-branch structure of series-temporal-spatial attention module is designed to simultaneously capture the series, temporal, and space dependencies among MTS. Afterwards, three associated alignment loss functions and a reconstruction loss are constructed to jointly optimize the model. Finally, anomalies are determined and diagnosed by comparing the residual matrices with the original matrices. We conducted comparative experiments on five publicly datasets spanning three application domains (service monitoring, spatial and earth exploration, and water treatment), along with a petroleum refining simulation dataset collected by ourselves. The results demonstrate that MAD-Transformer can adaptively detect fine-grained anomalies with short duration, and outperforms the state-of-the-art baselines in terms of noise robustness and localization performance.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning
Authors:
Haoyu Chen,
Wenbo Li,
**** Gu,
**g**g Ren,
Haoze Sun,
Xueyi Zou,
Zhensong Zhang,
Youliang Yan,
Lei Zhu
Abstract:
For image super-resolution (SR), bridging the gap between the performance on synthetic datasets and real-world degradation scenarios remains a challenge. This work introduces a novel "Low-Res Leads the Way" (LWay) training framework, merging Supervised Pre-training with Self-supervised Learning to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (L…
▽ More
For image super-resolution (SR), bridging the gap between the performance on synthetic datasets and real-world degradation scenarios remains a challenge. This work introduces a novel "Low-Res Leads the Way" (LWay) training framework, merging Supervised Pre-training with Self-supervised Learning to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (LR) reconstruction network to extract degradation embeddings from LR images, merging them with super-resolved outputs for LR reconstruction. Leveraging unseen LR images for self-supervised learning guides the model to adapt its modeling space to the target domain, facilitating fine-tuning of SR models without requiring paired high-resolution (HR) images. The integration of Discrete Wavelet Transform (DWT) further refines the focus on high-frequency details. Extensive evaluations show that our method significantly improves the generalization and detail restoration capabilities of SR models on unseen real-world datasets, outperforming existing methods. Our training regime is universally compatible, requiring no network architecture modifications, making it a practical solution for real-world SR applications.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Integrated Interpolation and Block-term Tensor Decomposition for Spectrum Map Construction
Authors:
Hao Sun,
Junting Chen
Abstract:
This paper addresses the challenge of reconstructing a 3D power spectrum map from sparse, scattered, and incomplete spectrum measurements. It proposes an integrated approach combining interpolation and block-term tensor decomposition (BTD). This approach leverages an interpolation model with the BTD structure to exploit the spatial correlation of power spectrum maps. Additionally, nuclear norm reg…
▽ More
This paper addresses the challenge of reconstructing a 3D power spectrum map from sparse, scattered, and incomplete spectrum measurements. It proposes an integrated approach combining interpolation and block-term tensor decomposition (BTD). This approach leverages an interpolation model with the BTD structure to exploit the spatial correlation of power spectrum maps. Additionally, nuclear norm regularization is incorporated to effectively capture the low-rank characteristics. To implement this approach, a novel algorithm that combines alternating regression with singular value thresholding is developed. Analytical justification for the enhancement provided by the BTD structure in interpolating power spectrum maps is provided, yielding several important theoretical insights. The analysis explores the impact of the spectrum on the error in the proposed method and compares it to conventional local polynomial interpolation. Extensive numerical results demonstrate that the proposed method outperforms state-of-the-art methods in terms of signal source separation and power spectrum map construction, and remains stable under off-grid measurements and inhomogeneous measurement topologies.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Uncertainty-Aware Transient Stability-Constrained Preventive Redispatch: A Distributional Reinforcement Learning Approach
Authors:
Zhengcheng Wang,
Fei Teng,
Yanzhen Zhou,
Qinglai Guo,
Hongbin Sun
Abstract:
Transient stability-constrained preventive redispatch plays a crucial role in ensuring power system security and stability. Since redispatch strategies need to simultaneously satisfy complex transient constraints and the economic need, model-based formulation and optimization become extremely challenging. In addition, the increasing uncertainty and variability introduced by renewable sources start…
▽ More
Transient stability-constrained preventive redispatch plays a crucial role in ensuring power system security and stability. Since redispatch strategies need to simultaneously satisfy complex transient constraints and the economic need, model-based formulation and optimization become extremely challenging. In addition, the increasing uncertainty and variability introduced by renewable sources start to drive the system stability consideration from deterministic to probabilistic, which further exaggerates the complexity. In this paper, a Graph neural network guided Distributional Deep Reinforcement Learning (GD2RL) method is proposed, for the first time, to solve the uncertainty-aware transient stability-constrained preventive redispatch problem. First, a graph neural network-based transient simulator is trained by supervised learning to efficiently generate post-contingency rotor angle curves with the steady-state and contingency as inputs, which serves as a feature extractor for operating states and a surrogate time-domain simulator during the environment interaction for reinforcement learning. Distributional deep reinforcement learning with explicit uncertainty distribution of system operational conditions is then applied to generate the redispatch strategy to balance the user-specified probabilistic stability performance and economy preferences. The full distribution of the post-redispatch transient stability index is directly provided as the output. Case studies on the modified New England 39-bus system validate the proposed method.
△ Less
Submitted 29 June, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Distribution Locational Marginal Emission for Carbon Alleviation in Distribution Networks: Formulation, Calculation, and Implication
Authors:
Linwei Sang,
Yinliang Xu,
Hongbin Sun,
Qiuwei Wu,
Wenchuan Wu
Abstract:
Regulating the proper carbon-aware intervention policy is one of the keys to emission alleviation in the distribution network, whose basis lies in effectively attributing the emission responsibility using emission factors. This paper establishes the distribution locational marginal emission (DLME) to calculate the marginal change of emission from the marginal change of both active and reactive loa…
▽ More
Regulating the proper carbon-aware intervention policy is one of the keys to emission alleviation in the distribution network, whose basis lies in effectively attributing the emission responsibility using emission factors. This paper establishes the distribution locational marginal emission (DLME) to calculate the marginal change of emission from the marginal change of both active and reactive load demand for incentivizing carbon alleviation. It first formulates the day-head distribution network scheduling model based on the second-order cone program (SOCP). The emission propagation and responsibility are analyzed from demand to supply to system emission. Considering the complex and implicit map** of the SOCP-based scheduling model, the implicit theorem is leveraged to exploit the optimal condition of SOCP. The corresponding SOCP-based implicit derivation approach is proposed to calculate the DLMEs effectively in a model-based way. Comprehensive numerical studies are conducted to verify the superiority of the proposed method by comparing its calculation efficacy to the conventional marginal estimation approach, assessing its effectiveness in carbon alleviation with comparison to the average emission factors, and evaluating its carbon alleviation ability of reactive DLME.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Active Support of Inverters for Improving Short-Term Voltage Security in 100% IBRsPenetrated Power Systems
Authors:
Yinhong Lin,
Bin Wang,
Qinglai Guo,
Haotian Zhao,
Hongbin Sun
Abstract:
Due to the energy crisis and environmental pollution, the installed capacity of inverter-based resources (IBRs) in power grids is rapidly increasing, and grid-following control (GFL) is the most prevalent at present. Meanwhile, grid-forming control-based (GFM) devices have been installed in the grid to provide active support for frequency and voltage. In the future GFL devices combined with GFM wi…
▽ More
Due to the energy crisis and environmental pollution, the installed capacity of inverter-based resources (IBRs) in power grids is rapidly increasing, and grid-following control (GFL) is the most prevalent at present. Meanwhile, grid-forming control-based (GFM) devices have been installed in the grid to provide active support for frequency and voltage. In the future GFL devices combined with GFM will be promising, especially in power systems with high penetration or 100% IBRs. When a short-circuit fault occurs in the grid, the controlled current source characteristic of the GFL devices leads to insufficient dynamic voltage support (DVS), while the GFM devices usually reduce the internal voltage to limit the current. Thus, deep voltage sags and undesired disconnections of IBRs may occur. Moreover, due to the dispersed locations and the control strategies' diversity of IBRs, the voltage support of different devices may not be fully coordinated, which is not conducive to short-term voltage security (STVS). To address this issue, a control scheme based on the simulation of transient characteristics of synchronous machines (SMs) is proposed. Then, a new fault ride-through strategy (FRT) is proposed based on the characteristic differences between GFL and GFM devices, and an optimization model of multi-device control parameters is formulated to meet the short-term voltage security constraints (SVSCs) and device capacity constraints. Finally, a fast solution method based on analytical modeling is proposed for the model. Test results based on the doublegenerator-one-load system, the IEEE 14-bus system, and other systems of different sizes show that the proposed method can effectively enhance the active support capability of GFL and GFM to the grid voltage, and avoid the large-scale disconnection of IBRs
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
ParaTransCNN: Parallelized TransCNN Encoder for Medical Image Segmentation
Authors:
Hongkun Sun,
**g Xu,
Yu** Duan
Abstract:
The convolutional neural network-based methods have become more and more popular for medical image segmentation due to their outstanding performance. However, they struggle with capturing long-range dependencies, which are essential for accurately modeling global contextual correlations. Thanks to the ability to model long-range dependencies by expanding the receptive field, the transformer-based…
▽ More
The convolutional neural network-based methods have become more and more popular for medical image segmentation due to their outstanding performance. However, they struggle with capturing long-range dependencies, which are essential for accurately modeling global contextual correlations. Thanks to the ability to model long-range dependencies by expanding the receptive field, the transformer-based methods have gained prominence. Inspired by this, we propose an advanced 2D feature extraction method by combining the convolutional neural network and Transformer architectures. More specifically, we introduce a parallelized encoder structure, where one branch uses ResNet to extract local information from images, while the other branch uses Transformer to extract global information. Furthermore, we integrate pyramid structures into the Transformer to extract global information at varying resolutions, especially in intensive prediction tasks. To efficiently utilize the different information in the parallelized encoder at the decoder stage, we use a channel attention module to merge the features of the encoder and propagate them through skip connections and bottlenecks. Intensive numerical experiments are performed on both aortic vessel tree, cardiac, and multi-organ datasets. By comparing with state-of-the-art medical image segmentation methods, our method is shown with better segmentation accuracy, especially on small organs. The code is publicly available on https://github.com/HongkunSun/ParaTransCNN.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
Joint User Scheduling and Computing Resource Allocation Optimization in Asynchronous Mobile Edge Computing Networks
Authors:
Yihan Cang,
Ming Chen,
Yi** Pan,
Zhaohui Yang,
Ye Hu,
Haijian Sun,
Mingzhe Chen
Abstract:
In this paper, the problem of joint user scheduling and computing resource allocation in asynchronous mobile edge computing (MEC) networks is studied. In such networks, edge devices will offload their computational tasks to an MEC server, using the energy they harvest from this server. To get their tasks processed on time using the harvested energy, edge devices will strategically schedule their t…
▽ More
In this paper, the problem of joint user scheduling and computing resource allocation in asynchronous mobile edge computing (MEC) networks is studied. In such networks, edge devices will offload their computational tasks to an MEC server, using the energy they harvest from this server. To get their tasks processed on time using the harvested energy, edge devices will strategically schedule their task offloading, and compete for the computational resource at the MEC server. Then, the MEC server will execute these tasks asynchronously based on the arrival of the tasks. This joint user scheduling, time and computation resource allocation problem is posed as an optimization framework whose goal is to find the optimal scheduling and allocation strategy that minimizes the energy consumption of these mobile computing tasks. To solve this mixed-integer non-linear programming problem, the general benders decomposition method is adopted which decomposes the original problem into a primal problem and a master problem. Specifically, the primal problem is related to computation resource and time slot allocation, of which the optimal closed-form solution is obtained. The master problem regarding discrete user scheduling variables is constructed by adding optimality cuts or feasibility cuts according to whether the primal problem is feasible, which is a standard mixed-integer linear programming problem and can be efficiently solved. By iteratively solving the primal problem and master problem, the optimal scheduling and resource allocation scheme is obtained. Simulation results demonstrate that the proposed asynchronous computing framework reduces 87.17% energy consumption compared with conventional synchronous computing counterpart.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Attack and Defense Analysis of Learned Image Compression
Authors:
Tianyu Zhu,
Heming Sun,
Xiankui Xiong,
Xuanpeng Zhu,
Yong Gong,
Minge **g,
Yibo Fan
Abstract:
Learned image compression (LIC) is becoming more and more popular these years with its high efficiency and outstanding compression quality. Still, the practicality against modified inputs added with specific noise could not be ignored. White-box attacks such as FGSM and PGD use only gradient to compute adversarial images that mislead LIC models to output unexpected results. Our experiments compare…
▽ More
Learned image compression (LIC) is becoming more and more popular these years with its high efficiency and outstanding compression quality. Still, the practicality against modified inputs added with specific noise could not be ignored. White-box attacks such as FGSM and PGD use only gradient to compute adversarial images that mislead LIC models to output unexpected results. Our experiments compare the effects of different dimensions such as attack methods, models, qualities, and targets, concluding that in the worst case, there is a 61.55% decrease in PSNR or a 19.15 times increase in bpp under the PGD attack. To improve their robustness, we conduct adversarial training by adding adversarial images into the training datasets, which obtains a 95.52% decrease in the R-D cost of the most vulnerable LIC model. We further test the robustness of H.266, whose better performance on reconstruction quality extends its possibility to defend one-step or iterative adversarial attacks.
△ Less
Submitted 27 March, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Neural Born Series Operator for Biomedical Ultrasound Computed Tomography
Authors:
Zhijun Zeng,
Yihang Zheng,
Youjia Zheng,
Yubing Li,
Zuoqiang Shi,
He Sun
Abstract:
Ultrasound Computed Tomography (USCT) provides a radiation-free option for high-resolution clinical imaging. Despite its potential, the computationally intensive Full Waveform Inversion (FWI) required for tissue property reconstruction limits its clinical utility. This paper introduces the Neural Born Series Operator (NBSO), a novel technique designed to speed up wave simulations, thereby facilita…
▽ More
Ultrasound Computed Tomography (USCT) provides a radiation-free option for high-resolution clinical imaging. Despite its potential, the computationally intensive Full Waveform Inversion (FWI) required for tissue property reconstruction limits its clinical utility. This paper introduces the Neural Born Series Operator (NBSO), a novel technique designed to speed up wave simulations, thereby facilitating a more efficient USCT image reconstruction process through an NBSO-based FWI pipeline. Thoroughly validated on comprehensive brain and breast datasets, simulated under experimental USCT conditions, the NBSO proves to be accurate and efficient in both forward simulation and image reconstruction. This advancement demonstrates the potential of neural operators in facilitating near real-time USCT reconstruction, making the clinical application of USCT increasingly viable and promising.
△ Less
Submitted 24 December, 2023;
originally announced December 2023.
-
Fine-grained Disentangled Representation Learning for Multimodal Emotion Recognition
Authors:
Haoqin Sun,
Shiwan Zhao,
Xuechen Wang,
Wenjia Zeng,
Yong Chen,
Yong Qin
Abstract:
Multimodal emotion recognition (MMER) is an active research field that aims to accurately recognize human emotions by fusing multiple perceptual modalities. However, inherent heterogeneity across modalities introduces distribution gaps and information redundancy, posing significant challenges for MMER. In this paper, we propose a novel fine-grained disentangled representation learning (FDRL) frame…
▽ More
Multimodal emotion recognition (MMER) is an active research field that aims to accurately recognize human emotions by fusing multiple perceptual modalities. However, inherent heterogeneity across modalities introduces distribution gaps and information redundancy, posing significant challenges for MMER. In this paper, we propose a novel fine-grained disentangled representation learning (FDRL) framework to address these challenges. Specifically, we design modality-shared and modality-private encoders to project each modality into modality-shared and modality-private subspaces, respectively. In the shared subspace, we introduce a fine-grained alignment component to learn modality-shared representations, thus capturing modal consistency. Subsequently, we tailor a fine-grained disparity component to constrain the private subspaces, thereby learning modality-private representations and enhancing their diversity. Lastly, we introduce a fine-grained predictor component to ensure that the labels of the output representations from the encoders remain unchanged. Experimental results on the IEMOCAP dataset show that FDRL outperforms the state-of-the-art methods, achieving 78.34% and 79.44% on WAR and UAR, respectively.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions
Authors:
Yang Liu,
Haoqin Sun,
Geng Chen,
Qingyue Wang,
Zhen Zhao,
Xugang Lu,
Longbiao Wang
Abstract:
Speech emotion recognition (SER) performance deteriorates significantly in the presence of noise, making it challenging to achieve competitive performance in noisy conditions. To this end, we propose a multi-level knowledge distillation (MLKD) method, which aims to transfer the knowledge from a teacher model trained on clean speech to a simpler student model trained on noisy speech. Specifically,…
▽ More
Speech emotion recognition (SER) performance deteriorates significantly in the presence of noise, making it challenging to achieve competitive performance in noisy conditions. To this end, we propose a multi-level knowledge distillation (MLKD) method, which aims to transfer the knowledge from a teacher model trained on clean speech to a simpler student model trained on noisy speech. Specifically, we use clean speech features extracted by the wav2vec-2.0 as the learning goal and train the distil wav2vec-2.0 to approximate the feature extraction ability of the original wav2vec-2.0 under noisy conditions. Furthermore, we leverage the multi-level knowledge of the original wav2vec-2.0 to supervise the single-level output of the distil wav2vec-2.0. We evaluate the effectiveness of our proposed method by conducting extensive experiments using five types of noise-contaminated speech on the IEMOCAP dataset, which show promising results compared to state-of-the-art models.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Precise Coil Alignment for Dynamic Wireless Charging of Electric Vehicles with RFID Sensing
Authors:
Haijian Sun,
Xiang Ma,
Rose Qingyang Hu,
Randy Christensen
Abstract:
Electric vehicle (EV) has emerged as a transformative force for the sustainable and environmentally friendly future. To alleviate range anxiety caused by battery and charging facility, dynamic wireless power transfer (DWPT) is increasingly recognized as a key enabler for widespread EV adoption, yet it faces significant technical challenges, primarily in precise coil alignment. This article begins…
▽ More
Electric vehicle (EV) has emerged as a transformative force for the sustainable and environmentally friendly future. To alleviate range anxiety caused by battery and charging facility, dynamic wireless power transfer (DWPT) is increasingly recognized as a key enabler for widespread EV adoption, yet it faces significant technical challenges, primarily in precise coil alignment. This article begins by reviewing current alignment methodologies and evaluates their advantages and limitations. We observe that achieving the necessary alignment precision is challenging with these existing methods. To address this, we present an innovative RFID-based DWPT coil alignment system, utilizing coherent phase detection and a maximum likelihood estimation algorithm, capable of achieving sub-10 cm accuracy. This system's efficacy in providing both lateral and vertical misalignment estimates has been verified through laboratory and experimental tests. We also discuss potential challenges in broader system implementation and propose corresponding solutions. This research offers a viable and promising solution for enhancing DWPT efficiency.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Accelerating Learnt Video Codecs with Gradient Decay and Layer-wise Distillation
Authors:
Tianhao Peng,
Ge Gao,
Heming Sun,
Fan Zhang,
David Bull
Abstract:
In recent years, end-to-end learnt video codecs have demonstrated their potential to compete with conventional coding algorithms in term of compression efficiency. However, most learning-based video compression models are associated with high computational complexity and latency, in particular at the decoder side, which limits their deployment in practical applications. In this paper, we present a…
▽ More
In recent years, end-to-end learnt video codecs have demonstrated their potential to compete with conventional coding algorithms in term of compression efficiency. However, most learning-based video compression models are associated with high computational complexity and latency, in particular at the decoder side, which limits their deployment in practical applications. In this paper, we present a novel model-agnostic pruning scheme based on gradient decay and adaptive layer-wise distillation. Gradient decay enhances parameter exploration during sparsification whilst preventing runaway sparsity and is superior to the standard Straight-Through Estimation. The adaptive layer-wise distillation regulates the sparse training in various stages based on the distortion of intermediate features. This stage-wise design efficiently updates parameters with minimal computational overhead. The proposed approach has been applied to three popular end-to-end learnt video codecs, FVC, DCVC, and DCVC-HEM. Results confirm that our method yields up to 65% reduction in MACs and 2x speed-up with less than 0.3dB drop in BD-PSNR. Supporting code and supplementary material can be downloaded from: https://jasminepp.github.io/lightweightdvc/
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
MoEC: Mixture of Experts Implicit Neural Compression
Authors:
Jianchen Zhao,
Cheng-Ching Tseng,
Ming Lu,
Ruichuan An,
Xiaobao Wei,
He Sun,
Shanghang Zhang
Abstract:
Emerging Implicit Neural Representation (INR) is a promising data compression technique, which represents the data using the parameters of a Deep Neural Network (DNN). Existing methods manually partition a complex scene into local regions and overfit the INRs into those regions. However, manually designing the partition scheme for a complex scene is very challenging and fails to jointly learn the…
▽ More
Emerging Implicit Neural Representation (INR) is a promising data compression technique, which represents the data using the parameters of a Deep Neural Network (DNN). Existing methods manually partition a complex scene into local regions and overfit the INRs into those regions. However, manually designing the partition scheme for a complex scene is very challenging and fails to jointly learn the partition and INRs. To solve the problem, we propose MoEC, a novel implicit neural compression method based on the theory of mixture of experts. Specifically, we use a gating network to automatically assign a specific INR to a 3D point in the scene. The gating network is trained jointly with the INRs of different local regions. Compared with block-wise and tree-structured partitions, our learnable partition can adaptively find the optimal partition in an end-to-end manner. We conduct detailed experiments on massive and diverse biomedical data to demonstrate the advantages of MoEC against existing approaches. In most of experiment settings, we have achieved state-of-the-art results. Especially in cases of extreme compression ratios, such as 6000x, we are able to uphold the PSNR of 48.16.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Channel Modeling for Terahertz Communications in Rain
Authors:
Peian Li,
Wenbo Liu,
Jiacheng Liu,
Da Li,
Guohao Liu,
Yuanshuai Lei,
Jiabiao Zhao,
Xiaopeng Wang,
Houjun Sun,
Jianjun Ma,
John F. Federici
Abstract:
Terahertz (THz) communication channels, integral to outdoor applications, are critically influenced by natural factors like rainfall. Our research focused on the nuanced effects of rain on these channels, employing an advanced rainfall emulation system. By analyzing key parameters such as rain rate, altitude based variations in rainfall, and diverse raindrop sizes, we identified the paramount sign…
▽ More
Terahertz (THz) communication channels, integral to outdoor applications, are critically influenced by natural factors like rainfall. Our research focused on the nuanced effects of rain on these channels, employing an advanced rainfall emulation system. By analyzing key parameters such as rain rate, altitude based variations in rainfall, and diverse raindrop sizes, we identified the paramount significance of the number of raindrops in the THz channel, particularly in scenarios with constant rain rates but varying drop sizes. Central to our findings is a novel model grounded in Mie scattering theory, which adeptly incorporates the variability of raindrop size distributions at different altitudes. This model has displayed strong congruence with our experimental results. In essence, our study underscores the inadequacy of solely depending on a fixed ground-based rain rate and emphasizes the imperative of calibrating distribution metrics to cater to specific environmental and operational contexts.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Networked Multiagent Safe Reinforcement Learning for Low-carbon Demand Management in Distribution Network
Authors:
Jichen Zhang,
Linwei Sang,
Yinliang Xu,
Hongbin Sun
Abstract:
This paper proposes a multiagent based bi-level operation framework for the low-carbon demand management in distribution networks considering the carbon emission allowance on the demand side. In the upper level, the aggregate load agents optimize the control signals for various types of loads to maximize the profits; in the lower level, the distribution network operator makes optimal dispatching d…
▽ More
This paper proposes a multiagent based bi-level operation framework for the low-carbon demand management in distribution networks considering the carbon emission allowance on the demand side. In the upper level, the aggregate load agents optimize the control signals for various types of loads to maximize the profits; in the lower level, the distribution network operator makes optimal dispatching decisions to minimize the operational costs and calculates the distribution locational marginal price and carbon intensity. The distributed flexible load agent has only incomplete information of the distribution network and cooperates with other agents using networked communication. Finally, the problem is formulated into a networked multi-agent constrained Markov decision process, which is solved using a safe reinforcement learning algorithm called consensus multi-agent constrained policy optimization considering the carbon emission allowance for each agent. Case studies with the IEEE 33-bus and 123-bus distribution network systems demonstrate the effectiveness of the proposed approach, in terms of satisfying the carbon emission constraint on demand side, ensuring the safe operation of the distribution network and preserving privacy of both sides.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Dynamic Operating Envelopes Embedded Peer-to-Peer-to-Grid Energy Trading
Authors:
Zhisen Jiang,
Ye Guo,
Hongbin Sun,
Jianxiao Wang
Abstract:
A novel decentralized peer-to-peer-to-grid (P2P2G) trading mechanism considering distribution network integrity is proposed. In order to direct prosumers' peer-to-peer (P2P) trading behavior to be grid-friendly, the proposed method incorporates Dynamic Operating Envelopes (DOEs) into the existing P2P2G trading. Moreover, DOEs are determined through negotiations between the distribution system oper…
▽ More
A novel decentralized peer-to-peer-to-grid (P2P2G) trading mechanism considering distribution network integrity is proposed. In order to direct prosumers' peer-to-peer (P2P) trading behavior to be grid-friendly, the proposed method incorporates Dynamic Operating Envelopes (DOEs) into the existing P2P2G trading. Moreover, DOEs are determined through negotiations between the distribution system operator (DSO) and prosumers alongside the process of P2P trading, avoiding compromising prosumers' privacy and network parameters leakage. To reduce communication costs during P2P trading, a variant of the alternating direction method of multipliers (ADMM), i.e., communication-censored ADMM (COCA) is used to solve the P2P2G trading problem. Finally, the DOE price is shown to be comprised of several economically interpretable components. Simulations validate the effectiveness of the proposed mechanism.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Fast Controllable Diffusion Models for Undersampled MRI Reconstruction
Authors:
Wei Jiang,
Zhuang Xiong,
Feng Liu,
Nan Ye,
Hongfu Sun
Abstract:
Supervised deep learning methods have shown promise in undersampled Magnetic Resonance Imaging (MRI) reconstruction, but their requirement for paired data limits their generalizability to the diverse MRI acquisition parameters. Recently, unsupervised controllable generative diffusion models have been applied to undersampled MRI reconstruction, without paired data or model retraining for different…
▽ More
Supervised deep learning methods have shown promise in undersampled Magnetic Resonance Imaging (MRI) reconstruction, but their requirement for paired data limits their generalizability to the diverse MRI acquisition parameters. Recently, unsupervised controllable generative diffusion models have been applied to undersampled MRI reconstruction, without paired data or model retraining for different MRI acquisitions. However, diffusion models are generally slow in sampling and state-of-the-art acceleration techniques can lead to sub-optimal results when directly applied to the controllable generation process. This study introduces a new algorithm called Predictor-Projector-Noisor (PPN), which enhances and accelerates controllable generation of diffusion models for undersampled MRI reconstruction. Our results demonstrate that PPN produces high-fidelity MR images that conform to undersampled k-space measurements with significantly shorter reconstruction time than other controllable sampling methods. In addition, the unsupervised PPN accelerated diffusion models are adaptable to different MRI acquisition parameters, making them more practical for clinical use than supervised learning techniques.
△ Less
Submitted 11 June, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks
Authors:
** Ye,
Junlong Cheng,
Jianpin Chen,
Zhongying Deng,
Tianbin Li,
Haoyu Wang,
Yanzhou Su,
Ziyan Huang,
Jilong Chen,
Lei Jiang,
Hui Sun,
Min Zhu,
Shaoting Zhang,
Junjun He,
Yu Qiao
Abstract:
Segment Anything Model (SAM) has achieved impressive results for natural image segmentation with input prompts such as points and bounding boxes. Its success largely owes to massive labeled training data. However, directly applying SAM to medical image segmentation cannot perform well because SAM lacks medical knowledge -- it does not use medical images for training. To incorporate medical knowled…
▽ More
Segment Anything Model (SAM) has achieved impressive results for natural image segmentation with input prompts such as points and bounding boxes. Its success largely owes to massive labeled training data. However, directly applying SAM to medical image segmentation cannot perform well because SAM lacks medical knowledge -- it does not use medical images for training. To incorporate medical knowledge into SAM, we introduce SA-Med2D-20M, a large-scale segmentation dataset of 2D medical images built upon numerous public and private datasets. It consists of 4.6 million 2D medical images and 19.7 million corresponding masks, covering almost the whole body and showing significant diversity. This paper describes all the datasets collected in SA-Med2D-20M and details how to process these datasets. Furthermore, comprehensive statistics of SA-Med2D-20M are presented to facilitate the better use of our dataset, which can help the researchers build medical vision foundation models or apply their models to downstream medical applications. We hope that the large scale and diversity of SA-Med2D-20M can be leveraged to develop medical artificial intelligence for enhancing diagnosis, medical image analysis, knowledge sharing, and education. The data with the redistribution license is publicly available at https://github.com/OpenGVLab/SAM-Med2D.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Plug-and-Play Latent Feature Editing for Orientation-Adaptive Quantitative Susceptibility Map** Neural Networks
Authors:
Yang Gao,
Zhuang Xiong,
Shanshan Shan,
Yin Liu,
Pengfei Rong,
Min Li,
Alan H Wilman,
G. Bruce Pike,
Feng Liu,
Hongfu Sun
Abstract:
Quantitative susceptibility map** (QSM) is a post-processing technique for deriving tissue magnetic susceptibility distribution from MRI phase measurements. Deep learning (DL) algorithms hold great potential for solving the ill-posed QSM reconstruction problem. However, a significant challenge facing current DL-QSM approaches is their limited adaptability to magnetic dipole field orientation var…
▽ More
Quantitative susceptibility map** (QSM) is a post-processing technique for deriving tissue magnetic susceptibility distribution from MRI phase measurements. Deep learning (DL) algorithms hold great potential for solving the ill-posed QSM reconstruction problem. However, a significant challenge facing current DL-QSM approaches is their limited adaptability to magnetic dipole field orientation variations during training and testing. In this work, we propose a novel Orientation-Adaptive Latent Feature Editing (OA-LFE) module to learn the encoding of acquisition orientation vectors and seamlessly integrate them into the latent features of deep networks. Importantly, it can be directly Plug-and-Play (PnP) into various existing DL-QSM architectures, enabling reconstructions of QSM from arbitrary magnetic dipole orientations. Its effectiveness is demonstrated by combining the OA-LFE module into our previously proposed phase-to-susceptibility single-step instant QSM (iQSM) network, which was initially tailored for pure-axial acquisitions. The proposed OA-LFE-empowered iQSM, which we refer to as iQSM+, is trained in a self-supervised manner on a specially-designed simulation brain dataset. Comprehensive experiments are conducted on simulated and in vivo human brain datasets, encompassing subjects ranging from healthy individuals to those with pathological conditions. These experiments involve various MRI platforms (3T and 7T) and aim to compare our proposed iQSM+ against several established QSM reconstruction frameworks, including the original iQSM. The iQSM+ yields QSM images with significantly improved accuracies and mitigates artifacts, surpassing other state-of-the-art DL-QSM algorithms.
△ Less
Submitted 26 March, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Equal Incremental Cost-Based Optimization Method to Enhance Efficiency for IPOP-Type Converters
Authors:
Hanfeng Cai,
Haiyang Liu,
Heyang Sun,
Qiao Wang
Abstract:
Systematic optimization over a wide power range is often achieved through the combination of modules of different power levels. This paper addresses the issue of enhancing the efficiency of a multiple module system connected in parallel during operation and proposes an algorithm based on equal incremental cost for dynamic load allocation. Initially, a polynomial fitting technique is employed to fi…
▽ More
Systematic optimization over a wide power range is often achieved through the combination of modules of different power levels. This paper addresses the issue of enhancing the efficiency of a multiple module system connected in parallel during operation and proposes an algorithm based on equal incremental cost for dynamic load allocation. Initially, a polynomial fitting technique is employed to fit efficiency test points for individual modules. Subsequently, the equal incremental cost-based optimization is utilized to formulate an efficiency optimization and allocation scheme for the multi-module system. A simulated annealing algorithm is applied to determine the optimal power output strategy for each module at given total power flow requirement. Finally, a dual active bridge (DAB) experimental prototype with two input-parallel-output-parallel (IPOP) configurations is constructed to validate the effectiveness of the proposed strategy. Experimental results demonstrate that under the 800W operating condition, the approach in this paper achieves an efficiency improvement of over 0.74\% by comparison with equal power sharing between both modules.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
MTS-DVGAN: Anomaly Detection in Cyber-Physical Systems using a Dual Variational Generative Adversarial Network
Authors:
Haili Sun,
Yan Huang,
Lansheng Han,
Cai Fu,
Hongle Liu,
Xiang Long
Abstract:
Deep generative models are promising in detecting novel cyber-physical attacks, mitigating the vulnerability of Cyber-physical systems (CPSs) without relying on labeled information. Nonetheless, these generative models face challenges in identifying attack behaviors that closely resemble normal data, or deviate from the normal data distribution but are in close proximity to the manifold of the nor…
▽ More
Deep generative models are promising in detecting novel cyber-physical attacks, mitigating the vulnerability of Cyber-physical systems (CPSs) without relying on labeled information. Nonetheless, these generative models face challenges in identifying attack behaviors that closely resemble normal data, or deviate from the normal data distribution but are in close proximity to the manifold of the normal cluster in latent space. To tackle this problem, this article proposes a novel unsupervised dual variational generative adversarial model named MST-DVGAN, to perform anomaly detection in multivariate time series data for CPS security. The central concept is to enhance the model's discriminative capability by widening the distinction between reconstructed abnormal samples and their normal counterparts. Specifically, we propose an augmented module by imposing contrastive constraints on the reconstruction process to obtain a more compact embedding. Then, by exploiting the distribution property and modeling the normal patterns of multivariate time series, a variational autoencoder is introduced to force the generative adversarial network (GAN) to generate diverse samples. Furthermore, two augmented loss functions are designed to extract essential characteristics in a self-supervised manner through mutual guidance between the augmented samples and original samples. Finally, a specific feature center loss is introduced for the generator network to enhance its stability. Empirical experiments are conducted on three public datasets, namely SWAT, WADI and NSL_KDD. Comparing with the state-of-the-art methods, the evaluation results show that the proposed MTS-DVGAN is more stable and can achieve consistent performance improvement.
△ Less
Submitted 4 November, 2023;
originally announced November 2023.
-
Deep Learning Enables Large Depth-of-Field Images for Sub-Diffraction-Limit Scanning Superlens Microscopy
Authors:
Hui Sun,
Hao Luo,
Feifei Wang,
Qingjiu Chen,
Meng Chen,
Xiaoduo Wang,
Haibo Yu,
Guanglie Zhang,
Lianqing Liu,
Jian** Wang,
Dapeng Wu,
Wen Jung Li
Abstract:
Scanning electron microscopy (SEM) is indispensable in diverse applications ranging from microelectronics to food processing because it provides large depth-of-field images with a resolution beyond the optical diffraction limit. However, the technology requires coating conductive films on insulator samples and a vacuum environment. We use deep learning to obtain the map** relationship between op…
▽ More
Scanning electron microscopy (SEM) is indispensable in diverse applications ranging from microelectronics to food processing because it provides large depth-of-field images with a resolution beyond the optical diffraction limit. However, the technology requires coating conductive films on insulator samples and a vacuum environment. We use deep learning to obtain the map** relationship between optical super-resolution (OSR) images and SEM domain images, which enables the transformation of OSR images into SEM-like large depth-of-field images. Our custom-built scanning superlens microscopy (SSUM) system, which requires neither coating samples by conductive films nor a vacuum environment, is used to acquire the OSR images with features down to ~80 nm. The peak signal-to-noise ratio (PSNR) and structural similarity index measure values indicate that the deep learning method performs excellently in image-to-image translation, with a PSNR improvement of about 0.74 dB over the optical super-resolution images. The proposed method provides a high level of detail in the reconstructed results, indicating that it has broad applicability to chip-level defect detection, biological sample analysis, forensics, and various other fields.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Energy Efficient Robust Beamforming for Vehicular ISAC with Imperfect Channel Estimation
Authors:
Hanwen Zhang,
Haijian Sun,
Tianyi He,
Weiming Xiang,
Rose Qingyang Hu
Abstract:
This paper investigates robust beamforming for system-centric energy efficiency (EE) optimization in the vehicular integrated sensing and communication (ISAC) system, where the mobility of vehicles poses significant challenges to channel estimation. To obtain the optimal beamforming under channel uncertainty, we first formulate an optimization problem for maximizing the system EE under bounded cha…
▽ More
This paper investigates robust beamforming for system-centric energy efficiency (EE) optimization in the vehicular integrated sensing and communication (ISAC) system, where the mobility of vehicles poses significant challenges to channel estimation. To obtain the optimal beamforming under channel uncertainty, we first formulate an optimization problem for maximizing the system EE under bounded channel estimation errors. Next, fractional programming and semidefinite relaxation (SDR) are utilized to relax the rank-1 constraints. We further use Schur complement and S-Procedure to transform Cramer-Rao bound (CRB) and channel estimation error constraints into convex forms, respectively. Based on the Lagrangian dual function and Karush-Kuhn-Tucker (KKT) conditions, it is proved that the optimal beamforming solution is rank-1. Finally, we present comprehensive simulation results to demonstrate two key findings: 1) the proposed algorithm exhibits a favorable convergence rate, and 2) the approach effectively mitigates the impact of channel estimation errors.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Multiscale Motion-Aware and Spatial-Temporal-Channel Contextual Coding Network for Learned Video Compression
Authors:
Yiming Wang,
Qian Huang,
Bin Tang,
Huashan Sun,
Xing Li
Abstract:
Recently, learned video compression has achieved exciting performance. Following the traditional hybrid prediction coding framework, most learned methods generally adopt the motion estimation motion compensation (MEMC) method to remove inter-frame redundancy. However, inaccurate motion vector (MV) usually lead to the distortion of reconstructed frame. In addition, most approaches ignore the spatia…
▽ More
Recently, learned video compression has achieved exciting performance. Following the traditional hybrid prediction coding framework, most learned methods generally adopt the motion estimation motion compensation (MEMC) method to remove inter-frame redundancy. However, inaccurate motion vector (MV) usually lead to the distortion of reconstructed frame. In addition, most approaches ignore the spatial and channel redundancy. To solve above problems, we propose a motion-aware and spatial-temporal-channel contextual coding based video compression network (MASTC-VC), which learns the latent representation and uses variational autoencoders (VAEs) to capture the characteristics of intra-frame pixels and inter-frame motion. Specifically, we design a multiscale motion-aware module (MS-MAM) to estimate spatial-temporal-channel consistent motion vector by utilizing the multiscale motion prediction information in a coarse-to-fine way. On the top of it, we further propose a spatial-temporal-channel contextual module (STCCM), which explores the correlation of latent representation to reduce the bit consumption from spatial, temporal and channel aspects respectively. Comprehensive experiments show that our proposed MASTC-VC is surprior to previous state-of-the-art (SOTA) methods on three public benchmark datasets. More specifically, our method brings average 10.15\% BD-rate savings against H.265/HEVC (HM-16.20) in PSNR metric and average 23.93\% BD-rate savings against H.266/VVC (VTM-13.2) in MS-SSIM metric.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Map2Schedule: An End-to-End Link Scheduling Method for Urban V2V Communications
Authors:
Lihao Zhang,
Haijian Sun,
** Sun,
Ramviyas Parasuraman,
Yinghui Ye,
Rose Qingyang Hu
Abstract:
Urban vehicle-to-vehicle (V2V) link scheduling with shared spectrum is a challenging problem. Its main goal is to find the scheduling policy that can maximize system performance (usually the sum capacity of each link or their energy efficiency). Given that each link can experience interference from all other active links, the scheduling becomes a combinatorial integer programming problem and gener…
▽ More
Urban vehicle-to-vehicle (V2V) link scheduling with shared spectrum is a challenging problem. Its main goal is to find the scheduling policy that can maximize system performance (usually the sum capacity of each link or their energy efficiency). Given that each link can experience interference from all other active links, the scheduling becomes a combinatorial integer programming problem and generally does not scale well with the number of V2V pairs. Moreover, link scheduling requires accurate channel state information (CSI), which is very difficult to estimate with good accuracy under high vehicle mobility. In this paper, we propose an end-to-end urban V2V link scheduling method called Map2Schedule, which can directly generate V2V scheduling policy from the city map and vehicle locations. Map2Schedule delivers comparable performance to the physical-model-based methods in urban settings while maintaining low computation complexity. This enhanced performance is achieved by machine learning (ML) technologies. Specifically, we first deploy the convolutional neural network (CNN) model to estimate the CSI from street layout and vehicle locations and then apply the graph embedding model for optimal scheduling policy. The results show that the proposed method can achieve high accuracy with much lower overhead and latency.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples
Authors:
Hao Sun,
Alihan Hüyük,
Daniel Jarrett,
Mihaela van der Schaar
Abstract:
Learning controllers with offline data in decision-making systems is an essential area of research due to its potential to reduce the risk of applications in real-world systems. However, in responsibility-sensitive settings such as healthcare, decision accountability is of paramount importance, yet has not been adequately addressed by the literature. This paper introduces the Accountable Offline C…
▽ More
Learning controllers with offline data in decision-making systems is an essential area of research due to its potential to reduce the risk of applications in real-world systems. However, in responsibility-sensitive settings such as healthcare, decision accountability is of paramount importance, yet has not been adequately addressed by the literature. This paper introduces the Accountable Offline Controller (AOC) that employs the offline dataset as the Decision Corpus and performs accountable control based on a tailored selection of examples, referred to as the Corpus Subset. AOC operates effectively in low-data scenarios, can be extended to the strictly offline imitation setting, and displays qualities of both conservation and adaptability. We assess AOC's performance in both simulated and real-world healthcare scenarios, emphasizing its capability to manage offline control tasks with high levels of performance while maintaining accountability.
△ Less
Submitted 27 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.