Search | arXiv e-print repository

Large Language Model-aided Edge Learning in Distribution System State Estimation

Authors: Renyou Xie, Xin Yin, Chaojie Li, Nian Liu, Bo Zhao, Zhaoyang Dong

Abstract: Distribution system state estimation (DSSE) plays a crucial role in the real-time monitoring, control, and operation of distribution networks. Besides intensive computational requirements, conventional DSSE methods need high-quality measurements to obtain accurate states, whereas missing values often occur due to sensor failures or communication delays. To address these challenging issues, a forec… ▽ More Distribution system state estimation (DSSE) plays a crucial role in the real-time monitoring, control, and operation of distribution networks. Besides intensive computational requirements, conventional DSSE methods need high-quality measurements to obtain accurate states, whereas missing values often occur due to sensor failures or communication delays. To address these challenging issues, a forecast-then-estimate framework of edge learning is proposed for DSSE, leveraging large language models (LLMs) to forecast missing measurements and provide pseudo-measurements. Firstly, natural language-based prompts and measurement sequences are integrated by the proposed LLM to learn patterns from historical data and provide accurate forecasting results. Secondly, a convolutional layer-based neural network model is introduced to improve the robustness of state estimation under missing measurement. Thirdly, to alleviate the overfitting of the deep learning-based DSSE, it is reformulated as a multi-task learning framework containing shared and task-specific layers. The uncertainty weighting algorithm is applied to find the optimal weights to balance different tasks. The numerical simulation on the Simbench case is used to demonstrate the effectiveness of the proposed forecast-then-estimate framework. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2404.01717 [pdf, other]

AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation

Authors: Rui Xie, Ying Tai, Chen Zhao, Kai Zhang, Zhenyu Zhang, Jun Zhou, Xiaoqian Ye, Qian Wang, Jian Yang

Abstract: Blind super-resolution methods based on stable diffusion showcase formidable generative capabilities in reconstructing clear high-resolution images with intricate details from low-resolution inputs. However, their practical applicability is often hampered by poor efficiency, stemming from the requirement of thousands or hundreds of sampling steps. Inspired by the efficient adversarial diffusion di… ▽ More Blind super-resolution methods based on stable diffusion showcase formidable generative capabilities in reconstructing clear high-resolution images with intricate details from low-resolution inputs. However, their practical applicability is often hampered by poor efficiency, stemming from the requirement of thousands or hundreds of sampling steps. Inspired by the efficient adversarial diffusion distillation (ADD), we design~\name~to address this issue by incorporating the ideas of both distillation and ControlNet. Specifically, we first propose a prediction-based self-refinement strategy to provide high-frequency information in the student model output with marginal additional time cost. Furthermore, we refine the training process by employing HR images, rather than LR images, to regulate the teacher model, providing a more robust constraint for distillation. Second, we introduce a timestep-adaptive ADD to address the perception-distortion imbalance problem introduced by original ADD. Extensive experiments demonstrate our~\name~generates better restoration results, while achieving faster speed than previous SD-based state-of-the-art models (e.g., $7$$\times$ faster than SeeSR). △ Less

Submitted 23 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.19185 [pdf, other]

Deep CSI Compression for Dual-Polarized Massive MIMO Channels with Disentangled Representation Learning

Authors: Suhang Fan, Wei Xu, Renjie Xie, Shi **, Derrick Wing Kwan Ng, Naofal Al-Dhahir

Abstract: Channel state information (CSI) feedback is critical for achieving the promised advantages of enhancing spectral and energy efficiencies in massive multiple-input multiple-output (MIMO) wireless communication systems. Deep learning (DL)-based methods have been proven effective in reducing the required signaling overhead for CSI feedback. In practical dual-polarized MIMO scenarios, channels in the… ▽ More Channel state information (CSI) feedback is critical for achieving the promised advantages of enhancing spectral and energy efficiencies in massive multiple-input multiple-output (MIMO) wireless communication systems. Deep learning (DL)-based methods have been proven effective in reducing the required signaling overhead for CSI feedback. In practical dual-polarized MIMO scenarios, channels in the vertical and horizontal polarization directions tend to exhibit high polarization correlation. To fully exploit the inherent propagation similarity within dual-polarized channels, we propose a disentangled representation neural network (NN) for CSI feedback, referred to as DiReNet. The proposed DiReNet disentangles dual-polarized CSI into three components: polarization-shared information, vertical polarization-specific information, and horizontal polarization-specific information. This disentanglement of dual-polarized CSI enables the minimization of information redundancy caused by the polarization correlation and improves the performance of CSI compression and recovery. Additionally, flexible quantization and network extension schemes are designed. Consequently, our method provides a pragmatic solution for CSI feedback to harness the physical MIMO polarization as a priori information. Our experimental results show that the performance of our proposed DiReNet surpasses that of existing DL-based networks, while also effectively reducing the number of network parameters by nearly one third. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2402.19085 [pdf, other]

Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

Authors: Yiju Guo, Ganqu Cui, Lifan Yuan, Ning Ding, Jiexin Wang, Huimin Chen, Bowen Sun, Ruobing Xie, Jie Zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun

Abstract: Alignment in artificial intelligence pursues the consistency between model responses and human preferences as well as values. In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a compromise where enhancements in alignment within one objective (e.g.,harmlessness) can diminish performance in others (e.g.,helpfulness). However, exi… ▽ More Alignment in artificial intelligence pursues the consistency between model responses and human preferences as well as values. In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a compromise where enhancements in alignment within one objective (e.g.,harmlessness) can diminish performance in others (e.g.,helpfulness). However, existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives. To navigate this challenge, we argue the prominence of grounding LLMs with evident preferences. We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives, thereby guiding the model to generate responses that meet the requirements. Our experimental analysis reveals that the aligned models can provide responses that match various preferences among the "3H" (helpfulness, honesty, harmlessness) desiderata. Furthermore, by introducing diverse data and alignment goals, we surpass baseline methods in aligning with single objectives, hence mitigating the impact of the alignment tax and achieving Pareto improvements in multi-objective alignment. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2401.10411 [pdf, other]

AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition

Authors: Ju Lin, Niko Moritz, Yiteng Huang, Ruiming Xie, Ming Sun, Christian Fuegen, Frank Seide

Abstract: Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations. We build on our recently introduced directional Automatic Speech Recognition (ASR) for smart glasses that have microphone arrays, which fuses multi-channel ASR with serialized output training, for wearer/conversation-partner disambiguation as well as s… ▽ More Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations. We build on our recently introduced directional Automatic Speech Recognition (ASR) for smart glasses that have microphone arrays, which fuses multi-channel ASR with serialized output training, for wearer/conversation-partner disambiguation as well as suppression of cross-talk speech from non-target directions and noise. When ASR work is part of a broader system-development process, one may be faced with changes to microphone geometries as system development progresses. This paper aims to make multi-channel ASR insensitive to limited variations of microphone-array geometry. We show that a model trained on multiple similar geometries is largely agnostic and generalizes well to new geometries, as long as they are not too different. Furthermore, training the model this way improves accuracy for seen geometries by 15 to 28\% relative. Lastly, we refine the beamforming by a novel Non-Linearly Constrained Minimum Variance criterion. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: Accepted to ICASSP 2024

arXiv:2310.14744 [pdf, other]

A Multi-timescale and Chance-Constrained Energy Dispatching Strategy of Integrated Heat-Power Community with Shared Hybrid Energy Storage

Authors: Wenyi Zhang, Yue Chen, Rui Xie, Yunjian Xu

Abstract: The community in the future may develop into an integrated heat-power system, which includes a high proportion of renewable energy, power generator units, heat generator units, and shared hybrid energy storage. In the integrated heat-power system with coupling heat-power generators and demands, the key challenges lie in the interaction between heat and power, the inherent uncertainty of renewable… ▽ More The community in the future may develop into an integrated heat-power system, which includes a high proportion of renewable energy, power generator units, heat generator units, and shared hybrid energy storage. In the integrated heat-power system with coupling heat-power generators and demands, the key challenges lie in the interaction between heat and power, the inherent uncertainty of renewable energy and consumers' demands, and the multi-timescale scheduling of heat and power. In this paper, we propose a game theoretic model of the integrated heat-power system. For the welfare-maximizing community operator, its energy dispatch strategy is under chance constraints, where the day-ahead scheduling determines the scheduled energy dispatching strategies, and the real-time dispatch considers the adjustment of generators. For utility-maximizing consumers, their demands are sensitive to the preference parameters. Taking into account the uncertainty in both renewable energy and consumer demand, we prove the existence and uniqueness of the Stackelberg game equilibrium and develop a fixed point algorithm to find the market equilibrium between the community operator and community consumers. Numerical simulations on integrated heat-power system validate the effectiveness of the proposed multi-timescale integrated heat and power model. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2308.05864 [pdf, other]

doi 10.1038/s41592-024-02233-6

The Multi-modality Cell Segmentation Challenge: Towards Universal Solutions

Authors: Jun Ma, Ronald Xie, Shamini Ayyadhury, Cheng Ge, Anubha Gupta, Ritu Gupta, Song Gu, Yao Zhang, Gihun Lee, Joonkee Kim, Wei Lou, Haofeng Li, Eric Upschulte, Timo Dickscheid, José Guilherme de Almeida, Yixin Wang, Lin Han, Xin Yang, Marco Labagnara, Vojislav Gligorovski, Maxime Scheder, Sahand Jamal Rahi, Carly Kempster, Alice Pollitt, Leon Espinosa , et al. (15 additional authors not shown)

Abstract: Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diver… ▽ More Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diverse biological experiments. The top participants developed a Transformer-based deep-learning algorithm that not only exceeds existing methods but can also be applied to diverse microscopy images across imaging platforms and tissue types without manual parameter adjustments. This benchmark and the improved algorithm offer promising avenues for more accurate and versatile cell analysis in microscopy imaging. △ Less

Submitted 1 April, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

Comments: NeurIPS22 Cell Segmentation Challenge: https://neurips22-cellseg.grand-challenge.org/ . Nature Methods (2024)

arXiv:2305.07662 [pdf, other]

Self-information Domain-based Neural CSI Compression with Feature Coupling

Authors: Ziqing Yin, Renjie Xie, Wei Xu, Zhaohui Yang, Xiaohu You

Abstract: Deep learning (DL)-based channel state information (CSI) feedback methods compressed the CSI matrix by exploiting its delay and angle features straightforwardly, while the measure in terms of information contained in the CSI matrix has rarely been considered. Based on this observation, we introduce self-information as an informative CSI representation from the perspective of information theory, wh… ▽ More Deep learning (DL)-based channel state information (CSI) feedback methods compressed the CSI matrix by exploiting its delay and angle features straightforwardly, while the measure in terms of information contained in the CSI matrix has rarely been considered. Based on this observation, we introduce self-information as an informative CSI representation from the perspective of information theory, which reflects the amount of information of the original CSI matrix in an explicit way. Then, a novel DL-based network is proposed for temporal CSI compression in the self-information domain, namely SD-CsiNet. The proposed SD-CsiNet projects the raw CSI onto a self-information matrix in the newly-defined self-information domain, extracts both temporal and spatial features of the self-information matrix, and then couples these two features for effective compression. Experimental results verify the effectiveness of the proposed SD-CsiNet by exploiting the self-information of CSI. Particularly for compression ratios 1/8 and 1/16, the SD-CsiNet respectively achieves 7.17 dB and 3.68 dB performance gains compared to state-of-the-art methods. △ Less

Submitted 30 April, 2023; originally announced May 2023.

arXiv:2303.17200 [pdf, other]

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Authors: Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, **chuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jáchym Kolář, Stavros Petridis, Maja Pantic, Christian Fuegen

Abstract: Recently reported state-of-the-art results in visual speech recognition (VSR) often rely on increasingly large amounts of video data, while the publicly available transcribed video datasets are limited in size. In this paper, for the first time, we study the potential of leveraging synthetic visual data for VSR. Our method, termed SynthVSR, substantially improves the performance of VSR systems wit… ▽ More Recently reported state-of-the-art results in visual speech recognition (VSR) often rely on increasingly large amounts of video data, while the publicly available transcribed video datasets are limited in size. In this paper, for the first time, we study the potential of leveraging synthetic visual data for VSR. Our method, termed SynthVSR, substantially improves the performance of VSR systems with synthetic lip movements. The key idea behind SynthVSR is to leverage a speech-driven lip animation model that generates lip movements conditioned on the input speech. The speech-driven lip animation model is trained on an unlabeled audio-visual dataset and could be further optimized towards a pre-trained VSR model when labeled videos are available. As plenty of transcribed acoustic data and face images are available, we are able to generate large-scale synthetic data using the proposed lip animation model for semi-supervised VSR training. We evaluate the performance of our approach on the largest public VSR benchmark - Lip Reading Sentences 3 (LRS3). SynthVSR achieves a WER of 43.3% with only 30 hours of real labeled data, outperforming off-the-shelf approaches using thousands of hours of video. The WER is further reduced to 27.9% when using all 438 hours of labeled data from LRS3, which is on par with the state-of-the-art self-supervised AV-HuBERT method. Furthermore, when combined with large-scale pseudo-labeled audio-visual data SynthVSR yields a new state-of-the-art VSR WER of 16.9% using publicly available data only, surpassing the recent state-of-the-art approaches trained with 29 times more non-public machine-transcribed video data (90,000 hours). Finally, we perform extensive ablation studies to understand the effect of each component in our proposed method. △ Less

Submitted 3 April, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

Comments: IEEE/CVF CVPR 2023

arXiv:2212.14665 [pdf, other]

Sizing Grid-Connected Wind Power Generation and Energy Storage with Wake Effect and Endogenous Uncertainty: A Distributionally Robust Method

Authors: Rui Xie, Wei Wei, Yue Chen

Abstract: Wind power, as a green energy resource, is growing rapidly worldwide, along with energy storage systems (ESSs) to mitigate its volatility. Sizing of wind power generation and ESSs has become an important problem to be addressed. Wake effect in a wind farm can cause wind speed deficits and a drop in downstream wind turbine power generation, which however was rarely considered in the sizing problem… ▽ More Wind power, as a green energy resource, is growing rapidly worldwide, along with energy storage systems (ESSs) to mitigate its volatility. Sizing of wind power generation and ESSs has become an important problem to be addressed. Wake effect in a wind farm can cause wind speed deficits and a drop in downstream wind turbine power generation, which however was rarely considered in the sizing problem in power systems. In this paper, a bi-objective distributionally robust optimization (DRO) model is proposed to determine the capacities of wind power generation and ESSs considering the wake effect. An ambiguity set based on Wasserstein metric is established to characterize the wind power and demand uncertainties. In particular, wind power uncertainty is affected by the wind power generation capacity which is determined in the first stage. Thus, the proposed model is a DRO problem with endogenous uncertainty (or decision-dependent uncertainty). To solve the proposed model, a stochastic programming approximation method based on minimum Lipschitz constants is developed to turn the DRO model into a linear program. Then, an iterative algorithm is built, embedded with methods for evaluating the minimum Lipschitz constants. Case studies demonstrate the necessity of considering wake effect and the effectiveness of the proposed method. △ Less

Submitted 11 June, 2023; v1 submitted 30 December, 2022; originally announced December 2022.

Comments: 14 pages, 6 figures

arXiv:2209.12265 [pdf, other]

Cooperative Sensing and Heterogeneous Information Fusion in VCPS: A Multi-agent Deep Reinforcement Learning Approach

Authors: Xincao Xu, Kai Liu, Penglin Dai, Ruitao Xie, **g**g Cao, Jiangtao Luo

Abstract: Cooperative sensing and heterogeneous information fusion are critical to realize vehicular cyber-physical systems (VCPSs). This paper makes the first attempt to quantitatively measure the quality of VCPS by designing a new metric called Age of View (AoV). Specifically, we first present the system architecture where heterogeneous information can be cooperatively sensed and uploaded via vehicle-to-i… ▽ More Cooperative sensing and heterogeneous information fusion are critical to realize vehicular cyber-physical systems (VCPSs). This paper makes the first attempt to quantitatively measure the quality of VCPS by designing a new metric called Age of View (AoV). Specifically, we first present the system architecture where heterogeneous information can be cooperatively sensed and uploaded via vehicle-to-infrastructure (V2I) communications in vehicular edge computing (VEC). Logical views are constructed by fusing the heterogeneous information at edge nodes. Further, we formulate the problem by deriving a cooperative sensing model based on the multi-class M/G/1 priority queue, and defining the AoV by modeling the timeliness, completeness and consistency of the logical views. On this basis, a multi-agent deep reinforcement learning solution is proposed. In particular, the system state includes vehicle sensed information, edge cached information and view requirements. The vehicle action space consists of the sensing frequencies and uploading priorities of information. A difference-reward-based credit assignment is designed to divide the system reward, which is defined as the VCPS quality, into the difference reward for vehicles. Edge node allocates V2I bandwidth to vehicles based on predicted vehicle trajectories and view requirements. Finally, we build the simulation model and give a comprehensive performance evaluation, which conclusively demonstrates the superiority of the proposed solution. △ Less

Submitted 27 January, 2023; v1 submitted 25 September, 2022; originally announced September 2022.

arXiv:2209.06427 [pdf]

Efficient low-thrust trajectory data generation based on generative adversarial network

Authors: Ruida Xie, Andrew G. Dempster

Abstract: Deep learning-based techniques have been introduced into the field of trajectory optimization in recent years. Deep Neural Networks (DNNs) are trained and used as the surrogates of conventional optimization process. They can provide low thrust (LT) transfer cost estimation and enable more complex preliminary mission designs. However, it is a challenge to efficiently obtain the required amount of t… ▽ More Deep learning-based techniques have been introduced into the field of trajectory optimization in recent years. Deep Neural Networks (DNNs) are trained and used as the surrogates of conventional optimization process. They can provide low thrust (LT) transfer cost estimation and enable more complex preliminary mission designs. However, it is a challenge to efficiently obtain the required amount of trajectory data for training. A Generative Adversarial Network (GAN) is adapted to generate the feasible LT trajectory data efficiently. The GAN consists of a generator and a discriminator, both of which are deep networks. The generator generates fake LT transfer features using random noise as input, while the discriminator distinguishes the generator's fake LT transfer features from real LT transfer features. The GAN is trained until the generator generates fake LT transfers that the discriminator cannot identify. This indicates the generator generates low thrust transfer features that have the same distribution as the real transfer features. The generated low thrust transfer data have a high convergence rate, and they can be used to efficiently produce training data for deep learning models. The proposed approach is validated by generating feasible LT transfers in a Near-Earth Asteroid (NEA) mission scenario. The convergence rate of GAN-generated samples is 84.3%. △ Less

Submitted 14 September, 2022; originally announced September 2022.

Comments: 10 pages, 8 figures

arXiv:2208.02724 [pdf, other]

Disentangled Representation Learning for RF Fingerprint Extraction under Unknown Channel Statistics

Authors: Renjie Xie, Wei Xu, Jiabao Yu, Aiqun Hu, Derrick Wing Kwan Ng, A. Lee Swindlehurst

Abstract: Deep learning (DL) applied to a device's radio-frequency fingerprint~(RFF) has attracted significant attention in physical-layer authentication due to its extraordinary classification performance. Conventional DL-RFF techniques are trained by adopting maximum likelihood estimation~(MLE). Although their discriminability has recently been extended to unknown devices in open-set scenarios, they still… ▽ More Deep learning (DL) applied to a device's radio-frequency fingerprint~(RFF) has attracted significant attention in physical-layer authentication due to its extraordinary classification performance. Conventional DL-RFF techniques are trained by adopting maximum likelihood estimation~(MLE). Although their discriminability has recently been extended to unknown devices in open-set scenarios, they still tend to overfit the channel statistics embedded in the training dataset. This restricts their practical applications as it is challenging to collect sufficient training data capturing the characteristics of all possible wireless channel environments. To address this challenge, we propose a DL framework of disentangled representation~(DR) learning that first learns to factor the signals into a device-relevant component and a device-irrelevant component via adversarial learning. Then, it shuffles these two parts within a dataset for implicit data augmentation, which imposes a strong regularization on RFF extractor learning to avoid the possible overfitting of device-irrelevant channel statistics, without collecting additional data from unknown channels. Experiments validate that the proposed approach, referred to as DR-based RFF, outperforms conventional methods in terms of generalizability to unknown devices even under unknown complicated propagation environments, e.g., dispersive multipath fading channels, even though all the training data are collected in a simple environment with dominated direct line-of-sight~(LoS) propagation paths. △ Less

Submitted 17 October, 2022; v1 submitted 4 August, 2022; originally announced August 2022.

arXiv:2204.11567 [pdf, other]

Deep CSI Compression for Massive MIMO: A Self-information Model-driven Neural Network

Authors: Ziqing Yin, Wei Xu, Renjie Xie, Shaoqing Zhang, Derrick Wing Kwan Ng, Xiaohu You

Abstract: In order to fully exploit the advantages of massive multiple-input multiple-output (mMIMO), it is critical for the transmitter to accurately acquire the channel state information (CSI). Deep learning (DL)-based methods have been proposed for CSI compression and feedback to the transmitter. Although most existing DL-based methods consider the CSI matrix as an image, structural features of the CSI i… ▽ More In order to fully exploit the advantages of massive multiple-input multiple-output (mMIMO), it is critical for the transmitter to accurately acquire the channel state information (CSI). Deep learning (DL)-based methods have been proposed for CSI compression and feedback to the transmitter. Although most existing DL-based methods consider the CSI matrix as an image, structural features of the CSI image are rarely exploited in neural network design. As such, we propose a model of self-information that dynamically measures the amount of information contained in each patch of a CSI image from the perspective of structural features. Then, by applying the self-information model, we propose a model-and-data-driven network for CSI compression and feedback, namely IdasNet. The IdasNet includes the design of a module of self-information deletion and selection (IDAS), an encoder of informative feature compression (IFC), and a decoder of informative feature recovery (IFR). In particular, the model-driven module of IDAS pre-compresses the CSI image by removing informative redundancy in terms of the self-information. The encoder of IFC then conducts feature compression to the pre-compressed CSI image and generates a feature codeword which contains two components, i.e., codeword values and position indices of the codeword values. Subsequently, the IFR decoder decouples the codeword values as well as position indices to recover the CSI image. Experimental results verify that the proposed IdasNet noticeably outperforms existing DL-based networks under various compression ratios while it has the number of network parameters reduced by orders-of-magnitude compared with various existing methods. △ Less

Submitted 25 April, 2022; originally announced April 2022.

arXiv:2204.10055 [pdf, other]

Generative Compression for Face Video: A Hybrid Scheme

Authors: Anni Tang, Yan Huang, Jun Ling, Zhiyu Zhang, Yiwei Zhang, Rong Xie, Li Song

Abstract: As the latest video coding standard, versatile video coding (VVC) has shown its ability in retaining pixel quality. To excavate more compression potential for video conference scenarios under ultra-low bitrate, this paper proposes a bitrate adjustable hybrid compression scheme for face video. This hybrid scheme combines the pixel-level precise recovery capability of traditional coding with the gen… ▽ More As the latest video coding standard, versatile video coding (VVC) has shown its ability in retaining pixel quality. To excavate more compression potential for video conference scenarios under ultra-low bitrate, this paper proposes a bitrate adjustable hybrid compression scheme for face video. This hybrid scheme combines the pixel-level precise recovery capability of traditional coding with the generation capability of deep learning based on abridged information, where Pixel wise Bi-Prediction, Low-Bitrate-FOM and Lossless Keypoint Encoder collaborate to achieve PSNR up to 36.23 dB at a low bitrate of 1.47 KB/s. Without introducing any additional bitrate, our method has a clear advantage over VVC under a completely fair comparative experiment, which proves the effectiveness of our proposed scheme. Moreover, our scheme can adapt to any existing encoder / configuration to deal with different encoding requirements, and the bitrate can be dynamically adjusted according to the network condition. △ Less

Submitted 20 March, 2023; v1 submitted 21 April, 2022; originally announced April 2022.

arXiv:2003.11397 [pdf]

A Novel Wide-Area Control Strategy for Dam** of Critical Frequency Oscillations via Modulation of Active Power Injections

Authors: Ruichao Xie, Innocent Kamwa, C. Y. Chung

Abstract: This paper proposes a novel wide-area control strategy for modulating the active power injections to damp the critical frequency oscillations in power systems, this includes the inter-area oscillations and the transient frequency swing. The proposed method pursues an efficient utilization of the limited power reserve of existing distributed energy resources (DERs) to mitigate these oscillations. T… ▽ More This paper proposes a novel wide-area control strategy for modulating the active power injections to damp the critical frequency oscillations in power systems, this includes the inter-area oscillations and the transient frequency swing. The proposed method pursues an efficient utilization of the limited power reserve of existing distributed energy resources (DERs) to mitigate these oscillations. This is accomplished by decoupling the dam** control actions at different sites using the oscillation signals of the concerned mode as the power commands. A theoretical basis for this decoupled modulating control is provided. Technically, the desired sole modal oscillation signals are filtered out by linearly combining the system-wide frequencies, which is determined by the linear quadratic regulator based sparsity-promoting (LQRSP) technique. With the proposed strategy, the modulation of each active power injection can be effectively engineered considering the response limit and steady-state output capability of the supporting device. The method is validated based on a two-area test system and is further demonstrated based on the New England 39-bus test system. △ Less

Submitted 25 March, 2020; originally announced March 2020.

arXiv:1908.00410 [pdf, other]

Pathological Myopic Image Analysis with Transfer Learning

Authors: Ruitao Xie, Libo Liu, **gxin Liu, Connor S Qiu

Abstract: We present a summary of transfer learning based methods for several challenging myopic fundus image analysis tasks including classification of pathological and non-pathological myopia,localisation of fovea,and segmentation of optic disc.By adapting existing popular deep learning architectures,our proposed methods have achieved 1st and 2nd place in several tasks at the Pathologic Myopia Challenge h… ▽ More We present a summary of transfer learning based methods for several challenging myopic fundus image analysis tasks including classification of pathological and non-pathological myopia,localisation of fovea,and segmentation of optic disc.By adapting existing popular deep learning architectures,our proposed methods have achieved 1st and 2nd place in several tasks at the Pathologic Myopia Challenge held at ISBI2019. △ Less

Submitted 31 July, 2019; originally announced August 2019.

Comments: MIDL 2019 [arXiv:1907.08612]

Report number: MIDL/2019/ExtendedAbstract/BkeLp6mTFE

arXiv:1804.07677 [pdf, other]

Learning an Inverse Tone Map** Network with a Generative Adversarial Regularizer

Authors: Shiyu Ning, Hongteng Xu, Li Song, Rong Xie, Wenjun Zhang

Abstract: Transferring a low-dynamic-range (LDR) image to a high-dynamic-range (HDR) image, which is the so-called inverse tone map** (iTM), is an important imaging technique to improve visual effects of imaging devices. In this paper, we propose a novel deep learning-based iTM method, which learns an inverse tone map** network with a generative adversarial regularizer. In the framework of alternating o… ▽ More Transferring a low-dynamic-range (LDR) image to a high-dynamic-range (HDR) image, which is the so-called inverse tone map** (iTM), is an important imaging technique to improve visual effects of imaging devices. In this paper, we propose a novel deep learning-based iTM method, which learns an inverse tone map** network with a generative adversarial regularizer. In the framework of alternating optimization, we learn a U-Net-based HDR image generator to transfer input LDR images to HDR ones, and a simple CNN-based discriminator to classify the real HDR images and the generated ones. Specifically, when learning the generator we consider the content-related loss and the generative adversarial regularizer jointly to improve the stability and the robustness of the generated HDR images. Using the learned generator as the proposed inverse tone map** network, we achieve superior iTM results to the state-of-the-art methods consistently. △ Less

Submitted 20 April, 2018; originally announced April 2018.

Showing 1–18 of 18 results for author: Xie, R