Search | arXiv e-print repository

Physics-Informed AI Inverter

Authors: Qing Shen, Yifan Zhou, Peng Zhang, Yacov A. Shamash, Xiaochuan Luo, Bin Wang, Huanfeng Zhao, Roshan Sharma, Bo Chen

Abstract: This letter devises an AI-Inverter that pilots the use of a physics-informed neural network (PINN) to enable AI-based electromagnetic transient simulations (EMT) of grid-forming inverters. The contributions are threefold: (1) A PINN-enabled AI-Inverter is formulated; (2) An enhanced learning strategy, balanced-adaptive PINN, is devised; (3) extensive validations and comparative analysis of the acc… ▽ More This letter devises an AI-Inverter that pilots the use of a physics-informed neural network (PINN) to enable AI-based electromagnetic transient simulations (EMT) of grid-forming inverters. The contributions are threefold: (1) A PINN-enabled AI-Inverter is formulated; (2) An enhanced learning strategy, balanced-adaptive PINN, is devised; (3) extensive validations and comparative analysis of the accuracy and efficiency of AI-Inverter are made to show its superiority over the classical electromagnetic transient programs (EMTP). △ Less

Submitted 1 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

arXiv:2405.09470 [pdf, other]

Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer

Authors: Weifei **, Yuxin Cao, Junjie Su, Qi Shen, Kai Ye, Derui Wang, Jie Hao, Ziyao Liu

Abstract: In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of… ▽ More In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of malicious commands. These attack methods mostly require adding noise perturbations under $\ell_p$ norm constraints, inevitably leaving behind artifacts of manual modifications. Recent research has alleviated this limitation by manipulating style vectors to synthesize adversarial examples based on Text-to-Speech (TTS) synthesis audio. However, style modifications based on optimization objectives significantly reduce the controllability and editability of audio styles. In this paper, we propose an attack on ASR systems based on user-customized style transfer. We first test the effect of Style Transfer Attack (STA) which combines style transfer and adversarial attack in sequential order. And then, as an improvement, we propose an iterative Style Code Attack (SCA) to maintain audio quality. Experimental results show that our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks, while kee** sound naturalness due to our user study. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: Accepted to SecTL (AsiaCCS Workshop) 2024

arXiv:2401.05431 [pdf, other]

TRLS: A Time Series Representation Learning Framework via Spectrogram for Medical Signal Processing

Authors: Luyuan Xie, Cong Li, Xin Zhang, Shengfang Zhai, Yuejian Fang, Qingni Shen, Zhonghai Wu

Abstract: Representation learning frameworks in unlabeled time series have been proposed for medical signal processing. Despite the numerous excellent progresses have been made in previous works, we observe the representation extracted for the time series still does not generalize well. In this paper, we present a Time series (medical signal) Representation Learning framework via Spectrogram (TRLS) to get m… ▽ More Representation learning frameworks in unlabeled time series have been proposed for medical signal processing. Despite the numerous excellent progresses have been made in previous works, we observe the representation extracted for the time series still does not generalize well. In this paper, we present a Time series (medical signal) Representation Learning framework via Spectrogram (TRLS) to get more informative representations. We transform the input time-domain medical signals into spectrograms and design a time-frequency encoder named Time Frequency RNN (TFRNN) to capture more robust multi-scale representations from the augmented spectrograms. Our TRLS takes spectrogram as input with two types of different data augmentations and maximizes the similarity between positive ones, which effectively circumvents the problem of designing negative samples. Our evaluation of four real-world medical signal datasets focusing on medical signal classification shows that TRLS is superior to the existing frameworks. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: This paper is accept by ICASSP 2024. This is a more detailed version

arXiv:2309.16950 [pdf]

doi 10.1109/ACCESS.2024.3415478

Scalable Neural Dynamic Equivalence for Power Systems

Authors: Qing Shen, Yifan Zhou, Huanfeng Zhao, Peng Zhang, Qiang Zhang, Slava Maslenniko, Xiaochuan Luo

Abstract: Traditional grid analytics are model-based, relying strongly on accurate models of power systems, especially the dynamic models of generators, controllers, loads and other dynamic components. However, acquiring thorough power system models can be impractical in real operation due to inaccessible system parameters and privacy of consumers, which necessitate data-driven dynamic equivalencing of unkn… ▽ More Traditional grid analytics are model-based, relying strongly on accurate models of power systems, especially the dynamic models of generators, controllers, loads and other dynamic components. However, acquiring thorough power system models can be impractical in real operation due to inaccessible system parameters and privacy of consumers, which necessitate data-driven dynamic equivalencing of unknown subsystems. Learning reliable dynamic equivalent models for the external systems from SCADA and PMU data, however, is a long-standing intractable problem in power system analysis due to complicated nonlinearity and unforeseeable dynamic modes of power systems. This paper advances a practical application of neural dynamic equivalence (NeuDyE) called Driving Port NeuDyE (DP-NeuDyE), which exploits physics-informed machine learning and neural-ordinary-differential-equations (ODE-NET) to discover a dynamic equivalence of external power grids while preserving its dynamic behaviors after disturbances. The new contributions are threefold: A NeuDyE formulation to enable a continuous-time, data-driven dynamic equivalence of power systems, saving the effort and expense of acquiring inaccessible system; An introduction of a Physics-Informed NeuDyE learning (PI-NeuDyE) to actively control the closed-loop accuracy of NeuDyE; and A DP-NeuDyE to reduce the number of inputs required for the training. We conduct extensive case studies on the NPCC system to validate the generalizability and accuracy of both PI-NeuDyE and DP-NeuDyE, which span a multitude of scenarios, differing in the time required for fault clearance, the specific fault locations, and the limitations of data. Test results have demonstrated the scalability and practicality of NeuDyE, showing its potential to be used in ISO and utility control centers for online transient stability analysis and for planning purposes. △ Less

Submitted 21 March, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

Journal ref: in IEEE Access, vol. 12, pp. 86513-86522, 2024,

arXiv:2309.16943 [pdf, other]

Physics-Informed Induction Machine Modelling

Authors: Qing Shen, Yifan Zhou, Peng Zhang

Abstract: This rapid communication devises a Neural Induction Machine (NeuIM) model, which pilots the use of physics-informed machine learning to enable AI-based electromagnetic transient simulations. The contributions are threefold: (1) a formation of NeuIM to represent the induction machine in phase domain; (2) a physics-informed neural network capable of capturing fast and slow IM dynamics even in the ab… ▽ More This rapid communication devises a Neural Induction Machine (NeuIM) model, which pilots the use of physics-informed machine learning to enable AI-based electromagnetic transient simulations. The contributions are threefold: (1) a formation of NeuIM to represent the induction machine in phase domain; (2) a physics-informed neural network capable of capturing fast and slow IM dynamics even in the absence of data; and (3) a data-physics-integrated hybrid NeuIM approach which is adaptive to various levels of data availability. Extensive case studies validate the efficacy of NeuIM and in particular, its advantage over purely data-driven approaches. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.16934 [pdf, other]

Physics-Aware Neural Dynamic Equivalence of Power Systems

Authors: Qing Shen, Yifan Zhou, Qiang Zhang, Slava Maslennikov, Xiaochuan Luo, Peng Zhang

Abstract: This letter devises Neural Dynamic Equivalence (NeuDyE), which explores physics-aware machine learning and neural-ordinary-differential-equations (ODE-Net) to discover a dynamic equivalence of external power grids while preserving its dynamic behaviors after disturbances. The contributions are threefold: (1) an ODE-Net-enabled NeuDyE formulation to enable a continuous-time, data-driven dynamic equ… ▽ More This letter devises Neural Dynamic Equivalence (NeuDyE), which explores physics-aware machine learning and neural-ordinary-differential-equations (ODE-Net) to discover a dynamic equivalence of external power grids while preserving its dynamic behaviors after disturbances. The contributions are threefold: (1) an ODE-Net-enabled NeuDyE formulation to enable a continuous-time, data-driven dynamic equivalence of power systems; (2) a physics-informed NeuDyE learning method (PI-NeuDyE) to actively control the closed-loop accuracy of NeuDyE without an additional verification module; (3) a physics-guided NeuDyE (PG-NeuDyE) to enhance the method's applicability even in the absence of analytical physics models. Extensive case studies in the NPCC system validate the efficacy of NeuDyE, and, in particular, its capability under various contingencies. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2306.14119 [pdf, other]

SHISRCNet: Super-resolution And Classification Network For Low-resolution Breast Cancer Histopathology Image

Authors: Luyuan Xie, Cong Li, Zirui Wang, Xin Zhang, Boyan Chen, Qingni Shen, Zhonghai Wu

Abstract: The rapid identification and accurate diagnosis of breast cancer, known as the killer of women, have become greatly significant for those patients. Numerous breast cancer histopathological image classification methods have been proposed. But they still suffer from two problems. (1) These methods can only hand high-resolution (HR) images. However, the low-resolution (LR) images are often collected… ▽ More The rapid identification and accurate diagnosis of breast cancer, known as the killer of women, have become greatly significant for those patients. Numerous breast cancer histopathological image classification methods have been proposed. But they still suffer from two problems. (1) These methods can only hand high-resolution (HR) images. However, the low-resolution (LR) images are often collected by the digital slide scanner with limited hardware conditions. Compared with HR images, LR images often lose some key features like texture, which deeply affects the accuracy of diagnosis. (2) The existing methods have fixed receptive fields, so they can not extract and fuse multi-scale features well for images with different magnification factors. To fill these gaps, we present a \textbf{S}ingle \textbf{H}istopathological \textbf{I}mage \textbf{S}uper-\textbf{R}esolution \textbf{C}lassification network (SHISRCNet), which consists of two modules: Super-Resolution (SR) and Classification (CF) modules. SR module reconstructs LR images into SR ones. CF module extracts and fuses the multi-scale features of SR images for classification. In the training stage, we introduce HR images into the CF module to enhance SHISRCNet's performance. Finally, through the joint training of these two modules, super-resolution and classified of LR images are integrated into our model. The experimental results demonstrate that the effects of our method are close to the SOTA methods with taking HR images as inputs. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: Accepted by MICCAI 2023

arXiv:2303.00369 [pdf, other]

Indescribable Multi-modal Spatial Evaluator

Authors: Lingke Kong, X. Sharon Qi, Qi** Shen, Jiacheng Wang, **gyi Zhang, Yanle Hu, Qichao Zhou

Abstract: Multi-modal image registration spatially aligns two images with different distributions. One of its major challenges is that images acquired from different imaging machines have different imaging distributions, making it difficult to focus only on the spatial aspect of the images and ignore differences in distributions. In this study, we developed a self-supervised approach, Indescribable Multi-mo… ▽ More Multi-modal image registration spatially aligns two images with different distributions. One of its major challenges is that images acquired from different imaging machines have different imaging distributions, making it difficult to focus only on the spatial aspect of the images and ignore differences in distributions. In this study, we developed a self-supervised approach, Indescribable Multi-model Spatial Evaluator (IMSE), to address multi-modal image registration. IMSE creates an accurate multi-modal spatial evaluator to measure spatial differences between two images, and then optimizes registration by minimizing the error predicted of the evaluator. To optimize IMSE performance, we also proposed a new style enhancement method called Shuffle Remap which randomizes the image distribution into multiple segments, and then randomly disorders and remaps these segments, so that the distribution of the original image is changed. Shuffle Remap can help IMSE to predict the difference in spatial location from unseen target distributions. Our results show that IMSE outperformed the existing methods for registration using T1-T2 and CT-MRI datasets. IMSE also can be easily integrated into the traditional registration process, and can provide a convenient way to evaluate and visualize registration results. IMSE also has the potential to be used as a new paradigm for image-to-image translation. Our code is available at https://github.com/Kid-Liet/IMSE. △ Less

Submitted 1 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR2023

arXiv:2208.03231 [pdf]

doi 10.5121/csit.2022.121304

Phase Difference based Doppler Disambiguation Method for TDM-MIMOFMCW Radars

Authors: Qingshan Shen, Qingbo Wang

Abstract: State-of-the-art automotive radar sensors use a Mutliple-Input Mutiple-Output (MIMO) approach to obtain a better angular resolution. Time-Division Multiplexing (TDM) scheme is commonly applied to realize the orthogonality in time at the transmitter. Apart from its simplicity in implementation, TDM scheme has the drawback of a reduced maximum unambiguous Doppler proportional to the number of transm… ▽ More State-of-the-art automotive radar sensors use a Mutliple-Input Mutiple-Output (MIMO) approach to obtain a better angular resolution. Time-Division Multiplexing (TDM) scheme is commonly applied to realize the orthogonality in time at the transmitter. Apart from its simplicity in implementation, TDM scheme has the drawback of a reduced maximum unambiguous Doppler proportional to the number of transmitters. In this paper, a phase difference based Doppler disambiguation method is proposed to regain the maximum unambiguous Doppler which is equivalent to only one transmitter. This method works well when the number of transmitters is large. The proposed method is demonstrated with simulation and measurement data. △ Less

Submitted 5 August, 2022; originally announced August 2022.

Comments: 9 pages, 10 figures, conference

arXiv:2205.12933 [pdf, other]

Boosting Tail Neural Network for Realtime Custom Keyword Spotting

Authors: Sihao Xue, Qianyao Shen, Guoqing Li

Abstract: In this paper, we propose a Boosting Tail Neural Network (BTNN) for improving the performance of Realtime Custom Keyword Spotting (RCKS) that is still an industrial challenge for demanding powerful classification ability with limited computation resources. Inspired by Brain Science that a brain is only partly activated for a nerve simulation and numerous machine learning algorithms are developed t… ▽ More In this paper, we propose a Boosting Tail Neural Network (BTNN) for improving the performance of Realtime Custom Keyword Spotting (RCKS) that is still an industrial challenge for demanding powerful classification ability with limited computation resources. Inspired by Brain Science that a brain is only partly activated for a nerve simulation and numerous machine learning algorithms are developed to use a batch of weak classifiers to resolve arduous problems, which are often proved to be effective. We show that this method is helpful to the RCKS problem. The proposed approach achieve better performances in terms of wakeup rate and false alarm. In our experiments compared with those traditional algorithms that use only one strong classifier, it gets 18\% relative improvement. We also point out that this approach may be promising in future ASR exploration. △ Less

Submitted 7 June, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: 4 pages, 8 figures, 2 tables

arXiv:2201.01034 [pdf, other]

doi 10.1109/TPAMI.2024.3378704

Uncovering the Over-smoothing Challenge in Image Super-Resolution: Entropy-based Quantification and Contrastive Optimization

Authors: Tianshuo Xu, Lijiang Li, Peng Mi, Xiawu Zheng, Fei Chao, Rongrong Ji, Yonghong Tian, Qiang Shen

Abstract: PSNR-oriented models are a critical class of super-resolution models with applications across various fields. However, these models tend to generate over-smoothed images, a problem that has been analyzed previously from the perspectives of models or loss functions, but without taking into account the impact of data properties. In this paper, we present a novel phenomenon that we term the center-or… ▽ More PSNR-oriented models are a critical class of super-resolution models with applications across various fields. However, these models tend to generate over-smoothed images, a problem that has been analyzed previously from the perspectives of models or loss functions, but without taking into account the impact of data properties. In this paper, we present a novel phenomenon that we term the center-oriented optimization (COO) problem, where a model's output converges towards the center point of similar high-resolution images, rather than towards the ground truth. We demonstrate that the strength of this problem is related to the uncertainty of data, which we quantify using entropy. We prove that as the entropy of high-resolution images increases, their center point will move further away from the clean image distribution, and the model will generate over-smoothed images. Implicitly optimizing the COO problem, perceptual-driven approaches such as perceptual loss, model structure optimization, or GAN-based methods can be viewed. We propose an explicit solution to the COO problem, called Detail Enhanced Contrastive Loss (DECLoss). DECLoss utilizes the clustering property of contrastive learning to directly reduce the variance of the potential high-resolution distribution and thereby decrease the entropy. We evaluate DECLoss on multiple super-resolution benchmarks and demonstrate that it improves the perceptual quality of PSNR-oriented models. Moreover, when applied to GAN-based methods, such as RaGAN, DECLoss helps to achieve state-of-the-art performance, such as 0.093 LPIPS with 24.51 PSNR on 4x downsampled Urban100, validating the effectiveness and generalization of our approach. △ Less

Submitted 15 March, 2024; v1 submitted 4 January, 2022; originally announced January 2022.

Comments: Accepted in IEEE Transactions on Pattern Analysis and Machine Intelligence

arXiv:2107.03065 [pdf, other]

Msdtron: a high-capability multi-speaker speech synthesis system for diverse data using characteristic information

Authors: Qinghua Wu, Quanbo Shen, Jian Luan, YuJun Wang

Abstract: In multi-speaker speech synthesis, data from a number of speakers usually tend to have great diversity due to the fact that the speakers may differ largely in ages, speaking styles, emotions, and so on. It is important but challenging to improve the modeling capabilities for multi-speaker speech synthesis. To address the issue, this paper proposes a high-capability speech synthesis system, called… ▽ More In multi-speaker speech synthesis, data from a number of speakers usually tend to have great diversity due to the fact that the speakers may differ largely in ages, speaking styles, emotions, and so on. It is important but challenging to improve the modeling capabilities for multi-speaker speech synthesis. To address the issue, this paper proposes a high-capability speech synthesis system, called Msdtron, in which 1) a representation of the harmonic structure of speech, called excitation spectrogram, is designed to directly guide the learning of harmonics in mel-spectrogram. 2) conditional gated LSTM (CGLSTM) is proposed to control the flow of text content information through the network by re-weighting the gates of LSTM using speaker information. The experiments show a significant reduction in reconstruction error of mel-spectrogram in the training of the multi-speaker model, and a great improvement is observed in the subjective evaluation of speaker adapted model. △ Less

Submitted 11 February, 2022; v1 submitted 7 July, 2021; originally announced July 2021.

Comments: Accepted by ICASSP-2022

arXiv:2101.03389 [pdf, ps, other]

Equalized Recovery State Estimators for Linear Systems with Delayed and Missing Observations

Authors: Syed M. Hassaan, Qiang Shen, Sze Zheng Yong

Abstract: This paper presents a dynamic state observer design for discrete-time linear time-varying systems that robustly achieves equalized recovery despite delayed or missing observations, where the set of all temporal patterns for the missing or delayed data is modeled by a finite-length language. By introducing a map** of the language onto a reduced event-based language, we design a state estimator th… ▽ More This paper presents a dynamic state observer design for discrete-time linear time-varying systems that robustly achieves equalized recovery despite delayed or missing observations, where the set of all temporal patterns for the missing or delayed data is modeled by a finite-length language. By introducing a map** of the language onto a reduced event-based language, we design a state estimator that adapts based on the history of available data at each step, and satisfies equalized recovery for all patterns in the reduced language. In contrast to existing equalized recovery estimators, the proposed design considers the equalized recovery level as a decision variable, which enables us to directly obtain the global minimum for the intermediate recovery level, resulting in improved estimation performance. Finally, we demonstrate the effectiveness of the proposed observer when compared to existing approaches using several illustrative examples. △ Less

Submitted 9 January, 2021; originally announced January 2021.

Comments: Submitted to L-CSS 2021 with presentation in ACC2021 as an option

arXiv:2004.06664 [pdf, ps, other]

Direction Finding of Electromagnetic Sources on a Sparse Cross-Dipole Array Using One-Bit Measurements

Authors: Zhiyong Cheng, Shengyao Chen, Qibin Shen, ** He, Zhong liu

Abstract: Sparse array arrangement has been widely used in vector-sensor arrays because of increased degree-of-freedoms for identifying more sources than sensors. For large-size sparse vector-sensor arrays, one-bit measurements can further reduce the receiver system complexity by using low-resolution ADCs. In this paper, we present a sparse cross-dipole array with one-bit measurements to estimate Direction… ▽ More Sparse array arrangement has been widely used in vector-sensor arrays because of increased degree-of-freedoms for identifying more sources than sensors. For large-size sparse vector-sensor arrays, one-bit measurements can further reduce the receiver system complexity by using low-resolution ADCs. In this paper, we present a sparse cross-dipole array with one-bit measurements to estimate Direction of Arrivals (DOA) of electromagnetic sources. Based on the independence assumption of sources, we establish the relation between the covariance matrix of one-bit measurements and that of unquantized measurements by Bussgang Theorem. Then we develop a Spatial-Smooth MUSIC (SS-MUSIC) based method, One-Bit MUSIC (OB-MUSIC), to estimate the DOAs. By jointly utilizing the covariance matrices of two dipole arrays, we find that OB-MUSIC is robust against polarization states. We also derive the Cramer-Rao bound (CRB) of DOA estimation for the proposed scheme. Furthermore, we theoretically analyze the applicability of the independence assumption of sources, which is the fundamental of the proposed and other typical methods, and verify the assumption in typical communication applications. Numerical results show that, with the same number of sensors, one-bit sparse cross-dipole arrays have comparable performance with unquantized uniform linear arrays and thus provide a compromise between the DOA estimation performance and the system complexity. △ Less

Submitted 14 April, 2020; originally announced April 2020.

arXiv:2004.01858 [pdf, other]

Tractable Compositions of Discrete-Time Control Barrier Functions with Application to Lane Kee** and Obstacle Avoidance

Authors: Matthew Cavorsi, Mohammad Khajenejad, Ruochen Niu, Qiang Shen, Sze Zheng Yong

Abstract: This paper introduces control barrier functions for discrete-time systems, which can be shown to be necessary and sufficient for controlled invariance of a given set. Moreover, we propose nonlinear discrete-time control barrier functions for partially control affine systems that lead to controlled invariance conditions that are affine in the control input, leading to a tractable formulation that e… ▽ More This paper introduces control barrier functions for discrete-time systems, which can be shown to be necessary and sufficient for controlled invariance of a given set. Moreover, we propose nonlinear discrete-time control barrier functions for partially control affine systems that lead to controlled invariance conditions that are affine in the control input, leading to a tractable formulation that enables us to handle the safety optimal control problem for a broader range of applications with more complicated safety conditions than existing approaches. In addition, we develop mixed-integer formulations for basic and secondary Boolean compositions of multiple control barrier functions and further provide mixed-integer constraints for piecewise control barrier functions. Finally, we apply these discrete-time control barrier function tools to automotive safety problems of lane kee** and obstacle avoidance, which are shown to be effective in simulation. △ Less

Submitted 4 April, 2020; originally announced April 2020.

Comments: submitted to CDC2020

arXiv:2004.01408 [pdf, ps, other]

Incremental Affine Abstraction of Nonlinear Systems

Authors: Syed M. Hassaan, Mohammad Khajenejad, Spencer Jensen, Qiang Shen, Sze Zheng Yong

Abstract: In this paper, we propose an incremental abstraction method for dynamically over-approximating nonlinear systems in a bounded domain by solving a sequence of linear programs, resulting in a sequence of affine upper and lower hyperplanes with expanding operating regions. Although the affine abstraction problem can be solved offline using a single linear program, existing approaches suffer from a co… ▽ More In this paper, we propose an incremental abstraction method for dynamically over-approximating nonlinear systems in a bounded domain by solving a sequence of linear programs, resulting in a sequence of affine upper and lower hyperplanes with expanding operating regions. Although the affine abstraction problem can be solved offline using a single linear program, existing approaches suffer from a computation space complexity that grows exponentially with the state dimension. Hence, the motivation for incremental abstraction is to reduce the space complexity for high-dimensional systems, but at the cost of yielding potentially worse abstractions/overapproximations. Specifically, we start with an operating region that is a subregion of the state space and compute two affine hyperplanes that bracket the nonlinear function locally. Then, by incrementally expanding the operating region, we dynamically update the two affine hyperplanes such that we eventually yield hyperplanes that are guaranteed to over-approximate the nonlinear system over the entire domain. Finally, the effectiveness of the proposed approach is demonstrated using numerical examples of high-dimensional nonlinear systems. △ Less

Submitted 3 April, 2020; originally announced April 2020.

Comments: Submitted to L-CSS 2020 with presentation in CDC2020 as an option

arXiv:1910.06244 [pdf, other]

doi 10.1109/TIP.2021.3058615

Neural Image Compression via Non-Local Attention Optimization and Improved Context Modeling

Authors: Tong Chen, Haojie Liu, Zhan Ma, Qiu Shen, Xun Cao, Yao Wang

Abstract: This paper proposes a novel Non-Local Attention optmization and Improved Context modeling-based image compression (NLAIC) algorithm, which is built on top of the deep nerual network (DNN)-based variational auto-encoder (VAE) structure. Our NLAIC 1) embeds non-local network operations as non-linear transforms in the encoders and decoders for both the image and the latent representation probability… ▽ More This paper proposes a novel Non-Local Attention optmization and Improved Context modeling-based image compression (NLAIC) algorithm, which is built on top of the deep nerual network (DNN)-based variational auto-encoder (VAE) structure. Our NLAIC 1) embeds non-local network operations as non-linear transforms in the encoders and decoders for both the image and the latent representation probability information (known as hyperprior) to capture both local and global correlations, 2) applies attention mechanism to generate masks that are used to weigh the features, which implicitly adapt bit allocation for feature elements based on their importance, and 3) implements the improved conditional entropy modeling of latent features using joint 3D convolutional neural network (CNN)-based autoregressive contexts and hyperpriors. Towards the practical application, additional enhancements are also introduced to speed up processing (e.g., parallel 3D CNN-based context prediction), reduce memory consumption (e.g., sparse non-local processing) and alleviate the implementation complexity (e.g., unified model for variable rates without re-training). The proposed model outperforms existing methods on Kodak and CLIC datasets with the state-of-the-art compression efficiency reported, including learned and conventional (e.g., BPG, JPEG2000, JPEG) image compression methods, for both PSNR and MS-SSIM distortion metrics. △ Less

Submitted 11 October, 2019; originally announced October 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1904.09757

Journal ref: IEEE Transactions on Image Processing, vol. 30, pp. 3179-3191, 2021

arXiv:1909.12037 [pdf, other]

doi 10.1109/TCSVT.2021.3051377

Learned Point Cloud Geometry Compression

Authors: Jianqiang Wang, Hao Zhu, Zhan Ma, Tong Chen, Haojie Liu, Qiu Shen

Abstract: This paper presents a novel end-to-end Learned Point Cloud Geometry Compression (a.k.a., Learned-PCGC) framework, to efficiently compress the point cloud geometry (PCG) using deep neural networks (DNN) based variational autoencoders (VAE). In our approach, PCG is first voxelized, scaled and partitioned into non-overlapped 3D cubes, which is then fed into stacked 3D convolutions for compact latent… ▽ More This paper presents a novel end-to-end Learned Point Cloud Geometry Compression (a.k.a., Learned-PCGC) framework, to efficiently compress the point cloud geometry (PCG) using deep neural networks (DNN) based variational autoencoders (VAE). In our approach, PCG is first voxelized, scaled and partitioned into non-overlapped 3D cubes, which is then fed into stacked 3D convolutions for compact latent feature and hyperprior generation. Hyperpriors are used to improve the conditional probability modeling of latent features. A weighted binary cross-entropy (WBCE) loss is applied in training while an adaptive thresholding is used in inference to remove unnecessary voxels and reduce the distortion. Objectively, our method exceeds the geometry-based point cloud compression (G-PCC) algorithm standardized by well-known Moving Picture Experts Group (MPEG) with a significant performance margin, e.g., at least 60% BD-Rate (Bjontegaard Delta Rate) gains, using common test datasets. Subjectively, our method has presented better visual quality with smoother surface reconstruction and appealing details, in comparison to all existing MPEG standard compliant PCC methods. Our method requires about 2.5MB parameters in total, which is a fairly small size for practical implementation, even on embedded platform. Additional ablation studies analyze a variety of aspects (e.g., cube size, kernels, etc) to explore the application potentials of our learned-PCGC. △ Less

Submitted 26 September, 2019; originally announced September 2019.

Comments: 13 pages

arXiv:1907.06751 [pdf, other]

Development of a General Momentum Exchange Devices Fault Model for Spacecraft Fault-Tolerant Control System Design

Authors: Chengfei Yue, Qiang Shen, Xibin Cao, Feng Wang, Cher Hiang Goh, Tong Heng Lee

Abstract: This paper investigates the mechanism of various faults of momentum exchange devices. These devices are modeled as a cascade electric motor EM - variable speed drive VSD system. Considering the mechanical part of the EM and the VSD system, the potential faults are reviewed and summarized. Thus with a clear understanding of these potential faults, a general fault model in a cascade multiplicative s… ▽ More This paper investigates the mechanism of various faults of momentum exchange devices. These devices are modeled as a cascade electric motor EM - variable speed drive VSD system. Considering the mechanical part of the EM and the VSD system, the potential faults are reviewed and summarized. Thus with a clear understanding of these potential faults, a general fault model in a cascade multiplicative structure is established for momentum exchange devices. Based on this general model, various fault scenarios can be simulated, and the possible output can be appropriately visualized. In this paper, six types of working condition are identified and the corresponding fault models are constructed. Using this fault model, the control responses using reaction wheels and single gimbal control moment gyros under various fault conditions are demonstrated. The simulation results show the severities of the faults and demonstrate that the additive fault is more serious than the multiplicative fault from the viewpoint of control accuracy. Finally, existing fault-tolerant control strategies are brief summarized and potential approaches including both passive and active ones to accommodate gimbal fault of single gimbal control moment gyro is demonstrated. △ Less

Submitted 27 July, 2019; v1 submitted 15 July, 2019; originally announced July 2019.

arXiv:1905.04711 [pdf]

doi 10.1038/s41524-020-00392-6

Data augmentation in microscopic images for material data mining

Authors: Boyuan Ma, Xiaoyan Wei, Chuni Liu, Xiaojuan Ban, Haiyou Huang, Hao Wang, Weihua Xue, Stephen Wu, Mingfei Gao, Qing Shen, Adnan Omer Abuassba, Haokai Shen, Yan**g Su

Abstract: Recent progress in material data mining has been driven by high-capacity models trained on large datasets. However, collecting experimental data (real data) has been extremely costly since the amount of human effort and expertise required. Here, we develop a novel transfer learning strategy to address small or insufficient data problem. This strategy realizes the fusion of real and simulated data,… ▽ More Recent progress in material data mining has been driven by high-capacity models trained on large datasets. However, collecting experimental data (real data) has been extremely costly since the amount of human effort and expertise required. Here, we develop a novel transfer learning strategy to address small or insufficient data problem. This strategy realizes the fusion of real and simulated data, and the augmentation of training data in data mining procedure. For a specific task of image segmentation, this strategy can generate synthetic images by fusing physical mechanism of simulated images and "image style" of real images. The result shows that the model trained with the acquired synthetic images and 35% of the real images outperforms the model trained on all real images. As the time required to generate synthetic data is almost negligible, this strategy is able to reduce the time cost of real data preparation by roughly 65%. △ Less

Submitted 28 October, 2019; v1 submitted 12 May, 2019; originally announced May 2019.

Comments: 17 pages, technical report

Journal ref: npj computational materials 2020

arXiv:1905.01025 [pdf, other]

Learned Quality Enhancement via Multi-Frame Priors for HEVC Compliant Low-Delay Applications

Authors: Ming Lu, Ming Cheng, Yiling Xu, Shiliang Pu, Qiu Shen, Zhan Ma

Abstract: Networked video applications, e.g., video conferencing, often suffer from poor visual quality due to unexpected network fluctuation and limited bandwidth. In this paper, we have developed a Quality Enhancement Network (QENet) to reduce the video compression artifacts, leveraging the spatial and temporal priors generated by respective multi-scale convolutions spatially and warped temporal predictio… ▽ More Networked video applications, e.g., video conferencing, often suffer from poor visual quality due to unexpected network fluctuation and limited bandwidth. In this paper, we have developed a Quality Enhancement Network (QENet) to reduce the video compression artifacts, leveraging the spatial and temporal priors generated by respective multi-scale convolutions spatially and warped temporal predictions in a recurrent fashion temporally. We have integrated this QENet as a standard-alone post-processing subsystem to the High Efficiency Video Coding (HEVC) compliant decoder. Experimental results show that our QENet demonstrates the state-of-the-art performance against default in-loop filters in HEVC and other deep learning based methods with noticeable objective gains in Peak-Signal-to-Noise Ratio (PSNR) and subjective gains visually. △ Less

Submitted 2 May, 2019; originally announced May 2019.

arXiv:1904.09757 [pdf, other]

doi 10.1109/TIP.2021.3058615

Non-local Attention Optimized Deep Image Compression

Authors: Haojie Liu, Tong Chen, Peiyao Guo, Qiu Shen, Xun Cao, Yao Wang, Zhan Ma

Abstract: This paper proposes a novel Non-Local Attention Optimized Deep Image Compression (NLAIC) framework, which is built on top of the popular variational auto-encoder (VAE) structure. Our NLAIC framework embeds non-local operations in the encoders and decoders for both image and latent feature probability information (known as hyperprior) to capture both local and global correlations, and apply attenti… ▽ More This paper proposes a novel Non-Local Attention Optimized Deep Image Compression (NLAIC) framework, which is built on top of the popular variational auto-encoder (VAE) structure. Our NLAIC framework embeds non-local operations in the encoders and decoders for both image and latent feature probability information (known as hyperprior) to capture both local and global correlations, and apply attention mechanism to generate masks that are used to weigh the features for the image and hyperprior, which implicitly adapt bit allocation for different features based on their importance. Furthermore, both hyperpriors and spatial-channel neighbors of the latent features are used to improve entropy coding. The proposed model outperforms the existing methods on Kodak dataset, including learned (e.g., Balle2019, Balle2018) and conventional (e.g., BPG, JPEG2000, JPEG) image compression methods, for both PSNR and MS-SSIM distortion metrics. △ Less

Submitted 22 April, 2019; originally announced April 2019.

Journal ref: IEEE Transactions on Image Processing, vol. 30, pp. 3179-3191, 2021

arXiv:1904.03851 [pdf, other]

Extreme Image Coding via Multiscale Autoencoders With Generative Adversarial Optimization

Authors: Chao Huang, Haojie Liu, Tong Chen, Qiu Shen, Zhan Ma

Abstract: We propose a MultiScale AutoEncoder(MSAE) based extreme image compression framework to offer visually pleasing reconstruction at a very low bitrate. Our method leverages the "priors" at different resolution scale to improve the compression efficiency, and also employs the generative adversarial network(GAN) with multiscale discriminators to perform the end-to-end trainable rate-distortion optimiza… ▽ More We propose a MultiScale AutoEncoder(MSAE) based extreme image compression framework to offer visually pleasing reconstruction at a very low bitrate. Our method leverages the "priors" at different resolution scale to improve the compression efficiency, and also employs the generative adversarial network(GAN) with multiscale discriminators to perform the end-to-end trainable rate-distortion optimization. We compare the perceptual quality of our reconstructions with traditional compression algorithms using High-Efficiency Video Coding(HEVC) based Intra Profile and JPEG2000 on the public Cityscapes and ADE20K datasets, demonstrating the significant subjective quality improvement. △ Less

Submitted 3 January, 2020; v1 submitted 8 April, 2019; originally announced April 2019.

Comments: Accepted to IEEE VCIP 2019 as an oral presentation

arXiv:1902.10480 [pdf, other]

Gated Context Model with Embedded Priors for Deep Image Compression

Authors: Haojie Liu, Tong Chen, Peiyao Guo, Qiu Shen, Zhan Ma

Abstract: A deep image compression scheme is proposed in this paper, offering the state-of-the-art compression efficiency, against the traditional JPEG, JPEG2000, BPG and those popular learning based methodologies. This is achieved by a novel conditional probably model with embedded priors which can accurately approximate the entropy rate for rate-distortion optimization. It utilizes three separable stacks… ▽ More A deep image compression scheme is proposed in this paper, offering the state-of-the-art compression efficiency, against the traditional JPEG, JPEG2000, BPG and those popular learning based methodologies. This is achieved by a novel conditional probably model with embedded priors which can accurately approximate the entropy rate for rate-distortion optimization. It utilizes three separable stacks to eliminate the blind spots in the receptive field for better probability prediction and computation reduction. Those embedded priors can be further used to help the image reconstruction when fused with latent features, after passing through the proposed information compensation network (ICN). Residual learning with generalized divisive normalization (GDN) based activation is used in our encoder and decoder with fast convergence rate and efficient performance. We have evaluated our model and other methods using rate-distortion criteria, where distortion is measured by multi-scale structural similarity (MS-SSIM). We have also discussed the impacts of various distortion metrics on the reconstructed image quality. Besides, a field study on perceptual quality is also given via a dedicated subjective assessment, to compare the efficiency of our proposed methods and other conventional image compression methods. △ Less

Submitted 27 February, 2019; originally announced February 2019.

arXiv:1902.07383 [pdf, other]

Neural Video Compression using Spatio-Temporal Priors

Authors: Haojie Liu, Tong Chen, Ming Lu, Qiu Shen, Zhan Ma

Abstract: The pursuit of higher compression efficiency continuously drives the advances of video coding technologies. Fundamentally, we wish to find better "predictions" or "priors" that are reconstructed previously to remove the signal dependency efficiently and to accurately model the signal distribution for entropy coding. In this work, we propose a neural video compression framework, leveraging the spat… ▽ More The pursuit of higher compression efficiency continuously drives the advances of video coding technologies. Fundamentally, we wish to find better "predictions" or "priors" that are reconstructed previously to remove the signal dependency efficiently and to accurately model the signal distribution for entropy coding. In this work, we propose a neural video compression framework, leveraging the spatial and temporal priors, independently and jointly to exploit the correlations in intra texture, optical flow based temporal motion and residuals. Spatial priors are generated using downscaled low-resolution features, while temporal priors (from previous reference frames and residuals) are captured using a convolutional neural network based long-short term memory (ConvLSTM) structure in a temporal recurrent fashion. All of these parts are connected and trained jointly towards the optimal rate-distortion performance. Compared with the High-Efficiency Video Coding (HEVC) Main Profile (MP), our method has demonstrated averaged 38% Bjontegaard-Delta Rate (BD-Rate) improvement using standard common test sequences, where the distortion is multi-scale structural similarity (MS-SSIM). △ Less

Submitted 20 February, 2019; v1 submitted 19 February, 2019; originally announced February 2019.

arXiv:1806.03400 [pdf, ps, other]

A 3.8 ps RMS time synchronization implemented in a 20 nm FPGA

Authors: Hong-Bo Xie, Yang Li, Qi Shen, Sheng-Kai Liao, Cheng-Zhi Peng

Abstract: A 3.8ps root mean square (RMS) time synchronization implemented in a 20nm fabrication process ultrascale kintex Field Programmable Gate Array (FPGA) is presented. The multichannel high-speed serial transceivers (e.g. GTH) play a key role in a wide range of applications, such as the optical source for quantum key distribution systems. However, owing to the independent clock dividers existed in each… ▽ More A 3.8ps root mean square (RMS) time synchronization implemented in a 20nm fabrication process ultrascale kintex Field Programmable Gate Array (FPGA) is presented. The multichannel high-speed serial transceivers (e.g. GTH) play a key role in a wide range of applications, such as the optical source for quantum key distribution systems. However, owing to the independent clock dividers existed in each transceiver, the random skew would appear among the multiple channels every time the system powers up or resets. A self-phase alignment method provided by Xilinx Corporation could reach a precision with 22 ps RMS and 100 ps maximum variation, which is far from meeting the demand of applications with rate up to 2.5 Gbps. To implement a high-precision intrachannel time synchronization, a protocol combined of a high-precision time-to-digital converter (TDC) and a tunable phase interpolator (PI) is presented. The TDC based on the carry8 primitive is applied to measure the intrachannel skew with 40.7ps bin size. The embedded tunable PI in each GTH channel has a theoretical step size of 3.125 ps. By tuning the PI in the minimal step size, the final intrachannel time synchronization reaches a 3.8 ps RMS precision and maximal variation 20 ps, much better than the self-phase alignment method. Besides, a desirable time offset of every channel can be implemented with a closed-loop control. △ Less

Submitted 8 June, 2018; originally announced June 2018.

Comments: 4 pages,5 figures.21st IEEE Real Time Conference

arXiv:1806.01496 [pdf, other]

Deep Image Compression via End-to-End Learning

Authors: Haojie Liu, Tong Chen, Qiu Shen, Tao Yue, Zhan Ma

Abstract: We present a lossy image compression method based on deep convolutional neural networks (CNNs), which outperforms the existing BPG, WebP, JPEG2000 and JPEG as measured via multi-scale structural similarity (MS-SSIM), at the same bit rate. Currently, most of the CNNs based approaches train the network using a L2 loss between the reconstructions and the ground-truths in the pixel domain, which leads… ▽ More We present a lossy image compression method based on deep convolutional neural networks (CNNs), which outperforms the existing BPG, WebP, JPEG2000 and JPEG as measured via multi-scale structural similarity (MS-SSIM), at the same bit rate. Currently, most of the CNNs based approaches train the network using a L2 loss between the reconstructions and the ground-truths in the pixel domain, which leads to over-smoothing results and visual quality degradation especially at a very low bit rate. Therefore, we improve the subjective quality with the combination of a perception loss and an adversarial loss additionally. To achieve better rate-distortion optimization (RDO), we also introduce an easy-to-hard transfer learning when adding quantization error and rate constraint. Finally, we evaluate our method on public Kodak and the Test Dataset P/M released by the Computer Vision Lab of ETH Zurich, resulting in averaged 7.81% and 19.1% BD-rate reduction over BPG, respectively. △ Less

Submitted 5 June, 2018; originally announced June 2018.

Showing 1–27 of 27 results for author: Shen, Q