-
Physics-Informed AI Inverter
Authors:
Qing Shen,
Yifan Zhou,
Peng Zhang,
Yacov A. Shamash,
Xiaochuan Luo,
Bin Wang,
Huanfeng Zhao,
Roshan Sharma,
Bo Chen
Abstract:
This letter devises an AI-Inverter that pilots the use of a physics-informed neural network (PINN) to enable AI-based electromagnetic transient simulations (EMT) of grid-forming inverters. The contributions are threefold: (1) A PINN-enabled AI-Inverter is formulated; (2) An enhanced learning strategy, balanced-adaptive PINN, is devised; (3) extensive validations and comparative analysis of the acc…
▽ More
This letter devises an AI-Inverter that pilots the use of a physics-informed neural network (PINN) to enable AI-based electromagnetic transient simulations (EMT) of grid-forming inverters. The contributions are threefold: (1) A PINN-enabled AI-Inverter is formulated; (2) An enhanced learning strategy, balanced-adaptive PINN, is devised; (3) extensive validations and comparative analysis of the accuracy and efficiency of AI-Inverter are made to show its superiority over the classical electromagnetic transient programs (EMTP).
△ Less
Submitted 1 July, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer
Authors:
Weifei **,
Yuxin Cao,
Junjie Su,
Qi Shen,
Kai Ye,
Derui Wang,
Jie Hao,
Ziyao Liu
Abstract:
In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of…
▽ More
In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of malicious commands. These attack methods mostly require adding noise perturbations under $\ell_p$ norm constraints, inevitably leaving behind artifacts of manual modifications. Recent research has alleviated this limitation by manipulating style vectors to synthesize adversarial examples based on Text-to-Speech (TTS) synthesis audio. However, style modifications based on optimization objectives significantly reduce the controllability and editability of audio styles. In this paper, we propose an attack on ASR systems based on user-customized style transfer. We first test the effect of Style Transfer Attack (STA) which combines style transfer and adversarial attack in sequential order. And then, as an improvement, we propose an iterative Style Code Attack (SCA) to maintain audio quality. Experimental results show that our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks, while kee** sound naturalness due to our user study.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
TRLS: A Time Series Representation Learning Framework via Spectrogram for Medical Signal Processing
Authors:
Luyuan Xie,
Cong Li,
Xin Zhang,
Shengfang Zhai,
Yuejian Fang,
Qingni Shen,
Zhonghai Wu
Abstract:
Representation learning frameworks in unlabeled time series have been proposed for medical signal processing. Despite the numerous excellent progresses have been made in previous works, we observe the representation extracted for the time series still does not generalize well. In this paper, we present a Time series (medical signal) Representation Learning framework via Spectrogram (TRLS) to get m…
▽ More
Representation learning frameworks in unlabeled time series have been proposed for medical signal processing. Despite the numerous excellent progresses have been made in previous works, we observe the representation extracted for the time series still does not generalize well. In this paper, we present a Time series (medical signal) Representation Learning framework via Spectrogram (TRLS) to get more informative representations. We transform the input time-domain medical signals into spectrograms and design a time-frequency encoder named Time Frequency RNN (TFRNN) to capture more robust multi-scale representations from the augmented spectrograms. Our TRLS takes spectrogram as input with two types of different data augmentations and maximizes the similarity between positive ones, which effectively circumvents the problem of designing negative samples. Our evaluation of four real-world medical signal datasets focusing on medical signal classification shows that TRLS is superior to the existing frameworks.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Scalable Neural Dynamic Equivalence for Power Systems
Authors:
Qing Shen,
Yifan Zhou,
Huanfeng Zhao,
Peng Zhang,
Qiang Zhang,
Slava Maslenniko,
Xiaochuan Luo
Abstract:
Traditional grid analytics are model-based, relying strongly on accurate models of power systems, especially the dynamic models of generators, controllers, loads and other dynamic components. However, acquiring thorough power system models can be impractical in real operation due to inaccessible system parameters and privacy of consumers, which necessitate data-driven dynamic equivalencing of unkn…
▽ More
Traditional grid analytics are model-based, relying strongly on accurate models of power systems, especially the dynamic models of generators, controllers, loads and other dynamic components. However, acquiring thorough power system models can be impractical in real operation due to inaccessible system parameters and privacy of consumers, which necessitate data-driven dynamic equivalencing of unknown subsystems. Learning reliable dynamic equivalent models for the external systems from SCADA and PMU data, however, is a long-standing intractable problem in power system analysis due to complicated nonlinearity and unforeseeable dynamic modes of power systems. This paper advances a practical application of neural dynamic equivalence (NeuDyE) called Driving Port NeuDyE (DP-NeuDyE), which exploits physics-informed machine learning and neural-ordinary-differential-equations (ODE-NET) to discover a dynamic equivalence of external power grids while preserving its dynamic behaviors after disturbances. The new contributions are threefold: A NeuDyE formulation to enable a continuous-time, data-driven dynamic equivalence of power systems, saving the effort and expense of acquiring inaccessible system; An introduction of a Physics-Informed NeuDyE learning (PI-NeuDyE) to actively control the closed-loop accuracy of NeuDyE; and A DP-NeuDyE to reduce the number of inputs required for the training. We conduct extensive case studies on the NPCC system to validate the generalizability and accuracy of both PI-NeuDyE and DP-NeuDyE, which span a multitude of scenarios, differing in the time required for fault clearance, the specific fault locations, and the limitations of data. Test results have demonstrated the scalability and practicality of NeuDyE, showing its potential to be used in ISO and utility control centers for online transient stability analysis and for planning purposes.
△ Less
Submitted 21 March, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Physics-Informed Induction Machine Modelling
Authors:
Qing Shen,
Yifan Zhou,
Peng Zhang
Abstract:
This rapid communication devises a Neural Induction Machine (NeuIM) model, which pilots the use of physics-informed machine learning to enable AI-based electromagnetic transient simulations. The contributions are threefold: (1) a formation of NeuIM to represent the induction machine in phase domain; (2) a physics-informed neural network capable of capturing fast and slow IM dynamics even in the ab…
▽ More
This rapid communication devises a Neural Induction Machine (NeuIM) model, which pilots the use of physics-informed machine learning to enable AI-based electromagnetic transient simulations. The contributions are threefold: (1) a formation of NeuIM to represent the induction machine in phase domain; (2) a physics-informed neural network capable of capturing fast and slow IM dynamics even in the absence of data; and (3) a data-physics-integrated hybrid NeuIM approach which is adaptive to various levels of data availability. Extensive case studies validate the efficacy of NeuIM and in particular, its advantage over purely data-driven approaches.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
Physics-Aware Neural Dynamic Equivalence of Power Systems
Authors:
Qing Shen,
Yifan Zhou,
Qiang Zhang,
Slava Maslennikov,
Xiaochuan Luo,
Peng Zhang
Abstract:
This letter devises Neural Dynamic Equivalence (NeuDyE), which explores physics-aware machine learning and neural-ordinary-differential-equations (ODE-Net) to discover a dynamic equivalence of external power grids while preserving its dynamic behaviors after disturbances. The contributions are threefold: (1) an ODE-Net-enabled NeuDyE formulation to enable a continuous-time, data-driven dynamic equ…
▽ More
This letter devises Neural Dynamic Equivalence (NeuDyE), which explores physics-aware machine learning and neural-ordinary-differential-equations (ODE-Net) to discover a dynamic equivalence of external power grids while preserving its dynamic behaviors after disturbances. The contributions are threefold: (1) an ODE-Net-enabled NeuDyE formulation to enable a continuous-time, data-driven dynamic equivalence of power systems; (2) a physics-informed NeuDyE learning method (PI-NeuDyE) to actively control the closed-loop accuracy of NeuDyE without an additional verification module; (3) a physics-guided NeuDyE (PG-NeuDyE) to enhance the method's applicability even in the absence of analytical physics models. Extensive case studies in the NPCC system validate the efficacy of NeuDyE, and, in particular, its capability under various contingencies.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
SHISRCNet: Super-resolution And Classification Network For Low-resolution Breast Cancer Histopathology Image
Authors:
Luyuan Xie,
Cong Li,
Zirui Wang,
Xin Zhang,
Boyan Chen,
Qingni Shen,
Zhonghai Wu
Abstract:
The rapid identification and accurate diagnosis of breast cancer, known as the killer of women, have become greatly significant for those patients. Numerous breast cancer histopathological image classification methods have been proposed. But they still suffer from two problems. (1) These methods can only hand high-resolution (HR) images. However, the low-resolution (LR) images are often collected…
▽ More
The rapid identification and accurate diagnosis of breast cancer, known as the killer of women, have become greatly significant for those patients. Numerous breast cancer histopathological image classification methods have been proposed. But they still suffer from two problems. (1) These methods can only hand high-resolution (HR) images. However, the low-resolution (LR) images are often collected by the digital slide scanner with limited hardware conditions. Compared with HR images, LR images often lose some key features like texture, which deeply affects the accuracy of diagnosis. (2) The existing methods have fixed receptive fields, so they can not extract and fuse multi-scale features well for images with different magnification factors. To fill these gaps, we present a \textbf{S}ingle \textbf{H}istopathological \textbf{I}mage \textbf{S}uper-\textbf{R}esolution \textbf{C}lassification network (SHISRCNet), which consists of two modules: Super-Resolution (SR) and Classification (CF) modules. SR module reconstructs LR images into SR ones. CF module extracts and fuses the multi-scale features of SR images for classification. In the training stage, we introduce HR images into the CF module to enhance SHISRCNet's performance. Finally, through the joint training of these two modules, super-resolution and classified of LR images are integrated into our model. The experimental results demonstrate that the effects of our method are close to the SOTA methods with taking HR images as inputs.
△ Less
Submitted 25 June, 2023;
originally announced June 2023.
-
Indescribable Multi-modal Spatial Evaluator
Authors:
Lingke Kong,
X. Sharon Qi,
Qi** Shen,
Jiacheng Wang,
**gyi Zhang,
Yanle Hu,
Qichao Zhou
Abstract:
Multi-modal image registration spatially aligns two images with different distributions. One of its major challenges is that images acquired from different imaging machines have different imaging distributions, making it difficult to focus only on the spatial aspect of the images and ignore differences in distributions. In this study, we developed a self-supervised approach, Indescribable Multi-mo…
▽ More
Multi-modal image registration spatially aligns two images with different distributions. One of its major challenges is that images acquired from different imaging machines have different imaging distributions, making it difficult to focus only on the spatial aspect of the images and ignore differences in distributions. In this study, we developed a self-supervised approach, Indescribable Multi-model Spatial Evaluator (IMSE), to address multi-modal image registration. IMSE creates an accurate multi-modal spatial evaluator to measure spatial differences between two images, and then optimizes registration by minimizing the error predicted of the evaluator. To optimize IMSE performance, we also proposed a new style enhancement method called Shuffle Remap which randomizes the image distribution into multiple segments, and then randomly disorders and remaps these segments, so that the distribution of the original image is changed. Shuffle Remap can help IMSE to predict the difference in spatial location from unseen target distributions. Our results show that IMSE outperformed the existing methods for registration using T1-T2 and CT-MRI datasets. IMSE also can be easily integrated into the traditional registration process, and can provide a convenient way to evaluate and visualize registration results. IMSE also has the potential to be used as a new paradigm for image-to-image translation. Our code is available at https://github.com/Kid-Liet/IMSE.
△ Less
Submitted 1 March, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Phase Difference based Doppler Disambiguation Method for TDM-MIMOFMCW Radars
Authors:
Qingshan Shen,
Qingbo Wang
Abstract:
State-of-the-art automotive radar sensors use a Mutliple-Input Mutiple-Output (MIMO) approach to obtain a better angular resolution. Time-Division Multiplexing (TDM) scheme is commonly applied to realize the orthogonality in time at the transmitter. Apart from its simplicity in implementation, TDM scheme has the drawback of a reduced maximum unambiguous Doppler proportional to the number of transm…
▽ More
State-of-the-art automotive radar sensors use a Mutliple-Input Mutiple-Output (MIMO) approach to obtain a better angular resolution. Time-Division Multiplexing (TDM) scheme is commonly applied to realize the orthogonality in time at the transmitter. Apart from its simplicity in implementation, TDM scheme has the drawback of a reduced maximum unambiguous Doppler proportional to the number of transmitters. In this paper, a phase difference based Doppler disambiguation method is proposed to regain the maximum unambiguous Doppler which is equivalent to only one transmitter. This method works well when the number of transmitters is large. The proposed method is demonstrated with simulation and measurement data.
△ Less
Submitted 5 August, 2022;
originally announced August 2022.
-
Boosting Tail Neural Network for Realtime Custom Keyword Spotting
Authors:
Sihao Xue,
Qianyao Shen,
Guoqing Li
Abstract:
In this paper, we propose a Boosting Tail Neural Network (BTNN) for improving the performance of Realtime Custom Keyword Spotting (RCKS) that is still an industrial challenge for demanding powerful classification ability with limited computation resources. Inspired by Brain Science that a brain is only partly activated for a nerve simulation and numerous machine learning algorithms are developed t…
▽ More
In this paper, we propose a Boosting Tail Neural Network (BTNN) for improving the performance of Realtime Custom Keyword Spotting (RCKS) that is still an industrial challenge for demanding powerful classification ability with limited computation resources. Inspired by Brain Science that a brain is only partly activated for a nerve simulation and numerous machine learning algorithms are developed to use a batch of weak classifiers to resolve arduous problems, which are often proved to be effective. We show that this method is helpful to the RCKS problem. The proposed approach achieve better performances in terms of wakeup rate and false alarm.
In our experiments compared with those traditional algorithms that use only one strong classifier, it gets 18\% relative improvement. We also point out that this approach may be promising in future ASR exploration.
△ Less
Submitted 7 June, 2023; v1 submitted 24 May, 2022;
originally announced May 2022.
-
Uncovering the Over-smoothing Challenge in Image Super-Resolution: Entropy-based Quantification and Contrastive Optimization
Authors:
Tianshuo Xu,
Lijiang Li,
Peng Mi,
Xiawu Zheng,
Fei Chao,
Rongrong Ji,
Yonghong Tian,
Qiang Shen
Abstract:
PSNR-oriented models are a critical class of super-resolution models with applications across various fields. However, these models tend to generate over-smoothed images, a problem that has been analyzed previously from the perspectives of models or loss functions, but without taking into account the impact of data properties. In this paper, we present a novel phenomenon that we term the center-or…
▽ More
PSNR-oriented models are a critical class of super-resolution models with applications across various fields. However, these models tend to generate over-smoothed images, a problem that has been analyzed previously from the perspectives of models or loss functions, but without taking into account the impact of data properties. In this paper, we present a novel phenomenon that we term the center-oriented optimization (COO) problem, where a model's output converges towards the center point of similar high-resolution images, rather than towards the ground truth. We demonstrate that the strength of this problem is related to the uncertainty of data, which we quantify using entropy. We prove that as the entropy of high-resolution images increases, their center point will move further away from the clean image distribution, and the model will generate over-smoothed images. Implicitly optimizing the COO problem, perceptual-driven approaches such as perceptual loss, model structure optimization, or GAN-based methods can be viewed. We propose an explicit solution to the COO problem, called Detail Enhanced Contrastive Loss (DECLoss). DECLoss utilizes the clustering property of contrastive learning to directly reduce the variance of the potential high-resolution distribution and thereby decrease the entropy. We evaluate DECLoss on multiple super-resolution benchmarks and demonstrate that it improves the perceptual quality of PSNR-oriented models. Moreover, when applied to GAN-based methods, such as RaGAN, DECLoss helps to achieve state-of-the-art performance, such as 0.093 LPIPS with 24.51 PSNR on 4x downsampled Urban100, validating the effectiveness and generalization of our approach.
△ Less
Submitted 15 March, 2024; v1 submitted 4 January, 2022;
originally announced January 2022.
-
Msdtron: a high-capability multi-speaker speech synthesis system for diverse data using characteristic information
Authors:
Qinghua Wu,
Quanbo Shen,
Jian Luan,
YuJun Wang
Abstract:
In multi-speaker speech synthesis, data from a number of speakers usually tend to have great diversity due to the fact that the speakers may differ largely in ages, speaking styles, emotions, and so on. It is important but challenging to improve the modeling capabilities for multi-speaker speech synthesis. To address the issue, this paper proposes a high-capability speech synthesis system, called…
▽ More
In multi-speaker speech synthesis, data from a number of speakers usually tend to have great diversity due to the fact that the speakers may differ largely in ages, speaking styles, emotions, and so on. It is important but challenging to improve the modeling capabilities for multi-speaker speech synthesis. To address the issue, this paper proposes a high-capability speech synthesis system, called Msdtron, in which 1) a representation of the harmonic structure of speech, called excitation spectrogram, is designed to directly guide the learning of harmonics in mel-spectrogram. 2) conditional gated LSTM (CGLSTM) is proposed to control the flow of text content information through the network by re-weighting the gates of LSTM using speaker information. The experiments show a significant reduction in reconstruction error of mel-spectrogram in the training of the multi-speaker model, and a great improvement is observed in the subjective evaluation of speaker adapted model.
△ Less
Submitted 11 February, 2022; v1 submitted 7 July, 2021;
originally announced July 2021.
-
Equalized Recovery State Estimators for Linear Systems with Delayed and Missing Observations
Authors:
Syed M. Hassaan,
Qiang Shen,
Sze Zheng Yong
Abstract:
This paper presents a dynamic state observer design for discrete-time linear time-varying systems that robustly achieves equalized recovery despite delayed or missing observations, where the set of all temporal patterns for the missing or delayed data is modeled by a finite-length language. By introducing a map** of the language onto a reduced event-based language, we design a state estimator th…
▽ More
This paper presents a dynamic state observer design for discrete-time linear time-varying systems that robustly achieves equalized recovery despite delayed or missing observations, where the set of all temporal patterns for the missing or delayed data is modeled by a finite-length language. By introducing a map** of the language onto a reduced event-based language, we design a state estimator that adapts based on the history of available data at each step, and satisfies equalized recovery for all patterns in the reduced language. In contrast to existing equalized recovery estimators, the proposed design considers the equalized recovery level as a decision variable, which enables us to directly obtain the global minimum for the intermediate recovery level, resulting in improved estimation performance. Finally, we demonstrate the effectiveness of the proposed observer when compared to existing approaches using several illustrative examples.
△ Less
Submitted 9 January, 2021;
originally announced January 2021.
-
Direction Finding of Electromagnetic Sources on a Sparse Cross-Dipole Array Using One-Bit Measurements
Authors:
Zhiyong Cheng,
Shengyao Chen,
Qibin Shen,
** He,
Zhong liu
Abstract:
Sparse array arrangement has been widely used in vector-sensor arrays because of increased degree-of-freedoms for identifying more sources than sensors. For large-size sparse vector-sensor arrays, one-bit measurements can further reduce the receiver system complexity by using low-resolution ADCs. In this paper, we present a sparse cross-dipole array with one-bit measurements to estimate Direction…
▽ More
Sparse array arrangement has been widely used in vector-sensor arrays because of increased degree-of-freedoms for identifying more sources than sensors. For large-size sparse vector-sensor arrays, one-bit measurements can further reduce the receiver system complexity by using low-resolution ADCs. In this paper, we present a sparse cross-dipole array with one-bit measurements to estimate Direction of Arrivals (DOA) of electromagnetic sources. Based on the independence assumption of sources, we establish the relation between the covariance matrix of one-bit measurements and that of unquantized measurements by Bussgang Theorem. Then we develop a Spatial-Smooth MUSIC (SS-MUSIC) based method, One-Bit MUSIC (OB-MUSIC), to estimate the DOAs. By jointly utilizing the covariance matrices of two dipole arrays, we find that OB-MUSIC is robust against polarization states. We also derive the Cramer-Rao bound (CRB) of DOA estimation for the proposed scheme. Furthermore, we theoretically analyze the applicability of the independence assumption of sources, which is the fundamental of the proposed and other typical methods, and verify the assumption in typical communication applications. Numerical results show that, with the same number of sensors, one-bit sparse cross-dipole arrays have comparable performance with unquantized uniform linear arrays and thus provide a compromise between the DOA estimation performance and the system complexity.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
Tractable Compositions of Discrete-Time Control Barrier Functions with Application to Lane Kee** and Obstacle Avoidance
Authors:
Matthew Cavorsi,
Mohammad Khajenejad,
Ruochen Niu,
Qiang Shen,
Sze Zheng Yong
Abstract:
This paper introduces control barrier functions for discrete-time systems, which can be shown to be necessary and sufficient for controlled invariance of a given set. Moreover, we propose nonlinear discrete-time control barrier functions for partially control affine systems that lead to controlled invariance conditions that are affine in the control input, leading to a tractable formulation that e…
▽ More
This paper introduces control barrier functions for discrete-time systems, which can be shown to be necessary and sufficient for controlled invariance of a given set. Moreover, we propose nonlinear discrete-time control barrier functions for partially control affine systems that lead to controlled invariance conditions that are affine in the control input, leading to a tractable formulation that enables us to handle the safety optimal control problem for a broader range of applications with more complicated safety conditions than existing approaches. In addition, we develop mixed-integer formulations for basic and secondary Boolean compositions of multiple control barrier functions and further provide mixed-integer constraints for piecewise control barrier functions. Finally, we apply these discrete-time control barrier function tools to automotive safety problems of lane kee** and obstacle avoidance, which are shown to be effective in simulation.
△ Less
Submitted 4 April, 2020;
originally announced April 2020.
-
Incremental Affine Abstraction of Nonlinear Systems
Authors:
Syed M. Hassaan,
Mohammad Khajenejad,
Spencer Jensen,
Qiang Shen,
Sze Zheng Yong
Abstract:
In this paper, we propose an incremental abstraction method for dynamically over-approximating nonlinear systems in a bounded domain by solving a sequence of linear programs, resulting in a sequence of affine upper and lower hyperplanes with expanding operating regions. Although the affine abstraction problem can be solved offline using a single linear program, existing approaches suffer from a co…
▽ More
In this paper, we propose an incremental abstraction method for dynamically over-approximating nonlinear systems in a bounded domain by solving a sequence of linear programs, resulting in a sequence of affine upper and lower hyperplanes with expanding operating regions. Although the affine abstraction problem can be solved offline using a single linear program, existing approaches suffer from a computation space complexity that grows exponentially with the state dimension. Hence, the motivation for incremental abstraction is to reduce the space complexity for high-dimensional systems, but at the cost of yielding potentially worse abstractions/overapproximations. Specifically, we start with an operating region that is a subregion of the state space and compute two affine hyperplanes that bracket the nonlinear function locally. Then, by incrementally expanding the operating region, we dynamically update the two affine hyperplanes such that we eventually yield hyperplanes that are guaranteed to over-approximate the nonlinear system over the entire domain. Finally, the effectiveness of the proposed approach is demonstrated using numerical examples of high-dimensional nonlinear systems.
△ Less
Submitted 3 April, 2020;
originally announced April 2020.
-
Neural Image Compression via Non-Local Attention Optimization and Improved Context Modeling
Authors:
Tong Chen,
Haojie Liu,
Zhan Ma,
Qiu Shen,
Xun Cao,
Yao Wang
Abstract:
This paper proposes a novel Non-Local Attention optmization and Improved Context modeling-based image compression (NLAIC) algorithm, which is built on top of the deep nerual network (DNN)-based variational auto-encoder (VAE) structure. Our NLAIC 1) embeds non-local network operations as non-linear transforms in the encoders and decoders for both the image and the latent representation probability…
▽ More
This paper proposes a novel Non-Local Attention optmization and Improved Context modeling-based image compression (NLAIC) algorithm, which is built on top of the deep nerual network (DNN)-based variational auto-encoder (VAE) structure. Our NLAIC 1) embeds non-local network operations as non-linear transforms in the encoders and decoders for both the image and the latent representation probability information (known as hyperprior) to capture both local and global correlations, 2) applies attention mechanism to generate masks that are used to weigh the features, which implicitly adapt bit allocation for feature elements based on their importance, and 3) implements the improved conditional entropy modeling of latent features using joint 3D convolutional neural network (CNN)-based autoregressive contexts and hyperpriors. Towards the practical application, additional enhancements are also introduced to speed up processing (e.g., parallel 3D CNN-based context prediction), reduce memory consumption (e.g., sparse non-local processing) and alleviate the implementation complexity (e.g., unified model for variable rates without re-training). The proposed model outperforms existing methods on Kodak and CLIC datasets with the state-of-the-art compression efficiency reported, including learned and conventional (e.g., BPG, JPEG2000, JPEG) image compression methods, for both PSNR and MS-SSIM distortion metrics.
△ Less
Submitted 11 October, 2019;
originally announced October 2019.
-
Learned Point Cloud Geometry Compression
Authors:
Jianqiang Wang,
Hao Zhu,
Zhan Ma,
Tong Chen,
Haojie Liu,
Qiu Shen
Abstract:
This paper presents a novel end-to-end Learned Point Cloud Geometry Compression (a.k.a., Learned-PCGC) framework, to efficiently compress the point cloud geometry (PCG) using deep neural networks (DNN) based variational autoencoders (VAE). In our approach, PCG is first voxelized, scaled and partitioned into non-overlapped 3D cubes, which is then fed into stacked 3D convolutions for compact latent…
▽ More
This paper presents a novel end-to-end Learned Point Cloud Geometry Compression (a.k.a., Learned-PCGC) framework, to efficiently compress the point cloud geometry (PCG) using deep neural networks (DNN) based variational autoencoders (VAE). In our approach, PCG is first voxelized, scaled and partitioned into non-overlapped 3D cubes, which is then fed into stacked 3D convolutions for compact latent feature and hyperprior generation. Hyperpriors are used to improve the conditional probability modeling of latent features. A weighted binary cross-entropy (WBCE) loss is applied in training while an adaptive thresholding is used in inference to remove unnecessary voxels and reduce the distortion. Objectively, our method exceeds the geometry-based point cloud compression (G-PCC) algorithm standardized by well-known Moving Picture Experts Group (MPEG) with a significant performance margin, e.g., at least 60% BD-Rate (Bjontegaard Delta Rate) gains, using common test datasets. Subjectively, our method has presented better visual quality with smoother surface reconstruction and appealing details, in comparison to all existing MPEG standard compliant PCC methods. Our method requires about 2.5MB parameters in total, which is a fairly small size for practical implementation, even on embedded platform. Additional ablation studies analyze a variety of aspects (e.g., cube size, kernels, etc) to explore the application potentials of our learned-PCGC.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Development of a General Momentum Exchange Devices Fault Model for Spacecraft Fault-Tolerant Control System Design
Authors:
Chengfei Yue,
Qiang Shen,
Xibin Cao,
Feng Wang,
Cher Hiang Goh,
Tong Heng Lee
Abstract:
This paper investigates the mechanism of various faults of momentum exchange devices. These devices are modeled as a cascade electric motor EM - variable speed drive VSD system. Considering the mechanical part of the EM and the VSD system, the potential faults are reviewed and summarized. Thus with a clear understanding of these potential faults, a general fault model in a cascade multiplicative s…
▽ More
This paper investigates the mechanism of various faults of momentum exchange devices. These devices are modeled as a cascade electric motor EM - variable speed drive VSD system. Considering the mechanical part of the EM and the VSD system, the potential faults are reviewed and summarized. Thus with a clear understanding of these potential faults, a general fault model in a cascade multiplicative structure is established for momentum exchange devices. Based on this general model, various fault scenarios can be simulated, and the possible output can be appropriately visualized. In this paper, six types of working condition are identified and the corresponding fault models are constructed. Using this fault model, the control responses using reaction wheels and single gimbal control moment gyros under various fault conditions are demonstrated. The simulation results show the severities of the faults and demonstrate that the additive fault is more serious than the multiplicative fault from the viewpoint of control accuracy. Finally, existing fault-tolerant control strategies are brief summarized and potential approaches including both passive and active ones to accommodate gimbal fault of single gimbal control moment gyro is demonstrated.
△ Less
Submitted 27 July, 2019; v1 submitted 15 July, 2019;
originally announced July 2019.
-
Data augmentation in microscopic images for material data mining
Authors:
Boyuan Ma,
Xiaoyan Wei,
Chuni Liu,
Xiaojuan Ban,
Haiyou Huang,
Hao Wang,
Weihua Xue,
Stephen Wu,
Mingfei Gao,
Qing Shen,
Adnan Omer Abuassba,
Haokai Shen,
Yan**g Su
Abstract:
Recent progress in material data mining has been driven by high-capacity models trained on large datasets. However, collecting experimental data (real data) has been extremely costly since the amount of human effort and expertise required. Here, we develop a novel transfer learning strategy to address small or insufficient data problem. This strategy realizes the fusion of real and simulated data,…
▽ More
Recent progress in material data mining has been driven by high-capacity models trained on large datasets. However, collecting experimental data (real data) has been extremely costly since the amount of human effort and expertise required. Here, we develop a novel transfer learning strategy to address small or insufficient data problem. This strategy realizes the fusion of real and simulated data, and the augmentation of training data in data mining procedure. For a specific task of image segmentation, this strategy can generate synthetic images by fusing physical mechanism of simulated images and "image style" of real images. The result shows that the model trained with the acquired synthetic images and 35% of the real images outperforms the model trained on all real images. As the time required to generate synthetic data is almost negligible, this strategy is able to reduce the time cost of real data preparation by roughly 65%.
△ Less
Submitted 28 October, 2019; v1 submitted 12 May, 2019;
originally announced May 2019.
-
Learned Quality Enhancement via Multi-Frame Priors for HEVC Compliant Low-Delay Applications
Authors:
Ming Lu,
Ming Cheng,
Yiling Xu,
Shiliang Pu,
Qiu Shen,
Zhan Ma
Abstract:
Networked video applications, e.g., video conferencing, often suffer from poor visual quality due to unexpected network fluctuation and limited bandwidth. In this paper, we have developed a Quality Enhancement Network (QENet) to reduce the video compression artifacts, leveraging the spatial and temporal priors generated by respective multi-scale convolutions spatially and warped temporal predictio…
▽ More
Networked video applications, e.g., video conferencing, often suffer from poor visual quality due to unexpected network fluctuation and limited bandwidth. In this paper, we have developed a Quality Enhancement Network (QENet) to reduce the video compression artifacts, leveraging the spatial and temporal priors generated by respective multi-scale convolutions spatially and warped temporal predictions in a recurrent fashion temporally. We have integrated this QENet as a standard-alone post-processing subsystem to the High Efficiency Video Coding (HEVC) compliant decoder. Experimental results show that our QENet demonstrates the state-of-the-art performance against default in-loop filters in HEVC and other deep learning based methods with noticeable objective gains in Peak-Signal-to-Noise Ratio (PSNR) and subjective gains visually.
△ Less
Submitted 2 May, 2019;
originally announced May 2019.
-
Non-local Attention Optimized Deep Image Compression
Authors:
Haojie Liu,
Tong Chen,
Peiyao Guo,
Qiu Shen,
Xun Cao,
Yao Wang,
Zhan Ma
Abstract:
This paper proposes a novel Non-Local Attention Optimized Deep Image Compression (NLAIC) framework, which is built on top of the popular variational auto-encoder (VAE) structure. Our NLAIC framework embeds non-local operations in the encoders and decoders for both image and latent feature probability information (known as hyperprior) to capture both local and global correlations, and apply attenti…
▽ More
This paper proposes a novel Non-Local Attention Optimized Deep Image Compression (NLAIC) framework, which is built on top of the popular variational auto-encoder (VAE) structure. Our NLAIC framework embeds non-local operations in the encoders and decoders for both image and latent feature probability information (known as hyperprior) to capture both local and global correlations, and apply attention mechanism to generate masks that are used to weigh the features for the image and hyperprior, which implicitly adapt bit allocation for different features based on their importance. Furthermore, both hyperpriors and spatial-channel neighbors of the latent features are used to improve entropy coding. The proposed model outperforms the existing methods on Kodak dataset, including learned (e.g., Balle2019, Balle2018) and conventional (e.g., BPG, JPEG2000, JPEG) image compression methods, for both PSNR and MS-SSIM distortion metrics.
△ Less
Submitted 22 April, 2019;
originally announced April 2019.
-
Extreme Image Coding via Multiscale Autoencoders With Generative Adversarial Optimization
Authors:
Chao Huang,
Haojie Liu,
Tong Chen,
Qiu Shen,
Zhan Ma
Abstract:
We propose a MultiScale AutoEncoder(MSAE) based extreme image compression framework to offer visually pleasing reconstruction at a very low bitrate. Our method leverages the "priors" at different resolution scale to improve the compression efficiency, and also employs the generative adversarial network(GAN) with multiscale discriminators to perform the end-to-end trainable rate-distortion optimiza…
▽ More
We propose a MultiScale AutoEncoder(MSAE) based extreme image compression framework to offer visually pleasing reconstruction at a very low bitrate. Our method leverages the "priors" at different resolution scale to improve the compression efficiency, and also employs the generative adversarial network(GAN) with multiscale discriminators to perform the end-to-end trainable rate-distortion optimization. We compare the perceptual quality of our reconstructions with traditional compression algorithms using High-Efficiency Video Coding(HEVC) based Intra Profile and JPEG2000 on the public Cityscapes and ADE20K datasets, demonstrating the significant subjective quality improvement.
△ Less
Submitted 3 January, 2020; v1 submitted 8 April, 2019;
originally announced April 2019.
-
Gated Context Model with Embedded Priors for Deep Image Compression
Authors:
Haojie Liu,
Tong Chen,
Peiyao Guo,
Qiu Shen,
Zhan Ma
Abstract:
A deep image compression scheme is proposed in this paper, offering the state-of-the-art compression efficiency, against the traditional JPEG, JPEG2000, BPG and those popular learning based methodologies. This is achieved by a novel conditional probably model with embedded priors which can accurately approximate the entropy rate for rate-distortion optimization. It utilizes three separable stacks…
▽ More
A deep image compression scheme is proposed in this paper, offering the state-of-the-art compression efficiency, against the traditional JPEG, JPEG2000, BPG and those popular learning based methodologies. This is achieved by a novel conditional probably model with embedded priors which can accurately approximate the entropy rate for rate-distortion optimization. It utilizes three separable stacks to eliminate the blind spots in the receptive field for better probability prediction and computation reduction. Those embedded priors can be further used to help the image reconstruction when fused with latent features, after passing through the proposed information compensation network (ICN). Residual learning with generalized divisive normalization (GDN) based activation is used in our encoder and decoder with fast convergence rate and efficient performance. We have evaluated our model and other methods using rate-distortion criteria, where distortion is measured by multi-scale structural similarity (MS-SSIM). We have also discussed the impacts of various distortion metrics on the reconstructed image quality. Besides, a field study on perceptual quality is also given via a dedicated subjective assessment, to compare the efficiency of our proposed methods and other conventional image compression methods.
△ Less
Submitted 27 February, 2019;
originally announced February 2019.
-
Neural Video Compression using Spatio-Temporal Priors
Authors:
Haojie Liu,
Tong Chen,
Ming Lu,
Qiu Shen,
Zhan Ma
Abstract:
The pursuit of higher compression efficiency continuously drives the advances of video coding technologies. Fundamentally, we wish to find better "predictions" or "priors" that are reconstructed previously to remove the signal dependency efficiently and to accurately model the signal distribution for entropy coding. In this work, we propose a neural video compression framework, leveraging the spat…
▽ More
The pursuit of higher compression efficiency continuously drives the advances of video coding technologies. Fundamentally, we wish to find better "predictions" or "priors" that are reconstructed previously to remove the signal dependency efficiently and to accurately model the signal distribution for entropy coding. In this work, we propose a neural video compression framework, leveraging the spatial and temporal priors, independently and jointly to exploit the correlations in intra texture, optical flow based temporal motion and residuals. Spatial priors are generated using downscaled low-resolution features, while temporal priors (from previous reference frames and residuals) are captured using a convolutional neural network based long-short term memory (ConvLSTM) structure in a temporal recurrent fashion. All of these parts are connected and trained jointly towards the optimal rate-distortion performance. Compared with the High-Efficiency Video Coding (HEVC) Main Profile (MP), our method has demonstrated averaged 38% Bjontegaard-Delta Rate (BD-Rate) improvement using standard common test sequences, where the distortion is multi-scale structural similarity (MS-SSIM).
△ Less
Submitted 20 February, 2019; v1 submitted 19 February, 2019;
originally announced February 2019.
-
A 3.8 ps RMS time synchronization implemented in a 20 nm FPGA
Authors:
Hong-Bo Xie,
Yang Li,
Qi Shen,
Sheng-Kai Liao,
Cheng-Zhi Peng
Abstract:
A 3.8ps root mean square (RMS) time synchronization implemented in a 20nm fabrication process ultrascale kintex Field Programmable Gate Array (FPGA) is presented. The multichannel high-speed serial transceivers (e.g. GTH) play a key role in a wide range of applications, such as the optical source for quantum key distribution systems. However, owing to the independent clock dividers existed in each…
▽ More
A 3.8ps root mean square (RMS) time synchronization implemented in a 20nm fabrication process ultrascale kintex Field Programmable Gate Array (FPGA) is presented. The multichannel high-speed serial transceivers (e.g. GTH) play a key role in a wide range of applications, such as the optical source for quantum key distribution systems. However, owing to the independent clock dividers existed in each transceiver, the random skew would appear among the multiple channels every time the system powers up or resets. A self-phase alignment method provided by Xilinx Corporation could reach a precision with 22 ps RMS and 100 ps maximum variation, which is far from meeting the demand of applications with rate up to 2.5 Gbps. To implement a high-precision intrachannel time synchronization, a protocol combined of a high-precision time-to-digital converter (TDC) and a tunable phase interpolator (PI) is presented. The TDC based on the carry8 primitive is applied to measure the intrachannel skew with 40.7ps bin size. The embedded tunable PI in each GTH channel has a theoretical step size of 3.125 ps. By tuning the PI in the minimal step size, the final intrachannel time synchronization reaches a 3.8 ps RMS precision and maximal variation 20 ps, much better than the self-phase alignment method. Besides, a desirable time offset of every channel can be implemented with a closed-loop control.
△ Less
Submitted 8 June, 2018;
originally announced June 2018.
-
Deep Image Compression via End-to-End Learning
Authors:
Haojie Liu,
Tong Chen,
Qiu Shen,
Tao Yue,
Zhan Ma
Abstract:
We present a lossy image compression method based on deep convolutional neural networks (CNNs), which outperforms the existing BPG, WebP, JPEG2000 and JPEG as measured via multi-scale structural similarity (MS-SSIM), at the same bit rate. Currently, most of the CNNs based approaches train the network using a L2 loss between the reconstructions and the ground-truths in the pixel domain, which leads…
▽ More
We present a lossy image compression method based on deep convolutional neural networks (CNNs), which outperforms the existing BPG, WebP, JPEG2000 and JPEG as measured via multi-scale structural similarity (MS-SSIM), at the same bit rate. Currently, most of the CNNs based approaches train the network using a L2 loss between the reconstructions and the ground-truths in the pixel domain, which leads to over-smoothing results and visual quality degradation especially at a very low bit rate. Therefore, we improve the subjective quality with the combination of a perception loss and an adversarial loss additionally. To achieve better rate-distortion optimization (RDO), we also introduce an easy-to-hard transfer learning when adding quantization error and rate constraint. Finally, we evaluate our method on public Kodak and the Test Dataset P/M released by the Computer Vision Lab of ETH Zurich, resulting in averaged 7.81% and 19.1% BD-rate reduction over BPG, respectively.
△ Less
Submitted 5 June, 2018;
originally announced June 2018.