Search | arXiv e-print repository

Computation-Aware Learning for Stable Control with Gaussian Process

Authors: Wenhan Cao, Alexandre Capone, Rishabh Yadav, Sandra Hirche, Wei Pan

Abstract: In Gaussian Process (GP) dynamical model learning for robot control, particularly for systems constrained by computational resources like small quadrotors equipped with low-end processors, analyzing stability and designing a stable controller present significant challenges. This paper distinguishes between two types of uncertainty within the posteriors of GP dynamical models: the well-documented m… ▽ More In Gaussian Process (GP) dynamical model learning for robot control, particularly for systems constrained by computational resources like small quadrotors equipped with low-end processors, analyzing stability and designing a stable controller present significant challenges. This paper distinguishes between two types of uncertainty within the posteriors of GP dynamical models: the well-documented mathematical uncertainty stemming from limited data and computational uncertainty arising from constrained computational capabilities, which has been largely overlooked in prior research. Our work demonstrates that computational uncertainty, quantified through a probabilistic approximation of the inverse covariance matrix in GP dynamical models, is essential for stable control under computational constraints. We show that incorporating computational uncertainty can prevent overestimating the region of attraction, a safe subset of the state space with asymptotic stability, thus improving system safety. Building on these insights, we propose an innovative controller design methodology that integrates computational uncertainty within a second-order cone programming framework. Simulations of canonical stable control tasks and experiments of quadrotor tracking exhibit the effectiveness of our method under computational constraints. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.03008 [pdf, other]

DVMSR: Distillated Vision Mamba for Efficient Super-Resolution

Authors: Xiaoyan Lei, Wenlong Zhang, Weifeng Cao

Abstract: Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational comple… ▽ More Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational complexity, which have shown impressive performance on high-level vision tasks. In this paper, we propose DVMSR, a novel lightweight Image SR network that incorporates Vision Mamba and a distillation strategy. The network of DVMSR consists of three modules: feature extraction convolution, multiple stacked Residual State Space Blocks (RSSBs), and a reconstruction module. Specifically, the deep feature extraction module is composed of several residual state space blocks (RSSB), each of which has several Vision Mamba Moudles(ViMM) together with a residual connection. To achieve efficiency improvement while maintaining comparable performance, we employ a distillation strategy to the vision Mamba network for superior performance. Specifically, we leverage the rich representation knowledge of teacher network as additional supervision for the output of lightweight student networks. Extensive experiments have demonstrated that our proposed DVMSR can outperform state-of-the-art efficient SR methods in terms of model parameters while maintaining the performance of both PSNR and SSIM. The source code is available at https://github.com/nathan66666/DVMSR.git △ Less

Submitted 11 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

Comments: 8 pages, 8 figures

arXiv:2405.00027 [pdf, other]

doi 10.5220/0012431300003660

Multidimensional Compressed Sensing for Spectral Light Field Imaging

Authors: Wen Cao, Ehsan Miandji, Jonas Unger

Abstract: This paper considers a compressive multi-spectral light field camera model that utilizes a one-hot spectralcoded mask and a microlens array to capture spatial, angular, and spectral information using a single monochrome sensor. We propose a model that employs compressed sensing techniques to reconstruct the complete multi-spectral light field from undersampled measurements. Unlike previous work wh… ▽ More This paper considers a compressive multi-spectral light field camera model that utilizes a one-hot spectralcoded mask and a microlens array to capture spatial, angular, and spectral information using a single monochrome sensor. We propose a model that employs compressed sensing techniques to reconstruct the complete multi-spectral light field from undersampled measurements. Unlike previous work where a light field is vectorized to a 1D signal, our method employs a 5D basis and a novel 5D measurement model, hence, matching the intrinsic dimensionality of multispectral light fields. We mathematically and empirically show the equivalence of 5D and 1D sensing models, and most importantly that the 5D framework achieves orders of magnitude faster reconstruction while requiring a small fraction of the memory. Moreover, our new multidimensional sensing model opens new research directions for designing efficient visual data acquisition algorithms and hardware. △ Less

Submitted 27 February, 2024; originally announced May 2024.

Comments: 8 pages, published of VISAPP 2024

Journal ref: In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP 2024, ISBN 978-989-758-679-8, ISSN 2184-4321, pages 349-356

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2404.00481 [pdf, other]

Convolutional Bayesian Filtering

Authors: Wenhan Cao, Shiqi Liu, Chang Liu, Zeyu He, Stephen S. -T. Yau, Shengbo Eben Li

Abstract: Bayesian filtering serves as the mainstream framework of state estimation in dynamic systems. Its standard version utilizes total probability rule and Bayes' law alternatively, where how to define and compute conditional probability is critical to state distribution inference. Previously, the conditional probability is assumed to be exactly known, which represents a measure of the occurrence proba… ▽ More Bayesian filtering serves as the mainstream framework of state estimation in dynamic systems. Its standard version utilizes total probability rule and Bayes' law alternatively, where how to define and compute conditional probability is critical to state distribution inference. Previously, the conditional probability is assumed to be exactly known, which represents a measure of the occurrence probability of one event, given the second event. In this paper, we find that by adding an additional event that stipulates an inequality condition, we can transform the conditional probability into a special integration that is analogous to convolution. Based on this transformation, we show that both transition probability and output probability can be generalized to convolutional forms, resulting in a more general filtering framework that we call convolutional Bayesian filtering. This new framework encompasses standard Bayesian filtering as a special case when the distance metric of the inequality condition is selected as Dirac delta function. It also allows for a more nuanced consideration of model mismatch by choosing different types of inequality conditions. For instance, when the distance metric is defined in a distributional sense, the transition probability and output probability can be approximated by simply rescaling them into fractional powers. Under this framework, a robust version of Kalman filter can be constructed by only altering the noise covariance matrix, while maintaining the conjugate nature of Gaussian distributions. Finally, we exemplify the effectiveness of our approach by resha** classic filtering algorithms into convolutional versions, including Kalman filter, extended Kalman filter, unscented Kalman filter and particle filter. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2403.07954 [pdf, other]

Optimizing Polynomial Graph Filters: A Novel Adaptive Krylov Subspace Approach

Authors: Keke Huang, Wencai Cao, Hoang Ta, Xiaokui Xiao, Pietro Liò

Abstract: Graph Neural Networks (GNNs), known as spectral graph filters, find a wide range of applications in web networks. To bypass eigendecomposition, polynomial graph filters are proposed to approximate graph filters by leveraging various polynomial bases for filter training. However, no existing studies have explored the diverse polynomial graph filters from a unified perspective for optimization. In… ▽ More Graph Neural Networks (GNNs), known as spectral graph filters, find a wide range of applications in web networks. To bypass eigendecomposition, polynomial graph filters are proposed to approximate graph filters by leveraging various polynomial bases for filter training. However, no existing studies have explored the diverse polynomial graph filters from a unified perspective for optimization. In this paper, we first unify polynomial graph filters, as well as the optimal filters of identical degrees into the Krylov subspace of the same order, thus providing equivalent expressive power theoretically. Next, we investigate the asymptotic convergence property of polynomials from the unified Krylov subspace perspective, revealing their limited adaptability in graphs with varying heterophily degrees. Inspired by those facts, we design a novel adaptive Krylov subspace approach to optimize polynomial bases with provable controllability over the graph spectrum so as to adapt various heterophily graphs. Subsequently, we propose AdaptKry, an optimized polynomial graph filter utilizing bases from the adaptive Krylov subspaces. Meanwhile, in light of the diverse spectral properties of complex graphs, we extend AdaptKry by leveraging multiple adaptive Krylov bases without incurring extra training costs. As a consequence, extended AdaptKry is able to capture the intricate characteristics of graphs and provide insights into their inherent complexity. We conduct extensive experiments across a series of real-world datasets. The experimental results demonstrate the superior filtering capability of AdaptKry, as well as the optimized efficacy of the adaptive Krylov basis. △ Less

Submitted 20 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.17375 [pdf, other]

Impact of Computation in Integral Reinforcement Learning for Continuous-Time Control

Authors: Wenhan Cao, Wei Pan

Abstract: Integral reinforcement learning (IntRL) demands the precise computation of the utility function's integral at its policy evaluation (PEV) stage. This is achieved through quadrature rules, which are weighted sums of utility functions evaluated from state samples obtained in discrete time. Our research reveals a critical yet underexplored phenomenon: the choice of the computational method -- in this… ▽ More Integral reinforcement learning (IntRL) demands the precise computation of the utility function's integral at its policy evaluation (PEV) stage. This is achieved through quadrature rules, which are weighted sums of utility functions evaluated from state samples obtained in discrete time. Our research reveals a critical yet underexplored phenomenon: the choice of the computational method -- in this case, the quadrature rule -- can significantly impact control performance. This impact is traced back to the fact that computational errors introduced in the PEV stage can affect the policy iteration's convergence behavior, which in turn affects the learned controller. To elucidate how computation impacts control, we draw a parallel between IntRL's policy iteration and Newton's method applied to the Hamilton-Jacobi-Bellman equation. In this light, computational error in PEV manifests as an extra error term in each iteration of Newton's method, with its upper bound proportional to the computational error. Further, we demonstrate that when the utility function resides in a reproducing kernel Hilbert space (RKHS), the optimal quadrature is achievable by employing Bayesian quadrature with the RKHS-inducing kernel function. We prove that the local convergence rates for IntRL using the trapezoidal rule and Bayesian quadrature with a Matérn kernel to be $O(N^{-2})$ and $O(N^{-b})$, where $N$ is the number of evenly-spaced samples and $b$ is the Matérn kernel's smoothness parameter. These theoretical findings are finally validated by two canonical control tasks. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.09575 [pdf, other]

Analyzing the Impact of Computation in Adaptive Dynamic Programming for Stochastic LQR Problem

Authors: Wenhan Cao, Alexandre Capone, Sandra Hirche, Wei Pan

Abstract: Adaptive dynamic programming (ADP) for stochastic linear quadratic regulation (LQR) demands the precise computation of stochastic integrals during policy iteration (PI). In a fully model-free problem setting, this computation can only be approximated by state samples collected at discrete time points using computational methods such as the canonical Euler-Maruyama method. Our research reveals a cr… ▽ More Adaptive dynamic programming (ADP) for stochastic linear quadratic regulation (LQR) demands the precise computation of stochastic integrals during policy iteration (PI). In a fully model-free problem setting, this computation can only be approximated by state samples collected at discrete time points using computational methods such as the canonical Euler-Maruyama method. Our research reveals a critical phenomenon: the sampling period can significantly impact control performance. This impact is due to the fact that computational errors introduced in each step of PI can significantly affect the algorithm's convergence behavior, which in turn influences the resulting control policy. We draw a parallel between PI and Newton's method applied to the Ricatti equation to elucidate how the computation impacts control. In this light, the computational error in each PI step manifests itself as an extra error term in each step of Newton's method, with its upper bound proportional to the computational error. Furthermore, we demonstrate that the convergence rate for ADP in stochastic LQR problems using the Euler-Maruyama method is O(h), with h being the sampling period. A sensorimotor control task finally validates these theoretical findings. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2312.03324 [pdf]

Lightweight Speaker Verification Using Transformation Module with Feature Partition and Fusion

Authors: Yanxiong Li, Zhongjie Jiang, Qisheng Huang, Wenchang Cao, Jialong Li

Abstract: Although many efforts have been made on decreasing the model complexity for speaker verification, it is still challenging to deploy speaker verification systems with satisfactory result on low-resource terminals. We design a transformation module that performs feature partition and fusion to implement lightweight speaker verification. The transformation module consists of multiple simple but effec… ▽ More Although many efforts have been made on decreasing the model complexity for speaker verification, it is still challenging to deploy speaker verification systems with satisfactory result on low-resource terminals. We design a transformation module that performs feature partition and fusion to implement lightweight speaker verification. The transformation module consists of multiple simple but effective operations, such as convolution, pooling, mean, concatenation, normalization, and element-wise summation. It works in a plug-and-play way, and can be easily implanted into a wide variety of models to reduce the model complexity while maintaining the model error. First, the input feature is split into several low-dimensional feature subsets for decreasing the model complexity. Then, each feature subset is updated by fusing it with the inter-feature-subsets correlational information to enhance its representational capability. Finally, the updated feature subsets are independently fed into the block (one or several layers) of the model for further processing. The features that are output from current block of the model are processed according to the steps above before they are fed into the next block of the model. Experimental data are selected from two public speech corpora (namely VoxCeleb1 and VoxCeleb2). Results show that implanting the transformation module into three models (namely AMCRN, ResNet34, and ECAPA-TDNN) for speaker verification slightly increases the model error and significantly decreases the model complexity. Our proposed method outperforms baseline methods on the whole in memory requirement and computational complexity with lower equal error rate. It also generalizes well across truncated segments with various lengths. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: 12 pages, 5 figures, 6 tables; accepted for publication in IEEE-ACM TASLP

arXiv:2306.02054 [pdf]

Low-Complexity Acoustic Scene Classification Using Data Augmentation and Lightweight ResNet

Authors: Yanxiong Li, Wenchang Cao, Wei Xie, Qisheng Huang, Wenfeng Pang, Qianhua He

Abstract: We present a work on low-complexity acoustic scene classification (ASC) with multiple devices, namely the subtask A of Task 1 of the DCASE2021 challenge. This subtask focuses on classifying audio samples of multiple devices with a low-complexity model, where two main difficulties need to be overcome. First, the audio samples are recorded by different devices, and there is mismatch of recording dev… ▽ More We present a work on low-complexity acoustic scene classification (ASC) with multiple devices, namely the subtask A of Task 1 of the DCASE2021 challenge. This subtask focuses on classifying audio samples of multiple devices with a low-complexity model, where two main difficulties need to be overcome. First, the audio samples are recorded by different devices, and there is mismatch of recording devices in audio samples. We reduce the negative impact of the mismatch of recording devices by using some effective strategies, including data augmentation (e.g., mix-up, spectrum correction, pitch shift), usages of multi-patch network structure and channel attention. Second, the model size should be smaller than a threshold (e.g., 128 KB required by the DCASE2021 challenge). To meet this condition, we adopt a ResNet with both depthwise separable convolution and channel attention as the backbone network, and perform model compression. In summary, we propose a low-complexity ASC method using data augmentation and a lightweight ResNet. Evaluated on the official development and evaluation datasets, our method obtains classification accuracy scores of 71.6% and 66.7%, respectively; and obtains Log-loss scores of 1.038 and 1.136, respectively. Our final model size is 110.3 KB which is smaller than the maximum of 128 KB. △ Less

Submitted 3 June, 2023; originally announced June 2023.

Comments: 5 pages, 5 figures, 4 tables. Accepted for publication in the 16th IEEE International Conference on Signal Processing (IEEE ICSP)

arXiv:2306.02053 [pdf]

Few-shot Class-incremental Audio Classification Using Stochastic Classifier

Authors: Yanxiong Li, Wenchang Cao, Jialong Li, Wei Xie, Qianhua He

Abstract: It is generally assumed that number of classes is fixed in current audio classification methods, and the model can recognize pregiven classes only. When new classes emerge, the model needs to be retrained with adequate samples of all classes. If new classes continually emerge, these methods will not work well and even infeasible. In this study, we propose a method for fewshot class-incremental aud… ▽ More It is generally assumed that number of classes is fixed in current audio classification methods, and the model can recognize pregiven classes only. When new classes emerge, the model needs to be retrained with adequate samples of all classes. If new classes continually emerge, these methods will not work well and even infeasible. In this study, we propose a method for fewshot class-incremental audio classification, which continually recognizes new classes and remember old ones. The proposed model consists of an embedding extractor and a stochastic classifier. The former is trained in base session and frozen in incremental sessions, while the latter is incrementally expanded in all sessions. Two datasets (NS-100 and LS-100) are built by choosing samples from audio corpora of NSynth and LibriSpeech, respectively. Results show that our method exceeds four baseline ones in average accuracy and performance drop** rate. Code is at https://github.com/vinceasvp/meta-sc. △ Less

Submitted 3 June, 2023; originally announced June 2023.

Comments: 5 pages, 3 figures, 4 tables. Accepted for publication in INTERSPEECH 2023

arXiv:2306.00426 [pdf]

Speaker verification using attentive multi-scale convolutional recurrent network

Authors: Yanxiong Li, Zhongjie Jiang, Wenchang Cao, Qisheng Huang

Abstract: In this paper, we propose a speaker verification method by an Attentive Multi-scale Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local spatial information and global sequential information from the input speech recordings. In the proposed method, logarithm Mel spectrum is extracted from each speech recording and then fed to the proposed AMCRN for learning speaker em… ▽ More In this paper, we propose a speaker verification method by an Attentive Multi-scale Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local spatial information and global sequential information from the input speech recordings. In the proposed method, logarithm Mel spectrum is extracted from each speech recording and then fed to the proposed AMCRN for learning speaker embedding. Afterwards, the learned speaker embedding is fed to the back-end classifier (such as cosine similarity metric) for scoring in the testing stage. The proposed method is compared with state-of-the-art methods for speaker verification. Experimental data are three public datasets that are selected from two large-scale speech corpora (VoxCeleb1 and VoxCeleb2). Experimental results show that our method exceeds baseline methods in terms of equal error rate and minimal detection cost function, and has advantages over most of baseline methods in terms of computational complexity and memory requirement. In addition, our method generalizes well across truncated speech segments with different durations, and the speaker embedding learned by the proposed AMCRN has stronger generalization ability across two back-end classifiers. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 21 pages, 6 figures, 8 tables. Accepted for publication in Applied Soft Computing

arXiv:2305.19541 [pdf]

doi 10.1109/TMM.2023.3253301

Few-Shot Speaker Identification Using Lightweight Prototypical Network with Feature Grou** and Interaction

Authors: Yanxiong Li, Hao Chen, Wenchang Cao, Qisheng Huang, Qianhua He

Abstract: Existing methods for few-shot speaker identification (FSSI) obtain high accuracy, but their computational complexities and model sizes need to be reduced for lightweight applications. In this work, we propose a FSSI method using a lightweight prototypical network with the final goal to implement the FSSI on intelligent terminals with limited resources, such as smart watches and smart speakers. In… ▽ More Existing methods for few-shot speaker identification (FSSI) obtain high accuracy, but their computational complexities and model sizes need to be reduced for lightweight applications. In this work, we propose a FSSI method using a lightweight prototypical network with the final goal to implement the FSSI on intelligent terminals with limited resources, such as smart watches and smart speakers. In the proposed prototypical network, an embedding module is designed to perform feature grou** for reducing the memory requirement and computational complexity, and feature interaction for enhancing the representational ability of the learned speaker embedding. In the proposed embedding module, audio feature of each speech sample is split into several low-dimensional feature subsets that are transformed by a recurrent convolutional block in parallel. Then, the operations of averaging, addition, concatenation, element-wise summation and statistics pooling are sequentially executed to learn a speaker embedding for each speech sample. The recurrent convolutional block consists of a block of bidirectional long short-term memory, and a block of de-redundancy convolution in which feature grou** and interaction are conducted too. Our method is compared to baseline methods on three datasets that are selected from three public speech corpora (VoxCeleb1, VoxCeleb2, and LibriSpeech). The results show that our method obtains higher accuracy under several conditions, and has advantages over all baseline methods in computational complexity and model size. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: 12 pages, 4 figures, 12 tables. Accepted for publication in IEEE TMM

arXiv:2305.19539 [pdf]

doi 10.1109/TMM.2023.3280011

Few-shot Class-incremental Audio Classification Using Dynamically Expanded Classifier with Self-attention Modified Prototypes

Authors: Yanxiong Li, Wenchang Cao, Wei Xie, Jialong Li, Emmanouil Benetos

Abstract: Most existing methods for audio classification assume that the vocabulary of audio classes to be classified is fixed. When novel (unseen) audio classes appear, audio classification systems need to be retrained with abundant labeled samples of all audio classes for recognizing base (initial) and novel audio classes. If novel audio classes continue to appear, the existing methods for audio classific… ▽ More Most existing methods for audio classification assume that the vocabulary of audio classes to be classified is fixed. When novel (unseen) audio classes appear, audio classification systems need to be retrained with abundant labeled samples of all audio classes for recognizing base (initial) and novel audio classes. If novel audio classes continue to appear, the existing methods for audio classification will be inefficient and even infeasible. In this work, we propose a method for few-shot class-incremental audio classification, which can continually recognize novel audio classes without forgetting old ones. The framework of our method mainly consists of two parts: an embedding extractor and a classifier, and their constructions are decoupled. The embedding extractor is the backbone of a ResNet based network, which is frozen after construction by a training strategy using only samples of base audio classes. However, the classifier consisting of prototypes is expanded by a prototype adaptation network with few samples of novel audio classes in incremental sessions. Labeled support samples and unlabeled query samples are used to train the prototype adaptation network and update the classifier, since they are informative for audio classification. Three audio datasets, named NSynth-100, FSC-89 and LS-100 are built by choosing samples from audio corpora of NSynth, FSD-MIX-CLIP and LibriSpeech, respectively. Results show that our method exceeds baseline methods in average accuracy and performance drop** rate. In addition, it is competitive compared to baseline methods in computational complexity and memory requirement. The code for our method is given at https://github.com/vinceasvp/FCAC. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: 13 pages, 8 figures, 12 tables. Accepted for publication in IEEE TMM

arXiv:2305.18045 [pdf, ps, other]

Few-shot Class-incremental Audio Classification Using Adaptively-refined Prototypes

Authors: Wei Xie, Yanxiong Li, Qianhua He, Wenchang Cao, Tuomas Virtanen

Abstract: New classes of sounds constantly emerge with a few samples, making it challenging for models to adapt to dynamic acoustic environments. This challenge motivates us to address the new problem of few-shot class-incremental audio classification. This study aims to enable a model to continuously recognize new classes of sounds with a few training samples of new classes while remembering the learned on… ▽ More New classes of sounds constantly emerge with a few samples, making it challenging for models to adapt to dynamic acoustic environments. This challenge motivates us to address the new problem of few-shot class-incremental audio classification. This study aims to enable a model to continuously recognize new classes of sounds with a few training samples of new classes while remembering the learned ones. To this end, we propose a method to generate discriminative prototypes and use them to expand the model's classifier for recognizing sounds of new and learned classes. The model is first trained with a random episodic training strategy, and then its backbone is used to generate the prototypes. A dynamic relation projection module refines the prototypes to enhance their discriminability. Results on two datasets (derived from the corpora of Nsynth and FSD-MIX-CLIPS) show that the proposed method exceeds three state-of-the-art methods in average accuracy and performance drop** rate. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: 5 pages,2 figures, Accepted by Interspeech 2023

arXiv:2302.10959 [pdf, other]

Dealing with Collinearity in Large-Scale Linear System Identification Using Gaussian Regression

Authors: Wenqi Cao, Gianluigi Pillonetto

Abstract: Many problems arising in control require the determination of a mathematical model of the application. This has often to be performed starting from input-output data, leading to a task known as system identification in the engineering literature. One emerging topic in this field is estimation of networks consisting of several interconnected dynamic systems. We consider the linear setting assuming… ▽ More Many problems arising in control require the determination of a mathematical model of the application. This has often to be performed starting from input-output data, leading to a task known as system identification in the engineering literature. One emerging topic in this field is estimation of networks consisting of several interconnected dynamic systems. We consider the linear setting assuming that system outputs are the result of many correlated inputs, hence making system identification severely ill-conditioned. This is a scenario often encountered when modeling complex cybernetics systems composed by many sub-units with feedback and algebraic loops. We develop a strategy cast in a Bayesian regularization framework where any impulse response is seen as realization of a zero-mean Gaussian process. Any covariance is defined by the so called stable spline kernel which includes information on smooth exponential decay. We design a novel Markov chain Monte Carlo scheme able to reconstruct the impulse responses posterior by efficiently dealing with collinearity. Our scheme relies on a variation of the Gibbs sampling technique: beyond considering blocks forming a partition of the parameter space, some other (overlap**) blocks are also updated on the basis of the level of collinearity of the system inputs. Theoretical properties of the algorithm are studied obtaining its convergence rate. Numerical experiments are included using systems containing hundreds of impulse responses and highly correlated inputs. △ Less

Submitted 28 February, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: arXiv admin note: text overlap with arXiv:2203.13633

arXiv:2210.02166 [pdf, other]

Robust Bayesian Inference for Moving Horizon Estimation

Authors: Wenhan Cao, Chang Liu, Zhiqian Lan, Shengbo Eben Li, Wei Pan, Angelo Alessandri

Abstract: The accuracy of moving horizon estimation (MHE) suffers significantly in the presence of measurement outliers. Existing methods address this issue by treating measurements leading to large MHE cost function values as outliers, which are subsequently discarded. This strategy, achieved through solving combinatorial optimization problems, is confined to linear systems to guarantee computational tract… ▽ More The accuracy of moving horizon estimation (MHE) suffers significantly in the presence of measurement outliers. Existing methods address this issue by treating measurements leading to large MHE cost function values as outliers, which are subsequently discarded. This strategy, achieved through solving combinatorial optimization problems, is confined to linear systems to guarantee computational tractability and stability. Contrasting these heuristic solutions, our work reexamines MHE from a Bayesian perspective, unveils the fundamental issue of its lack of robustness: MHE's sensitivity to outliers results from its reliance on the Kullback-Leibler (KL) divergence, where both outliers and inliers are equally considered. To tackle this problem, we propose a robust Bayesian inference framework for MHE, integrating a robust divergence measure to reduce the impact of outliers. In particular, the proposed approach prioritizes the fitting of uncontaminated data and lowers the weight of contaminated ones, instead of directly discarding all potentially contaminated measurements, which may lead to undesirable removal of uncontaminated data. A tuning parameter is incorporated into the framework to adjust the robustness degree to outliers. Notably, the classical MHE can be interpreted as a special case of the proposed approach as the parameter converges to zero. In addition, our method involves only minor modification to the classical MHE stage cost, thus avoiding the high computational complexity associated with previous outlier-robust methods and inherently suitable for nonlinear systems. Most importantly, our method provides robustness and stability guarantees, which are often missing in other outlier-robust Bayes filters. The effectiveness of the proposed method is demonstrated on simulations subject to outliers following different distributions, as well as on physical experiment data. △ Less

Submitted 2 October, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: 17 pages

arXiv:2209.05042 [pdf, other]

doi 10.1109/CDC51059.2022.9992503

On the Optimization Landscape of Dynamic Output Feedback: A Case Study for Linear Quadratic Regulator

Authors: **gliang Duan, Wenhan Cao, Yang Zheng, Lin Zhao

Abstract: The convergence of policy gradient algorithms in reinforcement learning hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies… ▽ More The convergence of policy gradient algorithms in reinforcement learning hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies (controllers). We investigate the more challenging case of dynamic output-feedback policies for linear quadratic regulation (abbreviated as dLQR), which is prevalent in practice but has a rather complicated optimization landscape. We first show how the dLQR cost varies with the coordinate transformation of the dynamic controller and then derive the optimal transformation for a given observable stabilizing controller. At the core of our results is the uniqueness of the stationary point of dLQR when it is observable, which is in a concise form of an observer-based controller with the optimal similarity transformation. These results shed light on designing efficient algorithms for general decision-making problems with partially observed information. △ Less

Submitted 29 October, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2201.09598

Journal ref: 2022 IEEE 61st Conference on Decision and Control (CDC)

arXiv:2208.04525

Using Large Context for Kidney Multi-Structure Segmentation from CTA Images

Authors: Weiwei Cao, Yuzhu Cao

Abstract: Accurate and automated segmentation of multi-structure (i.e., kidneys, renal tu-mors, arteries, and veins) from 3D CTA is one of the most important tasks for surgery-based renal cancer treatment (e.g., laparoscopic partial nephrectomy). This paper briefly presents the main technique details of the multi-structure seg-mentation method in MICCAI 2022 KIPA challenge. The main contribution of this pap… ▽ More Accurate and automated segmentation of multi-structure (i.e., kidneys, renal tu-mors, arteries, and veins) from 3D CTA is one of the most important tasks for surgery-based renal cancer treatment (e.g., laparoscopic partial nephrectomy). This paper briefly presents the main technique details of the multi-structure seg-mentation method in MICCAI 2022 KIPA challenge. The main contribution of this paper is that we design the 3D UNet with the large context information cap-turing capability. Our method ranked eighth on the MICCAI 2022 KIPA chal-lenge open testing dataset with a mean position of 8.2. Our code and trained models are publicly available at https://github.com/fengjiejiejiejie/kipa22_nnunet. △ Less

Submitted 28 February, 2024; v1 submitted 8 August, 2022; originally announced August 2022.

Comments: The paper lacks research value

arXiv:2208.02406 [pdf]

Domestic Activity Clustering from Audio via Depthwise Separable Convolutional Autoencoder Network

Authors: Yanxiong Li, Wenchang Cao, Konstantinos Drossos, Tuomas Virtanen

Abstract: Automatic estimation of domestic activities from audio can be used to solve many problems, such as reducing the labor cost for nursing the elderly people. This study focuses on solving the problem of domestic activity clustering from audio. The target of domestic activity clustering is to cluster audio clips which belong to the same category of domestic activity into one cluster in an unsupervised… ▽ More Automatic estimation of domestic activities from audio can be used to solve many problems, such as reducing the labor cost for nursing the elderly people. This study focuses on solving the problem of domestic activity clustering from audio. The target of domestic activity clustering is to cluster audio clips which belong to the same category of domestic activity into one cluster in an unsupervised way. In this paper, we propose a method of domestic activity clustering using a depthwise separable convolutional autoencoder network. In the proposed method, initial embeddings are learned by the depthwise separable convolutional autoencoder, and a clustering-oriented loss is designed to jointly optimize embedding refinement and cluster assignment. Different methods are evaluated on a public dataset (a derivative of the SINS dataset) used in the challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) in 2018. Our method obtains the normalized mutual information (NMI) score of 54.46%, and the clustering accuracy (CA) score of 63.64%, and outperforms state-of-the-art methods in terms of NMI and CA. In addition, both computational complexity and memory requirement of our method is lower than that of previous deep-model-based methods. Codes: https://github.com/vinceasvp/domestic-activity-clustering-from-audio △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: 6 pages, 5 figures, 4 tables. Accepted by IEEE MMSP 2022

arXiv:2207.07767 [pdf, other]

Strategic Asset Allocation with Illiquid Alternatives

Authors: Eric Luxenberg, Stephen Boyd, Mykel Kochenderfer, Misha van Beek, Wen Cao, Steven Diamond, Alex Ulitsky, Kunal Menda, Vidy Vairavamurthy

Abstract: We address the problem of strategic asset allocation (SAA) with portfolios that include illiquid alternative asset classes. The main challenge in portfolio construction with illiquid asset classes is that we do not have direct control over our positions, as we do in liquid asset classes. Instead we can only make commitments; the position builds up over time as capital calls come in, and reduces ov… ▽ More We address the problem of strategic asset allocation (SAA) with portfolios that include illiquid alternative asset classes. The main challenge in portfolio construction with illiquid asset classes is that we do not have direct control over our positions, as we do in liquid asset classes. Instead we can only make commitments; the position builds up over time as capital calls come in, and reduces over time as distributions occur, neither of which the investor has direct control over. The effect on positions of our commitments is subject to a delay, typically of a few years, and is also unknown or stochastic. A further challenge is the requirement that we can meet the capital calls, with very high probability, with our liquid assets. We formulate the illiquid dynamics as a random linear system, and propose a convex optimization based model predictive control (MPC) policy for allocating liquid assets and making new illiquid commitments in each period. Despite the challenges of time delay and uncertainty, we show that this policy attains performance surprisingly close to a fictional setting where we pretend the illiquid asset classes are completely liquid, and we can arbitrarily and immediately adjust our positions. In this paper we focus on the growth problem, with no external liabilities or income, but the method is readily extended to handle this case. △ Less

Submitted 15 July, 2022; originally announced July 2022.

arXiv:2205.14029 [pdf]

Lesion classification by model-based feature extraction: A differential affine invariant model of soft tissue elasticity

Authors: Weiguo Cao, Marc J. Pomeroy, Zhengrong Liang, Yongfeng Gao, Yongyi Shi, Jiaxing Tan, Fangfang Han, **g Wang, Jianhua Ma, Hongbin Lu, Almas F. Abbasi, Perry J. Pickhardt

Abstract: The elasticity of soft tissues has been widely considered as a characteristic property to differentiate between healthy and vicious tissues and, therefore, motivated several elasticity imaging modalities, such as Ultrasound Elastography, Magnetic Resonance Elastography, and Optical Coherence Elastography. This paper proposes an alternative approach of modeling the elasticity using Computed Tomogra… ▽ More The elasticity of soft tissues has been widely considered as a characteristic property to differentiate between healthy and vicious tissues and, therefore, motivated several elasticity imaging modalities, such as Ultrasound Elastography, Magnetic Resonance Elastography, and Optical Coherence Elastography. This paper proposes an alternative approach of modeling the elasticity using Computed Tomography (CT) imaging modality for model-based feature extraction machine learning (ML) differentiation of lesions. The model describes a dynamic non-rigid (or elastic) deformation in differential manifold to mimic the soft tissues elasticity under wave fluctuation in vivo. Based on the model, three local deformation invariants are constructed by two tensors defined by the first and second order derivatives from the CT images and used to generate elastic feature maps after normalization via a novel signal suppression method. The model-based elastic image features are extracted from the feature maps and fed to machine learning to perform lesion classifications. Two pathologically proven image datasets of colon polyps (44 malignant and 43 benign) and lung nodules (46 malignant and 20 benign) were used to evaluate the proposed model-based lesion classification. The outcomes of this modeling approach reached the score of area under the curve of the receiver operating characteristics of 94.2 % for the polyps and 87.4 % for the nodules, resulting in an average gain of 5 % to 30 % over ten existing state-of-the-art lesion classification methods. The gains by modeling tissue elasticity for ML differentiation of lesions are striking, indicating the great potential of exploring the modeling strategy to other tissue properties for ML differentiation of lesions. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: 12 pages, 4 figures, 3 tables

arXiv:2204.11382

Real-time Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Authors: Abdul Rehman, Zhen-Tao Liu, Min Wu, Wei-Hua Cao, Cheng-Shan Jiang

Abstract: Speech emotion recognition systems have high prediction latency because of the high computational requirements for deep learning models and low generalizability mainly because of the poor reliability of emotional measurements across multiple corpora. To solve these problems, we present a speech emotion recognition system based on a reductionist approach of decomposing and analyzing syllable-level… ▽ More Speech emotion recognition systems have high prediction latency because of the high computational requirements for deep learning models and low generalizability mainly because of the poor reliability of emotional measurements across multiple corpora. To solve these problems, we present a speech emotion recognition system based on a reductionist approach of decomposing and analyzing syllable-level features. Mel-spectrogram of an audio stream is decomposed into syllable-level components, which are then analyzed to extract statistical features. The proposed method uses formant attention, noise-gate filtering, and rolling normalization contexts to increase feature processing speed and tolerance to adversity. A set of syllable-level formant features is extracted and fed into a single hidden layer neural network that makes predictions for each syllable as opposed to the conventional approach of using a sophisticated deep learner to make sentence-wide predictions. The syllable level predictions help to achieve the real-time latency and lower the aggregated error in utterance level cross-corpus predictions. The experiments on IEMOCAP (IE), MSP-Improv (MI), and RAVDESS (RA) databases show that the method archives real-time latency while predicting with state-of-the-art cross-corpus unweighted accuracy of 47.6% for IE to MI and 56.2% for MI to IE. △ Less

Submitted 22 February, 2023; v1 submitted 24 April, 2022; originally announced April 2022.

Comments: Significant revisions

ACM Class: I.5.2; I.5.5

arXiv:2204.11180 [pdf]

Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention

Authors: Yanxiong Li, Wucheng Wang, Hao Chen, Wenchang Cao, Wei Li, Qianhua He

Abstract: Although few-shot learning has attracted much attention from the fields of image and audio classification, few efforts have been made on few-shot speaker identification. In the task of few-shot learning, overfitting is a tough problem mainly due to the mismatch between training and testing conditions. In this paper, we propose a few-shot speaker identification method which can alleviate the overfi… ▽ More Although few-shot learning has attracted much attention from the fields of image and audio classification, few efforts have been made on few-shot speaker identification. In the task of few-shot learning, overfitting is a tough problem mainly due to the mismatch between training and testing conditions. In this paper, we propose a few-shot speaker identification method which can alleviate the overfitting problem. In the proposed method, the model of a depthwise separable convolutional network with channel attention is trained with a prototypical loss function. Experimental datasets are extracted from three public speech corpora: Aishell-2, VoxCeleb1 and TORGO. Experimental results show that the proposed method exceeds state-of-the-art methods for few-shot speaker identification in terms of accuracy and F-score. △ Less

Submitted 23 April, 2022; originally announced April 2022.

Comments: Accepted by Odyssey 2022 (The Speaker and Language Recognition Workshop 2022, Bei**g, China)

arXiv:2204.02857 [pdf, other]

Primal-dual Estimator Learning: an Offline Constrained Moving Horizon Estimation Method with Feasibility and Near-optimality Guarantees

Authors: Wenhan Cao, **gliang Duan, Shengbo Eben Li, Chen Chen, Chang Liu, Yu Wang

Abstract: This paper proposes a primal-dual framework to learn a stable estimator for linear constrained estimation problems leveraging the moving horizon approach. To avoid the online computational burden in most existing methods, we learn a parameterized function offline to approximate the primal estimate. Meanwhile, a dual estimator is trained to check the suboptimality of the primal estimator during exe… ▽ More This paper proposes a primal-dual framework to learn a stable estimator for linear constrained estimation problems leveraging the moving horizon approach. To avoid the online computational burden in most existing methods, we learn a parameterized function offline to approximate the primal estimate. Meanwhile, a dual estimator is trained to check the suboptimality of the primal estimator during execution time. Both the primal and dual estimators are learned from data using supervised learning techniques, and the explicit sample size is provided, which enables us to guarantee the quality of each learned estimator in terms of feasibility and optimality. This in turn allows us to bound the probability of the learned estimator being infeasible or suboptimal. Furthermore, we analyze the stability of the resulting estimator with a bounded error in the minimization of the cost function. Since our algorithm does not require the solution of an optimization problem during runtime, state estimates can be generated online almost instantly. Simulation results are presented to show the accuracy and time efficiency of the proposed framework compared to online optimization of moving horizon estimation and Kalman filter. To the best of our knowledge, this is the first learning-based state estimator with feasibility and near-optimality guarantees for linear constrained systems. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2203.13633 [pdf, other]

Dealing with collinearity in large-scale linear system identification using Bayesian regularization

Authors: Wenqi Cao, Gianluigi Pillonetto

Abstract: We consider the identification of large-scale linear and stable dynamic systems whose outputs may be the result of many correlated inputs. Hence, severe ill-conditioning may affect the estimation problem. This is a scenario often arising when modeling complex physical systems given by the interconnection of many sub-units where feedback and algebraic loops can be encountered. We develop a strategy… ▽ More We consider the identification of large-scale linear and stable dynamic systems whose outputs may be the result of many correlated inputs. Hence, severe ill-conditioning may affect the estimation problem. This is a scenario often arising when modeling complex physical systems given by the interconnection of many sub-units where feedback and algebraic loops can be encountered. We develop a strategy based on Bayesian regularization where any impulse response is modeled as the realization of a zero-mean Gaussian process. The stable spline covariance is used to include information on smooth exponential decay of the impulse responses. We then design a new Markov chain Monte Carlo scheme that deals with collinearity and is able to efficiently reconstruct the posterior of the impulse responses. It is based on a variation of Gibbs sampling which updates possibly overlap** blocks of the parameter space on the basis of the level of collinearity affecting the different inputs. Numerical experiments are included to test the goodness of the approach where hundreds of impulse responses form the system and inputs correlation may be very high. △ Less

Submitted 2 September, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

Comments: 7 pages, 10 figures; Keywords: linear system identification, Bayesian regularization, MCMC, stable spline prior

arXiv:2203.04286 [pdf, other]

Proximal PanNet: A Model-Based Deep Network for Pansharpening

Authors: Xiangyong Cao, Yang Chen, Wenfei Cao

Abstract: Recently, deep learning techniques have been extensively studied for pansharpening, which aims to generate a high resolution multispectral (HRMS) image by fusing a low resolution multispectral (LRMS) image with a high resolution panchromatic (PAN) image. However, existing deep learning-based pansharpening methods directly learn the map** from LRMS and PAN to HRMS. These network architectures alw… ▽ More Recently, deep learning techniques have been extensively studied for pansharpening, which aims to generate a high resolution multispectral (HRMS) image by fusing a low resolution multispectral (LRMS) image with a high resolution panchromatic (PAN) image. However, existing deep learning-based pansharpening methods directly learn the map** from LRMS and PAN to HRMS. These network architectures always lack sufficient interpretability, which limits further performance improvements. To alleviate this issue, we propose a novel deep network for pansharpening by combining the model-based methodology with the deep learning method. Firstly, we build an observation model for pansharpening using the convolutional sparse coding (CSC) technique and design a proximal gradient algorithm to solve this model. Secondly, we unfold the iterative algorithm into a deep network, dubbed as Proximal PanNet, by learning the proximal operators using convolutional neural networks. Finally, all the learnable modules can be automatically learned in an end-to-end manner. Experimental results on some benchmark datasets show that our network performs better than other advanced methods both quantitatively and qualitatively. △ Less

Submitted 12 February, 2022; originally announced March 2022.

Comments: 9 pages, 6 figures

arXiv:2201.09598 [pdf, other]

doi 10.1109/TAC.2023.3275732

On the Optimization Landscape of Dynamic Output Feedback Linear Quadratic Control

Authors: **gliang Duan, Wenhan Cao, Yang Zheng, Lin Zhao

Abstract: The convergence of policy gradient algorithms hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies (controllers). We investig… ▽ More The convergence of policy gradient algorithms hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies (controllers). We investigate the more challenging case of dynamic output-feedback policies for linear quadratic regulation (abbreviated as dLQR), which is prevalent in practice but has a rather complicated optimization landscape. We first show how the dLQR cost varies with the coordinate transformation of the dynamic controller and then derive the optimal transformation for a given observable stabilizing controller. One of our core results is the uniqueness of the stationary point of dLQR when it is observable, which provides an optimality certificate for solving dynamic controllers using policy gradient methods. Moreover, we establish conditions under which dLQR and linear quadratic Gaussian control are equivalent, thus providing a unified viewpoint of optimal control of both deterministic and stochastic linear systems. These results further shed light on designing policy gradient algorithms for more general decision-making problems with partially observed information. △ Less

Submitted 29 October, 2023; v1 submitted 24 January, 2022; originally announced January 2022.

Journal ref: IEEE Transactions on Automatic Control (full paper), 2023

arXiv:2111.10899 [pdf, other]

Identification of Low Rank Vector Processes

Authors: Wenqi Cao, Giorgio Picci, Anders Lindquist

Abstract: We study modeling and identification of stationary processes with a spectral density matrix of low rank. Equivalently, we consider processes having an innovation of reduced dimension for which Prediction Error Methods (PEM) algorithms are not directly applicable. We show that these processes admit a special feedback structure with a deterministic feedback channel which can be used to split the ide… ▽ More We study modeling and identification of stationary processes with a spectral density matrix of low rank. Equivalently, we consider processes having an innovation of reduced dimension for which Prediction Error Methods (PEM) algorithms are not directly applicable. We show that these processes admit a special feedback structure with a deterministic feedback channel which can be used to split the identification in two steps, one of which can be based on standard algorithms while the other is based on a deterministic least squares fit. Identifiability of the feedback system is analyzed and a unique identifiable structure is characterized. Simulations show that the proposed procedure works well in some simple examples. △ Less

Submitted 13 January, 2023; v1 submitted 21 November, 2021; originally announced November 2021.

Comments: arXiv admin note: text overlap with arXiv:2012.05004

arXiv:2109.11814 [pdf, other]

Modeling of Low Rank Time Series

Authors: Wenqi Cao, Anders Lindquist, Giorgio Picci

Abstract: Rank-deficient stationary stochastic vector processes are present in many problems in network theory and dynamic factor analysis. In this paper we study hidden dynamical relations between the components of a discrete-time stochastic vector process and investigate their properties with respect to stability and causality. More specifically, we construct transfer functions with a full-rank input proc… ▽ More Rank-deficient stationary stochastic vector processes are present in many problems in network theory and dynamic factor analysis. In this paper we study hidden dynamical relations between the components of a discrete-time stochastic vector process and investigate their properties with respect to stability and causality. More specifically, we construct transfer functions with a full-rank input process formed from selected components of the given vector process and having a vector process of the remaining components as output. An important question, which we answer in the negative, is whether it is always possible to find such a deterministic relation that is stable. If it is unstable, there must be feedback from output to input ensuring that stationarity is maintained. This leads to connections to robust control. We also show how our results could be used to investigate the structure of dynamic network models and the latent low-rank stochastic process in a dynamic factor model. △ Less

Submitted 13 April, 2023; v1 submitted 24 September, 2021; originally announced September 2021.

Comments: In previous versions we assumed the spectral factor to be minimum-phase, which is not required actually

arXiv:2108.05118 [pdf]

Capture Uncertainties in Deep Neural Networks for Safe Operation of Autonomous Driving Vehicles

Authors: Liuhui Ding, Dachuan Li, Bowen Liu, Wenxing Lan, Bing Bai, Qi Hao, Weipeng Cao, Ke Pei

Abstract: Uncertainties in Deep Neural Network (DNN)-based perception and vehicle's motion pose challenges to the development of safe autonomous driving vehicles. In this paper, we propose a safe motion planning framework featuring the quantification and propagation of DNN-based perception uncertainties and motion uncertainties. Contributions of this work are twofold: (1) A Bayesian Deep Neural network mode… ▽ More Uncertainties in Deep Neural Network (DNN)-based perception and vehicle's motion pose challenges to the development of safe autonomous driving vehicles. In this paper, we propose a safe motion planning framework featuring the quantification and propagation of DNN-based perception uncertainties and motion uncertainties. Contributions of this work are twofold: (1) A Bayesian Deep Neural network model which detects 3D objects and quantitatively captures the associated aleatoric and epistemic uncertainties of DNNs; (2) An uncertainty-aware motion planning algorithm (PU-RRT) that accounts for uncertainties in object detection and ego-vehicle's motion. The proposed approaches are validated via simulated complex scenarios built in CARLA. Experimental results show that the proposed motion planning scheme can cope with uncertainties of DNN-based perception and vehicle motion, and improve the operational safety of autonomous vehicles while still achieving desirable efficiency. △ Less

Submitted 11 August, 2021; originally announced August 2021.

Comments: To appear in the 19th IEEE International Symposium on Parallel and Distributed Processing with Applications (IEEE ISPA 2021)

MSC Class: 68T40 ACM Class: I.2.9

arXiv:2107.11645 [pdf]

Dual-Attention Enhanced BDense-UNet for Liver Lesion Segmentation

Authors: Wenming Cao, Philip L. H. Yu, Gilbert C. S. Lui, Keith W. H. Chiu, Ho-Ming Cheng, Yanwen Fang, Man-Fung Yuen, Wai-Kay Seto

Abstract: In this work, we propose a new segmentation network by integrating DenseUNet and bidirectional LSTM together with attention mechanism, termed as DA-BDense-UNet. DenseUNet allows learning enough diverse features and enhancing the representative power of networks by regulating the information flow. Bidirectional LSTM is responsible to explore the relationships between the encoded features and the up… ▽ More In this work, we propose a new segmentation network by integrating DenseUNet and bidirectional LSTM together with attention mechanism, termed as DA-BDense-UNet. DenseUNet allows learning enough diverse features and enhancing the representative power of networks by regulating the information flow. Bidirectional LSTM is responsible to explore the relationships between the encoded features and the up-sampled features in the encoding and decoding paths. Meanwhile, we introduce attention gates (AG) into DenseUNet to diminish responses of unrelated background regions and magnify responses of salient regions progressively. Besides, the attention in bidirectional LSTM takes into account the contribution differences of the encoded features and the up-sampled features in segmentation improvement, which can in turn adjust proper weights for these two kinds of features. We conduct experiments on liver CT image data sets collected from multiple hospitals by comparing them with state-of-the-art segmentation models. Experimental results indicate that our proposed method DA-BDense-UNet has achieved comparative performance in terms of dice coefficient, which demonstrates its effectiveness. △ Less

Submitted 24 July, 2021; originally announced July 2021.

Comments: 9 pages, 3 figures

arXiv:2103.15735 [pdf, other]

High-Sensitivity Iodine Imaging by Combining Spectral CT Technologies

Authors: Matthew Tivnan, Grace Gang, Wenchao Cao, Nadav Shapira, Peter B. Noel, J. Webster Stayman

Abstract: Spectral CT offers enhanced material discrimination over single-energy systems and enables quantitative estimation of basis material density images. Water/iodine decomposition in contrast-enhanced CT is one of the most widespread applications of this technology in the clinic. However, low concentrations of iodine can be difficult to estimate accurately, limiting potential clinical applications and… ▽ More Spectral CT offers enhanced material discrimination over single-energy systems and enables quantitative estimation of basis material density images. Water/iodine decomposition in contrast-enhanced CT is one of the most widespread applications of this technology in the clinic. However, low concentrations of iodine can be difficult to estimate accurately, limiting potential clinical applications and/or raising injected contrast agent requirements. We seek high-sensitivity spectral CT system designs which minimize noise in water/iodine density estimates. In this work, we present a model-driven framework for spectral CT system design optimization to maximize material separability. We apply this tool to optimize the sensitivity spectra on a spectral CT test bench using a hybrid design which combines source kVp control and k-edge filtration. Following design optimization, we scanned a water/iodine phantom with the hybrid spectral CT system and performed dose-normalized comparisons to two single-technique designs which use only kVp control or only kedge filtration. The material decomposition results show that the hybrid system reduces both standard deviation and crossmaterial noise correlations compared to the designs where the constituent technologies are used individually. △ Less

Submitted 29 March, 2021; originally announced March 2021.

arXiv:2103.05505 [pdf]

Approximate Optimal Filter for Linear Gaussian Time-invariant Systems

Authors: Kaiming Tang, Shengbo Eben Li, Yuming Yin, Yang Guan, **gliang Duan, Wenhan Cao, Jie Li

Abstract: State estimation is critical to control systems, especially when the states cannot be directly measured. This paper presents an approximate optimal filter, which enables to use policy iteration technique to obtain the steady-state gain in linear Gaussian time-invariant systems. This design transforms the optimal filtering problem with minimum mean square error into an optimal control problem, call… ▽ More State estimation is critical to control systems, especially when the states cannot be directly measured. This paper presents an approximate optimal filter, which enables to use policy iteration technique to obtain the steady-state gain in linear Gaussian time-invariant systems. This design transforms the optimal filtering problem with minimum mean square error into an optimal control problem, called Approximate Optimal Filtering (AOF) problem. The equivalence holds given certain conditions about initial state distributions and policy formats, in which the system state is the estimation error, control input is the filter gain, and control objective function is the accumulated estimation error. We present a policy iteration algorithm to solve the AOF problem in steady-state. A classic vehicle state estimation problem finally evaluates the approximate filter. The results show that the policy converges to the steady-state Kalman gain, and its accuracy is within 2 %. △ Less

Submitted 9 March, 2021; originally announced March 2021.

arXiv:2012.05004 [pdf, other]

Modeling and Identification of Low Rank Vector Processes

Authors: Giorgio Picci, Wenqi Cao, Anders Lindquist

Abstract: We study modeling and identification of processes with a spectral density matrix of low rank. Equivalently, we consider processes having an innovation of reduced dimension for which Prediction Error Methods (PEM) algorithms are not directly applicable. We show that these processes admit a special feedback structure with a deterministic feedback channel which can be used to split the identification… ▽ More We study modeling and identification of processes with a spectral density matrix of low rank. Equivalently, we consider processes having an innovation of reduced dimension for which Prediction Error Methods (PEM) algorithms are not directly applicable. We show that these processes admit a special feedback structure with a deterministic feedback channel which can be used to split the identification in two steps, one of which can be based on standard algorithms while the other is based on a deterministic least squares fit. △ Less

Submitted 10 May, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

Comments: A more detailed version of the submission with the same name to IFAC SYSID 2021

arXiv:2010.06507 [pdf]

Machine Learning of Partial Differential Equations from Noise Data

Authors: Wenbo Cao, Weiwei Zhang

Abstract: Machine learning of partial differential equations from data is a potential breakthrough to solve the lack of physical equations in complex dynamic systems, but because numerical differentiation is ill-posed to noise data, noise has become the biggest obstacle in the application of partial differential equation identification method. To overcome this problem, we propose Frequency Domain Identifica… ▽ More Machine learning of partial differential equations from data is a potential breakthrough to solve the lack of physical equations in complex dynamic systems, but because numerical differentiation is ill-posed to noise data, noise has become the biggest obstacle in the application of partial differential equation identification method. To overcome this problem, we propose Frequency Domain Identification method based on Fourier transforms, which effectively eliminates the influence of noise by using the low frequency component of frequency domain data to identify partial differential equations in frequency domain. We also propose a new sparse identification criterion, which can accurately identify the terms in the equation from low signal-to-noise ratio data. Through identifying a variety of canonical equations spanning a number of scientific domains, the proposed method is proved to have high accuracy and robustness for equation structure and parameters identification for low signal-to-noise ratio data. The method provides a promising technique to discover potential partial differential equations from noisy experimental data. △ Less

Submitted 17 November, 2021; v1 submitted 28 September, 2020; originally announced October 2020.

arXiv:2008.00674 [pdf, other]

Reinforcement Solver for H-infinity Filter with Bounded Noise

Authors: Jie Li, Shengbo Eben Li, Kaiming Tang, Yao Lv, Wenhan Cao

Abstract: H-infinity filter has been widely applied in engineering field, but cop** with bounded noise is still an open problem and difficult to solve. This paper considers the H-infinity filtering problem for linear system with bounded process and measurement noise. The problem is first formulated as a zero-sum game where the dynamic of estimation error is non-affine with respect to filter gain and measu… ▽ More H-infinity filter has been widely applied in engineering field, but cop** with bounded noise is still an open problem and difficult to solve. This paper considers the H-infinity filtering problem for linear system with bounded process and measurement noise. The problem is first formulated as a zero-sum game where the dynamic of estimation error is non-affine with respect to filter gain and measurement noise. A nonquadratic Hamilton-Jacobi-Isaacs (HJI) equation is then derived by employing a nonquadratic cost to characterize bounded noise, which is extremely difficult to solve due to its non-affine and nonlinear properties. Next, a reinforcement learning algorithm based on gradient descent method which can handle nonlinearity is proposed to update the gain of reinforcement filter, where measurement noise is fixed to tackle non-affine property and increase the convexity of Hamiltonian. Two examples demonstrate the convergence and effectiveness of the proposed algorithm. △ Less

Submitted 3 August, 2020; originally announced August 2020.

arXiv:2005.03622 [pdf, ps, other]

Nonparametric Estimation of the Fisher Information and Its Applications

Authors: Wei Cao, Alex Dytso, Michael Fauß, H. Vincent Poor, Gang Feng

Abstract: This paper considers the problem of estimation of the Fisher information for location from a random sample of size $n$. First, an estimator proposed by Bhattacharya is revisited and improved convergence rates are derived. Second, a new estimator, termed a clipped estimator, is proposed. Superior upper bounds on the rates of convergence can be shown for the new estimator compared to the Bhattachary… ▽ More This paper considers the problem of estimation of the Fisher information for location from a random sample of size $n$. First, an estimator proposed by Bhattacharya is revisited and improved convergence rates are derived. Second, a new estimator, termed a clipped estimator, is proposed. Superior upper bounds on the rates of convergence can be shown for the new estimator compared to the Bhattacharya estimator, albeit with different regularity conditions. Third, both of the estimators are evaluated for the practically relevant case of a random variable contaminated by Gaussian noise. Moreover, using Brown's identity, which relates the Fisher information and the minimum mean squared error (MMSE) in Gaussian noise, two corresponding consistent estimators for the MMSE are proposed. Simulation examples for the Bhattacharya estimator and the clipped estimator as well as the MMSE estimators are presented. The examples demonstrate that the clipped estimator can significantly reduce the required sample size to guarantee a specific confidence interval compared to the Bhattacharya estimator. △ Less

Submitted 7 May, 2020; originally announced May 2020.

arXiv:2004.14699 [pdf]

A 6G White Paper on Connectivity for Remote Areas

Authors: Harri Saarnisaari, Sudhir Dixit, Mohamed-Slim Alouini, Abdelaali Chaoub, Marco Giordani, Adrian Kliks, Marja Matinmikko-Blue, Nan Zhang, Anuj Agrawal, Mats Andersson, Vimal Bhatia, Wei Cao, Yunfei Chen, Wei Feng, Marjo Heikkilä, Josep M. Jornet, Luciano Mendes, Heikki Karvonen, Brejesh Lall, Matti Latva-aho, Xiangling Li, Kalle Lähetkangas, Moshe T. Masonta, Alok Pandey, Pekka Pirinen , et al. (9 additional authors not shown)

Abstract: In many places all over the world rural and remote areas lack proper connectivity that has led to increasing digital divide. These areas might have low population density, low incomes, etc., making them less attractive places to invest and operate connectivity networks. 6G could be the first mobile radio generation truly aiming to close the digital divide. However, in order to do so, special requi… ▽ More In many places all over the world rural and remote areas lack proper connectivity that has led to increasing digital divide. These areas might have low population density, low incomes, etc., making them less attractive places to invest and operate connectivity networks. 6G could be the first mobile radio generation truly aiming to close the digital divide. However, in order to do so, special requirements and challenges have to be considered since the beginning of the design process. The aim of this white paper is to discuss requirements and challenges and point out related, identified research topics that have to be solved in 6G. This white paper first provides a generic discussion, shows some facts and discusses targets set in international bodies related to rural and remote connectivity and digital divide. Then the paper digs into technical details, i.e., into a solutions space. Each technical section ends with a discussion and then highlights identified 6G challenges and research ideas as a list. △ Less

Submitted 30 April, 2020; originally announced April 2020.

Comments: A 6G white paper, 17 pages

arXiv:2004.02463 [pdf, other]

Spectral Rank, Feedback, Causality and the Indirect Method for CARMA Identification

Authors: Wenqi Cao, Anders Lindquist, Giorgio Picci

Abstract: Building on a recent paper by Georgiou and Lindquist [1] on the problem of rank deficiency of spectral densities and hidden dynamical relations after sampling of continuous-time stochastic processes, this paper is devoted to understanding related questions of feedback and Granger causality that affect stability properties. This then naturally connects to CARMA identification, where we remark on ce… ▽ More Building on a recent paper by Georgiou and Lindquist [1] on the problem of rank deficiency of spectral densities and hidden dynamical relations after sampling of continuous-time stochastic processes, this paper is devoted to understanding related questions of feedback and Granger causality that affect stability properties. This then naturally connects to CARMA identification, where we remark on certain oversights in the literature. △ Less

Submitted 7 June, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

Comments: 7 pages, 2 figures

arXiv:2001.10489 [pdf]

doi 10.1016/j.advengsoft.2019.102687

An efficient surrogate-aided importance sampling framework for reliability analysis

Authors: Wang-Sheng Liu, Sai Hung Cheung, Wen-Jun Cao

Abstract: Surrogates in lieu of expensive-to-evaluate performance functions can accelerate the reliability analysis greatly. This paper proposes a new two-stage framework for surrogate-aided reliability analysis named Surrogates for Importance Sampling (S4IS). In the first stage, a coarse surrogate is built to gain the information about failure regions; the second stage zooms into the important regions and… ▽ More Surrogates in lieu of expensive-to-evaluate performance functions can accelerate the reliability analysis greatly. This paper proposes a new two-stage framework for surrogate-aided reliability analysis named Surrogates for Importance Sampling (S4IS). In the first stage, a coarse surrogate is built to gain the information about failure regions; the second stage zooms into the important regions and improves the accuracy of the failure probability estimator by adaptively selecting support points therein. The learning functions are proposed to guide the selection of support points such that the exploration and exploitation can be dynamically balanced. As a generic framework, S4IS has the potential to incorporate different types of surrogates (Gaussian Processes, Support Vector Machines, Neural Network, etc.). The effectiveness and efficiency of S4IS is validated by five illustrative examples, which involve system reliability, highly nonlinear limit-state function, small failure probability and moderately high dimensionality. The implementation of S4IS is made available to download at https://github.com/RobinSeaside/S4IS. △ Less

Submitted 22 January, 2020; originally announced January 2020.

Journal ref: Advances in Engineering Software 135 (2019) 102687

arXiv:1911.12815 [pdf, other]

Neural Network-Inspired Analog-to-Digital Conversion to Achieve Super-Resolution with Low-Precision RRAM Devices

Authors: Weidong Cao, Liu Ke, Ayan Chakrabarti, Xuan Zhang

Abstract: Recent works propose neural network- (NN-) inspired analog-to-digital converters (NNADCs) and demonstrate their great potentials in many emerging applications. These NNADCs often rely on resistive random-access memory (RRAM) devices to realize the NN operations and require high-precision RRAM cells (6~12-bit) to achieve a moderate quantization resolution (4~8-bit). Such optimistic assumption of RR… ▽ More Recent works propose neural network- (NN-) inspired analog-to-digital converters (NNADCs) and demonstrate their great potentials in many emerging applications. These NNADCs often rely on resistive random-access memory (RRAM) devices to realize the NN operations and require high-precision RRAM cells (6~12-bit) to achieve a moderate quantization resolution (4~8-bit). Such optimistic assumption of RRAM resolution, however, is not supported by fabrication data of RRAM arrays in large-scale production process. In this paper, we propose an NN-inspired super-resolution ADC based on low-precision RRAM devices by taking the advantage of a co-design methodology that combines a pipelined hardware architecture with a custom NN training framework. Results obtained from SPICE simulations demonstrate that our method leads to robust design of a 14-bit super-resolution ADC using 3-bit RRAM devices with improved power and speed performance and competitive figure-of-merits (FoMs). In addition to the linear uniform quantization, the proposed ADC can also support configurable high-resolution nonlinear quantization with high conversion speed and low conversion energy, enabling future intelligent analog-to-information interfaces for near-sensor analytics and processing. △ Less

Submitted 28 November, 2019; originally announced November 2019.

Comments: 7 pages, ICCAD 2019

Showing 1–42 of 42 results for author: Cao, W