-
Computation-Aware Learning for Stable Control with Gaussian Process
Authors:
Wenhan Cao,
Alexandre Capone,
Rishabh Yadav,
Sandra Hirche,
Wei Pan
Abstract:
In Gaussian Process (GP) dynamical model learning for robot control, particularly for systems constrained by computational resources like small quadrotors equipped with low-end processors, analyzing stability and designing a stable controller present significant challenges. This paper distinguishes between two types of uncertainty within the posteriors of GP dynamical models: the well-documented m…
▽ More
In Gaussian Process (GP) dynamical model learning for robot control, particularly for systems constrained by computational resources like small quadrotors equipped with low-end processors, analyzing stability and designing a stable controller present significant challenges. This paper distinguishes between two types of uncertainty within the posteriors of GP dynamical models: the well-documented mathematical uncertainty stemming from limited data and computational uncertainty arising from constrained computational capabilities, which has been largely overlooked in prior research. Our work demonstrates that computational uncertainty, quantified through a probabilistic approximation of the inverse covariance matrix in GP dynamical models, is essential for stable control under computational constraints. We show that incorporating computational uncertainty can prevent overestimating the region of attraction, a safe subset of the state space with asymptotic stability, thus improving system safety. Building on these insights, we propose an innovative controller design methodology that integrates computational uncertainty within a second-order cone programming framework. Simulations of canonical stable control tasks and experiments of quadrotor tracking exhibit the effectiveness of our method under computational constraints.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
DVMSR: Distillated Vision Mamba for Efficient Super-Resolution
Authors:
Xiaoyan Lei,
Wenlong Zhang,
Weifeng Cao
Abstract:
Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational comple…
▽ More
Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational complexity, which have shown impressive performance on high-level vision tasks. In this paper, we propose DVMSR, a novel lightweight Image SR network that incorporates Vision Mamba and a distillation strategy. The network of DVMSR consists of three modules: feature extraction convolution, multiple stacked Residual State Space Blocks (RSSBs), and a reconstruction module. Specifically, the deep feature extraction module is composed of several residual state space blocks (RSSB), each of which has several Vision Mamba Moudles(ViMM) together with a residual connection. To achieve efficiency improvement while maintaining comparable performance, we employ a distillation strategy to the vision Mamba network for superior performance. Specifically, we leverage the rich representation knowledge of teacher network as additional supervision for the output of lightweight student networks. Extensive experiments have demonstrated that our proposed DVMSR can outperform state-of-the-art efficient SR methods in terms of model parameters while maintaining the performance of both PSNR and SSIM. The source code is available at https://github.com/nathan66666/DVMSR.git
△ Less
Submitted 11 May, 2024; v1 submitted 5 May, 2024;
originally announced May 2024.
-
Multidimensional Compressed Sensing for Spectral Light Field Imaging
Authors:
Wen Cao,
Ehsan Miandji,
Jonas Unger
Abstract:
This paper considers a compressive multi-spectral light field camera model that utilizes a one-hot spectralcoded mask and a microlens array to capture spatial, angular, and spectral information using a single monochrome sensor. We propose a model that employs compressed sensing techniques to reconstruct the complete multi-spectral light field from undersampled measurements. Unlike previous work wh…
▽ More
This paper considers a compressive multi-spectral light field camera model that utilizes a one-hot spectralcoded mask and a microlens array to capture spatial, angular, and spectral information using a single monochrome sensor. We propose a model that employs compressed sensing techniques to reconstruct the complete multi-spectral light field from undersampled measurements. Unlike previous work where a light field is vectorized to a 1D signal, our method employs a 5D basis and a novel 5D measurement model, hence, matching the intrinsic dimensionality of multispectral light fields. We mathematically and empirically show the equivalence of 5D and 1D sensing models, and most importantly that the 5D framework achieves orders of magnitude faster reconstruction while requiring a small fraction of the memory. Moreover, our new multidimensional sensing model opens new research directions for designing efficient visual data acquisition algorithms and hardware.
△ Less
Submitted 27 February, 2024;
originally announced May 2024.
-
The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report
Authors:
Bin Ren,
Yawei Li,
Nancy Mehta,
Radu Timofte,
Hongyuan Yu,
Cheng Wan,
Yuxin Hong,
Bingnan Han,
Zhuoyuan Wu,
Yajun Zou,
Yuqing Liu,
Jizhe Li,
Keji He,
Chao Fan,
Heng Zhang,
Xiaolin Zhang,
Xuanwu Yin,
Kunlong Zuo,
Bohao Liao,
Peizhe Xia,
Long Peng,
Zhibo Du,
Xin Di,
Wangkai Li,
Yang Wang
, et al. (109 additional authors not shown)
Abstract:
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such…
▽ More
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/.
△ Less
Submitted 25 June, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Convolutional Bayesian Filtering
Authors:
Wenhan Cao,
Shiqi Liu,
Chang Liu,
Zeyu He,
Stephen S. -T. Yau,
Shengbo Eben Li
Abstract:
Bayesian filtering serves as the mainstream framework of state estimation in dynamic systems. Its standard version utilizes total probability rule and Bayes' law alternatively, where how to define and compute conditional probability is critical to state distribution inference. Previously, the conditional probability is assumed to be exactly known, which represents a measure of the occurrence proba…
▽ More
Bayesian filtering serves as the mainstream framework of state estimation in dynamic systems. Its standard version utilizes total probability rule and Bayes' law alternatively, where how to define and compute conditional probability is critical to state distribution inference. Previously, the conditional probability is assumed to be exactly known, which represents a measure of the occurrence probability of one event, given the second event. In this paper, we find that by adding an additional event that stipulates an inequality condition, we can transform the conditional probability into a special integration that is analogous to convolution. Based on this transformation, we show that both transition probability and output probability can be generalized to convolutional forms, resulting in a more general filtering framework that we call convolutional Bayesian filtering. This new framework encompasses standard Bayesian filtering as a special case when the distance metric of the inequality condition is selected as Dirac delta function. It also allows for a more nuanced consideration of model mismatch by choosing different types of inequality conditions. For instance, when the distance metric is defined in a distributional sense, the transition probability and output probability can be approximated by simply rescaling them into fractional powers. Under this framework, a robust version of Kalman filter can be constructed by only altering the noise covariance matrix, while maintaining the conjugate nature of Gaussian distributions. Finally, we exemplify the effectiveness of our approach by resha** classic filtering algorithms into convolutional versions, including Kalman filter, extended Kalman filter, unscented Kalman filter and particle filter.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Optimizing Polynomial Graph Filters: A Novel Adaptive Krylov Subspace Approach
Authors:
Keke Huang,
Wencai Cao,
Hoang Ta,
Xiaokui Xiao,
Pietro Liò
Abstract:
Graph Neural Networks (GNNs), known as spectral graph filters, find a wide range of applications in web networks. To bypass eigendecomposition, polynomial graph filters are proposed to approximate graph filters by leveraging various polynomial bases for filter training. However, no existing studies have explored the diverse polynomial graph filters from a unified perspective for optimization.
In…
▽ More
Graph Neural Networks (GNNs), known as spectral graph filters, find a wide range of applications in web networks. To bypass eigendecomposition, polynomial graph filters are proposed to approximate graph filters by leveraging various polynomial bases for filter training. However, no existing studies have explored the diverse polynomial graph filters from a unified perspective for optimization.
In this paper, we first unify polynomial graph filters, as well as the optimal filters of identical degrees into the Krylov subspace of the same order, thus providing equivalent expressive power theoretically. Next, we investigate the asymptotic convergence property of polynomials from the unified Krylov subspace perspective, revealing their limited adaptability in graphs with varying heterophily degrees. Inspired by those facts, we design a novel adaptive Krylov subspace approach to optimize polynomial bases with provable controllability over the graph spectrum so as to adapt various heterophily graphs. Subsequently, we propose AdaptKry, an optimized polynomial graph filter utilizing bases from the adaptive Krylov subspaces. Meanwhile, in light of the diverse spectral properties of complex graphs, we extend AdaptKry by leveraging multiple adaptive Krylov bases without incurring extra training costs. As a consequence, extended AdaptKry is able to capture the intricate characteristics of graphs and provide insights into their inherent complexity. We conduct extensive experiments across a series of real-world datasets. The experimental results demonstrate the superior filtering capability of AdaptKry, as well as the optimized efficacy of the adaptive Krylov basis.
△ Less
Submitted 20 May, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Impact of Computation in Integral Reinforcement Learning for Continuous-Time Control
Authors:
Wenhan Cao,
Wei Pan
Abstract:
Integral reinforcement learning (IntRL) demands the precise computation of the utility function's integral at its policy evaluation (PEV) stage. This is achieved through quadrature rules, which are weighted sums of utility functions evaluated from state samples obtained in discrete time. Our research reveals a critical yet underexplored phenomenon: the choice of the computational method -- in this…
▽ More
Integral reinforcement learning (IntRL) demands the precise computation of the utility function's integral at its policy evaluation (PEV) stage. This is achieved through quadrature rules, which are weighted sums of utility functions evaluated from state samples obtained in discrete time. Our research reveals a critical yet underexplored phenomenon: the choice of the computational method -- in this case, the quadrature rule -- can significantly impact control performance. This impact is traced back to the fact that computational errors introduced in the PEV stage can affect the policy iteration's convergence behavior, which in turn affects the learned controller. To elucidate how computation impacts control, we draw a parallel between IntRL's policy iteration and Newton's method applied to the Hamilton-Jacobi-Bellman equation. In this light, computational error in PEV manifests as an extra error term in each iteration of Newton's method, with its upper bound proportional to the computational error. Further, we demonstrate that when the utility function resides in a reproducing kernel Hilbert space (RKHS), the optimal quadrature is achievable by employing Bayesian quadrature with the RKHS-inducing kernel function. We prove that the local convergence rates for IntRL using the trapezoidal rule and Bayesian quadrature with a Matérn kernel to be $O(N^{-2})$ and $O(N^{-b})$, where $N$ is the number of evenly-spaced samples and $b$ is the Matérn kernel's smoothness parameter. These theoretical findings are finally validated by two canonical control tasks.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Analyzing the Impact of Computation in Adaptive Dynamic Programming for Stochastic LQR Problem
Authors:
Wenhan Cao,
Alexandre Capone,
Sandra Hirche,
Wei Pan
Abstract:
Adaptive dynamic programming (ADP) for stochastic linear quadratic regulation (LQR) demands the precise computation of stochastic integrals during policy iteration (PI). In a fully model-free problem setting, this computation can only be approximated by state samples collected at discrete time points using computational methods such as the canonical Euler-Maruyama method. Our research reveals a cr…
▽ More
Adaptive dynamic programming (ADP) for stochastic linear quadratic regulation (LQR) demands the precise computation of stochastic integrals during policy iteration (PI). In a fully model-free problem setting, this computation can only be approximated by state samples collected at discrete time points using computational methods such as the canonical Euler-Maruyama method. Our research reveals a critical phenomenon: the sampling period can significantly impact control performance. This impact is due to the fact that computational errors introduced in each step of PI can significantly affect the algorithm's convergence behavior, which in turn influences the resulting control policy. We draw a parallel between PI and Newton's method applied to the Ricatti equation to elucidate how the computation impacts control. In this light, the computational error in each PI step manifests itself as an extra error term in each step of Newton's method, with its upper bound proportional to the computational error. Furthermore, we demonstrate that the convergence rate for ADP in stochastic LQR problems using the Euler-Maruyama method is O(h), with h being the sampling period. A sensorimotor control task finally validates these theoretical findings.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Lightweight Speaker Verification Using Transformation Module with Feature Partition and Fusion
Authors:
Yanxiong Li,
Zhongjie Jiang,
Qisheng Huang,
Wenchang Cao,
Jialong Li
Abstract:
Although many efforts have been made on decreasing the model complexity for speaker verification, it is still challenging to deploy speaker verification systems with satisfactory result on low-resource terminals. We design a transformation module that performs feature partition and fusion to implement lightweight speaker verification. The transformation module consists of multiple simple but effec…
▽ More
Although many efforts have been made on decreasing the model complexity for speaker verification, it is still challenging to deploy speaker verification systems with satisfactory result on low-resource terminals. We design a transformation module that performs feature partition and fusion to implement lightweight speaker verification. The transformation module consists of multiple simple but effective operations, such as convolution, pooling, mean, concatenation, normalization, and element-wise summation. It works in a plug-and-play way, and can be easily implanted into a wide variety of models to reduce the model complexity while maintaining the model error. First, the input feature is split into several low-dimensional feature subsets for decreasing the model complexity. Then, each feature subset is updated by fusing it with the inter-feature-subsets correlational information to enhance its representational capability. Finally, the updated feature subsets are independently fed into the block (one or several layers) of the model for further processing. The features that are output from current block of the model are processed according to the steps above before they are fed into the next block of the model. Experimental data are selected from two public speech corpora (namely VoxCeleb1 and VoxCeleb2). Results show that implanting the transformation module into three models (namely AMCRN, ResNet34, and ECAPA-TDNN) for speaker verification slightly increases the model error and significantly decreases the model complexity. Our proposed method outperforms baseline methods on the whole in memory requirement and computational complexity with lower equal error rate. It also generalizes well across truncated segments with various lengths.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Low-Complexity Acoustic Scene Classification Using Data Augmentation and Lightweight ResNet
Authors:
Yanxiong Li,
Wenchang Cao,
Wei Xie,
Qisheng Huang,
Wenfeng Pang,
Qianhua He
Abstract:
We present a work on low-complexity acoustic scene classification (ASC) with multiple devices, namely the subtask A of Task 1 of the DCASE2021 challenge. This subtask focuses on classifying audio samples of multiple devices with a low-complexity model, where two main difficulties need to be overcome. First, the audio samples are recorded by different devices, and there is mismatch of recording dev…
▽ More
We present a work on low-complexity acoustic scene classification (ASC) with multiple devices, namely the subtask A of Task 1 of the DCASE2021 challenge. This subtask focuses on classifying audio samples of multiple devices with a low-complexity model, where two main difficulties need to be overcome. First, the audio samples are recorded by different devices, and there is mismatch of recording devices in audio samples. We reduce the negative impact of the mismatch of recording devices by using some effective strategies, including data augmentation (e.g., mix-up, spectrum correction, pitch shift), usages of multi-patch network structure and channel attention. Second, the model size should be smaller than a threshold (e.g., 128 KB required by the DCASE2021 challenge). To meet this condition, we adopt a ResNet with both depthwise separable convolution and channel attention as the backbone network, and perform model compression. In summary, we propose a low-complexity ASC method using data augmentation and a lightweight ResNet. Evaluated on the official development and evaluation datasets, our method obtains classification accuracy scores of 71.6% and 66.7%, respectively; and obtains Log-loss scores of 1.038 and 1.136, respectively. Our final model size is 110.3 KB which is smaller than the maximum of 128 KB.
△ Less
Submitted 3 June, 2023;
originally announced June 2023.
-
Few-shot Class-incremental Audio Classification Using Stochastic Classifier
Authors:
Yanxiong Li,
Wenchang Cao,
Jialong Li,
Wei Xie,
Qianhua He
Abstract:
It is generally assumed that number of classes is fixed in current audio classification methods, and the model can recognize pregiven classes only. When new classes emerge, the model needs to be retrained with adequate samples of all classes. If new classes continually emerge, these methods will not work well and even infeasible. In this study, we propose a method for fewshot class-incremental aud…
▽ More
It is generally assumed that number of classes is fixed in current audio classification methods, and the model can recognize pregiven classes only. When new classes emerge, the model needs to be retrained with adequate samples of all classes. If new classes continually emerge, these methods will not work well and even infeasible. In this study, we propose a method for fewshot class-incremental audio classification, which continually recognizes new classes and remember old ones. The proposed model consists of an embedding extractor and a stochastic classifier. The former is trained in base session and frozen in incremental sessions, while the latter is incrementally expanded in all sessions. Two datasets (NS-100 and LS-100) are built by choosing samples from audio corpora of NSynth and LibriSpeech, respectively. Results show that our method exceeds four baseline ones in average accuracy and performance drop** rate. Code is at https://github.com/vinceasvp/meta-sc.
△ Less
Submitted 3 June, 2023;
originally announced June 2023.
-
Speaker verification using attentive multi-scale convolutional recurrent network
Authors:
Yanxiong Li,
Zhongjie Jiang,
Wenchang Cao,
Qisheng Huang
Abstract:
In this paper, we propose a speaker verification method by an Attentive Multi-scale Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local spatial information and global sequential information from the input speech recordings. In the proposed method, logarithm Mel spectrum is extracted from each speech recording and then fed to the proposed AMCRN for learning speaker em…
▽ More
In this paper, we propose a speaker verification method by an Attentive Multi-scale Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local spatial information and global sequential information from the input speech recordings. In the proposed method, logarithm Mel spectrum is extracted from each speech recording and then fed to the proposed AMCRN for learning speaker embedding. Afterwards, the learned speaker embedding is fed to the back-end classifier (such as cosine similarity metric) for scoring in the testing stage. The proposed method is compared with state-of-the-art methods for speaker verification. Experimental data are three public datasets that are selected from two large-scale speech corpora (VoxCeleb1 and VoxCeleb2). Experimental results show that our method exceeds baseline methods in terms of equal error rate and minimal detection cost function, and has advantages over most of baseline methods in terms of computational complexity and memory requirement. In addition, our method generalizes well across truncated speech segments with different durations, and the speaker embedding learned by the proposed AMCRN has stronger generalization ability across two back-end classifiers.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Few-Shot Speaker Identification Using Lightweight Prototypical Network with Feature Grou** and Interaction
Authors:
Yanxiong Li,
Hao Chen,
Wenchang Cao,
Qisheng Huang,
Qianhua He
Abstract:
Existing methods for few-shot speaker identification (FSSI) obtain high accuracy, but their computational complexities and model sizes need to be reduced for lightweight applications. In this work, we propose a FSSI method using a lightweight prototypical network with the final goal to implement the FSSI on intelligent terminals with limited resources, such as smart watches and smart speakers. In…
▽ More
Existing methods for few-shot speaker identification (FSSI) obtain high accuracy, but their computational complexities and model sizes need to be reduced for lightweight applications. In this work, we propose a FSSI method using a lightweight prototypical network with the final goal to implement the FSSI on intelligent terminals with limited resources, such as smart watches and smart speakers. In the proposed prototypical network, an embedding module is designed to perform feature grou** for reducing the memory requirement and computational complexity, and feature interaction for enhancing the representational ability of the learned speaker embedding. In the proposed embedding module, audio feature of each speech sample is split into several low-dimensional feature subsets that are transformed by a recurrent convolutional block in parallel. Then, the operations of averaging, addition, concatenation, element-wise summation and statistics pooling are sequentially executed to learn a speaker embedding for each speech sample. The recurrent convolutional block consists of a block of bidirectional long short-term memory, and a block of de-redundancy convolution in which feature grou** and interaction are conducted too. Our method is compared to baseline methods on three datasets that are selected from three public speech corpora (VoxCeleb1, VoxCeleb2, and LibriSpeech). The results show that our method obtains higher accuracy under several conditions, and has advantages over all baseline methods in computational complexity and model size.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Few-shot Class-incremental Audio Classification Using Dynamically Expanded Classifier with Self-attention Modified Prototypes
Authors:
Yanxiong Li,
Wenchang Cao,
Wei Xie,
Jialong Li,
Emmanouil Benetos
Abstract:
Most existing methods for audio classification assume that the vocabulary of audio classes to be classified is fixed. When novel (unseen) audio classes appear, audio classification systems need to be retrained with abundant labeled samples of all audio classes for recognizing base (initial) and novel audio classes. If novel audio classes continue to appear, the existing methods for audio classific…
▽ More
Most existing methods for audio classification assume that the vocabulary of audio classes to be classified is fixed. When novel (unseen) audio classes appear, audio classification systems need to be retrained with abundant labeled samples of all audio classes for recognizing base (initial) and novel audio classes. If novel audio classes continue to appear, the existing methods for audio classification will be inefficient and even infeasible. In this work, we propose a method for few-shot class-incremental audio classification, which can continually recognize novel audio classes without forgetting old ones. The framework of our method mainly consists of two parts: an embedding extractor and a classifier, and their constructions are decoupled. The embedding extractor is the backbone of a ResNet based network, which is frozen after construction by a training strategy using only samples of base audio classes. However, the classifier consisting of prototypes is expanded by a prototype adaptation network with few samples of novel audio classes in incremental sessions. Labeled support samples and unlabeled query samples are used to train the prototype adaptation network and update the classifier, since they are informative for audio classification. Three audio datasets, named NSynth-100, FSC-89 and LS-100 are built by choosing samples from audio corpora of NSynth, FSD-MIX-CLIP and LibriSpeech, respectively. Results show that our method exceeds baseline methods in average accuracy and performance drop** rate. In addition, it is competitive compared to baseline methods in computational complexity and memory requirement. The code for our method is given at https://github.com/vinceasvp/FCAC.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Few-shot Class-incremental Audio Classification Using Adaptively-refined Prototypes
Authors:
Wei Xie,
Yanxiong Li,
Qianhua He,
Wenchang Cao,
Tuomas Virtanen
Abstract:
New classes of sounds constantly emerge with a few samples, making it challenging for models to adapt to dynamic acoustic environments. This challenge motivates us to address the new problem of few-shot class-incremental audio classification. This study aims to enable a model to continuously recognize new classes of sounds with a few training samples of new classes while remembering the learned on…
▽ More
New classes of sounds constantly emerge with a few samples, making it challenging for models to adapt to dynamic acoustic environments. This challenge motivates us to address the new problem of few-shot class-incremental audio classification. This study aims to enable a model to continuously recognize new classes of sounds with a few training samples of new classes while remembering the learned ones. To this end, we propose a method to generate discriminative prototypes and use them to expand the model's classifier for recognizing sounds of new and learned classes. The model is first trained with a random episodic training strategy, and then its backbone is used to generate the prototypes. A dynamic relation projection module refines the prototypes to enhance their discriminability. Results on two datasets (derived from the corpora of Nsynth and FSD-MIX-CLIPS) show that the proposed method exceeds three state-of-the-art methods in average accuracy and performance drop** rate.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
Dealing with Collinearity in Large-Scale Linear System Identification Using Gaussian Regression
Authors:
Wenqi Cao,
Gianluigi Pillonetto
Abstract:
Many problems arising in control require the determination of a mathematical model of the application. This has often to be performed starting from input-output data, leading to a task known as system identification in the engineering literature. One emerging topic in this field is estimation of networks consisting of several interconnected dynamic systems. We consider the linear setting assuming…
▽ More
Many problems arising in control require the determination of a mathematical model of the application. This has often to be performed starting from input-output data, leading to a task known as system identification in the engineering literature. One emerging topic in this field is estimation of networks consisting of several interconnected dynamic systems. We consider the linear setting assuming that system outputs are the result of many correlated inputs, hence making system identification severely ill-conditioned. This is a scenario often encountered when modeling complex cybernetics systems composed by many sub-units with feedback and algebraic loops. We develop a strategy cast in a Bayesian regularization framework where any impulse response is seen as realization of a zero-mean Gaussian process. Any covariance is defined by the so called stable spline kernel which includes information on smooth exponential decay. We design a novel Markov chain Monte Carlo scheme able to reconstruct the impulse responses posterior by efficiently dealing with collinearity. Our scheme relies on a variation of the Gibbs sampling technique: beyond considering blocks forming a partition of the parameter space, some other (overlap**) blocks are also updated on the basis of the level of collinearity of the system inputs. Theoretical properties of the algorithm are studied obtaining its convergence rate. Numerical experiments are included using systems containing hundreds of impulse responses and highly correlated inputs.
△ Less
Submitted 28 February, 2023; v1 submitted 21 February, 2023;
originally announced February 2023.
-
Robust Bayesian Inference for Moving Horizon Estimation
Authors:
Wenhan Cao,
Chang Liu,
Zhiqian Lan,
Shengbo Eben Li,
Wei Pan,
Angelo Alessandri
Abstract:
The accuracy of moving horizon estimation (MHE) suffers significantly in the presence of measurement outliers. Existing methods address this issue by treating measurements leading to large MHE cost function values as outliers, which are subsequently discarded. This strategy, achieved through solving combinatorial optimization problems, is confined to linear systems to guarantee computational tract…
▽ More
The accuracy of moving horizon estimation (MHE) suffers significantly in the presence of measurement outliers. Existing methods address this issue by treating measurements leading to large MHE cost function values as outliers, which are subsequently discarded. This strategy, achieved through solving combinatorial optimization problems, is confined to linear systems to guarantee computational tractability and stability. Contrasting these heuristic solutions, our work reexamines MHE from a Bayesian perspective, unveils the fundamental issue of its lack of robustness: MHE's sensitivity to outliers results from its reliance on the Kullback-Leibler (KL) divergence, where both outliers and inliers are equally considered. To tackle this problem, we propose a robust Bayesian inference framework for MHE, integrating a robust divergence measure to reduce the impact of outliers. In particular, the proposed approach prioritizes the fitting of uncontaminated data and lowers the weight of contaminated ones, instead of directly discarding all potentially contaminated measurements, which may lead to undesirable removal of uncontaminated data. A tuning parameter is incorporated into the framework to adjust the robustness degree to outliers. Notably, the classical MHE can be interpreted as a special case of the proposed approach as the parameter converges to zero. In addition, our method involves only minor modification to the classical MHE stage cost, thus avoiding the high computational complexity associated with previous outlier-robust methods and inherently suitable for nonlinear systems. Most importantly, our method provides robustness and stability guarantees, which are often missing in other outlier-robust Bayes filters. The effectiveness of the proposed method is demonstrated on simulations subject to outliers following different distributions, as well as on physical experiment data.
△ Less
Submitted 2 October, 2023; v1 submitted 5 October, 2022;
originally announced October 2022.
-
On the Optimization Landscape of Dynamic Output Feedback: A Case Study for Linear Quadratic Regulator
Authors:
**gliang Duan,
Wenhan Cao,
Yang Zheng,
Lin Zhao
Abstract:
The convergence of policy gradient algorithms in reinforcement learning hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies…
▽ More
The convergence of policy gradient algorithms in reinforcement learning hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies (controllers). We investigate the more challenging case of dynamic output-feedback policies for linear quadratic regulation (abbreviated as dLQR), which is prevalent in practice but has a rather complicated optimization landscape. We first show how the dLQR cost varies with the coordinate transformation of the dynamic controller and then derive the optimal transformation for a given observable stabilizing controller. At the core of our results is the uniqueness of the stationary point of dLQR when it is observable, which is in a concise form of an observer-based controller with the optimal similarity transformation. These results shed light on designing efficient algorithms for general decision-making problems with partially observed information.
△ Less
Submitted 29 October, 2023; v1 submitted 12 September, 2022;
originally announced September 2022.
-
Using Large Context for Kidney Multi-Structure Segmentation from CTA Images
Authors:
Weiwei Cao,
Yuzhu Cao
Abstract:
Accurate and automated segmentation of multi-structure (i.e., kidneys, renal tu-mors, arteries, and veins) from 3D CTA is one of the most important tasks for surgery-based renal cancer treatment (e.g., laparoscopic partial nephrectomy). This paper briefly presents the main technique details of the multi-structure seg-mentation method in MICCAI 2022 KIPA challenge. The main contribution of this pap…
▽ More
Accurate and automated segmentation of multi-structure (i.e., kidneys, renal tu-mors, arteries, and veins) from 3D CTA is one of the most important tasks for surgery-based renal cancer treatment (e.g., laparoscopic partial nephrectomy). This paper briefly presents the main technique details of the multi-structure seg-mentation method in MICCAI 2022 KIPA challenge. The main contribution of this paper is that we design the 3D UNet with the large context information cap-turing capability. Our method ranked eighth on the MICCAI 2022 KIPA chal-lenge open testing dataset with a mean position of 8.2. Our code and trained models are publicly available at https://github.com/fengjiejiejiejie/kipa22_nnunet.
△ Less
Submitted 28 February, 2024; v1 submitted 8 August, 2022;
originally announced August 2022.
-
Domestic Activity Clustering from Audio via Depthwise Separable Convolutional Autoencoder Network
Authors:
Yanxiong Li,
Wenchang Cao,
Konstantinos Drossos,
Tuomas Virtanen
Abstract:
Automatic estimation of domestic activities from audio can be used to solve many problems, such as reducing the labor cost for nursing the elderly people. This study focuses on solving the problem of domestic activity clustering from audio. The target of domestic activity clustering is to cluster audio clips which belong to the same category of domestic activity into one cluster in an unsupervised…
▽ More
Automatic estimation of domestic activities from audio can be used to solve many problems, such as reducing the labor cost for nursing the elderly people. This study focuses on solving the problem of domestic activity clustering from audio. The target of domestic activity clustering is to cluster audio clips which belong to the same category of domestic activity into one cluster in an unsupervised way. In this paper, we propose a method of domestic activity clustering using a depthwise separable convolutional autoencoder network. In the proposed method, initial embeddings are learned by the depthwise separable convolutional autoencoder, and a clustering-oriented loss is designed to jointly optimize embedding refinement and cluster assignment. Different methods are evaluated on a public dataset (a derivative of the SINS dataset) used in the challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) in 2018. Our method obtains the normalized mutual information (NMI) score of 54.46%, and the clustering accuracy (CA) score of 63.64%, and outperforms state-of-the-art methods in terms of NMI and CA. In addition, both computational complexity and memory requirement of our method is lower than that of previous deep-model-based methods. Codes: https://github.com/vinceasvp/domestic-activity-clustering-from-audio
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
Strategic Asset Allocation with Illiquid Alternatives
Authors:
Eric Luxenberg,
Stephen Boyd,
Mykel Kochenderfer,
Misha van Beek,
Wen Cao,
Steven Diamond,
Alex Ulitsky,
Kunal Menda,
Vidy Vairavamurthy
Abstract:
We address the problem of strategic asset allocation (SAA) with portfolios that include illiquid alternative asset classes. The main challenge in portfolio construction with illiquid asset classes is that we do not have direct control over our positions, as we do in liquid asset classes. Instead we can only make commitments; the position builds up over time as capital calls come in, and reduces ov…
▽ More
We address the problem of strategic asset allocation (SAA) with portfolios that include illiquid alternative asset classes. The main challenge in portfolio construction with illiquid asset classes is that we do not have direct control over our positions, as we do in liquid asset classes. Instead we can only make commitments; the position builds up over time as capital calls come in, and reduces over time as distributions occur, neither of which the investor has direct control over. The effect on positions of our commitments is subject to a delay, typically of a few years, and is also unknown or stochastic. A further challenge is the requirement that we can meet the capital calls, with very high probability, with our liquid assets.
We formulate the illiquid dynamics as a random linear system, and propose a convex optimization based model predictive control (MPC) policy for allocating liquid assets and making new illiquid commitments in each period. Despite the challenges of time delay and uncertainty, we show that this policy attains performance surprisingly close to a fictional setting where we pretend the illiquid asset classes are completely liquid, and we can arbitrarily and immediately adjust our positions. In this paper we focus on the growth problem, with no external liabilities or income, but the method is readily extended to handle this case.
△ Less
Submitted 15 July, 2022;
originally announced July 2022.
-
Lesion classification by model-based feature extraction: A differential affine invariant model of soft tissue elasticity
Authors:
Weiguo Cao,
Marc J. Pomeroy,
Zhengrong Liang,
Yongfeng Gao,
Yongyi Shi,
Jiaxing Tan,
Fangfang Han,
**g Wang,
Jianhua Ma,
Hongbin Lu,
Almas F. Abbasi,
Perry J. Pickhardt
Abstract:
The elasticity of soft tissues has been widely considered as a characteristic property to differentiate between healthy and vicious tissues and, therefore, motivated several elasticity imaging modalities, such as Ultrasound Elastography, Magnetic Resonance Elastography, and Optical Coherence Elastography. This paper proposes an alternative approach of modeling the elasticity using Computed Tomogra…
▽ More
The elasticity of soft tissues has been widely considered as a characteristic property to differentiate between healthy and vicious tissues and, therefore, motivated several elasticity imaging modalities, such as Ultrasound Elastography, Magnetic Resonance Elastography, and Optical Coherence Elastography. This paper proposes an alternative approach of modeling the elasticity using Computed Tomography (CT) imaging modality for model-based feature extraction machine learning (ML) differentiation of lesions. The model describes a dynamic non-rigid (or elastic) deformation in differential manifold to mimic the soft tissues elasticity under wave fluctuation in vivo. Based on the model, three local deformation invariants are constructed by two tensors defined by the first and second order derivatives from the CT images and used to generate elastic feature maps after normalization via a novel signal suppression method. The model-based elastic image features are extracted from the feature maps and fed to machine learning to perform lesion classifications. Two pathologically proven image datasets of colon polyps (44 malignant and 43 benign) and lung nodules (46 malignant and 20 benign) were used to evaluate the proposed model-based lesion classification. The outcomes of this modeling approach reached the score of area under the curve of the receiver operating characteristics of 94.2 % for the polyps and 87.4 % for the nodules, resulting in an average gain of 5 % to 30 % over ten existing state-of-the-art lesion classification methods. The gains by modeling tissue elasticity for ML differentiation of lesions are striking, indicating the great potential of exploring the modeling strategy to other tissue properties for ML differentiation of lesions.
△ Less
Submitted 27 May, 2022;
originally announced May 2022.
-
Real-time Speech Emotion Recognition Based on Syllable-Level Feature Extraction
Authors:
Abdul Rehman,
Zhen-Tao Liu,
Min Wu,
Wei-Hua Cao,
Cheng-Shan Jiang
Abstract:
Speech emotion recognition systems have high prediction latency because of the high computational requirements for deep learning models and low generalizability mainly because of the poor reliability of emotional measurements across multiple corpora. To solve these problems, we present a speech emotion recognition system based on a reductionist approach of decomposing and analyzing syllable-level…
▽ More
Speech emotion recognition systems have high prediction latency because of the high computational requirements for deep learning models and low generalizability mainly because of the poor reliability of emotional measurements across multiple corpora. To solve these problems, we present a speech emotion recognition system based on a reductionist approach of decomposing and analyzing syllable-level features. Mel-spectrogram of an audio stream is decomposed into syllable-level components, which are then analyzed to extract statistical features. The proposed method uses formant attention, noise-gate filtering, and rolling normalization contexts to increase feature processing speed and tolerance to adversity. A set of syllable-level formant features is extracted and fed into a single hidden layer neural network that makes predictions for each syllable as opposed to the conventional approach of using a sophisticated deep learner to make sentence-wide predictions. The syllable level predictions help to achieve the real-time latency and lower the aggregated error in utterance level cross-corpus predictions. The experiments on IEMOCAP (IE), MSP-Improv (MI), and RAVDESS (RA) databases show that the method archives real-time latency while predicting with state-of-the-art cross-corpus unweighted accuracy of 47.6% for IE to MI and 56.2% for MI to IE.
△ Less
Submitted 22 February, 2023; v1 submitted 24 April, 2022;
originally announced April 2022.
-
Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention
Authors:
Yanxiong Li,
Wucheng Wang,
Hao Chen,
Wenchang Cao,
Wei Li,
Qianhua He
Abstract:
Although few-shot learning has attracted much attention from the fields of image and audio classification, few efforts have been made on few-shot speaker identification. In the task of few-shot learning, overfitting is a tough problem mainly due to the mismatch between training and testing conditions. In this paper, we propose a few-shot speaker identification method which can alleviate the overfi…
▽ More
Although few-shot learning has attracted much attention from the fields of image and audio classification, few efforts have been made on few-shot speaker identification. In the task of few-shot learning, overfitting is a tough problem mainly due to the mismatch between training and testing conditions. In this paper, we propose a few-shot speaker identification method which can alleviate the overfitting problem. In the proposed method, the model of a depthwise separable convolutional network with channel attention is trained with a prototypical loss function. Experimental datasets are extracted from three public speech corpora: Aishell-2, VoxCeleb1 and TORGO. Experimental results show that the proposed method exceeds state-of-the-art methods for few-shot speaker identification in terms of accuracy and F-score.
△ Less
Submitted 23 April, 2022;
originally announced April 2022.
-
Primal-dual Estimator Learning: an Offline Constrained Moving Horizon Estimation Method with Feasibility and Near-optimality Guarantees
Authors:
Wenhan Cao,
**gliang Duan,
Shengbo Eben Li,
Chen Chen,
Chang Liu,
Yu Wang
Abstract:
This paper proposes a primal-dual framework to learn a stable estimator for linear constrained estimation problems leveraging the moving horizon approach. To avoid the online computational burden in most existing methods, we learn a parameterized function offline to approximate the primal estimate. Meanwhile, a dual estimator is trained to check the suboptimality of the primal estimator during exe…
▽ More
This paper proposes a primal-dual framework to learn a stable estimator for linear constrained estimation problems leveraging the moving horizon approach. To avoid the online computational burden in most existing methods, we learn a parameterized function offline to approximate the primal estimate. Meanwhile, a dual estimator is trained to check the suboptimality of the primal estimator during execution time. Both the primal and dual estimators are learned from data using supervised learning techniques, and the explicit sample size is provided, which enables us to guarantee the quality of each learned estimator in terms of feasibility and optimality. This in turn allows us to bound the probability of the learned estimator being infeasible or suboptimal. Furthermore, we analyze the stability of the resulting estimator with a bounded error in the minimization of the cost function. Since our algorithm does not require the solution of an optimization problem during runtime, state estimates can be generated online almost instantly. Simulation results are presented to show the accuracy and time efficiency of the proposed framework compared to online optimization of moving horizon estimation and Kalman filter. To the best of our knowledge, this is the first learning-based state estimator with feasibility and near-optimality guarantees for linear constrained systems.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
Dealing with collinearity in large-scale linear system identification using Bayesian regularization
Authors:
Wenqi Cao,
Gianluigi Pillonetto
Abstract:
We consider the identification of large-scale linear and stable dynamic systems whose outputs may be the result of many correlated inputs. Hence, severe ill-conditioning may affect the estimation problem. This is a scenario often arising when modeling complex physical systems given by the interconnection of many sub-units where feedback and algebraic loops can be encountered. We develop a strategy…
▽ More
We consider the identification of large-scale linear and stable dynamic systems whose outputs may be the result of many correlated inputs. Hence, severe ill-conditioning may affect the estimation problem. This is a scenario often arising when modeling complex physical systems given by the interconnection of many sub-units where feedback and algebraic loops can be encountered. We develop a strategy based on Bayesian regularization where any impulse response is modeled as the realization of a zero-mean Gaussian process. The stable spline covariance is used to include information on smooth exponential decay of the impulse responses. We then design a new Markov chain Monte Carlo scheme that deals with collinearity and is able to efficiently reconstruct the posterior of the impulse responses. It is based on a variation of Gibbs sampling which updates possibly overlap** blocks of the parameter space on the basis of the level of collinearity affecting the different inputs. Numerical experiments are included to test the goodness of the approach where hundreds of impulse responses form the system and inputs correlation may be very high.
△ Less
Submitted 2 September, 2022; v1 submitted 25 March, 2022;
originally announced March 2022.
-
Proximal PanNet: A Model-Based Deep Network for Pansharpening
Authors:
Xiangyong Cao,
Yang Chen,
Wenfei Cao
Abstract:
Recently, deep learning techniques have been extensively studied for pansharpening, which aims to generate a high resolution multispectral (HRMS) image by fusing a low resolution multispectral (LRMS) image with a high resolution panchromatic (PAN) image. However, existing deep learning-based pansharpening methods directly learn the map** from LRMS and PAN to HRMS. These network architectures alw…
▽ More
Recently, deep learning techniques have been extensively studied for pansharpening, which aims to generate a high resolution multispectral (HRMS) image by fusing a low resolution multispectral (LRMS) image with a high resolution panchromatic (PAN) image. However, existing deep learning-based pansharpening methods directly learn the map** from LRMS and PAN to HRMS. These network architectures always lack sufficient interpretability, which limits further performance improvements. To alleviate this issue, we propose a novel deep network for pansharpening by combining the model-based methodology with the deep learning method. Firstly, we build an observation model for pansharpening using the convolutional sparse coding (CSC) technique and design a proximal gradient algorithm to solve this model. Secondly, we unfold the iterative algorithm into a deep network, dubbed as Proximal PanNet, by learning the proximal operators using convolutional neural networks. Finally, all the learnable modules can be automatically learned in an end-to-end manner. Experimental results on some benchmark datasets show that our network performs better than other advanced methods both quantitatively and qualitatively.
△ Less
Submitted 12 February, 2022;
originally announced March 2022.
-
On the Optimization Landscape of Dynamic Output Feedback Linear Quadratic Control
Authors:
**gliang Duan,
Wenhan Cao,
Yang Zheng,
Lin Zhao
Abstract:
The convergence of policy gradient algorithms hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies (controllers). We investig…
▽ More
The convergence of policy gradient algorithms hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies (controllers). We investigate the more challenging case of dynamic output-feedback policies for linear quadratic regulation (abbreviated as dLQR), which is prevalent in practice but has a rather complicated optimization landscape. We first show how the dLQR cost varies with the coordinate transformation of the dynamic controller and then derive the optimal transformation for a given observable stabilizing controller. One of our core results is the uniqueness of the stationary point of dLQR when it is observable, which provides an optimality certificate for solving dynamic controllers using policy gradient methods. Moreover, we establish conditions under which dLQR and linear quadratic Gaussian control are equivalent, thus providing a unified viewpoint of optimal control of both deterministic and stochastic linear systems. These results further shed light on designing policy gradient algorithms for more general decision-making problems with partially observed information.
△ Less
Submitted 29 October, 2023; v1 submitted 24 January, 2022;
originally announced January 2022.
-
Identification of Low Rank Vector Processes
Authors:
Wenqi Cao,
Giorgio Picci,
Anders Lindquist
Abstract:
We study modeling and identification of stationary processes with a spectral density matrix of low rank. Equivalently, we consider processes having an innovation of reduced dimension for which Prediction Error Methods (PEM) algorithms are not directly applicable. We show that these processes admit a special feedback structure with a deterministic feedback channel which can be used to split the ide…
▽ More
We study modeling and identification of stationary processes with a spectral density matrix of low rank. Equivalently, we consider processes having an innovation of reduced dimension for which Prediction Error Methods (PEM) algorithms are not directly applicable. We show that these processes admit a special feedback structure with a deterministic feedback channel which can be used to split the identification in two steps, one of which can be based on standard algorithms while the other is based on a deterministic least squares fit. Identifiability of the feedback system is analyzed and a unique identifiable structure is characterized. Simulations show that the proposed procedure works well in some simple examples.
△ Less
Submitted 13 January, 2023; v1 submitted 21 November, 2021;
originally announced November 2021.
-
Modeling of Low Rank Time Series
Authors:
Wenqi Cao,
Anders Lindquist,
Giorgio Picci
Abstract:
Rank-deficient stationary stochastic vector processes are present in many problems in network theory and dynamic factor analysis. In this paper we study hidden dynamical relations between the components of a discrete-time stochastic vector process and investigate their properties with respect to stability and causality. More specifically, we construct transfer functions with a full-rank input proc…
▽ More
Rank-deficient stationary stochastic vector processes are present in many problems in network theory and dynamic factor analysis. In this paper we study hidden dynamical relations between the components of a discrete-time stochastic vector process and investigate their properties with respect to stability and causality. More specifically, we construct transfer functions with a full-rank input process formed from selected components of the given vector process and having a vector process of the remaining components as output. An important question, which we answer in the negative, is whether it is always possible to find such a deterministic relation that is stable. If it is unstable, there must be feedback from output to input ensuring that stationarity is maintained. This leads to connections to robust control. We also show how our results could be used to investigate the structure of dynamic network models and the latent low-rank stochastic process in a dynamic factor model.
△ Less
Submitted 13 April, 2023; v1 submitted 24 September, 2021;
originally announced September 2021.
-
Capture Uncertainties in Deep Neural Networks for Safe Operation of Autonomous Driving Vehicles
Authors:
Liuhui Ding,
Dachuan Li,
Bowen Liu,
Wenxing Lan,
Bing Bai,
Qi Hao,
Weipeng Cao,
Ke Pei
Abstract:
Uncertainties in Deep Neural Network (DNN)-based perception and vehicle's motion pose challenges to the development of safe autonomous driving vehicles. In this paper, we propose a safe motion planning framework featuring the quantification and propagation of DNN-based perception uncertainties and motion uncertainties. Contributions of this work are twofold: (1) A Bayesian Deep Neural network mode…
▽ More
Uncertainties in Deep Neural Network (DNN)-based perception and vehicle's motion pose challenges to the development of safe autonomous driving vehicles. In this paper, we propose a safe motion planning framework featuring the quantification and propagation of DNN-based perception uncertainties and motion uncertainties. Contributions of this work are twofold: (1) A Bayesian Deep Neural network model which detects 3D objects and quantitatively captures the associated aleatoric and epistemic uncertainties of DNNs; (2) An uncertainty-aware motion planning algorithm (PU-RRT) that accounts for uncertainties in object detection and ego-vehicle's motion. The proposed approaches are validated via simulated complex scenarios built in CARLA. Experimental results show that the proposed motion planning scheme can cope with uncertainties of DNN-based perception and vehicle motion, and improve the operational safety of autonomous vehicles while still achieving desirable efficiency.
△ Less
Submitted 11 August, 2021;
originally announced August 2021.
-
Dual-Attention Enhanced BDense-UNet for Liver Lesion Segmentation
Authors:
Wenming Cao,
Philip L. H. Yu,
Gilbert C. S. Lui,
Keith W. H. Chiu,
Ho-Ming Cheng,
Yanwen Fang,
Man-Fung Yuen,
Wai-Kay Seto
Abstract:
In this work, we propose a new segmentation network by integrating DenseUNet and bidirectional LSTM together with attention mechanism, termed as DA-BDense-UNet. DenseUNet allows learning enough diverse features and enhancing the representative power of networks by regulating the information flow. Bidirectional LSTM is responsible to explore the relationships between the encoded features and the up…
▽ More
In this work, we propose a new segmentation network by integrating DenseUNet and bidirectional LSTM together with attention mechanism, termed as DA-BDense-UNet. DenseUNet allows learning enough diverse features and enhancing the representative power of networks by regulating the information flow. Bidirectional LSTM is responsible to explore the relationships between the encoded features and the up-sampled features in the encoding and decoding paths. Meanwhile, we introduce attention gates (AG) into DenseUNet to diminish responses of unrelated background regions and magnify responses of salient regions progressively. Besides, the attention in bidirectional LSTM takes into account the contribution differences of the encoded features and the up-sampled features in segmentation improvement, which can in turn adjust proper weights for these two kinds of features. We conduct experiments on liver CT image data sets collected from multiple hospitals by comparing them with state-of-the-art segmentation models. Experimental results indicate that our proposed method DA-BDense-UNet has achieved comparative performance in terms of dice coefficient, which demonstrates its effectiveness.
△ Less
Submitted 24 July, 2021;
originally announced July 2021.
-
High-Sensitivity Iodine Imaging by Combining Spectral CT Technologies
Authors:
Matthew Tivnan,
Grace Gang,
Wenchao Cao,
Nadav Shapira,
Peter B. Noel,
J. Webster Stayman
Abstract:
Spectral CT offers enhanced material discrimination over single-energy systems and enables quantitative estimation of basis material density images. Water/iodine decomposition in contrast-enhanced CT is one of the most widespread applications of this technology in the clinic. However, low concentrations of iodine can be difficult to estimate accurately, limiting potential clinical applications and…
▽ More
Spectral CT offers enhanced material discrimination over single-energy systems and enables quantitative estimation of basis material density images. Water/iodine decomposition in contrast-enhanced CT is one of the most widespread applications of this technology in the clinic. However, low concentrations of iodine can be difficult to estimate accurately, limiting potential clinical applications and/or raising injected contrast agent requirements. We seek high-sensitivity spectral CT system designs which minimize noise in water/iodine density estimates. In this work, we present a model-driven framework for spectral CT system design optimization to maximize material separability. We apply this tool to optimize the sensitivity spectra on a spectral CT test bench using a hybrid design which combines source kVp control and k-edge filtration. Following design optimization, we scanned a water/iodine phantom with the hybrid spectral CT system and performed dose-normalized comparisons to two single-technique designs which use only kVp control or only kedge filtration. The material decomposition results show that the hybrid system reduces both standard deviation and crossmaterial noise correlations compared to the designs where the constituent technologies are used individually.
△ Less
Submitted 29 March, 2021;
originally announced March 2021.
-
Approximate Optimal Filter for Linear Gaussian Time-invariant Systems
Authors:
Kaiming Tang,
Shengbo Eben Li,
Yuming Yin,
Yang Guan,
**gliang Duan,
Wenhan Cao,
Jie Li
Abstract:
State estimation is critical to control systems, especially when the states cannot be directly measured. This paper presents an approximate optimal filter, which enables to use policy iteration technique to obtain the steady-state gain in linear Gaussian time-invariant systems. This design transforms the optimal filtering problem with minimum mean square error into an optimal control problem, call…
▽ More
State estimation is critical to control systems, especially when the states cannot be directly measured. This paper presents an approximate optimal filter, which enables to use policy iteration technique to obtain the steady-state gain in linear Gaussian time-invariant systems. This design transforms the optimal filtering problem with minimum mean square error into an optimal control problem, called Approximate Optimal Filtering (AOF) problem. The equivalence holds given certain conditions about initial state distributions and policy formats, in which the system state is the estimation error, control input is the filter gain, and control objective function is the accumulated estimation error. We present a policy iteration algorithm to solve the AOF problem in steady-state. A classic vehicle state estimation problem finally evaluates the approximate filter. The results show that the policy converges to the steady-state Kalman gain, and its accuracy is within 2 %.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
Modeling and Identification of Low Rank Vector Processes
Authors:
Giorgio Picci,
Wenqi Cao,
Anders Lindquist
Abstract:
We study modeling and identification of processes with a spectral density matrix of low rank. Equivalently, we consider processes having an innovation of reduced dimension for which Prediction Error Methods (PEM) algorithms are not directly applicable. We show that these processes admit a special feedback structure with a deterministic feedback channel which can be used to split the identification…
▽ More
We study modeling and identification of processes with a spectral density matrix of low rank. Equivalently, we consider processes having an innovation of reduced dimension for which Prediction Error Methods (PEM) algorithms are not directly applicable. We show that these processes admit a special feedback structure with a deterministic feedback channel which can be used to split the identification in two steps, one of which can be based on standard algorithms while the other is based on a deterministic least squares fit.
△ Less
Submitted 10 May, 2021; v1 submitted 9 December, 2020;
originally announced December 2020.
-
Machine Learning of Partial Differential Equations from Noise Data
Authors:
Wenbo Cao,
Weiwei Zhang
Abstract:
Machine learning of partial differential equations from data is a potential breakthrough to solve the lack of physical equations in complex dynamic systems, but because numerical differentiation is ill-posed to noise data, noise has become the biggest obstacle in the application of partial differential equation identification method. To overcome this problem, we propose Frequency Domain Identifica…
▽ More
Machine learning of partial differential equations from data is a potential breakthrough to solve the lack of physical equations in complex dynamic systems, but because numerical differentiation is ill-posed to noise data, noise has become the biggest obstacle in the application of partial differential equation identification method. To overcome this problem, we propose Frequency Domain Identification method based on Fourier transforms, which effectively eliminates the influence of noise by using the low frequency component of frequency domain data to identify partial differential equations in frequency domain. We also propose a new sparse identification criterion, which can accurately identify the terms in the equation from low signal-to-noise ratio data. Through identifying a variety of canonical equations spanning a number of scientific domains, the proposed method is proved to have high accuracy and robustness for equation structure and parameters identification for low signal-to-noise ratio data. The method provides a promising technique to discover potential partial differential equations from noisy experimental data.
△ Less
Submitted 17 November, 2021; v1 submitted 28 September, 2020;
originally announced October 2020.
-
Reinforcement Solver for H-infinity Filter with Bounded Noise
Authors:
Jie Li,
Shengbo Eben Li,
Kaiming Tang,
Yao Lv,
Wenhan Cao
Abstract:
H-infinity filter has been widely applied in engineering field, but cop** with bounded noise is still an open problem and difficult to solve. This paper considers the H-infinity filtering problem for linear system with bounded process and measurement noise. The problem is first formulated as a zero-sum game where the dynamic of estimation error is non-affine with respect to filter gain and measu…
▽ More
H-infinity filter has been widely applied in engineering field, but cop** with bounded noise is still an open problem and difficult to solve. This paper considers the H-infinity filtering problem for linear system with bounded process and measurement noise. The problem is first formulated as a zero-sum game where the dynamic of estimation error is non-affine with respect to filter gain and measurement noise. A nonquadratic Hamilton-Jacobi-Isaacs (HJI) equation is then derived by employing a nonquadratic cost to characterize bounded noise, which is extremely difficult to solve due to its non-affine and nonlinear properties. Next, a reinforcement learning algorithm based on gradient descent method which can handle nonlinearity is proposed to update the gain of reinforcement filter, where measurement noise is fixed to tackle non-affine property and increase the convexity of Hamiltonian. Two examples demonstrate the convergence and effectiveness of the proposed algorithm.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
Nonparametric Estimation of the Fisher Information and Its Applications
Authors:
Wei Cao,
Alex Dytso,
Michael Fauß,
H. Vincent Poor,
Gang Feng
Abstract:
This paper considers the problem of estimation of the Fisher information for location from a random sample of size $n$. First, an estimator proposed by Bhattacharya is revisited and improved convergence rates are derived. Second, a new estimator, termed a clipped estimator, is proposed. Superior upper bounds on the rates of convergence can be shown for the new estimator compared to the Bhattachary…
▽ More
This paper considers the problem of estimation of the Fisher information for location from a random sample of size $n$. First, an estimator proposed by Bhattacharya is revisited and improved convergence rates are derived. Second, a new estimator, termed a clipped estimator, is proposed. Superior upper bounds on the rates of convergence can be shown for the new estimator compared to the Bhattacharya estimator, albeit with different regularity conditions. Third, both of the estimators are evaluated for the practically relevant case of a random variable contaminated by Gaussian noise. Moreover, using Brown's identity, which relates the Fisher information and the minimum mean squared error (MMSE) in Gaussian noise, two corresponding consistent estimators for the MMSE are proposed. Simulation examples for the Bhattacharya estimator and the clipped estimator as well as the MMSE estimators are presented. The examples demonstrate that the clipped estimator can significantly reduce the required sample size to guarantee a specific confidence interval compared to the Bhattacharya estimator.
△ Less
Submitted 7 May, 2020;
originally announced May 2020.
-
A 6G White Paper on Connectivity for Remote Areas
Authors:
Harri Saarnisaari,
Sudhir Dixit,
Mohamed-Slim Alouini,
Abdelaali Chaoub,
Marco Giordani,
Adrian Kliks,
Marja Matinmikko-Blue,
Nan Zhang,
Anuj Agrawal,
Mats Andersson,
Vimal Bhatia,
Wei Cao,
Yunfei Chen,
Wei Feng,
Marjo Heikkilä,
Josep M. Jornet,
Luciano Mendes,
Heikki Karvonen,
Brejesh Lall,
Matti Latva-aho,
Xiangling Li,
Kalle Lähetkangas,
Moshe T. Masonta,
Alok Pandey,
Pekka Pirinen
, et al. (9 additional authors not shown)
Abstract:
In many places all over the world rural and remote areas lack proper connectivity that has led to increasing digital divide. These areas might have low population density, low incomes, etc., making them less attractive places to invest and operate connectivity networks. 6G could be the first mobile radio generation truly aiming to close the digital divide. However, in order to do so, special requi…
▽ More
In many places all over the world rural and remote areas lack proper connectivity that has led to increasing digital divide. These areas might have low population density, low incomes, etc., making them less attractive places to invest and operate connectivity networks. 6G could be the first mobile radio generation truly aiming to close the digital divide. However, in order to do so, special requirements and challenges have to be considered since the beginning of the design process. The aim of this white paper is to discuss requirements and challenges and point out related, identified research topics that have to be solved in 6G. This white paper first provides a generic discussion, shows some facts and discusses targets set in international bodies related to rural and remote connectivity and digital divide. Then the paper digs into technical details, i.e., into a solutions space. Each technical section ends with a discussion and then highlights identified 6G challenges and research ideas as a list.
△ Less
Submitted 30 April, 2020;
originally announced April 2020.
-
Spectral Rank, Feedback, Causality and the Indirect Method for CARMA Identification
Authors:
Wenqi Cao,
Anders Lindquist,
Giorgio Picci
Abstract:
Building on a recent paper by Georgiou and Lindquist [1] on the problem of rank deficiency of spectral densities and hidden dynamical relations after sampling of continuous-time stochastic processes, this paper is devoted to understanding related questions of feedback and Granger causality that affect stability properties. This then naturally connects to CARMA identification, where we remark on ce…
▽ More
Building on a recent paper by Georgiou and Lindquist [1] on the problem of rank deficiency of spectral densities and hidden dynamical relations after sampling of continuous-time stochastic processes, this paper is devoted to understanding related questions of feedback and Granger causality that affect stability properties. This then naturally connects to CARMA identification, where we remark on certain oversights in the literature.
△ Less
Submitted 7 June, 2020; v1 submitted 6 April, 2020;
originally announced April 2020.
-
An efficient surrogate-aided importance sampling framework for reliability analysis
Authors:
Wang-Sheng Liu,
Sai Hung Cheung,
Wen-Jun Cao
Abstract:
Surrogates in lieu of expensive-to-evaluate performance functions can accelerate the reliability analysis greatly. This paper proposes a new two-stage framework for surrogate-aided reliability analysis named Surrogates for Importance Sampling (S4IS). In the first stage, a coarse surrogate is built to gain the information about failure regions; the second stage zooms into the important regions and…
▽ More
Surrogates in lieu of expensive-to-evaluate performance functions can accelerate the reliability analysis greatly. This paper proposes a new two-stage framework for surrogate-aided reliability analysis named Surrogates for Importance Sampling (S4IS). In the first stage, a coarse surrogate is built to gain the information about failure regions; the second stage zooms into the important regions and improves the accuracy of the failure probability estimator by adaptively selecting support points therein. The learning functions are proposed to guide the selection of support points such that the exploration and exploitation can be dynamically balanced. As a generic framework, S4IS has the potential to incorporate different types of surrogates (Gaussian Processes, Support Vector Machines, Neural Network, etc.). The effectiveness and efficiency of S4IS is validated by five illustrative examples, which involve system reliability, highly nonlinear limit-state function, small failure probability and moderately high dimensionality. The implementation of S4IS is made available to download at https://github.com/RobinSeaside/S4IS.
△ Less
Submitted 22 January, 2020;
originally announced January 2020.
-
Neural Network-Inspired Analog-to-Digital Conversion to Achieve Super-Resolution with Low-Precision RRAM Devices
Authors:
Weidong Cao,
Liu Ke,
Ayan Chakrabarti,
Xuan Zhang
Abstract:
Recent works propose neural network- (NN-) inspired analog-to-digital converters (NNADCs) and demonstrate their great potentials in many emerging applications. These NNADCs often rely on resistive random-access memory (RRAM) devices to realize the NN operations and require high-precision RRAM cells (6~12-bit) to achieve a moderate quantization resolution (4~8-bit). Such optimistic assumption of RR…
▽ More
Recent works propose neural network- (NN-) inspired analog-to-digital converters (NNADCs) and demonstrate their great potentials in many emerging applications. These NNADCs often rely on resistive random-access memory (RRAM) devices to realize the NN operations and require high-precision RRAM cells (6~12-bit) to achieve a moderate quantization resolution (4~8-bit). Such optimistic assumption of RRAM resolution, however, is not supported by fabrication data of RRAM arrays in large-scale production process. In this paper, we propose an NN-inspired super-resolution ADC based on low-precision RRAM devices by taking the advantage of a co-design methodology that combines a pipelined hardware architecture with a custom NN training framework. Results obtained from SPICE simulations demonstrate that our method leads to robust design of a 14-bit super-resolution ADC using 3-bit RRAM devices with improved power and speed performance and competitive figure-of-merits (FoMs). In addition to the linear uniform quantization, the proposed ADC can also support configurable high-resolution nonlinear quantization with high conversion speed and low conversion energy, enabling future intelligent analog-to-information interfaces for near-sensor analytics and processing.
△ Less
Submitted 28 November, 2019;
originally announced November 2019.