Search | arXiv e-print repository

arXiv:2403.12230 [pdf]

Motion and temporal B0 shift corrections for quantitative susceptibility map** (QSM) and R2* map** using dual-echo spiral navigators and conjugate-phase reconstruction

Authors: Yuguang Meng, Jason W. Allen, Vahid Khalilzad Sharghi, Deqiang Qiu

Abstract: Purpose: To develop an efficient navigator-based motion and temporal B0 shift correction technique for 3D multi-echo gradient-echo (ME-GRE) MRI for quantitative susceptibility map** (QSM) and R2* map**. Theory and Methods: A dual-echo 3D spiral navigator was designed to interleave with the Cartesian ME-GRE acquisitions, allowing the acquisition of both low- and high-echo time signals. We addit… ▽ More Purpose: To develop an efficient navigator-based motion and temporal B0 shift correction technique for 3D multi-echo gradient-echo (ME-GRE) MRI for quantitative susceptibility map** (QSM) and R2* map**. Theory and Methods: A dual-echo 3D spiral navigator was designed to interleave with the Cartesian ME-GRE acquisitions, allowing the acquisition of both low- and high-echo time signals. We additionally designed a novel conjugate-phase based reconstruction method for the joint correction of motion and temporal B0 shifts. We performed both numerical simulation and in vivo human scans to assess the performance of the methods. Results: Numerical simulation and human brain scans demonstrated that the proposed technique successfully corrected artifacts induced by both head motions and temporal B0 changes. Efficient B0-change correction with conjugate-phase reconstruction can be performed on less than 10 clustered k-space segments. In vivo scans showed that combining temporal B0 correction with motion correction further reduced artifacts and improved image quality in both R2* and QSM images. Conclusion: Our proposed approach of using 3D spiral navigators and a novel conjugate-phase reconstruction method can improve susceptibility-related measurements using MR. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 7 figures

arXiv:2312.08553 [pdf, other]

USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models

Authors: Shao** Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Zhonglin Han, Jian Li, Amir Yazdanbakhsh, Shivani Agrawal

Abstract: End-to-end automatic speech recognition (ASR) models have seen revolutionary quality gains with the recent development of large-scale universal speech models (USM). However, deploying these massive USMs is extremely expensive due to the enormous memory usage and computational cost. Therefore, model compression is an important research topic to fit USM-based ASR under budget in real-world scenarios… ▽ More End-to-end automatic speech recognition (ASR) models have seen revolutionary quality gains with the recent development of large-scale universal speech models (USM). However, deploying these massive USMs is extremely expensive due to the enormous memory usage and computational cost. Therefore, model compression is an important research topic to fit USM-based ASR under budget in real-world scenarios. In this study, we propose a USM fine-tuning approach for ASR, with a low-bit quantization and N:M structured sparsity aware paradigm on the model weights, reducing the model complexity from parameter precision and matrix topology perspectives. We conducted extensive experiments with a 2-billion parameter USM on a large-scale voice search dataset to evaluate our proposed method. A series of ablation studies validate the effectiveness of up to int4 quantization and 2:4 sparsity. However, a single compression technique fails to recover the performance well under extreme setups including int2 quantization and 1:4 sparsity. By contrast, our proposed method can compress the model to have 9.4% of the size, at the cost of only 7.3% relative word error rate (WER) regressions. We also provided in-depth analyses on the results and discussions on the limitations and potential solutions, which would be valuable for future studies. △ Less

Submitted 16 January, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: Accepted by ICASSP 2024. Preprint

arXiv:2307.03870 [pdf, other]

Opacity of Parametric Discrete Event Systems: Models, Decidability, and Algorithms

Authors: Weilin Deng, Daowen Qiu, **gkai Yang

Abstract: Finite automata (FAs) model is a popular tool to characterize discrete event systems (DESs) due to its succinctness. However, for some complex systems, it is difficult to describe the necessary details by means of FAs model. In this paper, we consider a kind of extended finite automata (EFAs) in which each transition carries a predicate over state and event parameters. We also consider a type of s… ▽ More Finite automata (FAs) model is a popular tool to characterize discrete event systems (DESs) due to its succinctness. However, for some complex systems, it is difficult to describe the necessary details by means of FAs model. In this paper, we consider a kind of extended finite automata (EFAs) in which each transition carries a predicate over state and event parameters. We also consider a type of simplified EFAs, called Event-Parameters EFAs (EP-EFAs), where the state parameters are removed. Based upon these two parametric models, we investigate the problem of opacity analysis for parametric DESs. First of all, it is shown that EFAs model is more expressive than EP-EFAs model. Secondly, it is proved that the opacity properties for EFAs are undecidable in general. Moreover, the decidable opacity properties for EP-EFAs are investigated. We present the verification algorithms for current-state opacity, initial-state opacity and infinite-step opacity, and then discuss the complexity. This paper establishes a preliminary theory for the opacity of parametric DESs, which lays a foundation for the opacity analysis of complex systems. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: 13 pages, 9 figures

arXiv:2307.02015 [pdf, other]

Model-based T1, T2* and Proton Density Map** Using a Bayesian Approach with Parameter Estimation and Complementary Undersampling Patterns

Authors: Shuai Huang, James J. Lah, Jason W. Allen, Deqiang Qiu

Abstract: Purpose: To achieve automatic hyperparameter estimation for the joint recovery of quantitative MR images, we propose a Bayesian formulation of the reconstruction problem that incorporates the signal model. Additionally, we investigate the use of complementary undersampling patterns to determine optimal undersampling schemes for quantitative MRI. Theory: We introduce a novel nonlinear approximate… ▽ More Purpose: To achieve automatic hyperparameter estimation for the joint recovery of quantitative MR images, we propose a Bayesian formulation of the reconstruction problem that incorporates the signal model. Additionally, we investigate the use of complementary undersampling patterns to determine optimal undersampling schemes for quantitative MRI. Theory: We introduce a novel nonlinear approximate message passing framework, referred to as ``AMP-PE'', that enables the simultaneous recovery of distribution parameters and quantitative maps. Methods: We employed the variable flip angle multi-echo (VFA-ME) method to acquire measurements. Both retrospective and prospective undersampling approaches were utilized to obtain Fourier measurements using variable-density and Poisson-disk patterns. Furthermore, we extensively explored various undersampling schemes, incorporating complementary patterns across different flip angles and/or echo times. Results: AMP-PE adopts a model-based joint recovery strategy, it outperforms the $l_1$-norm minimization approach that follows a decoupled recovery strategy. A comparison with an existing joint-recovery approach further demonstrates the advantageous outcomes of AMP-PE. For quantitative $T_1$ map** using VFA-ME, employing identical k-space sampling patterns across different echo times produced the best performance. Whereas for $T_2^*$ and proton density map**s, using complementary sampling patterns across different flip angles yielded the best performance. Conclusion: AMP-PE is equipped with built-in parameter estimation, and works naturally in clinical settings with varying acquisition protocols and scanners. It also achieves improved performance by combining information from the MR signal model and the sparse prior on images. △ Less

Submitted 10 September, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

arXiv:2305.16619 [pdf, other]

2-bit Conformer quantization for automatic speech recognition

Authors: Oleg Rybakov, Phoenix Meadowlark, Shao** Ding, David Qiu, Jian Li, David Rim, Yanzhang He

Abstract: Large speech models are rapidly gaining traction in research community. As a result, model compression has become an important topic, so that these models can fit in memory and be served with reduced cost. Practical approaches for compressing automatic speech recognition (ASR) model use int8 or int4 weight quantization. In this study, we propose to develop 2-bit ASR models. We explore the impact o… ▽ More Large speech models are rapidly gaining traction in research community. As a result, model compression has become an important topic, so that these models can fit in memory and be served with reduced cost. Practical approaches for compressing automatic speech recognition (ASR) model use int8 or int4 weight quantization. In this study, we propose to develop 2-bit ASR models. We explore the impact of symmetric and asymmetric quantization combined with sub-channel quantization and clip** on both LibriSpeech dataset and large-scale training data. We obtain a lossless 2-bit Conformer model with 32% model size reduction when compared to state of the art 4-bit Conformer model for LibriSpeech. With the large-scale training data, we obtain a 2-bit Conformer model with over 40% model size reduction against the 4-bit version at the cost of 17% relative word error rate degradation △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: submitted to Interspeech

arXiv:2305.15536 [pdf, other]

RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

Authors: David Qiu, David Rim, Shao** Ding, Oleg Rybakov, Yanzhang He

Abstract: With the rapid increase in the size of neural networks, model compression has become an important area of research. Quantization is an effective technique at decreasing the model size, memory access, and compute load of large models. Despite recent advances in quantization aware training (QAT) technique, most papers present evaluations that are focused on computer vision tasks, which have differen… ▽ More With the rapid increase in the size of neural networks, model compression has become an important area of research. Quantization is an effective technique at decreasing the model size, memory access, and compute load of large models. Despite recent advances in quantization aware training (QAT) technique, most papers present evaluations that are focused on computer vision tasks, which have different training dynamics compared to sequence tasks. In this paper, we first benchmark the impact of popular techniques such as straight through estimator, pseudo-quantization noise, learnable scale parameter, clip**, etc. on 4-bit seq2seq models across a suite of speech recognition datasets ranging from 1,000 hours to 1 million hours, as well as one machine translation dataset to illustrate its applicability outside of speech. Through the experiments, we report that noise based QAT suffers when there is insufficient regularization signal flowing back to the quantization scale. We propose low complexity changes to the QAT process to improve model accuracy (outperforming popular learnable scale and clip** methods). With the improved accuracy, it opens up the possibility to exploit some of the other benefits of noise based QAT: 1) training a single model that performs well in mixed precision mode and 2) improved generalization on long form speech recognition. △ Less

Submitted 24 May, 2023; originally announced May 2023.

arXiv:2207.14709 [pdf, other]

Robust Quantitative Susceptibility Map** via Approximate Message Passing with Parameter Estimation

Authors: Shuai Huang, James J. Lah, Jason W. Allen, Deqiang Qiu

Abstract: Purpose: For quantitative susceptibility map** (QSM), the lack of ground-truth in clinical settings makes it challenging to determine suitable parameters for the dipole inversion. We propose a probabilistic Bayesian approach for QSM with built-in parameter estimation, and incorporate the nonlinear formulation of the dipole inversion to achieve a robust recovery of the susceptibility maps. Theo… ▽ More Purpose: For quantitative susceptibility map** (QSM), the lack of ground-truth in clinical settings makes it challenging to determine suitable parameters for the dipole inversion. We propose a probabilistic Bayesian approach for QSM with built-in parameter estimation, and incorporate the nonlinear formulation of the dipole inversion to achieve a robust recovery of the susceptibility maps. Theory: From a Bayesian perspective, the image wavelet coefficients are approximately sparse and modelled by the Laplace distribution. The measurement noise is modelled by a Gaussian-mixture distribution with two components, where the second component is used to model the noise outliers. Through probabilistic inference, the susceptibility map and distribution parameters can be jointly recovered using approximate message passing (AMP). Methods: We compare our proposed AMP with built-in parameter estimation (AMP-PE) to the state-of-the-art L1-QSM, FANSI and MEDI approaches on the simulated and in vivo datasets, and perform experiments to explore the optimal settings of AMP-PE. Reproducible code is available at https://github.com/EmoryCN2L/QSM_AMP_PE Results: On the simulated Sim2Snr1 dataset, AMP-PE achieved the lowest NRMSE, DFCM and the highest SSIM, while MEDI achieved the lowest HFEN. On the in vivo datasets, AMP-PE is robust and successfully recovers the susceptibility maps using the estimated parameters, whereas L1-QSM, FANSI and MEDI typically require additional visual fine-tuning to select or double-check working parameters. Conclusion: AMP-PE provides automatic and adaptive parameter estimation for QSM and avoids the subjectivity from the visual fine-tuning step, making it an excellent choice for the clinical setting. △ Less

Submitted 30 May, 2023; v1 submitted 29 July, 2022; originally announced July 2022.

Comments: Keywords: Approximate message passing, Compressive sensing, Outlier modelling, Parameter estimation, Quantitative susceptibility map**

arXiv:2205.10448 [pdf, other]

doi 10.1109/TSP.2022.3167516

Approximate Message Passing with Parameter Estimation for Heavily Quantized Measurements

Authors: Shuai Huang, Deqiang Qiu, Trac D. Tran

Abstract: Designing efficient sparse recovery algorithms that could handle noisy quantized measurements is important in a variety of applications -- from radar to source localization, spectrum sensing and wireless networking. We take advantage of the approximate message passing (AMP) framework to achieve this goal given its high computational efficiency and state-of-the-art performance. In AMP, the signal o… ▽ More Designing efficient sparse recovery algorithms that could handle noisy quantized measurements is important in a variety of applications -- from radar to source localization, spectrum sensing and wireless networking. We take advantage of the approximate message passing (AMP) framework to achieve this goal given its high computational efficiency and state-of-the-art performance. In AMP, the signal of interest is assumed to follow certain prior distribution with unknown parameters. Previous works focused on finding the parameters that maximize the measurement likelihood via expectation maximization -- an increasingly difficult problem to solve in cases involving complicated probability models. In this paper, we treat the parameters as unknown variables and compute their posteriors via AMP. The parameters and signal of interest can then be jointly recovered. Compared to previous methods, the proposed approach leads to a simple and elegant parameter estimation scheme, allowing us to directly work with 1-bit quantization noise model. We then further extend our approach to general multi-bit quantization noise model. Experimental results show that the proposed framework provides significant improvement over state-of-the-art methods across a wide range of sparsity and noise levels. △ Less

Submitted 20 May, 2022; originally announced May 2022.

Comments: arXiv admin note: text overlap with arXiv:2007.07679

Journal ref: IEEE Transactions on Signal Processing, Vol. 70, pp. 2062-2077, Apr. 2022

arXiv:2110.03327 [pdf, other]

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

Authors: Qiujia Li, Yu Zhang, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland

Abstract: As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems. Recent research has shown that model-based confidence estimators have a significant advantage over using the output softmax probabilities. If the input data to the speech recogniser is from mismatched acoustic and linguistic conditions,… ▽ More As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems. Recent research has shown that model-based confidence estimators have a significant advantage over using the output softmax probabilities. If the input data to the speech recogniser is from mismatched acoustic and linguistic conditions, the ASR performance and the corresponding confidence estimators may exhibit severe degradation. Since confidence models are often trained on the same in-domain data as the ASR, generalising to out-of-domain (OOD) scenarios is challenging. By kee** the ASR model untouched, this paper proposes two approaches to improve the model-based confidence estimators on OOD data: using pseudo transcriptions and an additional OOD language model. With an ASR model trained on LibriSpeech, experiments show that the proposed methods can greatly improve the confidence metrics on TED-LIUM and Switchboard datasets while preserving in-domain performance. Furthermore, the improved confidence estimators are better calibrated on OOD data and can provide a much more reliable criterion for data selection. △ Less

Submitted 2 March, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Comments: Accepted as a conference paper at ICASSP 2022

arXiv:2110.00165 [pdf, other]

Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning

Authors: Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He

Abstract: Self- and semi-supervised learning methods have been actively investigated to reduce labeled training data or enhance the model performance. However, the approach mostly focus on in-domain performance for public datasets. In this study, we utilize the combination of self- and semi-supervised learning methods to solve unseen domain adaptation problem in a large-scale production setting for online A… ▽ More Self- and semi-supervised learning methods have been actively investigated to reduce labeled training data or enhance the model performance. However, the approach mostly focus on in-domain performance for public datasets. In this study, we utilize the combination of self- and semi-supervised learning methods to solve unseen domain adaptation problem in a large-scale production setting for online ASR model. This approach demonstrates that using the source domain data with a small fraction of the target domain data (3%) can recover the performance gap compared to a full data baseline: relative 13.5% WER improvement for target domain data. △ Less

Submitted 15 February, 2022; v1 submitted 30 September, 2021; originally announced October 2021.

Comments: ICASSP 2022 accepted, 5 pages, 2 figures, 5 tables

arXiv:2104.12870 [pdf, other]

Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction

Authors: David Qiu, Yanzhang He, Qiujia Li, Yu Zhang, Liangliang Cao, Ian McGraw

Abstract: Confidence scores are very useful for downstream applications of automatic speech recognition (ASR) systems. Recent works have proposed using neural networks to learn word or utterance confidence scores for end-to-end ASR. In those studies, word confidence by itself does not model deletions, and utterance confidence does not take advantage of word-level training signals. This paper proposes to joi… ▽ More Confidence scores are very useful for downstream applications of automatic speech recognition (ASR) systems. Recent works have proposed using neural networks to learn word or utterance confidence scores for end-to-end ASR. In those studies, word confidence by itself does not model deletions, and utterance confidence does not take advantage of word-level training signals. This paper proposes to jointly learn word confidence, word deletion, and utterance confidence. Empirical results show that multi-task learning with all three objectives improves confidence metrics (NCE, AUC, RMSE) without the need for increasing the model size of the confidence estimation module. Using the utterance-level confidence for rescoring also decreases the word error rates on Google's Voice Search and Long-tail Maps datasets by 3-5% relative, without needing a dedicated neural rescorer. △ Less

Submitted 26 April, 2021; originally announced April 2021.

Comments: Submitted to Interspeech 2021

arXiv:2104.09753 [pdf, other]

Supervisory Control of Quantum Discrete Event Systems

Authors: Daowen Qiu

Abstract: Discrete event systems (DES) have been deeply developed and applied in practice, but state complexity in DES still is an important problem to be better solved with innovative methods. With the development of quantum computing and quantum control, a natural problem is to simulate DES by means of quantum computing models and to establish {\it quantum DES} (QDES). The motivation is twofold: on the on… ▽ More Discrete event systems (DES) have been deeply developed and applied in practice, but state complexity in DES still is an important problem to be better solved with innovative methods. With the development of quantum computing and quantum control, a natural problem is to simulate DES by means of quantum computing models and to establish {\it quantum DES} (QDES). The motivation is twofold: on the one hand, QDES have potential applications when DES are simulated and processed by quantum computers, where quantum systems are employed to simulate the evolution of states driven by discrete events, and on the other hand, QDES may have essential advantages over DES concerning state complexity for imitating some practical problems. So, the goal of this paper is to establish a basic framework of QDES by using {\it quantum finite automata} (QFA) as the modelling formalisms, and the supervisory control theorems of QDES are established and proved. Then we present a polynomial-time algorithm to decide whether or not the controllability condition holds. In particular, we construct a number of new examples of QFA to illustrate the supervisory control of QDES and to verify the essential advantages of QDES over classical DES in state complexity. △ Less

Submitted 3 May, 2023; v1 submitted 20 April, 2021; originally announced April 2021.

Comments: 35 pages, 5 figures; comments are welcome

arXiv:2103.06716 [pdf, other]

Learning Word-Level Confidence For Subword End-to-End ASR

Authors: David Qiu, Qiujia Li, Yanzhang He, Yu Zhang, Bo Li, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li, Ke Hu, Tara N. Sainath, Ian McGraw

Abstract: We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR). Although prior works have proposed training auxiliary confidence models for ASR systems, they do not extend naturally to systems that operate on word-pieces (WP) as their vocabulary. In particular, ground truth WP correctness labels are needed for training confi… ▽ More We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR). Although prior works have proposed training auxiliary confidence models for ASR systems, they do not extend naturally to systems that operate on word-pieces (WP) as their vocabulary. In particular, ground truth WP correctness labels are needed for training confidence models, but the non-unique tokenization from word to WP causes inaccurate labels to be generated. This paper proposes and studies two confidence models of increasing complexity to solve this problem. The final model uses self-attention to directly learn word-level confidence without needing subword tokenization, and exploits full context features from multiple hypotheses to improve confidence accuracy. Experiments on Voice Search and long-tail test sets show standard metrics (e.g., NCE, AUC, RMSE) improving substantially. The proposed confidence module also enables a model selection approach to combine an on-device E2E model with a hybrid model on the server to address the rare word recognition problem for the E2E model. △ Less

Submitted 11 March, 2021; originally announced March 2021.

Comments: To appear in ICASSP 2021

arXiv:2103.05535 [pdf, other]

doi 10.1002/mrm.29303

A Probabilistic Bayesian Approach to Recover $R_2^*$ map and Phase Images for Quantitative Susceptibility Map**

Authors: Shuai Huang, James J. Lah, Jason W. Allen, Deqiang Qiu

Abstract: Purpose: Undersampling is used to reduce the scan time for high-resolution 3D magnetic resonance imaging. In order to achieve better image quality and avoid manual parameter tuning, we propose a probabilistic Bayesian approach to recover $R_2^*$ map and phase images for quantitative susceptibility map** (QSM), while allowing automatic parameter estimation from undersampled data. Theory: Sparse… ▽ More Purpose: Undersampling is used to reduce the scan time for high-resolution 3D magnetic resonance imaging. In order to achieve better image quality and avoid manual parameter tuning, we propose a probabilistic Bayesian approach to recover $R_2^*$ map and phase images for quantitative susceptibility map** (QSM), while allowing automatic parameter estimation from undersampled data. Theory: Sparse prior on the wavelet coefficients of images is interpreted from a Bayesian perspective as sparsity-promoting distribution. A novel nonlinear approximate message passing (AMP) framework that incorporates a mono-exponential decay model is proposed. The parameters are treated as unknown variables and jointly estimated with image wavelet coefficients. Results: The proposed AMP with parameter estimation (AMP-PE) approach successfully recovers $R_2^*$ maps and phase images for QSM across various undersampling rates. It is more computationally efficient, and performs better than the state-of-the-art $l_1$-norm regularization (L1) approach in general, except a few cases where the L1 approach performs as well as AMP-PE. Conclusion: AMP-PE achieves better performance by drawing information from both the sparse prior and the mono-exponential decay model. It does not require parameter tuning, and works with a clinical, prospective undersampling scheme where parameter tuning is often impossible or difficult due to the lack of ground-truth image. △ Less

Submitted 9 July, 2022; v1 submitted 9 March, 2021; originally announced March 2021.

Comments: Keywords: Approximate Message Passing, Compressive Sensing, Parameter Estimation, Quantitative Susceptibility Map**, R2* map**, T2* map**, Undersampling

Journal ref: Magnetic Resonance in Medicine,Vol. 88 (4), pp. 1624-1642, Jun. 2022

arXiv:2010.11428 [pdf, other]

Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition

Authors: Qiujia Li, David Qiu, Yu Zhang, Bo Li, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman

Abstract: For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions. In traditional hidden Markov model-based automatic speech recognition (ASR) systems, confidence scores can be reliably obtained from word posteriors in decoding lattices. However, for an ASR system with an auto-regressive decoder, such as an attention-based seq… ▽ More For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions. In traditional hidden Markov model-based automatic speech recognition (ASR) systems, confidence scores can be reliably obtained from word posteriors in decoding lattices. However, for an ASR system with an auto-regressive decoder, such as an attention-based sequence-to-sequence model, computing word posteriors is difficult. An obvious alternative is to use the decoder softmax probability as the model confidence. In this paper, we first examine how some commonly used regularisation methods influence the softmax-based confidence scores and study the overconfident behaviour of end-to-end models. Then we propose a lightweight and effective approach named confidence estimation module (CEM) on top of an existing end-to-end ASR model. Experiments on LibriSpeech show that CEM can mitigate the overconfidence problem and can produce more reliable confidence scores with and without shallow fusion of a language model. Further analysis shows that CEM generalises well to speech from a moderately mismatched domain and can potentially improve downstream tasks such as semi-supervised learning. △ Less

Submitted 23 October, 2020; v1 submitted 22 October, 2020; originally announced October 2020.

Comments: Submitted to ICASSP 2021

arXiv:2008.05258 [pdf, other]

Guided Collaborative Training for Pixel-wise Semi-Supervised Learning

Authors: Zhanghan Ke, Di Qiu, Kaican Li, Qiong Yan, Rynson W. H. Lau

Abstract: We investigate the generalization of semi-supervised learning (SSL) to diverse pixel-wise tasks. Although SSL methods have achieved impressive results in image classification, the performances of applying them to pixel-wise tasks are unsatisfactory due to their need for dense outputs. In addition, existing pixel-wise SSL approaches are only suitable for certain tasks as they usually require to use… ▽ More We investigate the generalization of semi-supervised learning (SSL) to diverse pixel-wise tasks. Although SSL methods have achieved impressive results in image classification, the performances of applying them to pixel-wise tasks are unsatisfactory due to their need for dense outputs. In addition, existing pixel-wise SSL approaches are only suitable for certain tasks as they usually require to use task-specific properties. In this paper, we present a new SSL framework, named Guided Collaborative Training (GCT), for pixel-wise tasks, with two main technical contributions. First, GCT addresses the issues caused by the dense outputs through a novel flaw detector. Second, the modules in GCT learn from unlabeled data collaboratively through two newly proposed constraints that are independent of task-specific properties. As a result, GCT can be applied to a wide range of pixel-wise tasks without structural adaptation. Our extensive experiments on four challenging vision tasks, including semantic segmentation, real image denoising, portrait image matting, and night image enhancement, show that GCT outperforms state-of-the-art SSL methods by a large margin. Our code available at: https://github.com/ZHKKKe/PixelSSL. △ Less

Submitted 12 August, 2020; originally announced August 2020.

Comments: 16th European Conference on Computer Vision (ECCV 2020)

arXiv:2008.01806 [pdf, other]

Fast Nonconvex $T_2^*$ Map** Using ADMM

Authors: Shuai Huang, James J. Lah, Jason W. Allen, Deqiang Qiu

Abstract: Magnetic resonance (MR)-$T_2^*$ map** is widely used to study hemorrhage, calcification and iron deposition in various clinical applications, it provides a direct and precise map** of desired contrast in the tissue. However, the long acquisition time required by conventional 3D high-resolution $T_2^*$ map** method causes discomfort to patients and introduces motion artifacts to reconstructed… ▽ More Magnetic resonance (MR)-$T_2^*$ map** is widely used to study hemorrhage, calcification and iron deposition in various clinical applications, it provides a direct and precise map** of desired contrast in the tissue. However, the long acquisition time required by conventional 3D high-resolution $T_2^*$ map** method causes discomfort to patients and introduces motion artifacts to reconstructed images, which limits its wider applicability. In this paper we address this issue by performing $T_2^*$ map** from undersampled data using compressive sensing (CS). We formulate the reconstruction as a nonconvex problem that can be decomposed into two subproblems. They can be solved either separately via the standard approach or jointly via the alternating direction method of multipliers (ADMM). Compared to previous CS-based approaches that only apply sparse regularization on the spin density $\boldsymbol X_0$ and the relaxation rate $\boldsymbol R_2^*$, our formulation enforces additional sparse priors on the $T_2^*$-weighted images at multiple echoes to improve the reconstruction performance. We performed convergence analysis of the proposed algorithm, evaluated its performance on in vivo data, and studied the effects of different sampling schemes. Experimental results showed that the proposed joint-recovery approach generally outperforms the state-of-the-art method, especially in the low-sampling rate regime, making it a preferred choice to perform fast 3D $T_2^*$ map** in practice. The framework adopted in this work can be easily extended to other problems arising from MR or other imaging modalities with non-linearly coupled variables. △ Less

Submitted 4 August, 2020; originally announced August 2020.

arXiv:2007.14564 [pdf, other]

Bayesian Massive MIMO Channel Estimation with Parameter Estimation Using Low-Resolution ADCs

Authors: Shuai Huang, Deqiang Qiu, Trac D. Tran

Abstract: In order to reduce hardware complexity and power consumption, massive multiple-input multiple-output (MIMO) systems employ low-resolution analog-to-digital converters (ADCs) to acquire quantized measurements $\boldsymbol y$. This poses new challenges to the channel estimation problem, and the sparse prior on the channel coefficient vector $\boldsymbol x$ in the angle domain is often used to compen… ▽ More In order to reduce hardware complexity and power consumption, massive multiple-input multiple-output (MIMO) systems employ low-resolution analog-to-digital converters (ADCs) to acquire quantized measurements $\boldsymbol y$. This poses new challenges to the channel estimation problem, and the sparse prior on the channel coefficient vector $\boldsymbol x$ in the angle domain is often used to compensate for the information lost during quantization. By interpreting the sparse prior from a probabilistic perspective, we can assume $\boldsymbol x$ follows certain sparse prior distribution and recover it using approximate message passing (AMP). However, the distribution parameters are unknown in practice and need to be estimated. Due to the increased computational complexity in the quantization noise model, previous works either use an approximated noise model or manually tune the noise distribution parameters. In this paper, we treat both signals and parameters as random variables and recover them jointly within the AMP framework. The proposed approach leads to a much simpler parameter estimation method, allowing us to work with the quantization noise model directly. Experimental results show that the proposed approach achieves state-of-the-art performance under various noise levels and does not require parameter tuning, making it a practical and maintenance-free approach for channel estimation. △ Less

Submitted 11 February, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

arXiv:1805.07196 [pdf, ps, other]

Supervisory Control of Probabilistic Discrete Event Systems under Partial Observation

Authors: Weilin Deng, **gkai Yang, Daowen Qiu

Abstract: The supervisory control of probabilistic discrete event systems (PDESs) is investigated under the assumptions that the supervisory controller (supervisor) is probabilistic and has a partial observation. The probabilistic P-supervisor is defined, which specifies a probability distribution on the control patterns for each observation. The notions of the probabilistic controllability and observabilit… ▽ More The supervisory control of probabilistic discrete event systems (PDESs) is investigated under the assumptions that the supervisory controller (supervisor) is probabilistic and has a partial observation. The probabilistic P-supervisor is defined, which specifies a probability distribution on the control patterns for each observation. The notions of the probabilistic controllability and observability are proposed and demonstrated to be a necessary and sufficient conditions for the existence of the probabilistic P-supervisors. Moreover, the polynomial verification algorithms for the probabilistic controllability and observability are put forward. In addition, the infimal probabilistic controllable and observable superlanguage is introduced and computed as the solution of the optimal control problem of PDESs. Several examples are presented to illustrate the results obtained. △ Less

Submitted 16 May, 2018; originally announced May 2018.

Comments: 36 pages, comments are welcome

arXiv:1610.02470 [pdf, ps, other]

doi 10.1109/TFUZZ.2015.2403866

Bi-Fuzzy Discrete Event Systems and Their Supervisory Control Theory

Authors: Weilin Deng, Daowen Qiu

Abstract: It is well known that type-1 fuzzy sets (T1 FSs) have limited capabilities to handle some data uncertainties directly, and type-2 fuzzy sets (T2 FSs) can cover the shortcoming of T1 FSs to a certain extent. Fuzzy discrete event systems (FDESs) were proposed based on T1 FSs theory. Hence, FDES may not be a satisfactory model to characterize some high-uncertainty systems. In this paper, we propose a… ▽ More It is well known that type-1 fuzzy sets (T1 FSs) have limited capabilities to handle some data uncertainties directly, and type-2 fuzzy sets (T2 FSs) can cover the shortcoming of T1 FSs to a certain extent. Fuzzy discrete event systems (FDESs) were proposed based on T1 FSs theory. Hence, FDES may not be a satisfactory model to characterize some high-uncertainty systems. In this paper, we propose a new model, called as bi-fuzzy discrete event systems (BFDESs), by combining classical DESs theory and T2 FSs theory. Then, we consider the supervisory control problem of BFDESs. The bi-fuzzy controllability theorem and nonblocking bi-fuzzy controllability theorem are demonstrated. Also, an algorithm for checking the bi-fuzzy controllability condition is presented. In addition, two controllable approximations to an uncontrollable language are investigated in detail. An illustrative example is provided to show the applicability and the advantages of BFDESs model. △ Less

Submitted 7 October, 2016; originally announced October 2016.

Comments: 14 pages

arXiv:1610.02465 [pdf, ps, other]

doi 10.1109/TFUZZ.2014.2310466

Supervisory Control of Fuzzy Discrete Event Systems for Simulation Equivalence

Authors: Weilin Deng, Daowen Qiu

Abstract: The supervisory control theory of fuzzy discrete event systems (FDESs) for fuzzy language equivalence has been developed. However, in a way, language equivalence has limited expressiveness. So if the given specification can not be expressed by language equivalence, then the control for language equivalence does not work. In this paper, we further establish the supervisory control theory of FDESs f… ▽ More The supervisory control theory of fuzzy discrete event systems (FDESs) for fuzzy language equivalence has been developed. However, in a way, language equivalence has limited expressiveness. So if the given specification can not be expressed by language equivalence, then the control for language equivalence does not work. In this paper, we further establish the supervisory control theory of FDESs for fuzzy simulation equivalence whose expressiveness is stronger than that of fuzzy language equivalence. First, we formalize the notions of fuzzy simulation and fuzzy simulation equivalence between two FDESs. Then we present a method for deciding whether there is a fuzzy simulation or not. In addition, we also show several basic properties of fuzzy simulation relations. Afterwards, we put forward the notion of fuzzy simulation-based controllability, and particularly show that it serves as a necessary and sufficient condition for the existence of the fuzzy supervisors of FDESs. Moreover, we study the "range" control problem of FDESs. Some examples are given to illustrate the main results obtained. △ Less

Submitted 7 October, 2016; originally announced October 2016.

Journal ref: IEEE Transactions on Fuzzy Systems, Vol. 23, No. 1, pp. 178-192, February 2015

Showing 1–21 of 21 results for author: Qiu, D