Search | arXiv e-print repository

doi 10.1016/j.xcrp.2024.101941

Taking Second-life Batteries from Exhausted to Empowered using Experiments, Data Analysis, and Health Estimation

Authors: Xiaofan Cui, Muhammad Aadil Khan, Gabriele Pozzato, Surinder Singh, Ratnesh Sharma, Simona Onori

Abstract: The reuse of retired electric vehicle batteries in grid energy storage offers environmental and economic benefits. This study concentrates on health monitoring algorithms for retired batteries deployed in grid storage. Over 15 months of testing, we collect, analyze, and publicize a dataset of second-life batteries, implementing a cycling protocol simulating grid energy storage load profiles within… ▽ More The reuse of retired electric vehicle batteries in grid energy storage offers environmental and economic benefits. This study concentrates on health monitoring algorithms for retired batteries deployed in grid storage. Over 15 months of testing, we collect, analyze, and publicize a dataset of second-life batteries, implementing a cycling protocol simulating grid energy storage load profiles within a 3-4 V voltage window. Four machine-learning-based health estimation models, relying on online-accessible features and initial capacity, are compared, with the selected model achieving a mean absolute percentage error below 2.3% on test data. Additionally, an adaptive online health estimation algorithm is proposed by integrating a clustering-based method, thus limiting estimation errors during online deployment. These results showcase the feasibility of repurposing retired batteries for second-life applications. Based on obtained data and power demand, these second-life batteries exhibit potential for over a decade of grid energy storage use. △ Less

Submitted 8 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: 16 pages, 8 figures

arXiv:2401.04734 [pdf, other]

Online Adaptive Data-driven State-of-health Estimation for Second-life Batteries with BIBO Stability Guarantees

Authors: Xiaofan Cui, Muhammad Aadil Khan, Simona Onori

Abstract: A key challenge that is currently hindering the widespread deployment and use of retired electric vehicle (EV) batteries for second-life (SL) applications is the ability to accurately estimate and monitor their state of health (SOH). Second-life battery systems can be sourced from different battery packs with a lack of knowledge of their historical usage. To facilitate the on-the-field use of SL… ▽ More A key challenge that is currently hindering the widespread deployment and use of retired electric vehicle (EV) batteries for second-life (SL) applications is the ability to accurately estimate and monitor their state of health (SOH). Second-life battery systems can be sourced from different battery packs with a lack of knowledge of their historical usage. To facilitate the on-the-field use of SL batteries, this paper introduces an online adaptive health estimation strategy with guaranteed stability. This method relies exclusively on operational data that can be accessed in real-time from SL batteries. The adaptation algorithm is designed to ensure bounded-input-bounded-output (BIBO) stability. The effectiveness of the proposed approach is shown on a laboratory-aged experimental data set of retired EV batteries. The estimator gains are dynamically adapted to accommodate the distinct characteristics of each individual cell, making it a promising candidate for future SL battery management systems (BMS2). △ Less

Submitted 7 January, 2024; originally announced January 2024.

arXiv:2311.09839 [pdf, other]

Load Data Valuation in Multi-Energy Systems: An End-to-End Approach

Authors: Yangze Zhou, Qingsong Wen, Jie Song, Xueyuan Cui, Yi Wang

Abstract: Accurate load forecasting serves as the foundation for the flexible operation of multi-energy systems (MES). Multi-energy loads are tightly coupled and exhibit significant uncertainties. Many works focus on enhancing forecasting accuracy by leveraging cross-sector information. However, data owners may not be motivated to share their data unless it leads to substantial benefits. Ensuring a reasonab… ▽ More Accurate load forecasting serves as the foundation for the flexible operation of multi-energy systems (MES). Multi-energy loads are tightly coupled and exhibit significant uncertainties. Many works focus on enhancing forecasting accuracy by leveraging cross-sector information. However, data owners may not be motivated to share their data unless it leads to substantial benefits. Ensuring a reasonable data valuation can encourage them to share their data willingly. This paper presents an end-to-end framework to quantify multi-energy load data value by integrating forecasting and decision processes. To address optimization problems with integer variables, a two-stage end-to-end model solution is proposed. Moreover, a profit allocation strategy based on contribution to cost savings is investigated to encourage data sharing in MES. The experimental results demonstrate a significant decrease in operation costs, suggesting that the proposed valuation approach more effectively extracts the inherent data value than traditional methods. According to the proposed incentive mechanism, all sectors can benefit from data sharing by improving forecasting accuracy or receiving economic compensation. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 10 pages

arXiv:2308.06786 [pdf, other]

Challenges and Opportunities for Second-life Batteries: A Review of Key Technologies and Economy

Authors: Xubo Gu, Hanyu Bai, Xiaofan Cui, Juner Zhu, Weichao Zhuang, Zhaojian Li, Xiaosong Hu, Ziyou Song

Abstract: Due to the increasing volume of Electric Vehicles in automotive markets and the limited lifetime of onboard lithium-ion batteries (LIBs), the large-scale retirement of LIBs is imminent. The battery packs retired from Electric Vehicles still own 70%-80% of the initial capacity, thus having the potential to be utilized in scenarios with lower energy and power requirements to maximize the value of LI… ▽ More Due to the increasing volume of Electric Vehicles in automotive markets and the limited lifetime of onboard lithium-ion batteries (LIBs), the large-scale retirement of LIBs is imminent. The battery packs retired from Electric Vehicles still own 70%-80% of the initial capacity, thus having the potential to be utilized in scenarios with lower energy and power requirements to maximize the value of LIBs. However, spent batteries are commonly less reliable than fresh batteries due to their degraded performance, thereby necessitating a comprehensive assessment from safety and economic perspectives before further utilization. To this end, this paper reviews the key technological and economic aspects of second-life batteries (SLBs). Firstly, we introduce various degradation models for first-life batteries and identify an opportunity to combine physics-based theories with data-driven methods to establish explainable models with physical laws that can be generalized. However, degradation models specifically tailored to SLBs are currently absent. Therefore, we analyze the applicability of existing battery degradation models developed for first-life batteries in SLB applications. Secondly, we investigate fast screening and regrou** techniques and discuss the regrou** standards for the first time to guide the classification procedure and enhance the performance and safety of SLBs. Thirdly, we scrutinize the economic analysis of SLBs and summarize the potentially profitable applications. Finally, we comprehensively examine and compare power electronics technologies that can substantially improve the performance of SLBs, including high-efficiency energy transformation technologies, active equalization technologies, and technologies to improve reliability and safety. △ Less

Submitted 13 August, 2023; originally announced August 2023.

arXiv:2302.14120 [pdf, other]

Diagonal State Space Augmented Transformers for Speech Recognition

Authors: George Saon, Ankit Gupta, Xiaodong Cui

Abstract: We improve on the popular conformer architecture by replacing the depthwise temporal convolutions with diagonal state space (DSS) models. DSS is a recently introduced variant of linear RNNs obtained by discretizing a linear dynamical system with a diagonal state transition matrix. DSS layers project the input sequence onto a space of orthogonal polynomials where the choice of basis functions, metr… ▽ More We improve on the popular conformer architecture by replacing the depthwise temporal convolutions with diagonal state space (DSS) models. DSS is a recently introduced variant of linear RNNs obtained by discretizing a linear dynamical system with a diagonal state transition matrix. DSS layers project the input sequence onto a space of orthogonal polynomials where the choice of basis functions, metric and support is controlled by the eigenvalues of the transition matrix. We compare neural transducers with either conformer or our proposed DSS-augmented transformer (DSSformer) encoders on three public corpora: Switchboard English conversational telephone speech 300 hours, Switchboard+Fisher 2000 hours, and a spoken archive of holocaust survivor testimonials called MALACH 176 hours. On Switchboard 300/2000 hours, we reach a single model performance of 8.9%/6.7% WER on the combined test set of the Hub5 2000 evaluation, respectively, and on MALACH we improve the WER by 7% relative over the previous best published result. In addition, we present empirical evidence suggesting that DSS layers learn damped Fourier basis functions where the attenuation coefficients are layer specific whereas the frequency coefficients converge to almost identical linearly-spaced values across all layers. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: to be presented at ICASSP 2023

arXiv:2211.13092 [pdf, other]

doi 10.1109/TVT.2022.3213179

Efficient Rigid Body Localization based on Euclidean Distance Matrix Completion for AGV Positioning under Harsh Environment

Authors: Xinyuan An, Xiaowei Cui, Sihao Zhao, Gang Liu, Mingquan Lu

Abstract: In real-world applications for automatic guided vehicle (AGV) navigation, the positioning system based on the time-of-flight (TOF) measurements between anchors and tags is confronted with the problem of insufficient measurements caused by blockages to radio signals or lasers, etc. Mounting multiple tags at different positions of the AGV to collect more TOFs is a feasible solution to tackle this di… ▽ More In real-world applications for automatic guided vehicle (AGV) navigation, the positioning system based on the time-of-flight (TOF) measurements between anchors and tags is confronted with the problem of insufficient measurements caused by blockages to radio signals or lasers, etc. Mounting multiple tags at different positions of the AGV to collect more TOFs is a feasible solution to tackle this difficulty. Vehicle localization by exploiting the measurements between multiple tags and anchors is a rigid body localization (RBL) problem, which estimates both the position and attitude of the vehicle. However, the state-of-the-art solutions to the RBL problem do not deal with missing measurements, and thus will result in degraded localization availability and accuracy in harsh environments. In this paper, different from these existing solutions for RBL, we model this problem as a sensor network localization problem with missing TOFs. To solve this problem, we propose a new efficient RBL solution based on Euclidean distance matrix (EDM) completion, abbreviated as ERBL-EDMC. Firstly, we develop a method to determine the upper and lower bounds of the missing measurements to complete the EDM reliably, using the known relative positions between tags and the statistics of the TOF measurements. Then, based on the completed EDM, the global tag positions are obtained from a coarse estimation followed by a refinement step assisted with inter-tag distances. Finally, the optimal vehicle position and attitude are obtained iteratively based on the estimated tag positions from the previous step. Theoretical analysis and simulation results show that the proposed ERBL-EDMC method effectively solves the RBL problem with incomplete measurements. It obtains the optimal positioning results while maintaining low computational complexity compared with the existing RBL methods based on semi-definite relaxation. △ Less

Submitted 23 November, 2022; originally announced November 2022.

arXiv:2211.12621 [pdf]

doi 10.1007/s10291-016-0560-y

A priori knowledge-free fast positioning approach for BeiDou receivers

Authors: Sihao Zhao, Xiaowei Cui, Mingquan Lu

Abstract: A Global Navigation Satellite System (GNSS) receiver usually needs a sufficient number of full pseudorange measurements to obtain a position solution. However, it is time-consuming to acquire full pseudorange information from only the satellite broadcast signals due to the navigation data features of GNSS. In order to realize fast positioning during a cold or warm start in a GNSS receiver, the exi… ▽ More A Global Navigation Satellite System (GNSS) receiver usually needs a sufficient number of full pseudorange measurements to obtain a position solution. However, it is time-consuming to acquire full pseudorange information from only the satellite broadcast signals due to the navigation data features of GNSS. In order to realize fast positioning during a cold or warm start in a GNSS receiver, the existing approaches require an initial estimation of position and time or require a number of computational steps to recover the full pseudorange information from fractional pseudoranges and then compute the position solution. The BeiDou Navigation Satellite System (BDS) has a unique constellation distribution and a fast navigation data rate for geostationary earth orbit (GEO) satellites. Taking advantage of these features, we propose a fast positioning technique for BDS receivers. It simultaneously processes the full and fractional pseudorange measurements from the BDS GEOs and non-GEOs, respectively, which is faster than processing all full measurements. This method resolves the position solution and recovers the full pseudoranges for non-GEOs simultaneously within 1 s theoretically and does not need an estimate of the initial position. Simulation and real data experiments confirm that the proposed technique completes fast positioning without a priori position and time estimation, and the positioning accuracy is identical with the conventional single-point positioning approach using full pseudorange measurements from all available satellites. △ Less

Submitted 22 November, 2022; originally announced November 2022.

arXiv:2209.09413 [pdf, other]

A Unified Analytical Method to Quantify Three Types of Fast Frequency Response from Inverter-based Resources

Authors: Shuan Dong, Xin Fang, ** Tan, Ningchao Gao, Xiaofan Cui, Anderson Hoke

Abstract: With more inverter-based resources (IBRs), our power systems have lower frequency nadirs following N-1 contingencies, and undesired under-frequency load shedding (UFLS) can occur. To address this challenge, IBRs can be programmed to provide at least three types of fast frequency response (FFR), e.g., step response, proportional response (P/f droop response), and derivative response (synthetic iner… ▽ More With more inverter-based resources (IBRs), our power systems have lower frequency nadirs following N-1 contingencies, and undesired under-frequency load shedding (UFLS) can occur. To address this challenge, IBRs can be programmed to provide at least three types of fast frequency response (FFR), e.g., step response, proportional response (P/f droop response), and derivative response (synthetic inertia). However, these heterogeneous FFR challenge the study of power system frequency dynamics. Thus, this paper develops an analytical frequency nadir prediction method that allows for the consideration of all three potential forms of FFR provided by IBRs. The proposed method provides fast and accurate frequency nadir estimation after N-1 generation trip** contingencies. Our method is grounded on the closed-form solution for the frequency nadir, which is solved from the second-order system frequency response model considering the governor dynamics and three types of FFR. The simulation results in the IEEE 39-bus system with different types of FFR demonstrate that the proposed method provides an accurate and fast prediction of the frequency nadir under various disturbances. △ Less

Submitted 25 August, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

arXiv:2209.05272 [pdf, other]

Fast-Response Variable Frequency DC-DC Converters Using Switching Cycle Event-Driven Digital Control

Authors: Xiaofan Cui, Al-Thaddeus Avestruz

Abstract: This paper investigates a new method to model and control variable-frequency power converters in a switching-synchronized sampled-state space for cycle-by-cycle digital control. There are a number of significant benefits in comparison to other methods including fast dynamic performance together with ease of design and implementation. Theoretical results are presented and verified through hardware,… ▽ More This paper investigates a new method to model and control variable-frequency power converters in a switching-synchronized sampled-state space for cycle-by-cycle digital control. There are a number of significant benefits in comparison to other methods including fast dynamic performance together with ease of design and implementation. Theoretical results are presented and verified through hardware, and simulations of a current-mode buck converter with constant on-time and a current-mode boost converter with constant off-time. Dynamic voltage scaling for microprocessors and LiDAR are among the applications that can benefit. △ Less

Submitted 9 September, 2022; originally announced September 2022.

arXiv:2207.08049 [pdf, other]

doi 10.1109/TITS.2022.3190023

Robust Vehicle Positioning based on Multi-Epoch and Multi-Antenna TOAs in Harsh Environments

Authors: Xinyuan An, Sihao Zhao, Xiaowei Cui, Gang Liu, Mingquan Lu

Abstract: For radio-based time-of-arrival (TOA) positioning systems applied in harsh environments, obstacles in the surroundings and on the vehicle itself will block the signals from the anchors, reduce the number of available TOA measurements and thus degrade the localization performance. Conventional multi-antenna positioning technique requires a good initialization to avoid local minima, and suffers from… ▽ More For radio-based time-of-arrival (TOA) positioning systems applied in harsh environments, obstacles in the surroundings and on the vehicle itself will block the signals from the anchors, reduce the number of available TOA measurements and thus degrade the localization performance. Conventional multi-antenna positioning technique requires a good initialization to avoid local minima, and suffers from location ambiguity due to insufficient number of TOA measurements and/or poor geometry of anchors at a single epoch. A new initialization method based on semidefinite programming (SDP), namely MEMA-SDP, is first designed to address the initialization problem of the MEMA-TOA method. Then, an iterative refinement step is developed to obtain the optimal positioning result based on the MEMA-SDP initialization. We derive the Cramer-Rao lower bound (CRLB) to analyze the accuracy of the new MEMA-TOA method theoretically, and show its superior positioning performance over the conventional single-epoch and multi-antenna (SEMA) localization method. Simulation results in harsh environments demonstrate that i) the new MEMA-SDP provides an initial estimation that is close to the real location, and empirically guarantees the global optimality of the final refined positioning solution, and ii) compared with the conventional SEMA method, the new MEMA-TOA method has higher positioning accuracy without location ambiguity, consistent with the theoretical analysis. △ Less

Submitted 16 July, 2022; originally announced July 2022.

arXiv:2206.10523 [pdf, other]

Overcoming High Frequency Limitations of Current-Mode Control Using a Control Conditioning Approach -- Part II: Implementation and Hardware

Authors: Xiaofan Cui, Al-Thaddeus Avestruz

Abstract: This article is the second part of a paper series about interference in extremum (i.e., peak or valley) current-mode control, which applies to both fixed and variable switching frequency power converters. Specifically, this part presents three control conditioning methods that mitigate the adverse effect of interference. These methods are new ways to use: (i) slope compensation; (ii) low-pass filt… ▽ More This article is the second part of a paper series about interference in extremum (i.e., peak or valley) current-mode control, which applies to both fixed and variable switching frequency power converters. Specifically, this part presents three control conditioning methods that mitigate the adverse effect of interference. These methods are new ways to use: (i) slope compensation; (ii) low-pass filtering; and (iii) the phenomenon of comparator-overdrive-delay, for control conditioning. The stability criterion, closed-loop dynamics, and transient performance are derived with mathematical rigor for each method. The design tradeoffs are illustrated, discussed, and compared. The effectiveness of all three methods are demonstrated and validated in hardware using a power converter operating at multi-MHz switching frequencies. △ Less

Submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.10518 [pdf, other]

Overcoming High Frequency Limitations of Current-Mode Control Using a Control Conditioning Approach -- Part I: Modeling and Analysis

Authors: Xiaofan Cui, Al-Thaddeus Avestruz

Abstract: Current-mode control is one of the most popular controller strategies for power converters. With the advent of wide bandgap devices including GaN and SiC, higher switching frequencies have become more viable at higher power because of lower switching losses. However, the advantage of higher switching frequency for faster, higher bandwidth control is squandered because of current sensor interferenc… ▽ More Current-mode control is one of the most popular controller strategies for power converters. With the advent of wide bandgap devices including GaN and SiC, higher switching frequencies have become more viable at higher power because of lower switching losses. However, the advantage of higher switching frequency for faster, higher bandwidth control is squandered because of current sensor interference. We present a framework for characterizing and analyzing this interference as uncertainties to the controller model. These uncertainties introduce additional dynamics and nonlinearity that can result in instability and poor transient performance of the current control loop. In this paper, we provide a model framework based on a new control conditioning approach that guarantees global stability and a strategy for optimizing transient performance. In Part II of this paper series, we present the analysis, design, and hardware validation of three effective solutions. △ Less

Submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.09340 [pdf, ps, other]

A Note on Comparator-Overdrive-Delay Conditioning for Current-Mode Control

Authors: Xiaofan Cui, Al-Thaddeus Avestruz

Abstract: Comparator-overdrive-delay conditioning is a new control conditioning approach for high-frequency current-mode control. No existing literature rigorously studies the effect of the comparator overdrive delay on the current-mode control. The results in this paper provide insights into the mechanism of comparator-overdrive-delay conditioning. Comparator-overdrive-delay conditioning is a new control conditioning approach for high-frequency current-mode control. No existing literature rigorously studies the effect of the comparator overdrive delay on the current-mode control. The results in this paper provide insights into the mechanism of comparator-overdrive-delay conditioning. △ Less

Submitted 19 June, 2022; originally announced June 2022.

arXiv:2206.07882 [pdf, other]

Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization

Authors: Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Kailash Gopalakrishnan

Abstract: We report on aggressive quantization strategies that greatly accelerate inference of Recurrent Neural Network Transducers (RNN-T). We use a 4 bit integer representation for both weights and activations and apply Quantization Aware Training (QAT) to retrain the full model (acoustic encoder and language model) and achieve near-iso-accuracy. We show that customized quantization schemes that are tailo… ▽ More We report on aggressive quantization strategies that greatly accelerate inference of Recurrent Neural Network Transducers (RNN-T). We use a 4 bit integer representation for both weights and activations and apply Quantization Aware Training (QAT) to retrain the full model (acoustic encoder and language model) and achieve near-iso-accuracy. We show that customized quantization schemes that are tailored to the local properties of the network are essential to achieve good performance while limiting the computational overhead of QAT. Density ratio Language Model fusion has shown remarkable accuracy gains on RNN-T workloads but it severely increases the computational cost of inference. We show that our quantization strategies enable using large beam widths for hypothesis search while achieving streaming-compatible runtimes and a full model compression ratio of 7.6$\times$ compared to the full precision model. Via hardware simulations, we estimate a 3.4$\times$ acceleration from FP16 to INT4 for the end-to-end quantized RNN-T inclusive of LM fusion, resulting in a Real Time Factor (RTF) of 0.06. On the NIST Hub5 2000, Hub5 2001, and RT-03 test sets, we retain most of the gains associated with LM fusion, improving the average WER by $>$1.5%. △ Less

Submitted 15 June, 2022; originally announced June 2022.

Comments: 5 pages, 2 figures, 1 table. Paper accepted to Interspeech 2022

ACM Class: I.2.6

arXiv:2206.06997 [pdf, other]

A Note on Low-Pass Filter Conditioning for Current-Mode Control

Authors: Xiaofan Cui, Al-Thaddeus Avestruz

Abstract: The low-pass filter is a classic control conditioning approach for high-frequency current-mode control. However, no existing literature discusses the large-signal stability criterion for the current-mode control with low-pass filters. This paper provides a mathematically rigorous large-signal stability criterion. The result can directly benefit the practical engineering implementation of the low-p… ▽ More The low-pass filter is a classic control conditioning approach for high-frequency current-mode control. However, no existing literature discusses the large-signal stability criterion for the current-mode control with low-pass filters. This paper provides a mathematically rigorous large-signal stability criterion. The result can directly benefit the practical engineering implementation of the low-pass filter in high-frequency current-mode control. △ Less

Submitted 11 June, 2022; originally announced June 2022.

arXiv:2205.10155 [pdf, other]

Large-Signal Stability Guarantees for Cycle-by-Cycle Controlled DC-DC Converters

Authors: Xiaofan Cui, Al-Thaddeus Avestruz

Abstract: Stability guarantees are critical for cycle-by-cycle controlled dc-dc converters in consumer electronics and energy storage systems. Traditional stability analysis on cycle-by-cycle dc-dc converters is incomplete because the inductor current ramps are considered fixed; but instead, inductor ramps are not fixed because they are dependent on the output voltage in large-signal transients. We demonstr… ▽ More Stability guarantees are critical for cycle-by-cycle controlled dc-dc converters in consumer electronics and energy storage systems. Traditional stability analysis on cycle-by-cycle dc-dc converters is incomplete because the inductor current ramps are considered fixed; but instead, inductor ramps are not fixed because they are dependent on the output voltage in large-signal transients. We demonstrate a new large-signal stability theory that treats cycle-by-cycle controlled dc-dc converters as a particular type of feedback interconnection system. An analytical and practical stability criterion is provided based on this system. The criterion indicates that the L/R and RC time constants are the design parameters that determine the amount of coupling between the current ramp and the output voltage. △ Less

Submitted 16 October, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

Comments: typos corrected, references added, title, abstract, and introduction revised, results unchanged

arXiv:2205.09048 [pdf, other]

Global Contrast Masked Autoencoders Are Powerful Pathological Representation Learners

Authors: Hao Quan, Xingyu Li, Weixing Chen, Qun Bai, Mingchen Zou, Ruijie Yang, Tingting Zheng, Ruiqun Qi, Xinghua Gao, Xiaoyu Cui

Abstract: Based on digital pathology slice scanning technology, artificial intelligence algorithms represented by deep learning have achieved remarkable results in the field of computational pathology. Compared to other medical images, pathology images are more difficult to annotate, and thus, there is an extreme lack of available datasets for conducting supervised learning to train robust deep learning mod… ▽ More Based on digital pathology slice scanning technology, artificial intelligence algorithms represented by deep learning have achieved remarkable results in the field of computational pathology. Compared to other medical images, pathology images are more difficult to annotate, and thus, there is an extreme lack of available datasets for conducting supervised learning to train robust deep learning models. In this paper, we propose a self-supervised learning (SSL) model, the global contrast-masked autoencoder (GCMAE), which can train the encoder to have the ability to represent local-global features of pathological images, also significantly improve the performance of transfer learning across data sets. In this study, the ability of the GCMAE to learn migratable representations was demonstrated through extensive experiments using a total of three different disease-specific hematoxylin and eosin (HE)-stained pathology datasets: Camelyon16, NCTCRC and BreakHis. In addition, this study designed an effective automated pathology diagnosis process based on the GCMAE for clinical applications. The source code of this paper is publicly available at https://github.com/StarUniversus/gcmae. △ Less

Submitted 15 November, 2023; v1 submitted 18 May, 2022; originally announced May 2022.

arXiv:2205.02850 [pdf]

A Deep Reinforcement Learning Framework for Rapid Diagnosis of Whole Slide Pathological Images

Authors: Tingting Zheng, Weixing chen, Shuqin Li, Hao Quan, Qun Bai, Tianhang Nan, Song Zheng, Xinghua Gao, Yue Zhao, Xiaoyu Cui

Abstract: The deep neural network is a research hotspot for histopathological image analysis, which can improve the efficiency and accuracy of diagnosis for pathologists or be used for disease screening. The whole slide pathological image can reach one gigapixel and contains abundant tissue feature information, which needs to be divided into a lot of patches in the training and inference stages. This will l… ▽ More The deep neural network is a research hotspot for histopathological image analysis, which can improve the efficiency and accuracy of diagnosis for pathologists or be used for disease screening. The whole slide pathological image can reach one gigapixel and contains abundant tissue feature information, which needs to be divided into a lot of patches in the training and inference stages. This will lead to a long convergence time and large memory consumption. Furthermore, well-annotated data sets are also in short supply in the field of digital pathology. Inspired by the pathologist's clinical diagnosis process, we propose a weakly supervised deep reinforcement learning framework, which can greatly reduce the time required for network inference. We use neural network to construct the search model and decision model of reinforcement learning agent respectively. The search model predicts the next action through the image features of different magnifications in the current field of view, and the decision model is used to return the predicted probability of the current field of view image. In addition, an expert-guided model is constructed by multi-instance learning, which not only provides rewards for search model, but also guides decision model learning by the knowledge distillation method. Experimental results show that our proposed method can achieve fast inference and accurate prediction of whole slide images without any pixel-level annotations. △ Less

Submitted 5 May, 2022; originally announced May 2022.

arXiv:2203.15176 [pdf, other]

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing

Authors: Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata

Abstract: We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR). Length perturbation is a data augmentation algorithm that randomly drops and inserts frames of an utterance to alter the length of the speech feature sequence. N-best based label smoothing randomly injects… ▽ More We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR). Length perturbation is a data augmentation algorithm that randomly drops and inserts frames of an utterance to alter the length of the speech feature sequence. N-best based label smoothing randomly injects noise to ground truth labels during training in order to avoid overfitting, where the noisy labels are generated from n-best hypotheses. We evaluate these two techniques extensively on the 300-hour Switchboard (SWB300) dataset and an in-house 500-hour Japanese (JPN500) dataset using recurrent neural network transducer (RNNT) acoustic models for ASR. We show that both techniques improve the generalization of RNNT models individually and they can also be complementary. In particular, they yield good improvements over a strong SWB300 baseline and give state-of-art performance on SWB300 using RNNT models. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: Submitted to Interspeech 2022

arXiv:2203.10035 [pdf, other]

doi 10.2312/3dor.20211307

SHREC 2021: Classification in cryo-electron tomograms

Authors: Ilja Gubins, Marten L. Chaillet, Gijs van der Schot, M. Cristina Trueba, Remco C. Veltkamp, Friedrich Förster, Xiao Wang, Daisuke Kihara, Emmanuel Moebel, Nguyen P. Nguyen, Tommi White, Filiz Bunyak, Giorgos Papoulias, Stavros Gerolymatos, Evangelia I. Zacharaki, Konstantinos Moustakas, Xiangrui Zeng, Sinuo Liu, Min Xu, Yaoyu Wang, Cheng Chen, Xuefeng Cui, Fa Zhang

Abstract: Cryo-electron tomography (cryo-ET) is an imaging technique that allows three-dimensional visualization of macro-molecular assemblies under near-native conditions. Cryo-ET comes with a number of challenges, mainly low signal-to-noise and inability to obtain images from all angles. Computational methods are key to analyze cryo-electron tomograms. To promote innovation in computational methods, we… ▽ More Cryo-electron tomography (cryo-ET) is an imaging technique that allows three-dimensional visualization of macro-molecular assemblies under near-native conditions. Cryo-ET comes with a number of challenges, mainly low signal-to-noise and inability to obtain images from all angles. Computational methods are key to analyze cryo-electron tomograms. To promote innovation in computational methods, we generate a novel simulated dataset to benchmark different methods of localization and classification of biological macromolecules in tomograms. Our publicly available dataset contains ten tomographic reconstructions of simulated cell-like volumes. Each volume contains twelve different types of complexes, varying in size, function and structure. In this paper, we have evaluated seven different methods of finding and classifying proteins. Seven research groups present results obtained with learning-based methods and trained on the simulated dataset, as well as a baseline template matching (TM), a traditional method widely used in cryo-ET research. We show that learning-based approaches can achieve notably better localization and classification performance than TM. We also experimentally confirm that there is a negative relationship between particle size and performance for all methods. △ Less

Submitted 18 March, 2022; originally announced March 2022.

Comments: Workshop version of the paper can be found here: https://diglib.eg.org/handle/10.2312/3dor20211307

arXiv:2202.06807 [pdf, other]

Sequential Doppler Shift based Optimal Localization and Synchronization with TOA

Authors: Sihao Zhao, Ningyan Guo, Xiao-** Zhang, Xiaowei Cui, Mingquan Lu

Abstract: Doppler shift is an important measurement for localization and synchronization (LAS), and is available in various practical systems. Existing studies on LAS techniques in a time division broadcast LAS system (TDBS) only use sequential time-of-arrival (TOA) measurements from the broadcast signals. In this paper, we develop a new optimal LAS method in the TDBS, namely LAS-SDT, by taking advantage of… ▽ More Doppler shift is an important measurement for localization and synchronization (LAS), and is available in various practical systems. Existing studies on LAS techniques in a time division broadcast LAS system (TDBS) only use sequential time-of-arrival (TOA) measurements from the broadcast signals. In this paper, we develop a new optimal LAS method in the TDBS, namely LAS-SDT, by taking advantage of the sequential Doppler shift and TOA measurements. It achieves higher accuracy compared with the conventional TOA-only method for user devices (UDs) with motion and clock drift. Another two variant methods, LAS-SDT-v for the case with UD velocity aiding, and LAS-SDT-k for the case with UD clock drift aiding, are developed. We derive the Cramer-Rao lower bound (CRLB) for these different cases. We show analytically that the accuracies of the estimated UD position, clock offset, velocity and clock drift are all significantly higher than those of the conventional LAS method using TOAs only. Numerical results corroborate the theoretical analysis and show the optimal estimation performance of the LAS-SDT. △ Less

Submitted 14 February, 2022; originally announced February 2022.

arXiv:2111.07019 [pdf, other]

doi 10.1109/LSP.2021.3127486

Closed-form Two-way TOA Localization and Synchronization for User Devices with Motion and Clock Drift

Authors: Sihao Zhao, Ningyan Guo, Xiao-** Zhang, Xiaowei Cui, Mingquan Lu

Abstract: A two-way time-of-arrival (TOA system is composed of anchor nodes (ANs and user devices (UDs . Two-way TOA measurements between AN-UD pairs are obtained via round-trip communications to achieve localization and synchronization (LAS for a UD. Existing LAS method for a moving UD with clock drift adopts an iterative algorithm, which requires accurate initialization and has high computational complexi… ▽ More A two-way time-of-arrival (TOA system is composed of anchor nodes (ANs and user devices (UDs . Two-way TOA measurements between AN-UD pairs are obtained via round-trip communications to achieve localization and synchronization (LAS for a UD. Existing LAS method for a moving UD with clock drift adopts an iterative algorithm, which requires accurate initialization and has high computational complexity. In this paper, we propose a new closed-form two-way TOA LAS approach, namely CFTWLAS, which does not require initialization, has low complexity and empirically achieves optimal LAS accuracy. We first linearize the LAS problem by squaring and differencing the two-way TOA equations. We employ two auxiliary variables to simplify the problem to finding the analytical solution of quadratic equations. Due to the measurement noise, we can only obtain a raw LAS estimation from the solution of the auxiliary variables. Then, a weighted least squares step is applied to further refine the raw estimation. We analyze the theoretical error of the new CFTWLAS and show that it empirically reaches the Cramer-Rao lower bound (CRLB with sufficient ANs under the condition with proper geometry and small noise. Numerical results in a 3D scenario verify the theoretical analysis that the estimation accuracy of the new CFTWLAS method reaches CRLB in the presented experiments when the number of the ANs is large, the geometry is appropriate, and the noise is small. Unlike the iterative method whose complexity increases with the iteration count, the new CFTWLAS has constant low complexity. △ Less

Submitted 12 November, 2021; originally announced November 2021.

arXiv:2109.12027 [pdf, other]

doi 10.1109/LCOMM.2020.2993894

Sequential TOA-Based Moving Target Localization in Multi-Agent Networks

Authors: Qin Shi, Xiaowei Cui, Sihao Zhao, Mingquan Lu

Abstract: Localizing moving targets in unknown harsh environments has always been a severe challenge. This letter investigates a novel localization system based on multi-agent networks, where multiple agents serve as mobile anchors broadcasting their time-space information to the targets. We study how the moving target can localize itself using the sequential time of arrival (TOA) of the one-way broadcast s… ▽ More Localizing moving targets in unknown harsh environments has always been a severe challenge. This letter investigates a novel localization system based on multi-agent networks, where multiple agents serve as mobile anchors broadcasting their time-space information to the targets. We study how the moving target can localize itself using the sequential time of arrival (TOA) of the one-way broadcast signals. An extended two-step weighted least squares (TSWLS) method is proposed to jointly estimate the position and velocity of the target in the presence of agent information uncertainties. We also address the large target clock offset (LTCO) problem for numerical stability. Analytical results reveal that our method reaches the Cramer-Rao lower bound (CRLB) under small noises. Numerical results show that the proposed method performs better than the existing algorithms. △ Less

Submitted 24 September, 2021; originally announced September 2021.

arXiv:2108.12074 [pdf, other]

4-bit Quantization of LSTM-based Speech Recognition Models

Authors: Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Xiao Sun, Naigang Wang, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Wei Zhang, Zoltán Tüske, Kailash Gopalakrishnan

Abstract: We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-HMMs) and Recurrent Neural Network - Transducers (RNN-Ts). Using a 4-bit integer representation, a naïve quantization approach applied to the LSTM port… ▽ More We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-HMMs) and Recurrent Neural Network - Transducers (RNN-Ts). Using a 4-bit integer representation, a naïve quantization approach applied to the LSTM portion of these models results in significant Word Error Rate (WER) degradation. On the other hand, we show that minimal accuracy loss is achievable with an appropriate choice of quantizers and initializations. In particular, we customize quantization schemes depending on the local properties of the network, improving recognition performance while limiting computational time. We demonstrate our solution on the Switchboard (SWB) and CallHome (CH) test sets of the NIST Hub5-2000 evaluation. DBLSTM-HMMs trained with 300 or 2000 hours of SWB data achieves $<$0.5% and $<$1% average WER degradation, respectively. On the more challenging RNN-T models, our quantization strategy limits degradation in 4-bit inference to 1.3%. △ Less

Submitted 26 August, 2021; originally announced August 2021.

Comments: 5 pages, 3 figures, Andrea Fasoli and Chia-Yu Chen equally contributed to this work. Paper accepted to Interspeech 2021

ACM Class: I.2.6

arXiv:2108.10803 [pdf, ps, other]

Reducing Exposure Bias in Training Recurrent Neural Network Transducers

Authors: Xiaodong Cui, Brian Kingsbury, George Saon, David Haws, Zoltan Tuske

Abstract: When recurrent neural network transducers (RNNTs) are trained using the typical maximum likelihood criterion, the prediction network is trained only on ground truth label sequences. This leads to a mismatch during inference, known as exposure bias, when the model must deal with label sequences containing errors. In this paper we investigate approaches to reducing exposure bias in training to impro… ▽ More When recurrent neural network transducers (RNNTs) are trained using the typical maximum likelihood criterion, the prediction network is trained only on ground truth label sequences. This leads to a mismatch during inference, known as exposure bias, when the model must deal with label sequences containing errors. In this paper we investigate approaches to reducing exposure bias in training to improve the generalization of RNNT models for automatic speech recognition (ASR). A label-preserving input perturbation to the prediction network is introduced. The input token sequences are perturbed using SwitchOut and scheduled sampling based on an additional token language model. Experiments conducted on the 300-hour Switchboard dataset demonstrate their effectiveness. By reducing the exposure bias, we show that we can further improve the accuracy of a high-performance RNNT ASR model and obtain state-of-the-art results on the 300-hour Switchboard dataset. △ Less

Submitted 24 August, 2021; originally announced August 2021.

Comments: accepted to Interspeech 2021

arXiv:2107.13740 [pdf, other]

doi 10.1016/j.ymssp.2022.109211

Three-dimensional instantaneous orbit map for rotor-bearing system based on a novel multivariate complex variational mode decomposition algorithm

Authors: Xiaolong Cui, Jie Huang, Chaoshun Li, Yujie Zhao

Abstract: Full spectrum and holospectrum are homogenous information fusion technology developed for the fault diagnosis of rotating machinery, which is extensively exploited in the analysis of the orbits of rotor-bearing systems. However, they are not adapted for non-stationary signals, nor can they be used for fusion analysis of vibrations of multiple bearing sections. By drawing inspiration from the multi… ▽ More Full spectrum and holospectrum are homogenous information fusion technology developed for the fault diagnosis of rotating machinery, which is extensively exploited in the analysis of the orbits of rotor-bearing systems. However, they are not adapted for non-stationary signals, nor can they be used for fusion analysis of vibrations of multiple bearing sections. By drawing inspiration from the multivariate variational mode decomposition (MVMD) and the complex-valued signal decomposition, we propose a method called multivariate complex variational mode decomposition (MCVMD). It can simultaneously extract the forward and backward components of multiple bearing sections and realize non-stationary complex signal decomposition of multiple bearing sections of the rotor. To achieve the visualization goal of condition monitoring, we propose the three-dimensional instantaneous orbit map (3D-IOM). It enables more features of shaft vibration of a rotor system to be displayed and offers a new way for the fusion analysis of vibration signals of multiple bearing sections of rotating machinery. Furthermore, making the most of the joint information, we also provide a high-resolution time-full spectrum (Time-FS) to display the forward and backward frequency components of multiple bearing sections. The effectiveness of the proposed method through both the simulated experiment and the real-life complex-valued signals is demonstrated in this paper. △ Less

Submitted 24 April, 2022; v1 submitted 29 July, 2021; originally announced July 2021.

Comments: 29 pages, 32 figures, 41 references

arXiv:2106.11749 [pdf, other]

doi 10.1109/ACCESS.2022.3201132

Lite-Sparse Hierarchical Partial Power Processing for Second-Use Battery Energy Storage Systems

Authors: Xiaofan Cui, Alireza Ramyar, Peyman Mohtat, Veronica Contreras, Jason Siegel, Anna Stefanopoulou, Al-Thaddeus Avestruz

Abstract: The explosive growth of electric vehicles (EVs) is leading to a surge in retired EV batteries, which are typically recycled despite having nearly 80% available capacity. Repurposing automotive batteries for second-use battery energy storage systems (2-BESS) has both economical and environmental benefits. The challenge with second-use batteries is the heterogeneity in their state of health. This pa… ▽ More The explosive growth of electric vehicles (EVs) is leading to a surge in retired EV batteries, which are typically recycled despite having nearly 80% available capacity. Repurposing automotive batteries for second-use battery energy storage systems (2-BESS) has both economical and environmental benefits. The challenge with second-use batteries is the heterogeneity in their state of health. This paper introduces a new strategy to optimize 2-BESS performance despite the heterogeneity of individual batteries while reducing the cost of power conversion. In this paper, the statistical distribution of the power heterogeneity in the supply of batteries is used to optimize the choice of power converters and design the power flow within the battery energy storage system (BESS) to optimize power capability. By leveraging a new lite-sparse hierarchical partial power processing (LS-HiPPP) approach, we study how a hierarchy in partial power processing (PPP) partitions power converters to significantly reduce converter ratings, process less power to achieve high system efficiency with lower cost (lower efficiency) converters, and take advantage of economies of scale by requiring only a minimal number of sets of identical converters. Our results demonstrate that LS-HiPPP architectures offer the best tradeoff between battery utilization and converter cost and have higher system efficiency than conventional partial power processing (C-PPP) in all cases. △ Less

Submitted 9 September, 2022; v1 submitted 6 June, 2021; originally announced June 2021.

Comments: typos corrected, references added, title, abstract and introduction revised, DOI updated, results unchanged

Journal ref: IEEE Access 2022

arXiv:2104.14976 [pdf, other]

doi 10.1016/j.est.2022.104017

Comparing Power Processing System Approaches in Second-Use Battery Energy Buffering for Electric Vehicle Charging

Authors: Xiaofan Cui, Alireza Ramyar, Jason Siegel, Peyman Mohtat, Anna Stefanopoulou, Al-Thaddeus Avestruz

Abstract: The heterogeneity in pack voltages and capacity of aged packs limits the performance and economic viability of second-use battery energy storage systems (2-BESS) due to issues of reliability and available energy. Overcoming these limitations could enable extended use of batteries and improve the environmental impacts of electric vehicles by reducing the number of batteries produced. This paper com… ▽ More The heterogeneity in pack voltages and capacity of aged packs limits the performance and economic viability of second-use battery energy storage systems (2-BESS) due to issues of reliability and available energy. Overcoming these limitations could enable extended use of batteries and improve the environmental impacts of electric vehicles by reducing the number of batteries produced. This paper compares Lite-Sparse Hierarchical Partial Power Processing (LS-HiPPP), a new method for power processing in 2-BESS, to conventional power processing architectures using a stochastic EV charging plaza model. This method for performance evaluation allows a fair comparison among power processing architectures for 2-BESS. Results show that LS-HiPPP increases the battery energy utilization to 94% as compared to 78% for conventional partial power processing (C-PPP) and 23% for full power processing. These results were obtained with 25% heterogeneity in individual battery capacities and 20% power processing within the 2-BESS. Derating and captured value are two derived performance metrics for comparing LS-HiPPP and C-PPP in this work. The derating for LS-HiPPP is 84.3% in comparison to 63.1% for C-PPP. The captured value for LS-HiPPP is 79.8% versus 51% for C-PPP. △ Less

Submitted 24 February, 2022; v1 submitted 28 April, 2021; originally announced April 2021.

Comments: typos corrected, references added, several sections from the previous version were omitted because it is out of scope for the journal submission, title, abstract and introduction revised, results unchanged

Journal ref: Journal of Energy Storage(2022) 1-14

arXiv:2103.09399 [pdf, other]

doi 10.1109/JIOT.2021.3055677

A New TOA Localization and Synchronization System with Virtually Synchronized Periodic Asymmetric Ranging Network

Authors: Sihao Zhao, Xiao-** Zhang, Xiaowei Cui, Mingquan Lu

Abstract: In this article, we design a new time-of-arrival (TOA) system for simultaneous user device (UD) localization and synchronization with a periodic asymmetric ranging network, namely PARN. The PARN includes one primary anchor node (PAN) transmitting and receiving signals, and many secondary ANs (SAN) only receiving signals. All the UDs can transmit and receive signals. The PAN periodically transmits… ▽ More In this article, we design a new time-of-arrival (TOA) system for simultaneous user device (UD) localization and synchronization with a periodic asymmetric ranging network, namely PARN. The PARN includes one primary anchor node (PAN) transmitting and receiving signals, and many secondary ANs (SAN) only receiving signals. All the UDs can transmit and receive signals. The PAN periodically transmits sync signal and the UD transmits response signal after reception of the sync signal. Using TOA measurements from the periodic sync signal at SANs, we develop a Kalman filtering method to virtually synchronize ANs with high accuracy estimation of clock parameters. Employing the virtual synchronization, and TOA measurements from the response signal and sync signal, we then develop a maximum likelihood (ML) approach, namely ML-LAS, to simultaneously localize and synchronize a moving UD. We analyze the UD localization and synchronization error, and derive the Cramer-Rao lower bound (CRLB). Different from existing asymmetric ranging network-based TOA systems, the new PARN i) uses the periodic sync signals at the SAN to exploit the temporal correlated clock information for high accuracy virtual synchronization, and ii) compensates the UD movement and clock drift using various TOA measurements to achieve consistent and simultaneous localization and synchronization performance. Numerical results verify the theoretical analysis that the new system has high accuracy in AN clock offset estimation and simultaneous localization and synchronization for a moving UD. We implement a prototype hardware system and demonstrate the feasibility and superiority of the PARN in real-world applications by experiments. △ Less

Submitted 16 March, 2021; originally announced March 2021.

arXiv:2103.02635 [pdf, other]

doi 10.1109/LSP.2021.3064755

Semidefinite Programming Two-way TOA Localization for User Devices with Motion and Clock Drift

Authors: Sihao Zhao, Xiao-** Zhang, Xiaowei Cui, Mingquan Lu

Abstract: In two-way time-of-arrival (TOA) systems, a user device (UD) obtains its position by round-trip communications to a number of anchor nodes (ANs) at known locations. The objective function of the maximum likelihood (ML) method for two-way TOA localization is nonconvex. Thus, the widely-adopted Gauss-Newton iterative method to solve the ML estimator usually suffers from the local minima problem. In… ▽ More In two-way time-of-arrival (TOA) systems, a user device (UD) obtains its position by round-trip communications to a number of anchor nodes (ANs) at known locations. The objective function of the maximum likelihood (ML) method for two-way TOA localization is nonconvex. Thus, the widely-adopted Gauss-Newton iterative method to solve the ML estimator usually suffers from the local minima problem. In this paper, we convert the original estimator into a convex problem by relaxation, and develop a new semidefinite programming (SDP) based localization method for moving UDs, namely SDP-M. Numerical result demonstrates that compared with the iterative method, which often fall into local minima, the SDP-M always converge to the global optimal solution and significantly reduces the localization error by more than 40%. It also has stable localization accuracy regardless of the UD movement, and outperforms the conventional method for stationary UDs, which has larger error with growing UD velocity. △ Less

Submitted 3 March, 2021; originally announced March 2021.

arXiv:2102.04429 [pdf, other]

Federated Acoustic Modeling For Automatic Speech Recognition

Authors: Xiaodong Cui, Songtao Lu, Brian Kingsbury

Abstract: Data privacy and protection is a crucial issue for any automatic speech recognition (ASR) service provider when dealing with clients. In this paper, we investigate federated acoustic modeling using data from multiple clients. A client's data is stored on a local data server and the clients communicate only model parameters with a central server, and not their data. The communication happens infreq… ▽ More Data privacy and protection is a crucial issue for any automatic speech recognition (ASR) service provider when dealing with clients. In this paper, we investigate federated acoustic modeling using data from multiple clients. A client's data is stored on a local data server and the clients communicate only model parameters with a central server, and not their data. The communication happens infrequently to reduce the communication cost. To mitigate the non-iid issue, client adaptive federated training (CAFT) is proposed to canonicalize data across clients. The experiments are carried out on 1,150 hours of speech data from multiple domains. Hybrid LSTM acoustic models are trained via federated learning and their performance is compared to traditional centralized acoustic model training. The experimental results demonstrate the effectiveness of the proposed federated acoustic modeling strategy. We also show that CAFT can further improve the performance of the federated acoustic model. △ Less

Submitted 8 February, 2021; originally announced February 2021.

Comments: Accepted by ICASSP 2021

arXiv:2102.01813 [pdf, other]

Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation

Authors: Mingke Xu, Fan Zhang, Xiaodong Cui, Wei Zhang

Abstract: In Speech Emotion Recognition (SER), emotional characteristics often appear in diverse forms of energy patterns in spectrograms. Typical attention neural network classifiers of SER are usually optimized on a fixed attention granularity. In this paper, we apply multiscale area attention in a deep convolutional neural network to attend emotional characteristics with varied granularities and therefor… ▽ More In Speech Emotion Recognition (SER), emotional characteristics often appear in diverse forms of energy patterns in spectrograms. Typical attention neural network classifiers of SER are usually optimized on a fixed attention granularity. In this paper, we apply multiscale area attention in a deep convolutional neural network to attend emotional characteristics with varied granularities and therefore the classifier can benefit from an ensemble of attentions with different scales. To deal with data sparsity, we conduct data augmentation with vocal tract length perturbation (VTLP) to improve the generalization capability of the classifier. Experiments are carried out on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset. We achieved 79.34% weighted accuracy (WA) and 77.54% unweighted accuracy (UA), which, to the best of our knowledge, is the state of the art on this dataset. △ Less

Submitted 2 February, 2021; originally announced February 2021.

Comments: Accepted by ICASSP 2021

arXiv:2102.00131 [pdf, other]

doi 10.1109/TSP.2022.3168363

New Closed-form Joint Localization and Synchronization using Sequential One-way TOAs

Authors: Ningyan Guo, Sihao Zhao, Xiao-** Zhang, Zheng Yao, Xiaowei Cui, Mingquan Lu

Abstract: It is an essential technique for the moving user nodes (UNs) with clock offset and clock skew to resolve the joint localization and synchronization (JLAS) problem. Existing iterative maximum likelihood methods using sequential one-way time-of-arrival (TOA) measurements from the anchor nodes' (AN) broadcast signals require a good initial guess and have a computational complexity that grows with the… ▽ More It is an essential technique for the moving user nodes (UNs) with clock offset and clock skew to resolve the joint localization and synchronization (JLAS) problem. Existing iterative maximum likelihood methods using sequential one-way time-of-arrival (TOA) measurements from the anchor nodes' (AN) broadcast signals require a good initial guess and have a computational complexity that grows with the number of iterations, given the size of the problem. In this paper, we propose a new closed-form JLAS approach, namely CFJLAS, which achieves the asymptotically optimal solution in one shot without initialization when the noise is small, and has a low computational complexity. After squaring and differencing the sequential TOA measurement equations, we devise two intermediate variables to reparameterize the non-linear problem. In this way, we convert the problem to a simpler one of solving two simultaneous quadratic equations. We then solve the equations analytically to obtain a raw closed-form JLAS estimation. Finally, we apply a weighted least squares (WLS) step to optimize the estimation. We derive the Cramer-Rao lower bound (CRLB), analyze the estimation error, and show that the estimation accuracy of the CFJLAS reaches the CRLB under the small noise condition. The complexity of the new CFJLAS is only determined by the size of the problem, unlike the conventional iterative method, whose complexity is additionally multiplied by the number of iterations. Simulations in a 2D scene verify that the estimation accuracies of the new CFJLAS method in position, velocity, clock offset, and clock skew all reach the CRLB under the small noise condition. Compared with the conventional iterative method, the proposed new CFJLAS method does not require initialization, obtains the optimal solution under the small noise condition, and has a low computational complexity. △ Less

Submitted 14 April, 2022; v1 submitted 29 January, 2021; originally announced February 2021.

arXiv:2101.01636 [pdf, other]

doi 10.1109/JIOT.2021.3055680

Optimal Localization with Sequential Pseudorange Measurements for Moving Users in a Time Division Broadcast Positioning System

Authors: Sihao Zhao, Xiao-** Zhang, Xiaowei Cui, Mingquan Lu

Abstract: In a time division broadcast positioning system (TDBPS), a user device (UD) determines its position by obtaining sequential time-of-arrival (TOA) or pseudorange measurements from signals broadcast by multiple synchronized base stations (BSs). The existing localization method using sequential pseudorange measurements and a linear clock drift model for the TDPBS, namely LSPM-D, does not compensate t… ▽ More In a time division broadcast positioning system (TDBPS), a user device (UD) determines its position by obtaining sequential time-of-arrival (TOA) or pseudorange measurements from signals broadcast by multiple synchronized base stations (BSs). The existing localization method using sequential pseudorange measurements and a linear clock drift model for the TDPBS, namely LSPM-D, does not compensate the position displacement caused by the UD movement and will result in position error. In this paper, depending on the knowledge of the UD velocity, we develop a set of optimal localization methods for different cases. First, for known UD velocity, we develop the optimal localization method, namely LSPM-KVD, to compensate the movement-caused position error. We show that the LSPM-D is a special case of the LSPM-KVD when the UD is stationary with zero velocity. Second, for the case with unknown UD velocity, we develop a maximum likelihood (ML) method to jointly estimate the UD position and velocity, namely LSPM-UVD. Third, in the case that we have prior distribution information of the UD velocity, we present a maximum a posteriori (MAP) estimator for localization, namely LSPM-PVD. We derive the Cramer-Rao lower bound (CRLB) for all three estimators and analyze their localization error performance. We show that the position error of the LSPM-KVD increases as the assumed known velocity deviates from the true value. As expected, the LSPM-KVD has the smallest position error while the LSPM-PVD and the LSPM-UVD are more robust when the prior knowledge of the UD velocity is limited. Numerical results verify the theoretical analysis on the optimality and the positioning accuracy of the proposed methods. △ Less

Submitted 1 February, 2021; v1 submitted 5 January, 2021; originally announced January 2021.

arXiv:2011.12892 [pdf]

doi 10.1080/00396265.2019.1683327

Single point positioning using full and fractional pseudorange measurements from GPS and BDS

Authors: Sihao Zhao, Xiaowei Cui, Mingquan Lu

Abstract: In conventional global navigation satellite system (GNSS) receivers, usually full pseudorange measurements are required to complete a single point position fix. However, to obtain full pseudorange measurements takes longer time than for fractional pseudorange measurements. Considering such a fact, in order to shorten the time to first fix and improve the position accuracy during cold or warm start… ▽ More In conventional global navigation satellite system (GNSS) receivers, usually full pseudorange measurements are required to complete a single point position fix. However, to obtain full pseudorange measurements takes longer time than for fractional pseudorange measurements. Considering such a fact, in order to shorten the time to first fix and improve the position accuracy during cold or warm start of a dual-constellation GNSS receiver, we propose a positioning algorithm using full and fractional pseudorange measurements from the two navigational constellations. This method uses four full pseudorange measurements from one constellation along with fractional ones from either or both constellations to obtain a potentially rapid position result with an identical accuracy to that of the conventional positioning method using full measurements. Tests with simulated and real Global Positioning System (GPS) and BeiDou Navigation Satellite System (BDS) data demonstrate that the proposed method can generate correct single point position solutions and the position error is identical with the result from the conventional approach using the full pseudorange measurements. △ Less

Submitted 25 November, 2020; originally announced November 2020.

arXiv:2011.12864 [pdf, other]

doi 10.1109/JIOT.2020.3010479

A Closed-form Localization Method Utilizing Pseudorange Measurements from Two Non-synchronized Positioning Systems

Authors: Sihao Zhao, Xiao-** Zhang, Xiaowei Cui, Mingquan Lu

Abstract: In a time-of-arrival (TOA) or pseudorange based positioning system, user location is obtained by observing multiple anchor nodes (AN) at known positions. Utilizing more than one positioning systems, e.g., combining Global Positioning System (GPS) and BeiDou Navigation Satellite System (BDS), brings better positioning accuracy. However, ANs from two systems are usually synchronized to two different… ▽ More In a time-of-arrival (TOA) or pseudorange based positioning system, user location is obtained by observing multiple anchor nodes (AN) at known positions. Utilizing more than one positioning systems, e.g., combining Global Positioning System (GPS) and BeiDou Navigation Satellite System (BDS), brings better positioning accuracy. However, ANs from two systems are usually synchronized to two different clock sources. Different from single-system localization, an extra user-to-system clock offset needs to be handled. Existing dual-system methods either have high computational complexity or sub-optimal positioning accuracy. In this paper, we propose a new closed-form dual-system localization (CDL) approach that has low complexity and optimal localization accuracy. We first convert the nonlinear problem into a linear one by squaring the distance equations and employing intermediate variables. Then, a weighted least squares (WLS) method is used to optimize the positioning accuracy. We prove that the positioning error of the new method reaches Cramer-Rao Lower Bound (CRLB) in far field conditions with small measurement noise. Simulations on 2D and 3D positioning scenes are conducted. Results show that, compared with the iterative approach, which has high complexity and requires a good initialization, the new CDL method does not require initialization and has lower computational complexity with comparable positioning accuracy. Numerical results verify the theoretical analysis on positioning accuracy, and show that the new CDL method has superior performance over the state-of-the-art closed-form method. Experiments using real GPS and BDS data verify the applicability of the new CDL method and the superiority of its performance in the real world. △ Less

Submitted 25 November, 2020; originally announced November 2020.

arXiv:2011.12272 [pdf, other]

doi 10.1109/TVT.2021.3092255

Optimal Two-way TOA Localization and Synchronization for Moving User Devices with Clock Drift

Authors: Sihao Zhao, Xiao-** Zhang, Xiaowei Cui, Mingquan Lu

Abstract: In two-way time-of-arrival (TOA) systems, a user device (UD) obtains its position and timing information by round-trip communications to a number of anchor nodes (ANs) at known locations. Compared with the one-way TOA technique, the two-way TOA scheme is easy to implement and has higher localization and synchronization accuracy. Existing two-way TOA methods assume a stationary UD. This will cause… ▽ More In two-way time-of-arrival (TOA) systems, a user device (UD) obtains its position and timing information by round-trip communications to a number of anchor nodes (ANs) at known locations. Compared with the one-way TOA technique, the two-way TOA scheme is easy to implement and has higher localization and synchronization accuracy. Existing two-way TOA methods assume a stationary UD. This will cause uncompensated position and timing errors. In this article, we propose an optimal maximum likelihood (ML) based two-way TOA localization and synchronization method, namely TWLAS. Different from the existing methods, it takes the UD mobility into account to compensate the error caused by the UD motion. We analyze its estimation error and derive the Cramer-Rao lower bound (CRLB). We show that the conventional two-way TOA method is a special case of the TWLAS when the UD is stationary, and the TWLAS has high estimation accuracy than the conventional one-way TOA method. We also derive the estimation error in the case of deviated UD velocity information. Numerical result demonstrates that the estimation accuracy of the new TWLAS for a moving UD reaches CRLB, better than that of the conventional one-way TOA method, and the estimation error caused by the deviated UD velocity information is consistent with the theoretical analysis. △ Less

Submitted 22 June, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

arXiv:2011.00454 [pdf, other]

doi 10.1007/s10489-021-03053-3

Dynamic radiomics: a new methodology to extract quantitative time-related features from tomographic images

Authors: Fengying Che, Ruichuan Shi, Jian Wu, Haoran Li, Shuqin Li, Weixing Chen, Hao Zhang, Zhi Li, Xiaoyu Cui

Abstract: The feature extraction methods of radiomics are mainly based on static tomographic images at a certain moment, while the occurrence and development of disease is a dynamic process that cannot be fully reflected by only static characteristics. This study proposes a new dynamic radiomics feature extraction workflow that uses time-dependent tomographic images of the same patient, focuses on the chang… ▽ More The feature extraction methods of radiomics are mainly based on static tomographic images at a certain moment, while the occurrence and development of disease is a dynamic process that cannot be fully reflected by only static characteristics. This study proposes a new dynamic radiomics feature extraction workflow that uses time-dependent tomographic images of the same patient, focuses on the changes in image features over time, and then quantifies them as new dynamic features for diagnostic or prognostic evaluation. We first define the mathematical paradigm of dynamic radiomics and introduce three specific methods that can describe the transformation process of features over time. Three different clinical problems are used to validate the performance of the proposed dynamic feature with conventional 2D and 3D static features. △ Less

Submitted 3 June, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

Comments: Appl Intell (2022)

arXiv:2007.14007 [pdf, other]

doi 10.1109/TGRS.2020.3006534

Coupled Convolutional Neural Network with Adaptive Response Function Learning for Unsupervised Hyperspectral Super-Resolution

Authors: Ke Zheng, Lianru Gao, Wenzhi Liao, Danfeng Hong, Bing Zhang, Ximin Cui, Jocelyn Chanussot

Abstract: Due to the limitations of hyperspectral imaging systems, hyperspectral imagery (HSI) often suffers from poor spatial resolution, thus hampering many applications of the imagery. Hyperspectral super-resolution refers to fusing HSI and MSI to generate an image with both high spatial and high spectral resolutions. Recently, several new methods have been proposed to solve this fusion problem, and most… ▽ More Due to the limitations of hyperspectral imaging systems, hyperspectral imagery (HSI) often suffers from poor spatial resolution, thus hampering many applications of the imagery. Hyperspectral super-resolution refers to fusing HSI and MSI to generate an image with both high spatial and high spectral resolutions. Recently, several new methods have been proposed to solve this fusion problem, and most of these methods assume that the prior information of the Point Spread Function (PSF) and Spectral Response Function (SRF) are known. However, in practice, this information is often limited or unavailable. In this work, an unsupervised deep learning-based fusion method - HyCoNet - that can solve the problems in HSI-MSI fusion without the prior PSF and SRF information is proposed. HyCoNet consists of three coupled autoencoder nets in which the HSI and MSI are unmixed into endmembers and abundances based on the linear unmixing model. Two special convolutional layers are designed to act as a bridge that coordinates with the three autoencoder nets, and the PSF and SRF parameters are learned adaptively in the two convolution layers during the training process. Furthermore, driven by the joint loss function, the proposed method is straightforward and easily implemented in an end-to-end training manner. The experiments performed in the study demonstrate that the proposed method performs well and produces robust results for different datasets and arbitrary PSFs and SRFs. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing,2020

arXiv:2005.12373 [pdf]

doi 10.1109/TSG.2020.2998041

Large-Signal Stability Criteria in DC Power Grids with Distributed-Controlled Converters and Constant Power Loads

Authors: Fangyuan Chang, Xiaofan Cui, Mengqi Wang, Wencong Su, Alex Q. Huang

Abstract: The increasing adoption of power electronic devices may lead to large disturbance and destabilization of future power systems. However, stability criteria are still an unsolved puzzle, since traditional small-signal stability analysis is not applicable to power electronics-enabled power systems when a large disturbance occurs, such as a fault, a pulse power load, or load switching. To address this… ▽ More The increasing adoption of power electronic devices may lead to large disturbance and destabilization of future power systems. However, stability criteria are still an unsolved puzzle, since traditional small-signal stability analysis is not applicable to power electronics-enabled power systems when a large disturbance occurs, such as a fault, a pulse power load, or load switching. To address this issue, this paper presents for the first time the rigorous derivation of the sufficient criteria for large-signal stability in DC microgrids with distributed-controlled DC-DC power converters. A novel type of closed-loop converter controllers is designed and considered. Moreover, this paper is the first to prove that the well-known and frequently cited Brayton-Moser mixed potential theory (published in 1964) is incomplete. Case studies are carried out to illustrate the defects of Brayton-Moser mixed potential theory and verify the effectiveness of the proposed novel stability criteria. △ Less

Submitted 25 May, 2020; originally announced May 2020.

arXiv:2005.10053 [pdf, other]

Map Generation from Large Scale Incomplete and Inaccurate Data Labels

Authors: Rui Zhang, Conrad Albrecht, Wei Zhang, Xiaodong Cui, Ulrich Finkler, David Kung, Siyuan Lu

Abstract: Accurately and globally map** human infrastructure is an important and challenging task with applications in routing, regulation compliance monitoring, and natural disaster response management etc.. In this paper we present progress in develo** an algorithmic pipeline and distributed compute system that automates the process of map creation using high resolution aerial images. Unlike previous… ▽ More Accurately and globally map** human infrastructure is an important and challenging task with applications in routing, regulation compliance monitoring, and natural disaster response management etc.. In this paper we present progress in develo** an algorithmic pipeline and distributed compute system that automates the process of map creation using high resolution aerial images. Unlike previous studies, most of which use datasets that are available only in a few cities across the world, we utilizes publicly available imagery and map data, both of which cover the contiguous United States (CONUS). We approach the technical challenge of inaccurate and incomplete training data adopting state-of-the-art convolutional neural network architectures such as the U-Net and the CycleGAN to incrementally generate maps with increasingly more accurate and more complete labels of man-made infrastructure such as roads and houses. Since scaling the map** task to CONUS calls for parallelization, we then adopted an asynchronous distributed stochastic parallel gradient descent training scheme to distribute the computational workload onto a cluster of GPUs with nearly linear speed-up. △ Less

Submitted 20 May, 2020; originally announced May 2020.

Comments: This paper is accepted by KDD 2020

ACM Class: I.2.10

arXiv:2002.10502 [pdf, other]

Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition

Authors: Xiaodong Cui, Wei Zhang, Ulrich Finkler, George Saon, Michael Picheny, David Kung

Abstract: The past decade has witnessed great progress in Automatic Speech Recognition (ASR) due to advances in deep learning. The improvements in performance can be attributed to both improved models and large-scale training data. Key to training such models is the employment of efficient distributed learning techniques. In this article, we provide an overview of distributed training techniques for deep ne… ▽ More The past decade has witnessed great progress in Automatic Speech Recognition (ASR) due to advances in deep learning. The improvements in performance can be attributed to both improved models and large-scale training data. Key to training such models is the employment of efficient distributed learning techniques. In this article, we provide an overview of distributed training techniques for deep neural network acoustic models for ASR. Starting with the fundamentals of data parallel stochastic gradient descent (SGD) and ASR acoustic modeling, we will investigate various distributed training strategies and their realizations in high performance computing (HPC) environments with an emphasis on striking the balance between communication and computation. Experiments are carried out on a popular public benchmark to study the convergence, speedup and recognition performance of the investigated strategies. △ Less

Submitted 24 February, 2020; originally announced February 2020.

Comments: Accepted to IEEE Signal Processing Magazine

arXiv:2001.04537 [pdf]

doi 10.1002/mp.14648

Deep convolutional neural networks for multi-planar lung nodule detection: improvement in small nodule identification

Authors: Sunyi Zheng, Ludo J. Cornelissen, Xiaonan Cui, Xue** **g, Raymond N. J. Veldhuis, Matthijs Oudkerk, Peter M. A. van Ooijen

Abstract: Objective: In clinical practice, small lung nodules can be easily overlooked by radiologists. The paper aims to provide an efficient and accurate detection system for small lung nodules while kee** good performance for large nodules. Methods: We propose a multi-planar detection system using convolutional neural networks. The 2-D convolutional neural network model, U-net++, was trained by axial,… ▽ More Objective: In clinical practice, small lung nodules can be easily overlooked by radiologists. The paper aims to provide an efficient and accurate detection system for small lung nodules while kee** good performance for large nodules. Methods: We propose a multi-planar detection system using convolutional neural networks. The 2-D convolutional neural network model, U-net++, was trained by axial, coronal, and sagittal slices for the candidate detection task. All possible nodule candidates from the three different planes are combined. For false positive reduction, we apply 3-D multi-scale dense convolutional neural networks to efficiently remove false positive candidates. We use the public LIDC-IDRI dataset which includes 888 CT scans with 1186 nodules annotated by four radiologists. Results: After ten-fold cross-validation, our proposed system achieves a sensitivity of 94.2% with 1.0 false positive/scan and a sensitivity of 96.0% with 2.0 false positives/scan. Although it is difficult to detect small nodules (i.e. < 6 mm), our designed CAD system reaches a sensitivity of 93.4% (95.0%) of these small nodules at an overall false positive rate of 1.0 (2.0) false positives/scan. At the nodule candidate detection stage, results show that a multi-planar method is capable to detect more nodules compared to using a single plane. Conclusion: Our approach achieves good performance not only for small nodules, but also for large lesions on this dataset. This demonstrates the effectiveness and efficiency of our developed CAD system for lung nodule detection. Significance: The proposed system could provide support for radiologists on early detection of lung cancer. △ Less

Submitted 9 December, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

arXiv:1912.01182 [pdf]

doi 10.33012/2019.16886

Range-only Collaborative Localization for Ground Vehicles

Authors: Qin Shi, Xiaowei Cui, Sihao Zhao, Jian Wen, Mingquan Lu

Abstract: High-accuracy absolute localization for a team of vehicles is essential when accomplishing various kinds of tasks. As a promising approach, collaborative localization fuses the individual motion measurements and the inter-vehicle measurements to collaboratively estimate the states. In this paper, we focus on the range-only collaborative localization, which specifies the inter-vehicle measurements… ▽ More High-accuracy absolute localization for a team of vehicles is essential when accomplishing various kinds of tasks. As a promising approach, collaborative localization fuses the individual motion measurements and the inter-vehicle measurements to collaboratively estimate the states. In this paper, we focus on the range-only collaborative localization, which specifies the inter-vehicle measurements as inter-vehicle ranging measurements. We first investigate the observability properties of the system and derive that to achieve bounded localization errors, two vehicles are required to remain static like external infrastructures. Under the guide of the observability analysis, we then propose our range-only collaborative localization system which categorize the ground vehicles into two static vehicles and dynamic vehicles. The vehicles are connected utilizing a UWB network that is capable of both producing inter-vehicle ranging measurements and communication. Simulation results validate the observability analysis and demonstrate that collaborative localization is capable of achieving higher accuracy when utilizing the inter-vehicle measurements. Extensive experimental results are performed for a team of 3 and 5 vehicles. The real-world results illustrate that our proposed system enables accurate and real-time estimation of all vehicles' absolute poses. △ Less

Submitted 2 December, 2019; originally announced December 2019.

Comments: Proceedings of the 32nd International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2019)

arXiv:1911.11995 [pdf, other]

doi 10.1109/TAES.2020.2979640

BLAS: Broadcast Relative Localization and Clock Synchronization for Dynamic Dense Multi-Agent Systems

Authors: Qin Shi, Xiaowei Cui, Sihao Zhao, Shuang Xu, Mingquan Lu

Abstract: The spatiotemporal information plays crucial roles in a multi-agent system (MAS). However, for a highly dynamic and dense MAS in unknown environments, estimating its spatiotemporal states is a difficult problem. In this paper, we present BLAS: a wireless broadcast relative localization and clock synchronization system to address these challenges. Our BLAS system exploits a broadcast architecture,… ▽ More The spatiotemporal information plays crucial roles in a multi-agent system (MAS). However, for a highly dynamic and dense MAS in unknown environments, estimating its spatiotemporal states is a difficult problem. In this paper, we present BLAS: a wireless broadcast relative localization and clock synchronization system to address these challenges. Our BLAS system exploits a broadcast architecture, under which a MAS is categorized into parent agents that broadcast wireless packets and child agents that are passive receivers, to reduce the number of required packets among agents for relative localization and clock synchronization. We first propose an asynchronous broadcasting and passively receiving (ABPR) protocol. The protocol schedules the broadcast of parent agents using a distributed time division multiple access (D-TDMA) scheme and delivers inter-agent information used for joint relative localization and clock synchronization. We then present distributed state estimation approaches in parent and child agents that utilize the broadcast inter-agent information for joint estimation of spatiotemporal states. The simulations and real-world experiments based on ultra-wideband (UWB) illustrate that our proposed BLAS cannot only enable accurate, high-frequency and real-time estimation of relative position and clock parameters but also support theoretically an unlimited number of agents. △ Less

Submitted 27 November, 2019; originally announced November 2019.

Journal ref: IEEE Transactions on Aerospace and Electronic Systems, 2020

arXiv:1908.03455 [pdf, other]

Challenging the Boundaries of Speech Recognition: The MALACH Corpus

Authors: Michael Picheny, Zóltan Tüske, Brian Kingsbury, Kartik Audhkhasi, Xiaodong Cui, George Saon

Abstract: There has been huge progress in speech recognition over the last several years. Tasks once thought extremely difficult, such as SWITCHBOARD, now approach levels of human performance. The MALACH corpus (LDC catalog LDC2012S05), a 375-Hour subset of a large archive of Holocaust testimonies collected by the Survivors of the Shoah Visual History Foundation, presents significant challenges to the speec… ▽ More There has been huge progress in speech recognition over the last several years. Tasks once thought extremely difficult, such as SWITCHBOARD, now approach levels of human performance. The MALACH corpus (LDC catalog LDC2012S05), a 375-Hour subset of a large archive of Holocaust testimonies collected by the Survivors of the Shoah Visual History Foundation, presents significant challenges to the speech community. The collection consists of unconstrained, natural speech filled with disfluencies, heavy accents, age-related coarticulations, un-cued speaker and language switching, and emotional speech - all still open problems for speech recognition systems. Transcription is challenging even for skilled human annotators. This paper proposes that the community place focus on the MALACH corpus to develop speech recognition systems that are more robust with respect to accents, disfluencies and emotional speech. To reduce the barrier for entry, a lexicon and training and testing setups have been created and baseline results using current deep learning technologies are presented. The metadata has just been released by LDC (LDC2019S11). It is hoped that this resource will enable the community to build on top of these baselines so that the extremely important information in these and related oral histories becomes accessible to a wider audience. △ Less

Submitted 9 August, 2019; originally announced August 2019.

Comments: Accepted for publication at INTERSPEECH 2019

arXiv:1907.05701 [pdf, other]

A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition

Authors: Wei Zhang, Xiaodong Cui, Ulrich Finkler, George Saon, Abdullah Kayi, Alper Buyuktosunoglu, Brian Kingsbury, David Kung, Michael Picheny

Abstract: Modern Automatic Speech Recognition (ASR) systems rely on distributed deep learning to for quick training completion. To enable efficient distributed training, it is imperative that the training algorithms can converge with a large mini-batch size. In this work, we discovered that Asynchronous Decentralized Parallel Stochastic Gradient Descent (ADPSGD) can work with much larger batch size than com… ▽ More Modern Automatic Speech Recognition (ASR) systems rely on distributed deep learning to for quick training completion. To enable efficient distributed training, it is imperative that the training algorithms can converge with a large mini-batch size. In this work, we discovered that Asynchronous Decentralized Parallel Stochastic Gradient Descent (ADPSGD) can work with much larger batch size than commonly used Synchronous SGD (SSGD) algorithm. On commonly used public SWB-300 and SWB-2000 ASR datasets, ADPSGD can converge with a batch size 3X as large as the one used in SSGD, thus enable training at a much larger scale. Further, we proposed a Hierarchical-ADPSGD (H-ADPSGD) system in which learners on the same computing node construct a super learner via a fast allreduce implementation, and super learners deploy ADPSGD algorithm among themselves. On a 64 Nvidia V100 GPU cluster connected via a 100Gb/s Ethernet network, our system is able to train SWB-2000 to reach a 7.6% WER on the Hub5-2000 Switchboard (SWB) test-set and a 13.2% WER on the Call-home (CH) test-set in 5.2 hours. To the best of our knowledge, this is the fastest ASR training system that attains this level of model accuracy for SWB-2000 task to be ever reported in the literature. △ Less

Submitted 10 July, 2019; originally announced July 2019.

Journal ref: INTERSPEECH 2019

arXiv:1907.04887 [pdf, other]

Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition

Authors: Khoi-Nguyen C. Mac, Xiaodong Cui, Wei Zhang, Michael Picheny

Abstract: In automatic speech recognition (ASR), wideband (WB) and narrowband (NB) speech signals with different sampling rates typically use separate acoustic models. Therefore mixed-bandwidth (MB) acoustic modeling has important practical values for ASR system deployment. In this paper, we extensively investigate large-scale MB deep neural network acoustic modeling for ASR using 1,150 hours of WB data and… ▽ More In automatic speech recognition (ASR), wideband (WB) and narrowband (NB) speech signals with different sampling rates typically use separate acoustic models. Therefore mixed-bandwidth (MB) acoustic modeling has important practical values for ASR system deployment. In this paper, we extensively investigate large-scale MB deep neural network acoustic modeling for ASR using 1,150 hours of WB data and 2,300 hours of NB data. We study various MB strategies including downsampling, upsampling and bandwidth extension for MB acoustic modeling and evaluate their performance on 8 diverse WB and NB test sets from various application domains. To deal with the large amounts of training data, distributed training is carried out on multiple GPUs using synchronous data parallelism. △ Less

Submitted 10 July, 2019; originally announced July 2019.

Comments: Interspeech 2019

arXiv:1907.04882 [pdf, ps, other]

Acoustic Model Optimization Based On Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition

Authors: Xiaodong Cui, Michael Picheny

Abstract: Evolutionary stochastic gradient descent (ESGD) was proposed as a population-based approach that combines the merits of gradient-aware and gradient-free optimization algorithms for superior overall optimization performance. In this paper we investigate a variant of ESGD for optimization of acoustic models for automatic speech recognition (ASR). In this variant, we assume the existence of a well-tr… ▽ More Evolutionary stochastic gradient descent (ESGD) was proposed as a population-based approach that combines the merits of gradient-aware and gradient-free optimization algorithms for superior overall optimization performance. In this paper we investigate a variant of ESGD for optimization of acoustic models for automatic speech recognition (ASR). In this variant, we assume the existence of a well-trained acoustic model and use it as an anchor in the parent population whose good "gene" will propagate in the evolution to the offsprings. We propose an ESGD algorithm leveraging the anchor models such that it guarantees the best fitness of the population will never degrade from the anchor model. Experiments on 50-hour Broadcast News (BN50) and 300-hour Switchboard (SWB300) show that the ESGD with anchors can further improve the loss and ASR performance over the existing well-trained acoustic models. △ Less

Submitted 10 July, 2019; originally announced July 2019.

Comments: Interspeech 2019

arXiv:1904.04956 [pdf, other]

Distributed Deep Learning Strategies For Automatic Speech Recognition

Authors: Wei Zhang, Xiaodong Cui, Ulrich Finkler, Brian Kingsbury, George Saon, David Kung, Michael Picheny

Abstract: In this paper, we propose and investigate a variety of distributed deep learning strategies for automatic speech recognition (ASR) and evaluate them with a state-of-the-art Long short-term memory (LSTM) acoustic model on the 2000-hour Switchboard (SWB2000), which is one of the most widely used datasets for ASR performance benchmark. We first investigate what are the proper hyper-parameters (e.g.,… ▽ More In this paper, we propose and investigate a variety of distributed deep learning strategies for automatic speech recognition (ASR) and evaluate them with a state-of-the-art Long short-term memory (LSTM) acoustic model on the 2000-hour Switchboard (SWB2000), which is one of the most widely used datasets for ASR performance benchmark. We first investigate what are the proper hyper-parameters (e.g., learning rate) to enable the training with sufficiently large batch size without impairing the model accuracy. We then implement various distributed strategies, including Synchronous (SYNC), Asynchronous Decentralized Parallel SGD (ADPSGD) and the hybrid of the two HYBRID, to study their runtime/accuracy trade-off. We show that we can train the LSTM model using ADPSGD in 14 hours with 16 NVIDIA P100 GPUs to reach a 7.6% WER on the Hub5- 2000 Switchboard (SWB) test set and a 13.1% WER on the CallHome (CH) test set. Furthermore, we can train the model using HYBRID in 11.5 hours with 32 NVIDIA V100 GPUs without loss in accuracy. △ Less

Submitted 9 April, 2019; originally announced April 2019.

Comments: Published in ICASSP'19

Showing 1–50 of 52 results for author: Cui, X