Search | arXiv e-print repository

arXiv:2405.20219 [pdf, other]

System Identification for Lithium-Ion Batteries with Nonlinear Coupled Electro-Thermal Dynamics via Bayesian Optimization

Authors: Hao Tu, Xinfan Lin, Yebin Wang, Huazhen Fang

Abstract: Essential to various practical applications of lithium-ion batteries is the availability of accurate equivalent circuit models. This paper presents a new coupled electro-thermal model for batteries and studies how to extract it from data. We consider the problem of maximum likelihood parameter estimation, which, however, is nontrivial to solve as the model is nonlinear in both its dynamics and mea… ▽ More Essential to various practical applications of lithium-ion batteries is the availability of accurate equivalent circuit models. This paper presents a new coupled electro-thermal model for batteries and studies how to extract it from data. We consider the problem of maximum likelihood parameter estimation, which, however, is nontrivial to solve as the model is nonlinear in both its dynamics and measurement. We propose to leverage the Bayesian optimization approach, owing to its machine learning-driven capability in handling complex optimization problems and searching for global optima. To enhance the parameter search efficiency, we dynamically narrow and refine the search space in Bayesian optimization. The proposed system identification approach can efficiently determine the parameters of the coupled electro-thermal model. It is amenable to practical implementation, with few requirements on the experiment, data types, and optimization setups, and well applicable to many other battery models. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 2024 American Control Conference(ACC)

arXiv:2404.14767 [pdf, other]

Remaining Energy Prediction for Lithium-Ion Batteries: A Machine Learning Approach

Authors: Hao Tu, Manashita Borah, Scott Moura, Yebin Wang, Huazhen Fang

Abstract: Lithium-ion batteries have found their way into myriad sectors of industry to drive electrification, decarbonization, and sustainability. A crucial aspect in ensuring their safe and optimal performance is monitoring their energy state. In this paper, we present the first study on predicting the remaining energy of a battery cell undergoing discharge over wide current ranges from low to high C-rate… ▽ More Lithium-ion batteries have found their way into myriad sectors of industry to drive electrification, decarbonization, and sustainability. A crucial aspect in ensuring their safe and optimal performance is monitoring their energy state. In this paper, we present the first study on predicting the remaining energy of a battery cell undergoing discharge over wide current ranges from low to high C-rates. The complexity of the challenge arises from the cell's C-rate-dependent energy availability as well as its intricate electro-thermal dynamics. To address this, we introduce a new definition of remaining discharge energy and then undertake a systematic effort in harnessing the power of machine learning to enable its prediction. Our effort includes two parts in cascade. First, we develop an accurate dynamic model based on integration of physics with machine learning to capture a battery's voltage and temperature behaviors. Second, based on the model, we propose a machine learning approach to predict the remaining discharge energy under arbitrary C-rates and pre-specified cut-off limits in voltage and temperature. The results from our experiments show that the proposed approach offers high prediction accuracy and amenability to training and computation. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 12 pages, 12 figures, 3 tables

arXiv:2404.08326 [pdf, other]

Quaternion-Based Attitude Stabilization Using Synergistic Hybrid Feedback With Minimal Potential Functions

Authors: Xin Tong, Qingpeng Ding, Haiyang Fang, Shing Shin Cheng

Abstract: This paper investigates the robust global attitude stabilization problem for a rigid-body system using quaternion-based feedback. We propose a novel synergistic hybrid feedback with the following notable features: (1) It demonstrates central synergism by utilizing a minimal number of potential functions; (2) It ensures consistency with respect to the unit quaternion representation of rigid-body at… ▽ More This paper investigates the robust global attitude stabilization problem for a rigid-body system using quaternion-based feedback. We propose a novel synergistic hybrid feedback with the following notable features: (1) It demonstrates central synergism by utilizing a minimal number of potential functions; (2) It ensures consistency with respect to the unit quaternion representation of rigid-body attitude; (3) Its state-feedback laws incorporate a shared action term that steers the system toward the desired attitude. We demonstrate that the proposed hybrid feedback method effectively solves the problem at hand and guarantees robust uniform global asymptotic stability. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 14 pages, 6 figures, extended version of a paper accepted for publication in Automatica

arXiv:2404.04358 [pdf, other]

Integrated Optimal Control for Fast Charging and Active Thermal Management of Lithium-Ion Batteries in Extreme Ambient Temperatures

Authors: Zehui Lu, Hao Tu, Huazhen Fang, Yebin Wang, Shaoshuai Mou

Abstract: This paper presents an integrated control strategy for fast charging and active thermal management of Lithium-ion batteries in extreme ambient temperatures. A control-oriented thermal-NDC (nonlinear double-capacitor) battery model is proposed to describe the electrical and thermal dynamics, accounting for the impact from both an active thermal source and ambient temperature. A state-feedback model… ▽ More This paper presents an integrated control strategy for fast charging and active thermal management of Lithium-ion batteries in extreme ambient temperatures. A control-oriented thermal-NDC (nonlinear double-capacitor) battery model is proposed to describe the electrical and thermal dynamics, accounting for the impact from both an active thermal source and ambient temperature. A state-feedback model predictive control algorithm is then developed for integrated fast charging and active thermal management. Numerical experiments validate the algorithm under extreme temperatures, showing that the proposed algorithm can energy-efficiently adjust the battery temperature to enhance fast charging. Additionally, an output-feedback model predictive control algorithm with an extended Kalman filter is proposed for battery charging when states are partially measurable. Numerical experiments validate the effectiveness under extreme temperatures. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2403.08948 [pdf, ps, other]

Model-free Resilient Controller Design based on Incentive Feedback Stackelberg Game and Q-learning

Authors: Jiajun Shen, Fengjun Li, Morteza Hashemi, Huazhen Fang

Abstract: In the swift evolution of Cyber-Physical Systems (CPSs) within intelligent environments, especially in the industrial domain shaped by Industry 4.0, the surge in development brings forth unprecedented security challenges. This paper explores the intricate security issues of Industrial CPSs (ICPSs), with a specific focus on the unique threats presented by intelligent attackers capable of directly c… ▽ More In the swift evolution of Cyber-Physical Systems (CPSs) within intelligent environments, especially in the industrial domain shaped by Industry 4.0, the surge in development brings forth unprecedented security challenges. This paper explores the intricate security issues of Industrial CPSs (ICPSs), with a specific focus on the unique threats presented by intelligent attackers capable of directly compromising the controller, thereby posing a direct risk to physical security. Within the framework of hierarchical control and incentive feedback Stackelberg game, we design a resilient leading controller (leader) that is adaptive to a compromised following controller (follower) such that the compromised follower acts cooperatively with the leader, aligning its strategies with the leader's objective to achieve a team-optimal solution. First, we provide sufficient conditions for the existence of an incentive Stackelberg solution when system dynamics are known. Then, we propose a Q-learning-based Approximate Dynamic Programming (ADP) approach, and corresponding algorithms for the online resolution of the incentive Stackelberg solution without requiring prior knowledge of system dynamics. Last but not least, we prove the convergence of our approach to the optimum. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 8 pages

arXiv:2402.17247 [pdf, ps, other]

Inverse Optimal Control for Linear Quadratic Tracking with Unknown Target States

Authors: Yao Li, Chengpu Yu, Hao Fang, Jie Chen

Abstract: This paper addresses the inverse optimal control for the linear quadratic tracking problem with a fixed but unknown target state, which aims to estimate the possible triplets comprising the target state, the state weight matrix, and the input weight matrix from observed optimal control input and the corresponding state trajectories. Sufficient conditions have been provided for the unique determina… ▽ More This paper addresses the inverse optimal control for the linear quadratic tracking problem with a fixed but unknown target state, which aims to estimate the possible triplets comprising the target state, the state weight matrix, and the input weight matrix from observed optimal control input and the corresponding state trajectories. Sufficient conditions have been provided for the unique determination of both the linear quadratic cost function as well as the target state. A computationally efficient and numerically reliable parameter identification algorithm is proposed by equating optimal control strategies with a system of linear equations, and the associated relative error upper bound is derived in terms of data volume and signal-to-noise ratio. Moreover, the proposed inverse optimal control algorithm is applied for the joint cluster coordination and intent identification of a multi-agent system. By incorporating the structural constraint of the Laplace matrix, the relative error upper bound can be reduced accordingly. Finally, the algorithm's efficiency and accuracy are validated by a vehicle-on-a-lever example and a multi-agent formation control example. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.05819 [pdf, other]

Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model

Authors: Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-yi Lee, David Harwath

Abstract: Recent advances in self-supervised speech models have shown significant improvement in many downstream tasks. However, these models predominantly centered on frame-level training objectives, which can fall short in spoken language understanding tasks that require semantic comprehension. Existing works often rely on additional speech-text data as intermediate targets, which is costly in the real-wo… ▽ More Recent advances in self-supervised speech models have shown significant improvement in many downstream tasks. However, these models predominantly centered on frame-level training objectives, which can fall short in spoken language understanding tasks that require semantic comprehension. Existing works often rely on additional speech-text data as intermediate targets, which is costly in the real-world setting. To address this challenge, we propose Pseudo-Word HuBERT (PW-HuBERT), a framework that integrates pseudo word-level targets into the training process, where the targets are derived from a visually-ground speech model, notably eliminating the need for speech-text paired data. Our experimental results on four spoken language understanding (SLU) benchmarks suggest the superiority of our model in capturing semantic information. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: Accepted to ICASSP 2024 workshop on Self-supervision in Audio, Speech, and Beyond (SASB)

arXiv:2402.01259 [pdf, other]

Position Aware 60 GHz mmWave Beamforming for V2V Communications Utilizing Deep Learning

Authors: Muhammad Baqer Mollah, Honggang Wang, Hua Fang

Abstract: Beamforming techniques are considered as essential parts to compensate the severe path loss in millimeter-wave (mmWave) communications by adopting large antenna arrays and formulating narrow beams to obtain satisfactory received powers. However, performing accurate beam alignment over such narrow beams for efficient link configuration by traditional beam selection approaches, mainly relied on chan… ▽ More Beamforming techniques are considered as essential parts to compensate the severe path loss in millimeter-wave (mmWave) communications by adopting large antenna arrays and formulating narrow beams to obtain satisfactory received powers. However, performing accurate beam alignment over such narrow beams for efficient link configuration by traditional beam selection approaches, mainly relied on channel state information, typically impose significant latency and computing overheads, which is often infeasible in vehicle-to-vehicle (V2V) communications like highly dynamic scenarios. In contrast, utilizing out-of-band contextual information, such as vehicular position information, is a potential alternative to reduce such overheads. In this context, this paper presents a deep learning-based solution on utilizing the vehicular position information for predicting the optimal beams having sufficient mmWave received powers so that the best V2V line-of-sight links can be ensured proactively. After experimental evaluation of the proposed solution on real-world measured mmWave sensing and communications datasets, the results show that the solution can achieve up to 84.58% of received power of link status on average, which confirm a promising solution for beamforming in mmWave at 60 GHz enabled V2V communications. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: 2024 IEEE International Conference on Communications (ICC), Denver, CO, USA

arXiv:2310.16333 [pdf, other]

Scalable Optimal Power Management for Large-Scale Battery Energy Storage Systems

Authors: Amir Farakhor, Di Wu, Yebin Wang, Huazhen Fang

Abstract: Large-scale battery energy storage systems (BESS) are hel** transition the world towards sustainability with their broad use, among others, in electrified transportation, power grid, and renewables. However, optimal power management for them is often computationally formidable. To overcome this challenge, we develop a scalable approach in the paper. The proposed approach partitions the constitut… ▽ More Large-scale battery energy storage systems (BESS) are hel** transition the world towards sustainability with their broad use, among others, in electrified transportation, power grid, and renewables. However, optimal power management for them is often computationally formidable. To overcome this challenge, we develop a scalable approach in the paper. The proposed approach partitions the constituting cells of a large-scale BESS into clusters based on their state-of-charge (SoC), temperature, and internal resistance. Each cluster is characterized by a representative model that approximately captures its collective SoC and temperature dynamics, as well as its overall power losses in charging/discharging. Based on the clusters, we then formulate a problem of receding-horizon optimal power control to minimize the power losses while promoting SoC and temperature balancing. The cluster-based power optimization will decide the power quota for each cluster, and then every cluster will split the quota among the constituent cells. Since the number of clusters is much fewer than the number of cells, the proposed approach significantly reduces the computational costs, allowing optimal power management to scale up to large-scale BESS. Extensive simulations are performed to evaluate the proposed approach. The obtained results highlight a significant computational overhead reduction by more than 60% for a small-scale and 98% for a large-scale BESS compared to the conventional cell-level optimization. Experimental validation based on a 20-cell prototype further demonstrates its effectiveness and utility. △ Less

Submitted 6 November, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: IEEE Transactions on Transportation Electrification

arXiv:2310.09700 [pdf, other]

doi 10.1109/MNET.2023.3321520

mmWave Enabled Connected Autonomous Vehicles: A Use Case with V2V Cooperative Perception

Authors: Muhammad Baqer Mollah, Honggang Wang, Mohammad Ataul Karim, Hua Fang

Abstract: Connected and autonomous vehicles (CAVs) will revolutionize tomorrow's intelligent transportation systems, being considered promising to improve transportation safety, traffic efficiency, and mobility. In fact, envisioned use cases of CAVs demand very high throughput, lower latency, highly reliable communications, and precise positioning capabilities. The availability of a large spectrum at millim… ▽ More Connected and autonomous vehicles (CAVs) will revolutionize tomorrow's intelligent transportation systems, being considered promising to improve transportation safety, traffic efficiency, and mobility. In fact, envisioned use cases of CAVs demand very high throughput, lower latency, highly reliable communications, and precise positioning capabilities. The availability of a large spectrum at millimeter-wave (mmWave) band potentially promotes new specifications in spectrum technologies capable of supporting such service requirements. In this article, we specifically focus on how mmWave communications are being approached in vehicular standardization activities, CAVs use cases and deployment challenges in realizing the future fully connected settings. Finally, we also present a detailed performance assessment on mmWave-enabled vehicle-to-vehicle (V2V) cooperative perception as an example case study to show the impact of different configurations. △ Less

Submitted 14 October, 2023; originally announced October 2023.

Comments: 8 Pages

Journal ref: IEEE Network, 2023

arXiv:2310.08045 [pdf, other]

Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning

Authors: Iman Askari, Xumein Tu, Shen Zeng, Huazhen Fang

Abstract: Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC metho… ▽ More Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC methods will be computationally too heavy to be a viable solution. In a departure, we propose the idea of model predictive inferential control (MPIC), which seeks to infer the best control decisions from the control objectives and constraints. Following the idea, we convert the MPC problem for motion planning into a Bayesian state estimation problem. Then, we develop a new particle filtering/smoothing approach to perform the estimation. This approach is implemented as banks of unscented Kalman filters/smoothers and offers high sampling efficiency, fast computation, and estimation accuracy. We evaluate the MPIC approach through a simulation study of autonomous driving in different scenarios, along with an exhaustive comparison with gradient-based MPC. The results show that the MPIC approach has considerable computational efficiency, regardless of complex neural network architectures, and shows the capability to solve large-scale MPC problems for neural state-space models. △ Less

Submitted 19 October, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.03132 [pdf, ps, other]

Application-Oriented Co-Design of Motors and Motions for a 6DOF Robot Manipulator

Authors: Adrian Stein, Yebin Wang, Yusuke Sakamoto, Bingnan Wang, Huazhen Fang

Abstract: This work investigates an application-driven co-design problem where the motion and motors of a six degrees of freedom robotic manipulator are optimized simultaneously, and the application is characterized by a set of tasks. Unlike the state-of-the-art which selects motors from a product catalogue and performs co-design for a single task, this work designs the motor geometry as well as motion for… ▽ More This work investigates an application-driven co-design problem where the motion and motors of a six degrees of freedom robotic manipulator are optimized simultaneously, and the application is characterized by a set of tasks. Unlike the state-of-the-art which selects motors from a product catalogue and performs co-design for a single task, this work designs the motor geometry as well as motion for a specific application. Contributions are made towards solving the proposed co-design problem in a computationally-efficient manner. First, a two-step process is proposed, where multiple motor designs are identified by optimizing motions and motors for multiple tasks one by one, and then are reconciled to determine the final motor design. Second, magnetic equivalent circuit modeling is exploited to establish the analytic map** from motor design parameters to dynamic models and objective functions to facilitate the subsequent differentiable simulation. Third, a direct-collocation-based differentiable simulator of motor and robotic arm dynamics is developed to balance the computational complexity and numerical stability. Simulation verifies that higher performance for a specific application can be achieved with the multi-task method, compared to several benchmark co-design methods. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2305.08744 [pdf, other]

doi 10.1109/TASLP.2023.3265202

Integrating Uncertainty into Neural Network-based Speech Enhancement

Authors: Huajian Fang, Dennis Becker, Stefan Wermter, Timo Gerkmann

Abstract: Supervised masking approaches in the time-frequency domain aim to employ deep neural networks to estimate a multiplicative mask to extract clean speech. This leads to a single estimate for each input without any guarantees or measures of reliability. In this paper, we study the benefits of modeling uncertainty in clean speech estimation. Prediction uncertainty is typically categorized into aleator… ▽ More Supervised masking approaches in the time-frequency domain aim to employ deep neural networks to estimate a multiplicative mask to extract clean speech. This leads to a single estimate for each input without any guarantees or measures of reliability. In this paper, we study the benefits of modeling uncertainty in clean speech estimation. Prediction uncertainty is typically categorized into aleatoric uncertainty and epistemic uncertainty. The former refers to inherent randomness in data, while the latter describes uncertainty in the model parameters. In this work, we propose a framework to jointly model aleatoric and epistemic uncertainties in neural network-based speech enhancement. The proposed approach captures aleatoric uncertainty by estimating the statistical moments of the speech posterior distribution and explicitly incorporates the uncertainty estimate to further improve clean speech estimation. For epistemic uncertainty, we investigate two Bayesian deep learning approaches: Monte Carlo dropout and Deep ensembles to quantify the uncertainty of the neural network parameters. Our analyses show that the proposed framework promotes capturing practical and reliable uncertainty, while combining different sources of uncertainties yields more reliable predictive uncertainty estimates. Furthermore, we demonstrate the benefits of modeling uncertainty on speech enhancement performance by evaluating the framework on different datasets, exhibiting notable improvement over comparable models that fail to account for uncertainty. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: Accepted version

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1587-1600, 2023

arXiv:2305.07816 [pdf, other]

PALM: Open Fundus Photograph Dataset with Pathologic Myopia Recognition and Anatomical Structure Annotation

Authors: Huihui Fang, Fei Li, Junde Wu, Huazhu Fu, Xu Sun, José Ignacio Orlando, Hrvoje Bogunović, Xiulan Zhang, Yanwu Xu

Abstract: Pathologic myopia (PM) is a common blinding retinal degeneration suffered by highly myopic population. Early screening of this condition can reduce the damage caused by the associated fundus lesions and therefore prevent vision loss. Automated diagnostic tools based on artificial intelligence methods can benefit this process by aiding clinicians to identify disease signs or to screen mass populati… ▽ More Pathologic myopia (PM) is a common blinding retinal degeneration suffered by highly myopic population. Early screening of this condition can reduce the damage caused by the associated fundus lesions and therefore prevent vision loss. Automated diagnostic tools based on artificial intelligence methods can benefit this process by aiding clinicians to identify disease signs or to screen mass populations using color fundus photographs as inputs. This paper provides insights about PALM, our open fundus imaging dataset for pathological myopia recognition and anatomical structure annotation. Our databases comprises 1200 images with associated labels for the pathologic myopia category and manual annotations of the optic disc, the position of the fovea and delineations of lesions such as patchy retinal atrophy (including peripapillary atrophy) and retinal detachment. In addition, this paper elaborates on other details such as the labeling process used to construct the database, the quality and characteristics of the samples and provides other relevant usage notes. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: 10 pages, 6 figures

arXiv:2303.15042 [pdf, other]

Partially Adaptive Multichannel Joint Reduction of Ego-noise and Environmental Noise

Authors: Huajian Fang, Niklas Wittmer, Johannes Twiefel, Stefan Wermter, Timo Gerkmann

Abstract: Human-robot interaction relies on a noise-robust audio processing module capable of estimating target speech from audio recordings impacted by environmental noise, as well as self-induced noise, so-called ego-noise. While external ambient noise sources vary from environment to environment, ego-noise is mainly caused by the internal motors and joints of a robot. Ego-noise and environmental noise re… ▽ More Human-robot interaction relies on a noise-robust audio processing module capable of estimating target speech from audio recordings impacted by environmental noise, as well as self-induced noise, so-called ego-noise. While external ambient noise sources vary from environment to environment, ego-noise is mainly caused by the internal motors and joints of a robot. Ego-noise and environmental noise reduction are often decoupled, i.e., ego-noise reduction is performed without considering environmental noise. Recently, a variational autoencoder (VAE)-based speech model has been combined with a fully adaptive non-negative matrix factorization (NMF) noise model to recover clean speech under different environmental noise disturbances. However, its enhancement performance is limited in adverse acoustic scenarios involving, e.g. ego-noise. In this paper, we propose a multichannel partially adaptive scheme to jointly model ego-noise and environmental noise utilizing the VAE-NMF framework, where we take advantage of spatially and spectrally structured characteristics of ego-noise by pre-training the ego-noise model, while retaining the ability to adapt to unknown environmental noise. Experimental results show that our proposed approach outperforms the methods based on a completely fixed scheme and a fully adaptive scheme when ego-noise and environmental noise are present simultaneously. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: Accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

arXiv:2301.05168 [pdf, other]

doi 10.1109/TTE.2022.3223993

A Novel Modular, Reconfigurable Battery Energy Storage System: Design, Control, and Experimentation

Authors: Amir Farakhor, Di Wu, Yebin Wang, Huazhen Fang

Abstract: This paper presents a novel modular, reconfigurable battery energy storage system. The proposed design is characterized by a tight integration of reconfigurable power switches and DC/DC converters. This characteristic enables isolation of faulty cells from the system and allows fine power control for individual cells toward optimal system-level performance. An optimal power management approach is… ▽ More This paper presents a novel modular, reconfigurable battery energy storage system. The proposed design is characterized by a tight integration of reconfigurable power switches and DC/DC converters. This characteristic enables isolation of faulty cells from the system and allows fine power control for individual cells toward optimal system-level performance. An optimal power management approach is developed to extensively exploit the merits of the proposed design. Based on receding-horizon convex optimization, this approach aims to minimize the total power losses in charging/discharging while allocating the power in line with each cell's condition to achieve state-of-charge (SoC) and temperature balancing. By appropriate design, the approach manages to regulate the power of a cell across its full SoC range and guarantees the feasibility of the optimization problem. We perform extensive simulations and further develop a lab-scale prototype to validate the proposed system design and power management approach. △ Less

Submitted 12 January, 2023; originally announced January 2023.

Comments: This work is published in the IEEE Transactions on Transportation Electrification

arXiv:2212.04831 [pdf, other]

Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture Models

Authors: Huajian Fang, Timo Gerkmann

Abstract: Single-channel deep speech enhancement approaches often estimate a single multiplicative mask to extract clean speech without a measure of its accuracy. Instead, in this work, we propose to quantify the uncertainty associated with clean speech estimates in neural network-based speech enhancement. Predictive uncertainty is typically categorized into aleatoric uncertainty and epistemic uncertainty.… ▽ More Single-channel deep speech enhancement approaches often estimate a single multiplicative mask to extract clean speech without a measure of its accuracy. Instead, in this work, we propose to quantify the uncertainty associated with clean speech estimates in neural network-based speech enhancement. Predictive uncertainty is typically categorized into aleatoric uncertainty and epistemic uncertainty. The former accounts for the inherent uncertainty in data and the latter corresponds to the model uncertainty. Aiming for robust clean speech estimation and efficient predictive uncertainty quantification, we propose to integrate statistical complex Gaussian mixture models (CGMMs) into a deep speech enhancement framework. More specifically, we model the dependency between input and output stochastically by means of a conditional probability density and train a neural network to map the noisy input to the full posterior distribution of clean speech, modeled as a mixture of multiple complex Gaussian components. Experimental results on different datasets show that the proposed algorithm effectively captures predictive uncertainty and that combining powerful statistical models and deep learning also delivers a superior speech enhancement performance. △ Less

Submitted 15 May, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: ©2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2212.02339 [pdf, other]

DeAR: A Deep-learning-based Audio Re-recording Resilient Watermarking

Authors: Chang Liu, Jie Zhang, Han Fang, Zehua Ma, Weiming Zhang, Nenghai Yu

Abstract: Audio watermarking is widely used for leaking source tracing. The robustness of the watermark determines the traceability of the algorithm. With the development of digital technology, audio re-recording (AR) has become an efficient and covert means to steal secrets. AR process could drastically destroy the watermark signal while preserving the original information. This puts forward a new requirem… ▽ More Audio watermarking is widely used for leaking source tracing. The robustness of the watermark determines the traceability of the algorithm. With the development of digital technology, audio re-recording (AR) has become an efficient and covert means to steal secrets. AR process could drastically destroy the watermark signal while preserving the original information. This puts forward a new requirement for audio watermarking at this stage, that is, to be robust to AR distortions. Unfortunately, none of the existing algorithms can effectively resist AR attacks due to the complexity of the AR process. To address this limitation, this paper proposes DeAR, a deep-learning-based audio re-recording resistant watermarking. Inspired by DNN-based image watermarking, we pioneer a deep learning framework for audio carriers, based on which the watermark signal can be effectively embedded and extracted. Meanwhile, in order to resist the AR attack, we delicately analyze the distortions that occurred in the AR process and design the corresponding distortion layer to cooperate with the proposed watermarking framework. Extensive experiments show that the proposed algorithm can resist not only common electronic channel distortions but also AR distortions. Under the premise of high-quality embedding (SNR=25.86dB), in the case of a common re-recording distance (20cm), the algorithm can effectively achieve an average bit recovery accuracy of 98.55%. △ Less

Submitted 3 April, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

Comments: Accepted by AAAI2023

arXiv:2212.00601 [pdf, other]

Multi-rater Prism: Learning self-calibrated medical image segmentation from multiple raters

Authors: Junde Wu, Huihui Fang, Yehui Yang, Yuanpei Liu, **g Gao, Lixin Duan, Weihua Yang, Yanwu Xu

Abstract: In medical image segmentation, it is often necessary to collect opinions from multiple experts to make the final decision. This clinical routine helps to mitigate individual bias. But when data is multiply annotated, standard deep learning models are often not applicable. In this paper, we propose a novel neural network framework, called Multi-Rater Prism (MrPrism) to learn the medical image segme… ▽ More In medical image segmentation, it is often necessary to collect opinions from multiple experts to make the final decision. This clinical routine helps to mitigate individual bias. But when data is multiply annotated, standard deep learning models are often not applicable. In this paper, we propose a novel neural network framework, called Multi-Rater Prism (MrPrism) to learn the medical image segmentation from multiple labels. Inspired by the iterative half-quadratic optimization, the proposed MrPrism will combine the multi-rater confidences assignment task and calibrated segmentation task in a recurrent manner. In this recurrent process, MrPrism can learn inter-observer variability taking into account the image semantic properties, and finally converges to a self-calibrated segmentation result reflecting the inter-observer agreement. Specifically, we propose Converging Prism (ConP) and Diverging Prism (DivP) to process the two tasks iteratively. ConP learns calibrated segmentation based on the multi-rater confidence maps estimated by DivP. DivP generates multi-rater confidence maps based on the segmentation masks estimated by ConP. The experimental results show that by recurrently running ConP and DivP, the two tasks can achieve mutual improvement. The final converged segmentation result of MrPrism outperforms state-of-the-art (SOTA) strategies on a wide range of medical image segmentation tasks. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.05999 [pdf, ps, other]

BattX: An Equivalent Circuit Model for Lithium-Ion Batteries Over Broad Current Ranges

Authors: Nikhil Biju, Huazhen Fang

Abstract: Advanced battery management is to lithium-ion battery systems as the brain is to the human body. Its performance rests on the use of battery models that are both fast and accurate. However, mainstream equivalent circuit models and electrochemical models have yet to meet this need well, due to struggle with either predictive accuracy or computational complexity. This problem has acquired urgency as… ▽ More Advanced battery management is to lithium-ion battery systems as the brain is to the human body. Its performance rests on the use of battery models that are both fast and accurate. However, mainstream equivalent circuit models and electrochemical models have yet to meet this need well, due to struggle with either predictive accuracy or computational complexity. This problem has acquired urgency as some emerging battery applications running across broad current ranges, e.g., electric vertical take-off and landing aircraft, can hardly find usable models from the literature. Motivated to address the problem, we develop an innovative model in this study. Called BattX, the model is an equivalent circuit model but draws comparisons to a single particle model with electrolyte and thermal dynamics, thus combining their respective merits to be computationally efficient, accurate, and physically interpretable. The model design pivots on leveraging multiple circuits to approximate major electrochemical and physical processes in charging/discharging. Given the model, we develop a multipronged approach to design experiments and identify its parameters in groups from experimental data. Experimental validation proves that the BattX model is capable of accurate voltage prediction for charging/discharging across low to high C-rates. △ Less

Submitted 10 November, 2022; originally announced November 2022.

Comments: 24 pages, 13 figures, 2 tables, and appendix

arXiv:2210.12723 [pdf]

A Faithful Deep Sensitivity Estimation for Accelerated Magnetic Resonance Imaging

Authors: Zi Wang, Haoming Fang, Chen Qian, Boxuan Shi, Lijun Bao, Liuhong Zhu, Jianjun Zhou, Wen** Wei, Jianzhong Lin, Di Guo, Xiaobo Qu

Abstract: Magnetic resonance imaging (MRI) is an essential diagnostic tool that suffers from prolonged scan time. To alleviate this limitation, advanced fast MRI technology attracts extensive research interests. Recent deep learning has shown its great potential in improving image quality and reconstruction speed. Faithful coil sensitivity estimation is vital for MRI reconstruction. However, most deep learn… ▽ More Magnetic resonance imaging (MRI) is an essential diagnostic tool that suffers from prolonged scan time. To alleviate this limitation, advanced fast MRI technology attracts extensive research interests. Recent deep learning has shown its great potential in improving image quality and reconstruction speed. Faithful coil sensitivity estimation is vital for MRI reconstruction. However, most deep learning methods still rely on pre-estimated sensitivity maps and ignore their inaccuracy, resulting in the significant quality degradation of reconstructed images. In this work, we propose a Joint Deep Sensitivity estimation and Image reconstruction network, called JDSI. During the image artifacts removal, it gradually provides more faithful sensitivity maps with high-frequency information, leading to improved image reconstructions. To understand the behavior of the network, the mutual promotion of sensitivity estimation and image reconstruction is revealed through the visualization of network intermediate results. Results on in vivo datasets and radiologist reader study demonstrate that, for both calibration-based and calibrationless reconstruction, the proposed JDSI achieves the state-of-the-art performance visually and quantitatively, especially when the acceleration factor is high. Additionally, JDSI owns nice robustness to patients and autocalibration signals. △ Less

Submitted 24 December, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

Comments: 12 pages, 13 figures, 7 tables

arXiv:2209.11431 [pdf]

Learning to screen Glaucoma like the ophthalmologists

Authors: Junde Wu, Huihui Fang, Fei Li, Huazhu Fu, Yanwu Xu

Abstract: GAMMA Challenge is organized to encourage the AI models to screen the glaucoma from a combination of 2D fundus image and 3D optical coherence tomography volume, like the ophthalmologists. GAMMA Challenge is organized to encourage the AI models to screen the glaucoma from a combination of 2D fundus image and 3D optical coherence tomography volume, like the ophthalmologists. △ Less

Submitted 23 September, 2022; originally announced September 2022.

arXiv:2208.03016 [pdf, other]

Calibrate the inter-observer segmentation uncertainty via diagnosis-first principle

Authors: Junde Wu, Huihui Fang, Hoayi Xiong, Lixin Duan, Mingkui Tan, Weihua Yang, Huiying Liu, Yanwu Xu

Abstract: On the medical images, many of the tissues/lesions may be ambiguous. That is why the medical segmentation is typically annotated by a group of clinical experts to mitigate the personal bias. However, this clinical routine also brings new challenges to the application of machine learning algorithms. Without a definite ground-truth, it will be difficult to train and evaluate the deep learning models… ▽ More On the medical images, many of the tissues/lesions may be ambiguous. That is why the medical segmentation is typically annotated by a group of clinical experts to mitigate the personal bias. However, this clinical routine also brings new challenges to the application of machine learning algorithms. Without a definite ground-truth, it will be difficult to train and evaluate the deep learning models. When the annotations are collected from different graders, a common choice is majority vote. However such a strategy ignores the difference between the grader expertness. In this paper, we consider the task of predicting the segmentation with the calibrated inter-observer uncertainty. We note that in clinical practice, the medical image segmentation is usually used to assist the disease diagnosis. Inspired by this observation, we propose diagnosis-first principle, which is to take disease diagnosis as the criterion to calibrate the inter-observer segmentation uncertainty. Following this idea, a framework named Diagnosis First segmentation Framework (DiFF) is proposed to estimate diagnosis-first segmentation from the raw images.Specifically, DiFF will first learn to fuse the multi-rater segmentation labels to a single ground-truth which could maximize the disease diagnosis performance. We dubbed the fused ground-truth as Diagnosis First Ground-truth (DF-GT).Then, we further propose Take and Give Modelto segment DF-GT from the raw image. We verify the effectiveness of DiFF on three different medical segmentation tasks: OD/OC segmentation on fundus images, thyroid nodule segmentation on ultrasound images, and skin lesion segmentation on dermoscopic images. Experimental results show that the proposed DiFF is able to significantly facilitate the corresponding disease diagnosis, which outperforms previous state-of-the-art multi-rater learning methods. △ Less

Submitted 5 August, 2022; originally announced August 2022.

Comments: arXiv admin note: text overlap with arXiv:2202.06505

arXiv:2207.13872 [pdf, other]

doi 10.1109/ICRA48506.2021.9561682

Model Predictive Control of Nonlinear Latent Force Models: A Scenario-Based Approach

Authors: Thomas Woodruff, Iman Askari, Guanghui Wang, Huazhen Fang

Abstract: Control of nonlinear uncertain systems is a common challenge in the robotics field. Nonlinear latent force models, which incorporate latent uncertainty characterized as Gaussian processes, carry the promise of representing such systems effectively, and we focus on the control design for them in this work. To enable the design, we adopt the state-space representation of a Gaussian process to recast… ▽ More Control of nonlinear uncertain systems is a common challenge in the robotics field. Nonlinear latent force models, which incorporate latent uncertainty characterized as Gaussian processes, carry the promise of representing such systems effectively, and we focus on the control design for them in this work. To enable the design, we adopt the state-space representation of a Gaussian process to recast the nonlinear latent force model and thus build the ability to predict the future state and uncertainty concurrently. Using this feature, a stochastic model predictive control problem is formulated. To derive a computational algorithm for the problem, we use the scenario-based approach to formulate a deterministic approximation of the stochastic optimization. We evaluate the resultant scenario-based model predictive control approach through a simulation study based on motion planning of an autonomous vehicle, which shows much effectiveness. The proposed approach can find prospective use in various other robotics applications. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Journal ref: 2021 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2207.05284 [pdf, other]

High-Order Leader-Follower Tracking Control under Limited Information Availability

Authors: Chuan Yan, Tao Yang, Huazhen Fang

Abstract: Limited information availability represents a fundamental challenge for control of multi-agent systems, since an agent often lacks sensing capabilities to measure certain states of its own and can exchange data only with its neighbors. The challenge becomes even greater when agents are governed by high-order dynamics. The present work is motivated to conduct control design for linear and nonlinear… ▽ More Limited information availability represents a fundamental challenge for control of multi-agent systems, since an agent often lacks sensing capabilities to measure certain states of its own and can exchange data only with its neighbors. The challenge becomes even greater when agents are governed by high-order dynamics. The present work is motivated to conduct control design for linear and nonlinear high-order leader-follower multi-agent systems in a context where only the first state of an agent is measured. To address this open challenge, we develop novel distributed observers to enable followers to reconstruct unmeasured or unknown quantities about themselves and the leader and on such a basis, build observer-based tracking control approaches. We analyze the convergence properties of the proposed approaches and validate their performance through simulation. △ Less

Submitted 11 July, 2022; originally announced July 2022.

arXiv:2206.05092 [pdf, other]

Learning self-calibrated optic disc and cup segmentation from multi-rater annotations

Authors: Junde Wu, Huihui Fang, Fangxin Shang, Zhaowei Wang, Dalu Yang, Wenshuo Zhou, Yehui Yang, Yanwu Xu

Abstract: The segmentation of optic disc(OD) and optic cup(OC) from fundus images is an important fundamental task for glaucoma diagnosis. In the clinical practice, it is often necessary to collect opinions from multiple experts to obtain the final OD/OC annotation. This clinical routine helps to mitigate the individual bias. But when data is multiply annotated, standard deep learning models will be inappli… ▽ More The segmentation of optic disc(OD) and optic cup(OC) from fundus images is an important fundamental task for glaucoma diagnosis. In the clinical practice, it is often necessary to collect opinions from multiple experts to obtain the final OD/OC annotation. This clinical routine helps to mitigate the individual bias. But when data is multiply annotated, standard deep learning models will be inapplicable. In this paper, we propose a novel neural network framework to learn OD/OC segmentation from multi-rater annotations. The segmentation results are self-calibrated through the iterative optimization of multi-rater expertness estimation and calibrated OD/OC segmentation. In this way, the proposed method can realize a mutual improvement of both tasks and finally obtain a refined segmentation result. Specifically, we propose Diverging Model(DivM) and Converging Model(ConM) to process the two tasks respectively. ConM segments the raw image based on the multi-rater expertness map provided by DivM. DivM generates multi-rater expertness map from the segmentation mask provided by ConM. The experiment results show that by recurrently running ConM and DivM, the results can be self-calibrated so as to outperform a range of state-of-the-art(SOTA) multi-rater segmentation methods. △ Less

Submitted 14 June, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

arXiv:2205.04521 [pdf, other]

doi 10.1016/j.automatica.2022.110469

Implicit Particle Filtering via a Bank of Nonlinear Kalman Filters

Authors: Iman Askari, Mulugeta A. Haile, Xuemin Tu, Huazhen Fang

Abstract: The implicit particle filter seeks to mitigate particle degeneracy by identifying particles in the target distribution's high-probability regions. This study is motivated by the need to enhance computational tractability in implementing this approach. We investigate the connection of the particle update step in the implicit particle filter with that of the Kalman filter and then formulate a novel… ▽ More The implicit particle filter seeks to mitigate particle degeneracy by identifying particles in the target distribution's high-probability regions. This study is motivated by the need to enhance computational tractability in implementing this approach. We investigate the connection of the particle update step in the implicit particle filter with that of the Kalman filter and then formulate a novel realization of the implicit particle filter based on a bank of nonlinear Kalman filters. This realization is more amenable and efficient computationally. △ Less

Submitted 6 June, 2023; v1 submitted 9 May, 2022; originally announced May 2022.

Comments: To appear in Automatica

Journal ref: Automatica, 145, (2022), 110469

arXiv:2205.04506 [pdf, other]

doi 10.23919/ACC53348.2022.9867324.

Sampling-Based Nonlinear MPC of Neural Network Dynamics with Application to Autonomous Vehicle Motion Planning

Authors: Iman Askari, Babak Badnava, Thomas Woodruff, Shen Zeng, Huazhen Fang

Abstract: Control of machine learning models has emerged as an important paradigm for a broad range of robotics applications. In this paper, we present a sampling-based nonlinear model predictive control (NMPC) approach for control of neural network dynamics. We show its design in two parts: 1) formulating conventional optimization-based NMPC as a Bayesian state estimation problem, and 2) using particle fil… ▽ More Control of machine learning models has emerged as an important paradigm for a broad range of robotics applications. In this paper, we present a sampling-based nonlinear model predictive control (NMPC) approach for control of neural network dynamics. We show its design in two parts: 1) formulating conventional optimization-based NMPC as a Bayesian state estimation problem, and 2) using particle filtering/smoothing to achieve the estimation. Through a principled sampling-based implementation, this approach can potentially make effective searches in the control action space for optimal control and also facilitate computation toward overcoming the challenges caused by neural network dynamics. We apply the proposed NMPC approach to motion planning for autonomous vehicles. The specific problem considers nonlinear unknown vehicle dynamics modeled as neural networks as well as dynamic on-road driving scenarios. The approach shows significant effectiveness in successful motion planning in case studies. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: To appear in 2022 American Control Conference (ACC)

Journal ref: 2022 American Control Conference (ACC), 2022, pp. 2084-2090

arXiv:2205.04497 [pdf, ps, other]

doi 10.23919/ACC50511.2021.9482774

Nonlinear Model Predictive Control Based on Constraint-Aware Particle Filtering/Smoothing

Authors: Iman Askari, Shen Zeng, Huazhen Fang

Abstract: Nonlinear model predictive control (NMPC) has gained widespread use in many applications. Its formulation traditionally involves repetitively solving a nonlinear constrained optimization problem online. In this paper, we investigate NMPC through the lens of Bayesian estimation and highlight that the Monte Carlo sampling method can offer a favorable way to implement NMPC. We develop a constraint-aw… ▽ More Nonlinear model predictive control (NMPC) has gained widespread use in many applications. Its formulation traditionally involves repetitively solving a nonlinear constrained optimization problem online. In this paper, we investigate NMPC through the lens of Bayesian estimation and highlight that the Monte Carlo sampling method can offer a favorable way to implement NMPC. We develop a constraint-aware particle filtering/smoothing method and exploit it to implement NMPC. The new sampling-based NMPC algorithm can be executed easily and efficiently even for complex nonlinear systems, while potentially mitigating the issues of computational complexity and local minima faced by numerical optimization in conventional studies. The effectiveness of the proposed algorithm is evaluated through a simulation study. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: Published in 2021 American Control Conference (ACC)

arXiv:2204.07850 [pdf, other]

Multi-organ Segmentation Network with Adversarial Performance Validator

Authors: Haoyu Fang, Yi Fang, Xiaofeng Yang

Abstract: CT organ segmentation on computed tomography (CT) images becomes a significant brick for modern medical image analysis, supporting clinic workflows in multiple domains. Previous segmentation methods include 2D convolution neural networks (CNN) based approaches, fed by CT image slices that lack the structural knowledge in axial view, and 3D CNN-based methods with the expensive computation cost in m… ▽ More CT organ segmentation on computed tomography (CT) images becomes a significant brick for modern medical image analysis, supporting clinic workflows in multiple domains. Previous segmentation methods include 2D convolution neural networks (CNN) based approaches, fed by CT image slices that lack the structural knowledge in axial view, and 3D CNN-based methods with the expensive computation cost in multi-organ segmentation applications. This paper introduces an adversarial performance validation network into a 2D-to-3D segmentation framework. The classifier and performance validator competition contribute to accurate segmentation results via back-propagation. The proposed network organically converts the 2D-coarse result to 3D high-quality segmentation masks in a coarse-to-fine manner, allowing joint optimization to improve segmentation accuracy. Besides, the structural information of one specific organ is depicted by a statistics-meaningful prior bounding box, which is transformed into a global feature leveraging the learning process in 3D fine segmentation. The experiments on the NIH pancreas segmentation dataset demonstrate the proposed network achieves state-of-the-art accuracy on small organ segmentation and outperforms the previous best. High accuracy is also reported on multi-organ segmentation in a dataset collected by ourselves. △ Less

Submitted 16 April, 2022; originally announced April 2022.

arXiv:2203.02288 [pdf, other]

doi 10.1109/ICASSP43922.2022.9747642

Integrating Statistical Uncertainty into Neural Network-Based Speech Enhancement

Authors: Huajian Fang, Tal Peer, Stefan Wermter, Timo Gerkmann

Abstract: Speech enhancement in the time-frequency domain is often performed by estimating a multiplicative mask to extract clean speech. However, most neural network-based methods perform point estimation, i.e., their output consists of a single mask. In this paper, we study the benefits of modeling uncertainty in neural network-based speech enhancement. For this, our neural network is trained to map a noi… ▽ More Speech enhancement in the time-frequency domain is often performed by estimating a multiplicative mask to extract clean speech. However, most neural network-based methods perform point estimation, i.e., their output consists of a single mask. In this paper, we study the benefits of modeling uncertainty in neural network-based speech enhancement. For this, our neural network is trained to map a noisy spectrogram to the Wiener filter and its associated variance, which quantifies uncertainty, based on the maximum a posteriori (MAP) inference of spectral coefficients. By estimating the distribution instead of the point estimate, one can model the uncertainty associated with each estimate. We further propose to use the estimated Wiener filter and its uncertainty to build an approximate MAP (A-MAP) estimator of spectral magnitudes, which in turn is combined with the MAP inference of spectral coefficients to form a hybrid loss function to jointly reinforce the estimation. Experimental results on different datasets show that the proposed method can not only capture the uncertainty associated with the estimated filters, but also yield a higher enhancement performance over comparable models that do not take uncertainty into account. △ Less

Submitted 4 March, 2022; originally announced March 2022.

Comments: ©2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:2202.08994 [pdf, other]

REFUGE2 Challenge: A Treasure Trove for Multi-Dimension Analysis and Evaluation in Glaucoma Screening

Authors: Huihui Fang, Fei Li, Junde Wu, Huazhu Fu, Xu Sun, Jaemin Son, Shuang Yu, Menglu Zhang, Chenglang Yuan, Cheng Bian, Baiying Lei, Benjian Zhao, Xinxing Xu, Shaohua Li, Francisco Fumero, José Sigut, Haidar Almubarak, Yakoub Bazi, Yuanhao Guo, Yating Zhou, Ujjwal Baid, Shubham Innani, Tianjiao Guo, Jie Yang, José Ignacio Orlando , et al. (3 additional authors not shown)

Abstract: With the rapid development of artificial intelligence (AI) in medical image processing, deep learning in color fundus photography (CFP) analysis is also evolving. Although there are some open-source, labeled datasets of CFPs in the ophthalmology community, large-scale datasets for screening only have labels of disease categories, and datasets with annotations of fundus structures are usually small… ▽ More With the rapid development of artificial intelligence (AI) in medical image processing, deep learning in color fundus photography (CFP) analysis is also evolving. Although there are some open-source, labeled datasets of CFPs in the ophthalmology community, large-scale datasets for screening only have labels of disease categories, and datasets with annotations of fundus structures are usually small in size. In addition, labeling standards are not uniform across datasets, and there is no clear information on the acquisition device. Here we release a multi-annotation, multi-quality, and multi-device color fundus image dataset for glaucoma analysis on an original challenge -- Retinal Fundus Glaucoma Challenge 2nd Edition (REFUGE2). The REFUGE2 dataset contains 2000 color fundus images with annotations of glaucoma classification, optic disc/cup segmentation, as well as fovea localization. Meanwhile, the REFUGE2 challenge sets three sub-tasks of automatic glaucoma diagnosis and fundus structure analysis and provides an online evaluation framework. Based on the characteristics of multi-device and multi-quality data, some methods with strong generalizations are provided in the challenge to make the predictions more robust. This shows that REFUGE2 brings attention to the characteristics of real-world multi-domain data, bridging the gap between scientific research and clinical application. △ Less

Submitted 29 December, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

Comments: 29 pages, 21 figures

arXiv:2202.07983 [pdf, other]

doi 10.1109/TMI.2022.3172773

ADAM Challenge: Detecting Age-related Macular Degeneration from Fundus Images

Authors: Huihui Fang, Fei Li, Huazhu Fu, Xu Sun, Xingxing Cao, Fengbin Lin, Jaemin Son, Sunho Kim, Gwenole Quellec, Sarah Matta, Sharath M Shankaranarayana, Yi-Ting Chen, Chuen-heng Wang, Nisarg A. Shah, Chia-Yen Lee, Chih-Chung Hsu, Hai Xie, Baiying Lei, Ujjwal Baid, Shubham Innani, Kang Dang, Wenxiu Shi, Ravi Kamble, Nitin Singhal, Ching-Wei Wang , et al. (6 additional authors not shown)

Abstract: Age-related macular degeneration (AMD) is the leading cause of visual impairment among elderly in the world. Early detection of AMD is of great importance, as the vision loss caused by this disease is irreversible and permanent. Color fundus photography is the most cost-effective imaging modality to screen for retinal disorders. Cutting edge deep learning based algorithms have been recently develo… ▽ More Age-related macular degeneration (AMD) is the leading cause of visual impairment among elderly in the world. Early detection of AMD is of great importance, as the vision loss caused by this disease is irreversible and permanent. Color fundus photography is the most cost-effective imaging modality to screen for retinal disorders. Cutting edge deep learning based algorithms have been recently developed for automatically detecting AMD from fundus images. However, there are still lack of a comprehensive annotated dataset and standard evaluation benchmarks. To deal with this issue, we set up the Automatic Detection challenge on Age-related Macular degeneration (ADAM), which was held as a satellite event of the ISBI 2020 conference. The ADAM challenge consisted of four tasks which cover the main aspects of detecting and characterizing AMD from fundus images, including detection of AMD, detection and segmentation of optic disc, localization of fovea, and detection and segmentation of lesions. As part of the challenge, we have released a comprehensive dataset of 1200 fundus images with AMD diagnostic labels, pixel-wise segmentation masks for both optic disc and AMD-related lesions (drusen, exudates, hemorrhages and scars, among others), as well as the coordinates corresponding to the location of the macular fovea. A uniform evaluation framework has been built to make a fair comparison of different models using this dataset. During the challenge, 610 results were submitted for online evaluation, with 11 teams finally participating in the onsite challenge. This paper introduces the challenge, the dataset and the evaluation methods, as well as summarizes the participating methods and analyzes their results for each task. In particular, we observed that the ensembling strategy and the incorporation of clinical domain knowledge were the key to improve the performance of the deep learning models. △ Less

Submitted 6 May, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: 31 pages, 17 figures

arXiv:2202.06505 [pdf, other]

Opinions Vary? Diagnosis First!

Authors: Junde Wu, Huihui Fang, Dalu Yang, Zhaowei Wang, Wenshuo Zhou, Fangxin Shang, Yehui Yang, Yanwu Xu

Abstract: With the advancement of deep learning techniques, an increasing number of methods have been proposed for optic disc and cup (OD/OC) segmentation from the fundus images. Clinically, OD/OC segmentation is often annotated by multiple clinical experts to mitigate the personal bias. However, it is hard to train the automated deep learning models on multiple labels. A common practice to tackle the issue… ▽ More With the advancement of deep learning techniques, an increasing number of methods have been proposed for optic disc and cup (OD/OC) segmentation from the fundus images. Clinically, OD/OC segmentation is often annotated by multiple clinical experts to mitigate the personal bias. However, it is hard to train the automated deep learning models on multiple labels. A common practice to tackle the issue is majority vote, e.g., taking the average of multiple labels. However such a strategy ignores the different expertness of medical experts. Motivated by the observation that OD/OC segmentation is often used for the glaucoma diagnosis clinically, in this paper, we propose a novel strategy to fuse the multi-rater OD/OC segmentation labels via the glaucoma diagnosis performance. Specifically, we assess the expertness of each rater through an attentive glaucoma diagnosis network. For each rater, its contribution for the diagnosis will be reflected as an expertness map. To ensure the expertness maps are general for different glaucoma diagnosis models, we further propose an Expertness Generator (ExpG) to eliminate the high-frequency components in the optimization process. Based on the obtained expertness maps, the multi-rater labels can be fused as a single ground-truth which we dubbed as Diagnosis First Ground-truth (DiagFirstGT). Experimental results show that by using DiagFirstGT as ground-truth, OD/OC segmentation networks will predict the masks with superior glaucoma diagnosis performance. △ Less

Submitted 18 September, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

Comments: MICCAI 2022

arXiv:2201.04809 [pdf, other]

Conditional Variational Autoencoder with Balanced Pre-training for Generative Adversarial Networks

Authors: Yuchong Yao, Xiaohui Wangr, Yuanbang Ma, Han Fang, Jiaying Wei, Liyuan Chen, Ali Anaissi, Ali Braytee

Abstract: Class imbalance occurs in many real-world applications, including image classification, where the number of images in each class differs significantly. With imbalanced data, the generative adversarial networks (GANs) leans to majority class samples. The two recent methods, Balancing GAN (BAGAN) and improved BAGAN (BAGAN-GP), are proposed as an augmentation tool to handle this problem and restore t… ▽ More Class imbalance occurs in many real-world applications, including image classification, where the number of images in each class differs significantly. With imbalanced data, the generative adversarial networks (GANs) leans to majority class samples. The two recent methods, Balancing GAN (BAGAN) and improved BAGAN (BAGAN-GP), are proposed as an augmentation tool to handle this problem and restore the balance to the data. The former pre-trains the autoencoder weights in an unsupervised manner. However, it is unstable when the images from different categories have similar features. The latter is improved based on BAGAN by facilitating supervised autoencoder training, but the pre-training is biased towards the majority classes. In this work, we propose a novel Conditional Variational Autoencoder with Balanced Pre-training for Generative Adversarial Networks (CAPGAN) as an augmentation tool to generate realistic synthetic images. In particular, we utilize a conditional convolutional variational autoencoder with supervised and balanced pre-training for the GAN initialization and training with gradient penalty. Our proposed method presents a superior performance of other state-of-the-art methods on the highly imbalanced version of MNIST, Fashion-MNIST, CIFAR-10, and two medical imaging datasets. Our method can synthesize high-quality minority samples in terms of Fréchet inception distance, structural similarity index measure and perceptual quality. △ Less

Submitted 13 January, 2022; originally announced January 2022.

arXiv:2112.12979 [pdf, other]

Integrating Physics-Based Modeling with Machine Learning for Lithium-Ion Batteries

Authors: Hao Tu, Scott Moura, Yebin Wang, Huazhen Fang

Abstract: Mathematical modeling of lithium-ion batteries (LiBs) is a primary challenge in advanced battery management. This paper proposes two new frameworks to integrate physics-based models with machine learning to achieve high-precision modeling for LiBs. The frameworks are characterized by informing the machine learning model of the state information of the physical model, enabling a deep integration be… ▽ More Mathematical modeling of lithium-ion batteries (LiBs) is a primary challenge in advanced battery management. This paper proposes two new frameworks to integrate physics-based models with machine learning to achieve high-precision modeling for LiBs. The frameworks are characterized by informing the machine learning model of the state information of the physical model, enabling a deep integration between physics and machine learning. Based on the frameworks, a series of hybrid models are constructed, through combining an electrochemical model and an equivalent circuit model, respectively, with a feedforward neural network. The hybrid models are relatively parsimonious in structure and can provide considerable voltage predictive accuracy under a broad range of C-rates, as shown by extensive simulations and experiments. The study further expands to conduct aging-aware hybrid modeling, leading to the design of a hybrid model conscious of the state-of-health to make prediction. The experiments show that the model has high voltage predictive accuracy throughout a LiB's cycle life. △ Less

Submitted 7 November, 2022; v1 submitted 24 December, 2021; originally announced December 2021.

Comments: 15 pages, 10 figures, 2 tables. arXiv admin note: text overlap with arXiv:2103.11580

arXiv:2112.02536 [pdf, other]

Deep Open Set Identification for RF Devices

Authors: Qing Wang, Qing Liu, Zihao Zhang, Haoyu Fang, Xi Zheng

Abstract: Artificial intelligence (AI) based device identification improves the security of the internet of things (IoT), and accelerates the authentication process. However, existing approaches rely on the assumption that we can learn all the classes from the training set, namely, closed-set classification. To overcome the closed-set limitation, we propose a novel open set RF device identification method t… ▽ More Artificial intelligence (AI) based device identification improves the security of the internet of things (IoT), and accelerates the authentication process. However, existing approaches rely on the assumption that we can learn all the classes from the training set, namely, closed-set classification. To overcome the closed-set limitation, we propose a novel open set RF device identification method to classify unseen classes in the testing set. First, we design a specific convolution neural network (CNN) with a short-time Fourier transforming (STFT) pre-processing module, which efficiently recognizes the differences of feature maps learned from various RF device signals. Then to generate a representation of known class bounds, we estimate the probability map of the open-set via the OpenMax function. We conduct experiments on sampled data and voice signal sets, considering various pre-processing schemes, network structures, distance metrics, tail sizes, and openness degrees. The simulation results show the superiority of the proposed method in terms of robustness and accuracy. △ Less

Submitted 5 December, 2021; originally announced December 2021.

Comments: 10 pages, 9 figures

arXiv:2112.01496 [pdf, other]

doi 10.1088/1361-6579/ac5b4a

Analysis of an adaptive lead weighted ResNet for multiclass classification of 12-lead ECGs

Authors: Zhibin Zhao, Darcy Murphy, Hugh Gifford, Stefan Williams, Annie Darlington, Samuel D. Relton, Hui Fang, David C. Wong

Abstract: Background: Twelve lead ECGs are a core diagnostic tool for cardiovascular diseases. Here, we describe and analyse an ensemble deep neural network architecture to classify 24 cardiac abnormalities from 12-lead ECGs. Method: We proposed a squeeze and excite ResNet to automatically learn deep features from 12-lead ECGs, in order to identify 24 cardiac conditions. The deep features were augmented w… ▽ More Background: Twelve lead ECGs are a core diagnostic tool for cardiovascular diseases. Here, we describe and analyse an ensemble deep neural network architecture to classify 24 cardiac abnormalities from 12-lead ECGs. Method: We proposed a squeeze and excite ResNet to automatically learn deep features from 12-lead ECGs, in order to identify 24 cardiac conditions. The deep features were augmented with age and gender features in the final fully connected layers. Output thresholds for each class were set using a constrained grid search. To determine why the model made incorrect predictions, two expert clinicians independently interpreted a random set of 100 misclassified ECGs concerning Left Axis Deviation. Results: Using the bespoke weighted accuracy metric, we achieved a 5-fold cross validation score of 0.684, and sensitivity and specificity of 0.758 and 0.969, respectively. We scored 0.520 on the full test data, and ranked 2nd out of 41 in the official challenge rankings. On a random set of misclassified ECGs, agreement between two clinicians and training labels was poor (clinician 1: kappa = -0.057, clinician 2: kappa = -0.159). In contrast, agreement between the clinicians was very high (kappa = 0.92). Discussion: The proposed prediction model performed well on the validation and hidden test data in comparison to models trained on the same data. We also discovered considerable inconsistency in training labels, which is likely to hinder development of more accurate models. △ Less

Submitted 1 December, 2021; originally announced December 2021.

Comments: 13 pages, 4 Figure, 4 Tables. To be submitted to Physiological Measurement (special issue for Physionet Challenge)

MSC Class: 68T07 ACM Class: J.3; I.2

arXiv:2111.12925 [pdf, other]

ContourletNet: A Generalized Rain Removal Architecture Using Multi-Direction Hierarchical Representation

Authors: Wei-Ting Chen, Cheng-Che Tsai, Hao-Yu Fang, I-Hsiang Chen, Jian-Jiun Ding, Sy-Yen Kuo

Abstract: Images acquired from rainy scenes usually suffer from bad visibility which may damage the performance of computer vision applications. The rainy scenarios can be categorized into two classes: moderate rain and heavy rain scenes. Moderate rain scene mainly consists of rain streaks while heavy rain scene contains both rain streaks and the veiling effect (similar to haze). Although existing methods h… ▽ More Images acquired from rainy scenes usually suffer from bad visibility which may damage the performance of computer vision applications. The rainy scenarios can be categorized into two classes: moderate rain and heavy rain scenes. Moderate rain scene mainly consists of rain streaks while heavy rain scene contains both rain streaks and the veiling effect (similar to haze). Although existing methods have achieved excellent performance on these two cases individually, it still lacks a general architecture to address both heavy rain and moderate rain scenarios effectively. In this paper, we construct a hierarchical multi-direction representation network by using the contourlet transform (CT) to address both moderate rain and heavy rain scenarios. The CT divides the image into the multi-direction subbands (MS) and the semantic subband (SS). First, the rain streak information is retrieved to the MS based on the multi-orientation property of the CT. Second, a hierarchical architecture is proposed to reconstruct the background information including damaged semantic information and the veiling effect in the SS. Last, the multi-level subband discriminator with the feedback error map is proposed. By this module, all subbands can be well optimized. This is the first architecture that can address both of the two scenarios effectively. The code is available in https://github.com/cctakaet/ContourletNet-BMVC2021. △ Less

Submitted 25 November, 2021; originally announced November 2021.

Comments: This paper is accepted by BMVC 2021

arXiv:2110.09814 [pdf, other]

doi 10.1109/ICASSP43922.2022.9747044

Speech Pattern based Black-box Model Watermarking for Automatic Speech Recognition

Authors: Haozhe Chen, Weiming Zhang, Kunlin Liu, Kejiang Chen, Han Fang, Nenghai Yu

Abstract: As an effective method for intellectual property (IP) protection, model watermarking technology has been applied on a wide variety of deep neural networks (DNN), including speech classification models. However, how to design a black-box watermarking scheme for automatic speech recognition (ASR) models is still an unsolved problem, which is a significant demand for protecting remote ASR Application… ▽ More As an effective method for intellectual property (IP) protection, model watermarking technology has been applied on a wide variety of deep neural networks (DNN), including speech classification models. However, how to design a black-box watermarking scheme for automatic speech recognition (ASR) models is still an unsolved problem, which is a significant demand for protecting remote ASR Application Programming Interface (API) deployed in cloud servers. Due to conditional independence assumption and label-detection-based evasion attack risk of ASR models, the black-box model watermarking scheme for speech classification models cannot apply to ASR models. In this paper, we propose the first black-box model watermarking framework for protecting the IP of ASR models. Specifically, we synthesize trigger audios by spreading the speech clips of model owners over the entire input audios and labeling the trigger audios with the stego texts, which hides the authorship information with linguistic steganography. Experiments on the state-of-the-art open-source ASR system DeepSpeech demonstrate the feasibility of the proposed watermarking scheme, which is robust against five kinds of attacks and has little impact on accuracy. △ Less

Submitted 2 May, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

Comments: 5 pages, 2 figures. Acceptted by 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:2105.14704 [pdf, other]

Parkinsonian Chinese Speech Analysis towards Automatic Classification of Parkinson's Disease

Authors: Hao Fang, Chen Gong, Chen Zhang, Yanan Sui, Luming Li

Abstract: Speech disorders often occur at the early stage of Parkinson's disease (PD). The speech impairments could be indicators of the disorder for early diagnosis, while motor symptoms are not obvious. In this study, we constructed a new speech corpus of Mandarin Chinese and addressed classification of patients with PD. We implemented classical machine learning methods with ranking algorithms for feature… ▽ More Speech disorders often occur at the early stage of Parkinson's disease (PD). The speech impairments could be indicators of the disorder for early diagnosis, while motor symptoms are not obvious. In this study, we constructed a new speech corpus of Mandarin Chinese and addressed classification of patients with PD. We implemented classical machine learning methods with ranking algorithms for feature selection, convolutional and recurrent deep networks, and an end to end system. Our classification accuracy significantly surpassed state-of-the-art studies. The result suggests that free talk has stronger classification power than standard speech tasks, which could help the design of future speech tasks for efficient early diagnosis of the disease. Based on existing classification methods and our natural speech study, the automatic detection of PD from daily conversation could be accessible to the majority of the clinical population. △ Less

Submitted 31 May, 2021; originally announced May 2021.

Comments: 12 pages, 5 figures, proceedings of the Machine Learning for Health NeurIPS Workshop, PMLR 136:114-125, 2020

arXiv:2104.10553 [pdf]

Rethinking Annotation Granularity for Overcoming Shortcuts in Deep Learning-based Radiograph Diagnosis: A Multicenter Study

Authors: Luyang Luo, Hao Chen, Yongjie Xiao, Yanning Zhou, Xi Wang, Varut Vardhanabhuti, Mingxiang Wu, Chu Han, Zaiyi Liu, Xin Hao Benjamin Fang, Efstratios Tsougenis, Huang**g Lin, Pheng-Ann Heng

Abstract: Two DL models were developed using radiograph-level annotations (yes or no disease) and fine-grained lesion-level annotations (lesion bounding boxes), respectively named CheXNet and CheXDet. The models' internal classification performance and lesion localization performance were compared on a testing set (n=2,922), external classification performance was compared on NIH-Google (n=4,376) and PadChe… ▽ More Two DL models were developed using radiograph-level annotations (yes or no disease) and fine-grained lesion-level annotations (lesion bounding boxes), respectively named CheXNet and CheXDet. The models' internal classification performance and lesion localization performance were compared on a testing set (n=2,922), external classification performance was compared on NIH-Google (n=4,376) and PadChest (n=24,536) datasets, and external lesion localization performance was compared on NIH-ChestX-ray14 dataset (n=880). The models were also compared to radiologists on a subset of the internal testing set (n=496). Given sufficient training data, both models performed comparably to radiologists. CheXDet achieved significant improvement for external classification, such as in classifying fracture on NIH-Google (CheXDet area under the ROC curve [AUC]: 0.67, CheXNet AUC: 0.51; p<.001) and PadChest (CheXDet AUC: 0.78, CheXNet AUC: 0.55; p<.001). CheXDet achieved higher lesion detection performance than CheXNet for most abnormalities on all datasets, such as in detecting pneumothorax on the internal set (CheXDet jacknife alternative free-response ROC-figure of merit [JAFROC-FOM]: 0.87, CheXNet JAFROC-FOM: 0.13; p<.001) and NIH-ChestX-ray14 (CheXDet JAFROC-FOM: 0.55, CheXNet JAFROC-FOM: 0.04; p<.001). To summarize, fine-grained annotations overcame shortcut learning and enabled DL models to identify correct lesion patterns, improving the models' generalizability. △ Less

Submitted 8 November, 2022; v1 submitted 21 April, 2021; originally announced April 2021.

Comments: Radiology: Artificial Intelligence

arXiv:2104.04291 [pdf, other]

Brain Surface Reconstruction from MRI Images Based on Segmentation Networks Applying Signed Distance Maps

Authors: Heng Fang, Xi Yang, Taichi Kin, Takeo Igarashi

Abstract: Whole-brain surface extraction is an essential topic in medical imaging systems as it provides neurosurgeons with a broader view of surgical planning and abnormality detection. To solve the problem confronted in current deep learning skull strip** methods lacking prior shape information, we propose a new network architecture that incorporates knowledge of signed distance fields and introduce an… ▽ More Whole-brain surface extraction is an essential topic in medical imaging systems as it provides neurosurgeons with a broader view of surgical planning and abnormality detection. To solve the problem confronted in current deep learning skull strip** methods lacking prior shape information, we propose a new network architecture that incorporates knowledge of signed distance fields and introduce an additional Laplacian loss to ensure that the prediction results retain shape information. We validated our newly proposed method by conducting experiments on our brain magnetic resonance imaging dataset (111 patients). The evaluation results demonstrate that our approach achieves comparable dice scores and also reduces the Hausdorff distance and average symmetric surface distance, thus producing more stable and smooth brain isosurfaces. △ Less

Submitted 9 April, 2021; originally announced April 2021.

Comments: Accepted by IEEE ISBI 2021 (International Symposium on Biomedical Imaging)

arXiv:2103.11580 [pdf, other]

Integrating Electrochemical Modeling with Machine Learning for Lithium-Ion Batteries

Authors: Hao Tu, Scott Moura, Huazhen Fang

Abstract: Mathematical modeling of lithium-ion batteries (LiBs) is a central challenge in advanced battery management. This paper presents a new approach to integrate a physics-based model with machine learning to achieve high-precision modeling for LiBs. This approach uniquely proposes to inform the machine learning model of the dynamic state of the physical model, enabling a deep integration between physi… ▽ More Mathematical modeling of lithium-ion batteries (LiBs) is a central challenge in advanced battery management. This paper presents a new approach to integrate a physics-based model with machine learning to achieve high-precision modeling for LiBs. This approach uniquely proposes to inform the machine learning model of the dynamic state of the physical model, enabling a deep integration between physics and machine learning. We propose two hybrid physics-machine learning models based on the approach, which blend a single particle model with thermal dynamics (SPMT) with a feedforward neural network (FNN) to perform physics-informed learning of a LiB's dynamic behavior. The proposed models are relatively parsimonious in structure and can provide considerable predictive accuracy even at high C-rates, as shown by extensive simulations. △ Less

Submitted 23 July, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

Comments: 7 pages, 8 figures, 4 tables, 2021 American Control Conference(ACC)

arXiv:2102.09683 [pdf, other]

Community Structure Recovery and Interaction Probability Estimation for Gossip Opinion Dynamics

Authors: Yu Xing, Xingkang He, Haitao Fang, Karl H. Johansson

Abstract: We study how to jointly recover the community structure and estimate the interaction probabilities of gossip opinion dynamics. In this process, agents randomly interact pairwise, and there are stubborn agents never changing their states. Such a model illustrates how disagreement and opinion fluctuation arise in a social network. It is assumed that each agent is assigned with one of two community l… ▽ More We study how to jointly recover the community structure and estimate the interaction probabilities of gossip opinion dynamics. In this process, agents randomly interact pairwise, and there are stubborn agents never changing their states. Such a model illustrates how disagreement and opinion fluctuation arise in a social network. It is assumed that each agent is assigned with one of two community labels, and the agents interact with probabilities depending on their labels. The considered problem is to jointly recover the community labels of the agents and estimate interaction probabilities between the agents, based on a single trajectory of the model. We first study stability and limit theorems of the model, and then propose a joint recovery and estimation algorithm based on a trajectory. It is verified that the community recovery can be achieved in finite time, and the interaction estimator converges almost surely. We derive a sample-complexity result for the recovery, and analyze the estimator's convergence rate. Simulations are presented for illustration of the performance of the proposed algorithm. △ Less

Submitted 25 August, 2023; v1 submitted 18 February, 2021; originally announced February 2021.

arXiv:2102.08706 [pdf, other]

doi 10.1109/ICASSP39728.2021.9414060

Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder

Authors: Huajian Fang, Guillaume Carbajal, Stefan Wermter, Timo Gerkmann

Abstract: Recently, a generative variational autoencoder (VAE) has been proposed for speech enhancement to model speech statistics. However, this approach only uses clean speech in the training phase, making the estimation particularly sensitive to noise presence, especially in low signal-to-noise ratios (SNRs). To increase the robustness of the VAE, we propose to include noise information in the training p… ▽ More Recently, a generative variational autoencoder (VAE) has been proposed for speech enhancement to model speech statistics. However, this approach only uses clean speech in the training phase, making the estimation particularly sensitive to noise presence, especially in low signal-to-noise ratios (SNRs). To increase the robustness of the VAE, we propose to include noise information in the training phase by using a noise-aware encoder trained on noisy-clean speech pairs. We evaluate our approach on real recordings of different noisy environments and acoustic conditions using two different noise datasets. We show that our proposed noise-aware VAE outperforms the standard VAE in terms of overall distortion without increasing the number of model parameters. At the same time, we demonstrate that our model is capable of generalizing to unseen noise conditions better than a supervised feedforward deep neural network (DNN). Furthermore, we demonstrate the robustness of the model performance to a reduction of the noisy-clean speech training data size. △ Less

Submitted 17 February, 2021; originally announced February 2021.

Comments: ICASSP 2021. (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:2010.10606 [pdf, ps, other]

doi 10.1002/rnc.5344

SSUE: Simultaneous State and Uncertainty Estimation for Dynamical Systems

Authors: Hang Geng, Mulugeta A. Haile, Huazhen Fang

Abstract: Parameters of the mathematical model describing many practical dynamical systems are prone to vary due to aging or renewal, wear and tear, as well as changes in environmental or service conditions. These variabilities will adversely affect the accuracy of state estimation. In this paper, we introduce SSUE: Simultaneous State and Uncertainty Estimation for quantifying parameter uncertainty while si… ▽ More Parameters of the mathematical model describing many practical dynamical systems are prone to vary due to aging or renewal, wear and tear, as well as changes in environmental or service conditions. These variabilities will adversely affect the accuracy of state estimation. In this paper, we introduce SSUE: Simultaneous State and Uncertainty Estimation for quantifying parameter uncertainty while simultaneously estimating the internal state of a system. Our approach involves the development of a Bayesian framework that recursively updates the posterior joint density of the unknown state vector and parameter uncertainty. To execute the framework for practical implementation, we develop a computational algorithm based on maximum a posteriori estimation and the numerical Newton's method. Observability analysis is conducted for linear systems, and its relation with the consistency of the estimation of the uncertainty's location is unveiled. Additional simulation results are provided to demonstrate the effectiveness of the proposed SSUE approach. △ Less

Submitted 20 October, 2020; originally announced October 2020.

arXiv:2007.15883 [pdf, other]

doi 10.1007/978-3-030-87000-3_20

Robust Retinal Vessel Segmentation from a Data Augmentation Perspective

Authors: Xu Sun, Huihui Fang, Yehui Yang, Dongwei Zhu, Lei Wang, Junwei Liu, Yanwu Xu

Abstract: Retinal vessel segmentation is a fundamental step in screening, diagnosis, and treatment of various cardiovascular and ophthalmic diseases. Robustness is one of the most critical requirements for practical utilization, since the test images may be captured using different fundus cameras, or be affected by various pathological changes. We investigate this problem from a data augmentation perspectiv… ▽ More Retinal vessel segmentation is a fundamental step in screening, diagnosis, and treatment of various cardiovascular and ophthalmic diseases. Robustness is one of the most critical requirements for practical utilization, since the test images may be captured using different fundus cameras, or be affected by various pathological changes. We investigate this problem from a data augmentation perspective, with the merits of no additional training data or inference time. In this paper, we propose two new data augmentation modules, namely, channel-wise random Gamma correction and channel-wise random vessel augmentation. Given a training color fundus image, the former applies random gamma correction on each color channel of the entire image, while the latter intentionally enhances or decreases only the fine-grained blood vessel regions using morphological transformations. With the additional training samples generated by applying these two modules sequentially, a model could learn more invariant and discriminating features against both global and local disturbances. Experimental results on both real-world and synthetic datasets demonstrate that our method can improve the performance and robustness of a classic convolutional neural network architecture. The source code is available at \url{https://github.com/PaddlePaddle/Research/tree/master/CV/robust_vessel_segmentation}. △ Less

Submitted 28 September, 2021; v1 submitted 31 July, 2020; originally announced July 2020.

arXiv:2004.14321 [pdf, other]

doi 10.1109/TII.2020.2983176

Real-Time Optimal Lithium-Ion Battery Charging Based on Explicit Model Predictive Control

Authors: Ning Tian, Huazhen Fang, Yebin Wang

Abstract: The rapidly growing use of lithium-ion batteries across various industries highlights the pressing issue of optimal charging control, as charging plays a crucial role in the health, safety and life of batteries. The literature increasingly adopts model predictive control (MPC) to address this issue, taking advantage of its capability of performing optimization under constraints. However, the compu… ▽ More The rapidly growing use of lithium-ion batteries across various industries highlights the pressing issue of optimal charging control, as charging plays a crucial role in the health, safety and life of batteries. The literature increasingly adopts model predictive control (MPC) to address this issue, taking advantage of its capability of performing optimization under constraints. However, the computationally complex online constrained optimization intrinsic to MPC often hinders real-time implementation. This paper is thus proposed to develop a framework for real-time charging control based on explicit MPC (eMPC), exploiting its advantage in characterizing an explicit solution to an MPC problem, to enable real-time charging control. The study begins with the formulation of MPC charging based on a nonlinear equivalent circuit model. Then, multi-segment linearization is conducted to the original model, and applying the eMPC design to the obtained linear models leads to a charging control algorithm. The proposed algorithm shifts the constrained optimization to offline by precomputing explicit solutions to the charging problem and expressing the charging law as piecewise affine functions. This drastically reduces not only the online computational costs in the control run but also the difficulty of coding. Extensive numerical simulation and experimental results verify the effectiveness of the proposed eMPC charging control framework and algorithm. The research results can potentially meet the needs for real-time battery management running on embedded hardware. △ Less

Submitted 29 April, 2020; originally announced April 2020.

arXiv:2003.14028 [pdf, ps, other]

Community Detection for Gossip Dynamics with Stubborn Agents

Authors: Yu Xing, Xingkang He, Haitao Fang, Karl Henrik Johansson

Abstract: We consider a community detection problem for gossip dynamics with stubborn agents in this paper. It is assumed that the communication probability matrix for agent pairs has a block structure. More specifically, we assume that the network can be divided into two communities, and the communication probability of two agents depends on whether they are in the same community. Stability of the model is… ▽ More We consider a community detection problem for gossip dynamics with stubborn agents in this paper. It is assumed that the communication probability matrix for agent pairs has a block structure. More specifically, we assume that the network can be divided into two communities, and the communication probability of two agents depends on whether they are in the same community. Stability of the model is investigated, and expectation of stationary distribution is characterized, indicating under the block assumption, the stationary behaviors of agents in the same community are similar. It is also shown that agents in different communities display distinct behaviors if and only if state averages of stubborn agents in different communities are not identical. A community detection algorithm is then proposed to recover community structure and to estimate communication probability parameters. It is verified that the community detection part converges in finite time, and the parameter estimation part converges almost surely. Simulations are given to illustrate algorithm performance. △ Less

Submitted 28 November, 2020; v1 submitted 31 March, 2020; originally announced March 2020.

Showing 1–50 of 69 results for author: Fang, H