-
Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases
Authors:
Meng Wang,
Tian Lin,
Aidi Lin,
Kai Yu,
Yuanyuan Peng,
Lianyu Wang,
Cheng Chen,
Ke Zou,
Huiyu Liang,
Man Chen,
Xue Yao,
Meiqin Zhang,
Binwei Huang,
Chaoxin Zheng,
Peixin Zhang,
Wei Chen,
Yilong Luo,
Yifan Chen,
Honghe Xia,
Tingkun Shi,
Qi Zhang,
**ming Guo,
Xiaolin Chen,
**gcheng Wang,
Yih Chung Tham
, et al. (24 additional authors not shown)
Abstract:
Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources…
▽ More
Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered.
△ Less
Submitted 30 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Federated Transfer Learning with Task Personalization for Condition Monitoring in Ultrasonic Metal Welding
Authors:
Ahmadreza Eslaminia,
Yuquan Meng,
Klara Nahrstedt,
Chenhui Shao
Abstract:
Ultrasonic metal welding (UMW) is a key joining technology with widespread industrial applications. Condition monitoring (CM) capabilities are critically needed in UMW applications because process anomalies significantly deteriorate the joining quality. Recently, machine learning models emerged as a promising tool for CM in many manufacturing applications due to their ability to learn complex patt…
▽ More
Ultrasonic metal welding (UMW) is a key joining technology with widespread industrial applications. Condition monitoring (CM) capabilities are critically needed in UMW applications because process anomalies significantly deteriorate the joining quality. Recently, machine learning models emerged as a promising tool for CM in many manufacturing applications due to their ability to learn complex patterns. Yet, the successful deployment of these models requires substantial training data that may be expensive and time-consuming to collect. Additionally, many existing machine learning models lack generalizability and cannot be directly applied to new process configurations (i.e., domains). Such issues may be potentially alleviated by pooling data across manufacturers, but data sharing raises critical data privacy concerns. To address these challenges, this paper presents a Federated Transfer Learning with Task Personalization (FTL-TP) framework that provides domain generalization capabilities in distributed learning while ensuring data privacy. By effectively learning a unified representation from feature space, FTL-TP can adapt CM models for clients working on similar tasks, thereby enhancing their overall adaptability and performance jointly. To demonstrate the effectiveness of FTL-TP, we investigate two distinct UMW CM tasks, tool condition monitoring and workpiece surface condition classification. Compared with state-of-the-art FL algorithms, FTL-TP achieves a 5.35%--8.08% improvement of accuracy in CM in new target domains. FTL-TP is also shown to perform excellently in challenging scenarios involving unbalanced data distributions and limited client fractions. Furthermore, by implementing the FTL-TP method on an edge-cloud architecture, we show that this method is both viable and efficient in practice. The FTL-TP framework is readily extensible to various other manufacturing applications.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Motion and temporal B0 shift corrections for quantitative susceptibility map** (QSM) and R2* map** using dual-echo spiral navigators and conjugate-phase reconstruction
Authors:
Yuguang Meng,
Jason W. Allen,
Vahid Khalilzad Sharghi,
Deqiang Qiu
Abstract:
Purpose: To develop an efficient navigator-based motion and temporal B0 shift correction technique for 3D multi-echo gradient-echo (ME-GRE) MRI for quantitative susceptibility map** (QSM) and R2* map**. Theory and Methods: A dual-echo 3D spiral navigator was designed to interleave with the Cartesian ME-GRE acquisitions, allowing the acquisition of both low- and high-echo time signals. We addit…
▽ More
Purpose: To develop an efficient navigator-based motion and temporal B0 shift correction technique for 3D multi-echo gradient-echo (ME-GRE) MRI for quantitative susceptibility map** (QSM) and R2* map**. Theory and Methods: A dual-echo 3D spiral navigator was designed to interleave with the Cartesian ME-GRE acquisitions, allowing the acquisition of both low- and high-echo time signals. We additionally designed a novel conjugate-phase based reconstruction method for the joint correction of motion and temporal B0 shifts. We performed both numerical simulation and in vivo human scans to assess the performance of the methods. Results: Numerical simulation and human brain scans demonstrated that the proposed technique successfully corrected artifacts induced by both head motions and temporal B0 changes. Efficient B0-change correction with conjugate-phase reconstruction can be performed on less than 10 clustered k-space segments. In vivo scans showed that combining temporal B0 correction with motion correction further reduced artifacts and improved image quality in both R2* and QSM images. Conclusion: Our proposed approach of using 3D spiral navigators and a novel conjugate-phase reconstruction method can improve susceptibility-related measurements using MR.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
LyZNet: A Lightweight Python Tool for Learning and Verifying Neural Lyapunov Functions and Regions of Attraction
Authors:
Jun Liu,
Yiming Meng,
Maxwell Fitzsimmons,
Ruikun Zhou
Abstract:
In this paper, we describe a lightweight Python framework that provides integrated learning and verification of neural Lyapunov functions for stability analysis. The proposed tool, named LyZNet, learns neural Lyapunov functions using physics-informed neural networks (PINNs) to solve Zubov's equation and verifies them using satisfiability modulo theories (SMT) solvers. What distinguishes this tool…
▽ More
In this paper, we describe a lightweight Python framework that provides integrated learning and verification of neural Lyapunov functions for stability analysis. The proposed tool, named LyZNet, learns neural Lyapunov functions using physics-informed neural networks (PINNs) to solve Zubov's equation and verifies them using satisfiability modulo theories (SMT) solvers. What distinguishes this tool from others in the literature is its ability to provide verified regions of attraction close to the domain of attraction. This is achieved by encoding Zubov's partial differential equation (PDE) into the PINN approach. By embracing the non-convex nature of the underlying optimization problems, we demonstrate that in cases where convex optimization, such as semidefinite programming, fails to capture the domain of attraction, our neural network framework proves more successful. The tool also offers automatic decomposition of coupled nonlinear systems into a network of low-dimensional subsystems for compositional verification. We illustrate the tool's usage and effectiveness with several numerical examples, including both non-trivial low-dimensional nonlinear systems and high-dimensional systems. The repository of the tool can be found at https://git.uwaterloo.ca/hybrid-systems-lab/lyznet.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Compositionally Verifiable Vector Neural Lyapunov Functions for Stability Analysis of Interconnected Nonlinear Systems
Authors:
Jun Liu,
Yiming Meng,
Maxwell Fitzsimmons,
Ruikun Zhou
Abstract:
While there has been increasing interest in using neural networks to compute Lyapunov functions, verifying that these functions satisfy the Lyapunov conditions and certifying stability regions remain challenging due to the curse of dimensionality. In this paper, we demonstrate that by leveraging the compositional structure of interconnected nonlinear systems, it is possible to verify neural Lyapun…
▽ More
While there has been increasing interest in using neural networks to compute Lyapunov functions, verifying that these functions satisfy the Lyapunov conditions and certifying stability regions remain challenging due to the curse of dimensionality. In this paper, we demonstrate that by leveraging the compositional structure of interconnected nonlinear systems, it is possible to verify neural Lyapunov functions for high-dimensional systems beyond the capabilities of current satisfiability modulo theories (SMT) solvers using a monolithic approach. Our numerical examples employ neural Lyapunov functions trained by solving Zubov's partial differential equation (PDE), which characterizes the domain of attraction for individual subsystems. These examples show a performance advantage over sums-of-squares (SOS) polynomial Lyapunov functions derived from semidefinite programming.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Physics-Informed Neural Network Policy Iteration: Algorithms, Convergence, and Verification
Authors:
Yiming Meng,
Ruikun Zhou,
Amartya Mukherjee,
Maxwell Fitzsimmons,
Christopher Song,
Jun Liu
Abstract:
Solving nonlinear optimal control problems is a challenging task, particularly for high-dimensional problems. We propose algorithms for model-based policy iterations to solve nonlinear optimal control problems with convergence guarantees. The main component of our approach is an iterative procedure that utilizes neural approximations to solve linear partial differential equations (PDEs), ensuring…
▽ More
Solving nonlinear optimal control problems is a challenging task, particularly for high-dimensional problems. We propose algorithms for model-based policy iterations to solve nonlinear optimal control problems with convergence guarantees. The main component of our approach is an iterative procedure that utilizes neural approximations to solve linear partial differential equations (PDEs), ensuring convergence. We present two variants of the algorithms. The first variant formulates the optimization problem as a linear least square problem, drawing inspiration from extreme learning machine (ELM) for solving PDEs. This variant efficiently handles low-dimensional problems with high accuracy. The second variant is based on a physics-informed neural network (PINN) for solving PDEs and has the potential to address high-dimensional problems. We demonstrate that both algorithms outperform traditional approaches, such as Galerkin methods, by a significant margin. We provide a theoretical analysis of both algorithms in terms of convergence of neural approximations towards the true optimal solutions in a general setting. Furthermore, we employ formal verification techniques to demonstrate the verifiable stability of the resulting controllers.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Music Genre Classification: A Comparative Analysis of CNN and XGBoost Approaches with Mel-frequency cepstral coefficients and Mel Spectrograms
Authors:
Yigang Meng
Abstract:
In recent years, various well-designed algorithms have empowered music platforms to provide content based on one's preferences. Music genres are defined through various aspects, including acoustic features and cultural considerations. Music genre classification works well with content-based filtering, which recommends content based on music similarity to users. Given a considerable dataset, one pr…
▽ More
In recent years, various well-designed algorithms have empowered music platforms to provide content based on one's preferences. Music genres are defined through various aspects, including acoustic features and cultural considerations. Music genre classification works well with content-based filtering, which recommends content based on music similarity to users. Given a considerable dataset, one premise is automatic annotation using machine learning or deep learning methods that can effectively classify audio files. The effectiveness of systems largely depends on feature and model selection, as different architectures and features can facilitate each other and yield different results. In this study, we conduct a comparative study investigating the performances of three models: a proposed convolutional neural network (CNN), the VGG16 with fully connected layers (FC), and an eXtreme Gradient Boosting (XGBoost) approach on different features: 30-second Mel spectrogram and 3-second Mel-frequency cepstral coefficients (MFCCs). The results show that the MFCC XGBoost model outperformed the others. Furthermore, applying data segmentation in the data preprocessing phase can significantly enhance the performance of the CNNs.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Stochastic Control Barrier Functions with Bayesian Inference for Unknown Stochastic Differential Equations
Authors:
Chuanzheng Wang,
Yiming Meng,
Jun Liu,
Stephen Smith
Abstract:
Control barrier functions are widely used to synthesize safety-critical controls. However, the presence of Gaussian-type noise in dynamical systems can generate unbounded signals and potentially result in severe consequences. Although research has been conducted in the field of safety-critical control for stochastic systems, in many real-world scenarios, we do not have precise knowledge about the…
▽ More
Control barrier functions are widely used to synthesize safety-critical controls. However, the presence of Gaussian-type noise in dynamical systems can generate unbounded signals and potentially result in severe consequences. Although research has been conducted in the field of safety-critical control for stochastic systems, in many real-world scenarios, we do not have precise knowledge about the stochastic dynamics. In this paper, we delve into the safety-critical control for stochastic systems where both the drift and diffusion components are unknown. We employ Bayesian inference as a data-driven approach to approximate the system. To be more specific, we utilize Bayesian linear regression along with the central limit theorem to estimate the drift term, and employ Bayesian inference to approximate the diffusion term. Through simulations, we verify our findings by applying them to a nonlinear dynamical model and an adaptive cruise control model.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Physics-Informed Neural Network Lyapunov Functions: PDE Characterization, Learning, and Verification
Authors:
Jun Liu,
Yiming Meng,
Maxwell Fitzsimmons,
Ruikun Zhou
Abstract:
We provide a systematic investigation of using physics-informed neural networks to compute Lyapunov functions. We encode Lyapunov conditions as a partial differential equation (PDE) and use this for training neural network Lyapunov functions. We analyze the analytical properties of the solutions to the Lyapunov and Zubov PDEs. In particular, we show that employing the Zubov equation in training ne…
▽ More
We provide a systematic investigation of using physics-informed neural networks to compute Lyapunov functions. We encode Lyapunov conditions as a partial differential equation (PDE) and use this for training neural network Lyapunov functions. We analyze the analytical properties of the solutions to the Lyapunov and Zubov PDEs. In particular, we show that employing the Zubov equation in training neural Lyapunov functions can lead to approximate regions of attraction close to the true domain of attraction. We also examine approximation errors and the convergence of neural approximations to the unique solution of Zubov's equation. We then provide sufficient conditions for the learned neural Lyapunov functions that can be readily verified by satisfiability modulo theories (SMT) solvers, enabling formal verification of both local stability analysis and region-of-attraction estimates in the large. Through a number of nonlinear examples, ranging from low to high dimensions, we demonstrate that the proposed framework can outperform traditional sums-of-squares (SOS) Lyapunov functions obtained using semidefinite programming (SDP).
△ Less
Submitted 21 December, 2023; v1 submitted 14 December, 2023;
originally announced December 2023.
-
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
Authors:
Yi Meng,
Xiang Li,
Zhiyong Wu,
Tingtian Li,
Zixun Sun,
Xinyu Xiao,
Chi Sun,
Hui Zhan,
Helen Meng
Abstract:
To further improve the speaking styles of synthesized speeches, current text-to-speech (TTS) synthesis systems commonly employ reference speeches to stylize their outputs instead of just the input texts. These reference speeches are obtained by manual selection which is resource-consuming, or selected by semantic features. However, semantic features contain not only style-related information, but…
▽ More
To further improve the speaking styles of synthesized speeches, current text-to-speech (TTS) synthesis systems commonly employ reference speeches to stylize their outputs instead of just the input texts. These reference speeches are obtained by manual selection which is resource-consuming, or selected by semantic features. However, semantic features contain not only style-related information, but also style irrelevant information. The information irrelevant to speaking style in the text could interfere the reference audio selection and result in improper speaking styles. To improve the reference selection, we propose Contrastive Acoustic-Linguistic Module (CALM) to extract the Style-related Text Feature (STF) from the text. CALM optimizes the correlation between the speaking style embedding and the extracted STF with contrastive learning. Thus, a certain number of the most appropriate reference speeches for the input text are selected by retrieving the speeches with the top STF similarities. Then the style embeddings are weighted summarized according to their STF similarities and used to stylize the synthesized speech of TTS. Experiment results demonstrate the effectiveness of our proposed approach, with both objective evaluations and subjective evaluations on the speaking styles of the synthesized speeches outperform a baseline approach with semantic-feature-based reference selection.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing
Authors:
**gbei Li,
Sipan Li,
** Chen,
Luwen Zhang,
Yi Meng,
Zhiyong Wu,
Helen Meng,
Qiao Tian,
Yu** Wang,
Yuxuan Wang
Abstract:
Automatic dubbing, which generates a corresponding version of the input speech in another language, could be widely utilized in many real-world scenarios such as video and game localization. In addition to synthesizing the translated scripts, automatic dubbing needs to further transfer the speaking style in the original language to the dubbed speeches to give audiences the impression that the char…
▽ More
Automatic dubbing, which generates a corresponding version of the input speech in another language, could be widely utilized in many real-world scenarios such as video and game localization. In addition to synthesizing the translated scripts, automatic dubbing needs to further transfer the speaking style in the original language to the dubbed speeches to give audiences the impression that the characters are speaking in their native tongue. However, state-of-the-art automatic dubbing systems only model the transfer on duration and speaking rate, neglecting the other aspects in speaking style such as emotion, intonation and emphasis which are also crucial to fully perform the characters and speech understanding. In this paper, we propose a joint multi-scale cross-lingual speaking style transfer framework to simultaneously model the bidirectional speaking style transfer between languages at both global (i.e. utterance level) and local (i.e. word level) scales. The global and local speaking styles in each language are extracted and utilized to predicted the global and local speaking styles in the other language with an encoder-decoder framework for each direction and a shared bidirectional attention mechanism for both directions. A multi-scale speaking style enhanced FastSpeech 2 is then utilized to synthesize the predicted the global and local speaking styles to speech for each language. Experiment results demonstrate the effectiveness of our proposed framework, which outperforms a baseline with only duration transfer in both objective and subjective evaluations.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Automatically Segment the Left Atrium and Scars from LGE-MRIs Using a Boundary-focused nnU-Net
Authors:
Yuchen Zhang,
Yanda Meng,
Yalin Zheng
Abstract:
Atrial fibrillation (AF) is the most common cardiac arrhythmia. Accurate segmentation of the left atrial (LA) and LA scars can provide valuable information to predict treatment outcomes in AF. In this paper, we proposed to automatically segment LA cavity and quantify LA scars with late gadolinium enhancement Magnetic Resonance Imagings (LGE-MRIs). We adopted nnU-Net as the baseline model and explo…
▽ More
Atrial fibrillation (AF) is the most common cardiac arrhythmia. Accurate segmentation of the left atrial (LA) and LA scars can provide valuable information to predict treatment outcomes in AF. In this paper, we proposed to automatically segment LA cavity and quantify LA scars with late gadolinium enhancement Magnetic Resonance Imagings (LGE-MRIs). We adopted nnU-Net as the baseline model and exploited the importance of LA boundary characteristics with the TopK loss as the loss function. Specifically, a focus on LA boundary pixels is achieved during training, which provides a more accurate boundary prediction. On the other hand, a distance map transformation of the predicted LA boundary is regarded as an additional input for the LA scar prediction, which provides marginal constraint on scar locations. We further designed a novel uncertainty-aware module (UAM) to produce better results for predictions with high uncertainty. Experiments on the LAScarQS 2022 dataset demonstrated our model's superior performance on the LA cavity and LA scar segmentation. Specifically, we achieved 88.98\% and 64.08\% Dice coefficient for LA cavity and scar segmentation, respectively. We will make our implementation code public available at https://github.com/level6626/Boundary-focused-nnU-Net.
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
Physical Knowledge Enhanced Deep Neural Network for Sea Surface Temperature Prediction
Authors:
Yuxin Meng,
Feng Gao,
Eric Rigall,
Ran Dong,
Junyu Dong,
Qian Du
Abstract:
Traditionally, numerical models have been deployed in oceanography studies to simulate ocean dynamics by representing physical equations. However, many factors pertaining to ocean dynamics seem to be ill-defined. We argue that transferring physical knowledge from observed data could further improve the accuracy of numerical models when predicting Sea Surface Temperature (SST). Recently, the advanc…
▽ More
Traditionally, numerical models have been deployed in oceanography studies to simulate ocean dynamics by representing physical equations. However, many factors pertaining to ocean dynamics seem to be ill-defined. We argue that transferring physical knowledge from observed data could further improve the accuracy of numerical models when predicting Sea Surface Temperature (SST). Recently, the advances in earth observation technologies have yielded a monumental growth of data. Consequently, it is imperative to explore ways in which to improve and supplement numerical models utilizing the ever-increasing amounts of historical observational data. To this end, we introduce a method for SST prediction that transfers physical knowledge from historical observations to numerical models. Specifically, we use a combination of an encoder and a generative adversarial network (GAN) to capture physical knowledge from the observed data. The numerical model data is then fed into the pre-trained model to generate physics-enhanced data, which can then be used for SST prediction. Experimental results demonstrate that the proposed method considerably enhances SST prediction performance when compared to several state-of-the-art baselines.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Towards Learning and Verifying Maximal Neural Lyapunov Functions
Authors:
Jun Liu,
Yiming Meng,
Maxwell Fitzsimmons,
Ruikun Zhou
Abstract:
The search for Lyapunov functions is a crucial task in the analysis of nonlinear systems. In this paper, we present a physics-informed neural network (PINN) approach to learning a Lyapunov function that is nearly maximal for a given stable set. A Lyapunov function is considered nearly maximal if its sub-level sets can be made arbitrarily close to the boundary of the domain of attraction. We use Zu…
▽ More
The search for Lyapunov functions is a crucial task in the analysis of nonlinear systems. In this paper, we present a physics-informed neural network (PINN) approach to learning a Lyapunov function that is nearly maximal for a given stable set. A Lyapunov function is considered nearly maximal if its sub-level sets can be made arbitrarily close to the boundary of the domain of attraction. We use Zubov's equation to train a maximal Lyapunov function defined on the domain of attraction. Additionally, we propose conditions that can be readily verified by satisfiability modulo theories (SMT) solvers for both local and global stability. We provide theoretical guarantees on the existence of maximal Lyapunov functions and demonstrate the effectiveness of our computational approach through numerical examples.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Weakly supervised segmentation with point annotations for histopathology images via contrast-based variational model
Authors:
Hongrun Zhang,
Liam Burrows,
Yanda Meng,
Declan Sculthorpe,
Abhik Mukherjee,
Sarah E Coupland,
Ke Chen,
Yalin Zheng
Abstract:
Image segmentation is a fundamental task in the field of imaging and vision. Supervised deep learning for segmentation has achieved unparalleled success when sufficient training data with annotated labels are available. However, annotation is known to be expensive to obtain, especially for histopathology images where the target regions are usually with high morphology variations and irregular shap…
▽ More
Image segmentation is a fundamental task in the field of imaging and vision. Supervised deep learning for segmentation has achieved unparalleled success when sufficient training data with annotated labels are available. However, annotation is known to be expensive to obtain, especially for histopathology images where the target regions are usually with high morphology variations and irregular shapes. Thus, weakly supervised learning with sparse annotations of points is promising to reduce the annotation workload. In this work, we propose a contrast-based variational model to generate segmentation results, which serve as reliable complementary supervision to train a deep segmentation model for histopathology images. The proposed method considers the common characteristics of target regions in histopathology images and can be trained in an end-to-end manner. It can generate more regionally consistent and smoother boundary segmentation, and is more robust to unlabeled `novel' regions. Experiments on two different histology datasets demonstrate its effectiveness and efficiency in comparison to previous models.
△ Less
Submitted 7 April, 2023;
originally announced April 2023.
-
Hybrid Systems Neural Control with Region-of-Attraction Planner
Authors:
Yue Meng,
Chuchu Fan
Abstract:
Hybrid systems are prevalent in robotics. However, ensuring the stability of hybrid systems is challenging due to sophisticated continuous and discrete dynamics. A system with all its system modes stable can still be unstable. Hence special treatments are required at mode switchings to stabilize the system. In this work, we propose a hierarchical, neural network (NN)-based method to control genera…
▽ More
Hybrid systems are prevalent in robotics. However, ensuring the stability of hybrid systems is challenging due to sophisticated continuous and discrete dynamics. A system with all its system modes stable can still be unstable. Hence special treatments are required at mode switchings to stabilize the system. In this work, we propose a hierarchical, neural network (NN)-based method to control general hybrid systems. For each system mode, we first learn an NN Lyapunov function and an NN controller to ensure the states within the region of attraction (RoA) can be stabilized. Then an RoA NN estimator is learned across different modes. Upon mode switching, we propose a differentiable planner to ensure the states after switching can land in next mode's RoA, hence stabilizing the hybrid system. We provide novel theoretical stability guarantees and conduct experiments in car tracking control, pogobot navigation, and bipedal walker locomotion. Our method only requires 0.25X of the training time as needed by other learning-based methods. With low running time (10-50X faster than model predictive control (MPC)), our controller achieves a higher stability/success rate over other baselines such as MPC, reinforcement learning (RL), common Lyapunov methods (CLF), linear quadratic regulator (LQR), quadratic programming (QP) and Hamilton-Jacobian-based methods (HJB). The project page is on https://mit-realm.github.io/hybrid-clf.
△ Less
Submitted 18 March, 2023;
originally announced March 2023.
-
Robustly Complete Finite-State Abstractions for Control Synthesis of Stochastic Systems
Authors:
Yiming Meng,
Jun Liu
Abstract:
The essential step of abstraction-based control synthesis for nonlinear systems to satisfy a given specification is to obtain a finite-state abstraction of the original systems. The complexity of the abstraction is usually the dominating factor that determines the efficiency of the algorithm. For the control synthesis of discrete-time nonlinear stochastic systems modelled by nonlinear stochastic d…
▽ More
The essential step of abstraction-based control synthesis for nonlinear systems to satisfy a given specification is to obtain a finite-state abstraction of the original systems. The complexity of the abstraction is usually the dominating factor that determines the efficiency of the algorithm. For the control synthesis of discrete-time nonlinear stochastic systems modelled by nonlinear stochastic difference equations, recent literature has demonstrated the soundness of abstractions in preserving robust probabilistic satisfaction of ω-regular lineartime properties. However, unnecessary transitions exist within the abstractions, which are difficult to quantify, and the completeness of abstraction-based control synthesis in the stochastic setting remains an open theoretical question. In this paper, we address this fundamental question from the topological view of metrizable space of probability measures, and propose constructive finite-state abstractions for control synthesis of probabilistic linear temporal specifications. Such abstractions are both sound and approximately complete. That is, given a concrete discrete-time stochastic system and an arbitrarily small L1-perturbation of this system, there exists a family of finite-state controlled Markov chains that both abstracts the concrete system and is abstracted by the slightly perturbed system. In other words, given an arbitrarily small prescribed precision, an abstraction always exists to decide whether a control strategy exists for the concrete system to satisfy the probabilistic specification.
△ Less
Submitted 7 March, 2023;
originally announced March 2023.
-
Multi-organ segmentation: a progressive exploration of learning paradigms under scarce annotation
Authors:
Shiman Li,
Haoran Wang,
Yucong Meng,
Chenxi Zhang,
Zhijian Song
Abstract:
Precise delineation of multiple organs or abnormal regions in the human body from medical images plays an essential role in computer-aided diagnosis, surgical simulation, image-guided interventions, and especially in radiotherapy treatment planning. Thus, it is of great significance to explore automatic segmentation approaches, among which deep learning-based approaches have evolved rapidly and wi…
▽ More
Precise delineation of multiple organs or abnormal regions in the human body from medical images plays an essential role in computer-aided diagnosis, surgical simulation, image-guided interventions, and especially in radiotherapy treatment planning. Thus, it is of great significance to explore automatic segmentation approaches, among which deep learning-based approaches have evolved rapidly and witnessed remarkable progress in multi-organ segmentation. However, obtaining an appropriately sized and fine-grained annotated dataset of multiple organs is extremely hard and expensive. Such scarce annotation limits the development of high-performance multi-organ segmentation models but promotes many annotation-efficient learning paradigms. Among these, studies on transfer learning leveraging external datasets, semi-supervised learning using unannotated datasets and partially-supervised learning integrating partially-labeled datasets have led the dominant way to break such dilemma in multi-organ segmentation. We first review the traditional fully supervised method, then present a comprehensive and systematic elaboration of the 3 abovementioned learning paradigms in the context of multi-organ segmentation from both technical and methodological perspectives, and finally summarize their challenges and future trends.
△ Less
Submitted 8 February, 2023; v1 submitted 7 February, 2023;
originally announced February 2023.
-
Once-for-All Sequence Compression for Self-Supervised Speech Models
Authors:
Hsuan-Jui Chen,
Yen Meng,
Hung-yi Lee
Abstract:
The sequence length along the time axis is often the dominant factor of the computation in speech processing. Works have been proposed to reduce the sequence length for lowering the computational cost in self-supervised speech models. However, different downstream tasks have different tolerance of sequence compressing, so a model that produces a fixed compressing rate may not fit all tasks. In thi…
▽ More
The sequence length along the time axis is often the dominant factor of the computation in speech processing. Works have been proposed to reduce the sequence length for lowering the computational cost in self-supervised speech models. However, different downstream tasks have different tolerance of sequence compressing, so a model that produces a fixed compressing rate may not fit all tasks. In this work, we introduce a once-for-all (OFA) sequence compression framework for self-supervised speech models that supports a continuous range of operating compressing rates. The framework is evaluated on various tasks, showing marginal degradation compared to the fixed compressing rate variants with a smooth performance-efficiency trade-off. We further explore adaptive compressing rate learning, demonstrating the ability to select task-specific preferred frame periods without needing a grid search.
△ Less
Submitted 9 May, 2023; v1 submitted 4 November, 2022;
originally announced November 2022.
-
On Compressing Sequences for Self-Supervised Speech Models
Authors:
Yen Meng,
Hsuan-Jui Chen,
Jiatong Shi,
Shinji Watanabe,
Paola Garcia,
Hung-yi Lee,
Hao Tang
Abstract:
Compressing self-supervised models has become increasingly necessary, as self-supervised models become larger. While previous approaches have primarily focused on compressing the model size, shortening sequences is also effective in reducing the computational cost. In this work, we study fixed-length and variable-length subsampling along the time axis in self-supervised learning. We explore how in…
▽ More
Compressing self-supervised models has become increasingly necessary, as self-supervised models become larger. While previous approaches have primarily focused on compressing the model size, shortening sequences is also effective in reducing the computational cost. In this work, we study fixed-length and variable-length subsampling along the time axis in self-supervised learning. We explore how individual downstream tasks are sensitive to input frame rates. Subsampling while training self-supervised models not only improves the overall performance on downstream tasks under certain frame rates, but also brings significant speed-up in inference. Variable-length subsampling performs particularly well under low frame rates. In addition, if we have access to phonetic boundaries, we find no degradation in performance for an average frame rate as low as 10 Hz.
△ Less
Submitted 25 October, 2022; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Density Planner: Minimizing Collision Risk in Motion Planning with Dynamic Obstacles using Density-based Reachability
Authors:
Laura Lützow,
Yue Meng,
Andres Chavez Armijos,
Chuchu Fan
Abstract:
Uncertainty is prevalent in robotics. Due to measurement noise and complex dynamics, we cannot estimate the exact system and environment state. Since conservative motion planners are not guaranteed to find a safe control strategy in a crowded, uncertain environment, we propose a density-based method. Our approach uses a neural network and the Liouville equation to learn the density evolution for a…
▽ More
Uncertainty is prevalent in robotics. Due to measurement noise and complex dynamics, we cannot estimate the exact system and environment state. Since conservative motion planners are not guaranteed to find a safe control strategy in a crowded, uncertain environment, we propose a density-based method. Our approach uses a neural network and the Liouville equation to learn the density evolution for a system with an uncertain initial state. We can plan for feasible and probably safe trajectories by applying a gradient-based optimization procedure to minimize the collision risk. We conduct motion planning experiments on simulated environments and environments generated from real-world data and outperform baseline methods such as model predictive control and nonlinear programming. While our method requires offline planning, the online run time is 100 times smaller compared to model predictive control.
△ Less
Submitted 27 February, 2023; v1 submitted 5 October, 2022;
originally announced October 2022.
-
Case Studies for Computing Density of Reachable States for Safe Autonomous Motion Planning
Authors:
Yue Meng,
Zeng Qiu,
Md Tawhid Bin Waez,
Chuchu Fan
Abstract:
Density of the reachable states can help understand the risk of safety-critical systems, especially in situations when worst-case reachability is too conservative. Recent work provides a data-driven approach to compute the density distribution of autonomous systems' forward reachable states online. In this paper, we study the use of such approach in combination with model predictive control for ve…
▽ More
Density of the reachable states can help understand the risk of safety-critical systems, especially in situations when worst-case reachability is too conservative. Recent work provides a data-driven approach to compute the density distribution of autonomous systems' forward reachable states online. In this paper, we study the use of such approach in combination with model predictive control for verifiable safe path planning under uncertainties. We first use the learned density distribution to compute the risk of collision online. If such risk exceeds the acceptable threshold, our method will plan for a new path around the previous trajectory, with the risk of collision below the threshold. Our method is well-suited to handle systems with uncertainties and complicated dynamics as our data-driven approach does not need an analytical form of the systems' dynamics and can estimate forward state density with an arbitrary initial distribution of uncertainties. We design two challenging scenarios (autonomous driving and hovercraft control) for safe motion planning in environments with obstacles under system uncertainties. We first show that our density estimation approach can reach a similar accuracy as the Monte-Carlo-based method while using only 0.01X training samples. By leveraging the estimated risk, our algorithm achieves the highest success rate in goal reaching when enforcing the safety rate above 0.99.
△ Less
Submitted 16 September, 2022;
originally announced September 2022.
-
Accelerating Monte-Carlo Tree Search on CPU-FPGA Heterogeneous Platform
Authors:
Yuan Meng,
Rajgopal Kannan,
Viktor Prasanna
Abstract:
Monte Carlo Tree Search (MCTS) methods have achieved great success in many Artificial Intelligence (AI) benchmarks. The in-tree operations become a critical performance bottleneck in realizing parallel MCTS on CPUs. In this work, we develop a scalable CPU-FPGA system for Tree-Parallel MCTS. We propose a novel decomposition and map** of MCTS data structure and computation onto CPU and FPGA to red…
▽ More
Monte Carlo Tree Search (MCTS) methods have achieved great success in many Artificial Intelligence (AI) benchmarks. The in-tree operations become a critical performance bottleneck in realizing parallel MCTS on CPUs. In this work, we develop a scalable CPU-FPGA system for Tree-Parallel MCTS. We propose a novel decomposition and map** of MCTS data structure and computation onto CPU and FPGA to reduce communication and coordination. High scalability of our system is achieved by encapsulating in-tree operations in an SRAM-based FPGA accelerator. To lower the high data access latency and inter-worker synchronization overheads, we develop several hardware optimizations. We show that by using our accelerator, we obtain up to $35\times$ speedup for in-tree operations, and $3\times$ higher overall system throughput. Our CPU-FPGA system also achieves superior scalability wrt number of parallel workers than state-of-the-art parallel MCTS implementations on CPU.
△ Less
Submitted 23 August, 2022;
originally announced August 2022.
-
Graph-based Multi-View Fusion and Local Adaptation: Mitigating Within-Household Confusability for Speaker Identification
Authors:
Long Chen,
Yixiong Meng,
Venkatesh Ravichandran,
Andreas Stolcke
Abstract:
Speaker identification (SID) in the household scenario (e.g., for smart speakers) is an important but challenging problem due to limited number of labeled (enrollment) utterances, confusable voices, and demographic imbalances. Conventional speaker recognition systems generalize from a large random sample of speakers, causing the recognition to underperform for households drawn from specific cohort…
▽ More
Speaker identification (SID) in the household scenario (e.g., for smart speakers) is an important but challenging problem due to limited number of labeled (enrollment) utterances, confusable voices, and demographic imbalances. Conventional speaker recognition systems generalize from a large random sample of speakers, causing the recognition to underperform for households drawn from specific cohorts or otherwise exhibiting high confusability. In this work, we propose a graph-based semi-supervised learning approach to improve household-level SID accuracy and robustness with locally adapted graph normalization and multi-signal fusion with multi-view graphs. Unlike other work on household SID, fairness, and signal fusion, this work focuses on speaker label inference (scoring) and provides a simple solution to realize household-specific adaptation and multi-signal fusion without tuning the embeddings or training a fusion network. Experiments on the VoxCeleb dataset demonstrate that our approach consistently improves the performance across households with different customer cohorts and degrees of confusability.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
Automatic extraction of coronary arteries using deep learning in invasive coronary angiograms
Authors:
Yinghui Meng,
Zhenglong Du,
Chen Zhao,
Minghao Dong,
Drew Pienta,
Zhihui Xu,
Weihua Zhou
Abstract:
Accurate extraction of coronary arteries from invasive coronary angiography (ICA) is important in clinical decision-making for the diagnosis and risk stratification of coronary artery disease (CAD). In this study, we develop a method using deep learning to automatically extract the coronary artery lumen. Methods. A deep learning model U-Net 3+, which incorporates the full-scale skip connections an…
▽ More
Accurate extraction of coronary arteries from invasive coronary angiography (ICA) is important in clinical decision-making for the diagnosis and risk stratification of coronary artery disease (CAD). In this study, we develop a method using deep learning to automatically extract the coronary artery lumen. Methods. A deep learning model U-Net 3+, which incorporates the full-scale skip connections and deep supervisions, was proposed for automatic extraction of coronary arteries from ICAs. Transfer learning and a hybrid loss function were employed in this novel coronary artery extraction framework. Results. A data set containing 616 ICAs obtained from 210 patients was used. In the technical evaluation, the U-Net 3+ achieved a Dice score of 0.8942 and a sensitivity of 0.8735, which is higher than U-Net ++ (Dice score: 0.8814, the sensitivity of 0.8331) and U-net (Dice score: 0.8799, the sensitivity of 0.8305). Conclusion. Our study demonstrates that the U-Net 3+ is superior to other segmentation frameworks for the automatic extraction of the coronary arteries from ICAs. This result suggests great promise for clinical use.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
Data-Driven Learning of Safety-Critical Control with Stochastic Control Barrier Functions
Authors:
Chuanzheng Wang,
Yiming Meng,
Stephen L. Smith,
Jun Liu
Abstract:
Control barrier functions are widely used to synthesize safety-critical controls. The existence of Gaussian-type noise may lead to unsafe actions and result in severe consequences. While studies are widely done in safety-critical control for stochastic systems, in many real-world applications, we do not have the knowledge of the stochastic component of the dynamics. In this paper, we study safety-…
▽ More
Control barrier functions are widely used to synthesize safety-critical controls. The existence of Gaussian-type noise may lead to unsafe actions and result in severe consequences. While studies are widely done in safety-critical control for stochastic systems, in many real-world applications, we do not have the knowledge of the stochastic component of the dynamics. In this paper, we study safety-critical control of stochastic systems with an unknown diffusion part and propose a data-driven method to handle these scenarios. More specifically, we propose a data-driven stochastic control barrier function (DDSCBF) framework and use supervised learning to learn the unknown stochastic dynamics via the DDSCBF scheme. Under some reasonable assumptions, we provide guarantees that the DDSCBF scheme can approximate the Itô derivative of the stochastic control barrier function (SCBF) under partially unknown dynamics using the universal approximation theorem. We also show that we can achieve the same safety guarantee using the DDSCBF scheme as with SCBF in previous work without requiring the knowledge of stochastic dynamics. We use two non-linear stochastic systems to validate our theory in simulations.
△ Less
Submitted 22 May, 2022;
originally announced May 2022.
-
Multi-core fiber enabled fading noise suppression in φ-OFDR based quantitative distributed vibration sensing
Authors:
Yuxiang Feng,
Weilin Xie,
Yinxia Meng,
Jiang Yang,
Qiang Yang,
Yan Ren,
Tianwai Bo,
Zhongwei Tan,
Wei Wei,
Yi Dong
Abstract:
Coherent fading has been regarded as a critical issue in phase-sensitive optical frequency domain reflectometry (φ-OFDR) based distributed fiber-optic sensing. Here, we report on an approach for fading noise suppression in φ-OFDR with multi-core fiber. By exploiting the independent nature of the randomness in the distribution of reflective index in each of the cores, the drastic phase fluctuations…
▽ More
Coherent fading has been regarded as a critical issue in phase-sensitive optical frequency domain reflectometry (φ-OFDR) based distributed fiber-optic sensing. Here, we report on an approach for fading noise suppression in φ-OFDR with multi-core fiber. By exploiting the independent nature of the randomness in the distribution of reflective index in each of the cores, the drastic phase fluctuations due to the fading phenomina can be effectively alleviated by applying weighted vectorial averaging for the Rayleigh backscattering traces from each of the cores with distinct fading distributions. With the consistent linear response with respect to external excitation of interest for each of the cores, demonstration for the propsoed φ-OFDR with a commercial seven-core fiber has achieved highly sensitive quantitative distributed vibration sensing with about 2.2 nm length precision and 2 cm sensing resolution along the 500 m fiber, corresponding to a range resolution factor as high as about about 4E-5. Featuring long distance, high sensitivity, high resolution, and fading robustness, this approach has shown promising potentials in various sensing techniques for a wide range of practical scenarios.
△ Less
Submitted 3 May, 2022;
originally announced May 2022.
-
NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism
Authors:
**gbei Li,
Yi Meng,
Zhiyong Wu,
Helen Meng,
Qiao Tian,
Yu** Wang,
Yuxuan Wang
Abstract:
Although deep learning and end-to-end models have been widely used and shown superiority in automatic speech recognition (ASR) and text-to-speech (TTS) synthesis, state-of-the-art forced alignment (FA) models are still based on hidden Markov model (HMM). HMM has limited view of contextual information and is developed with long pipelines, leading to error accumulation and unsatisfactory performance…
▽ More
Although deep learning and end-to-end models have been widely used and shown superiority in automatic speech recognition (ASR) and text-to-speech (TTS) synthesis, state-of-the-art forced alignment (FA) models are still based on hidden Markov model (HMM). HMM has limited view of contextual information and is developed with long pipelines, leading to error accumulation and unsatisfactory performance. Inspired by the capability of attention mechanism in capturing long term contextual information and learning alignments in ASR and TTS, we propose a neural network based end-to-end forced aligner called NeuFA, in which a novel bidirectional attention mechanism plays an essential role. NeuFA integrates the alignment learning of both ASR and TTS tasks in a unified framework by learning bidirectional alignment information from a shared attention matrix in the proposed bidirectional attention mechanism. Alignments are extracted from the learnt attention weights and optimized by the ASR, TTS and FA tasks in a multi-task learning manner. Experimental results demonstrate the effectiveness of our proposed model, with mean absolute error on test set drops from 25.8 ms to 23.7 ms at word level, and from 17.0 ms to 15.7 ms at phoneme level compared with state-of-the-art HMM based model.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Building Synthetic Speaker Profiles in Text-to-Speech Systems
Authors:
Jie Pu,
Yixiong Meng,
Oguz Elibol
Abstract:
The diversity of speaker profiles in multi-speaker TTS systems is a crucial aspect of its performance, as it measures how many different speaker profiles TTS systems could possibly synthesize. However, this important aspect is often overlooked when building multi-speaker TTS systems and there is no established framework to evaluate this diversity. The reason behind is that most multi-speaker TTS s…
▽ More
The diversity of speaker profiles in multi-speaker TTS systems is a crucial aspect of its performance, as it measures how many different speaker profiles TTS systems could possibly synthesize. However, this important aspect is often overlooked when building multi-speaker TTS systems and there is no established framework to evaluate this diversity. The reason behind is that most multi-speaker TTS systems are limited to generate speech signals with the same speaker profiles as its training data. They often use discrete speaker embedding vectors which have a one-to-one correspondence with individual speakers. This correspondence limits TTS systems and hinders their capability of generating unseen speaker profiles that did not appear during training. In this paper, we aim to build multi-speaker TTS systems that have a greater variety of speaker profiles and can generate new synthetic speaker profiles that are different from training data. To this end, we propose to use generative models with a triplet loss and a specific shuffle mechanism. In our experiments, the effectiveness and advantages of the proposed method have been demonstrated in terms of both the distinctiveness and intelligibility of synthesized speech signals.
△ Less
Submitted 7 February, 2022;
originally announced February 2022.
-
Decentralized Coordinated State Estimation in Integrated Transmission and Distribution Systems
Authors:
Ying Zhang,
Yanbo Chen,
Jianhui Wang,
Yue Meng,
Tianqiao Zhao
Abstract:
Current transmission and distribution system states are mostly unobservable to each other, and state estimation is separately conducted in the two systems owing to the differences in network structures and analytical models. The large-scale integration of transmission and active distribution systems calls for an effective solution to global state estimation. Unlike existing independent state estim…
▽ More
Current transmission and distribution system states are mostly unobservable to each other, and state estimation is separately conducted in the two systems owing to the differences in network structures and analytical models. The large-scale integration of transmission and active distribution systems calls for an effective solution to global state estimation. Unlike existing independent state estimation methods on both levels of these systems, we propose a decentralized coordinated transmission and distribution system state estimation (C-TDSE) method. This method enables accurate monitoring of the integrated systems with a global reference in a decentralized manner and reconciles the mismatches of voltages and powers on the boundaries of the systems. The comparative analysis on the integrated transmission and distribution systems points to improved estimation results relative to the independent state estimation methods.
△ Less
Submitted 8 November, 2021;
originally announced November 2021.
-
Physics-Guided Generative Adversarial Networks for Sea Subsurface Temperature Prediction
Authors:
Yuxin Meng,
Eric Rigall,
Xueen Chen,
Feng Gao,
Junyu Dong,
Sheng Chen
Abstract:
Sea subsurface temperature, an essential component of aquatic wildlife, underwater dynamics and heat transfer with the sea surface, is affected by global warming in climate change. Existing research is commonly based on either physics-based numerical models or data based models. Physical modeling and machine learning are traditionally considered as two unrelated fields for the sea subsurface tempe…
▽ More
Sea subsurface temperature, an essential component of aquatic wildlife, underwater dynamics and heat transfer with the sea surface, is affected by global warming in climate change. Existing research is commonly based on either physics-based numerical models or data based models. Physical modeling and machine learning are traditionally considered as two unrelated fields for the sea subsurface temperature prediction task, with very different scientific paradigms (physics-driven and data-driven). However, we believe both methods are complementary to each other. Physical modeling methods can offer the potential for extrapolation beyond observational conditions, while data-driven methods are flexible in adapting to data and are capable of detecting unexpected patterns. The combination of both approaches is very attractive and offers potential performance improvement. In this paper, we propose a novel framework based on generative adversarial network (GAN) combined with numerical model to predict sea subsurface temperature. First, a GAN-based model is used to learn the simplified physics between the surface temperature and the target subsurface temperature in numerical model. Then, observation data are used to calibrate the GAN-based model parameters to obtain better prediction. We evaluate the proposed framework by predicting daily sea subsurface temperature in the South China sea. Extensive experiments demonstrate the effectiveness of the proposed framework compared to existing state-of-the-art methods.
△ Less
Submitted 4 November, 2021;
originally announced November 2021.
-
Don't speak too fast: The impact of data bias on self-supervised speech models
Authors:
Yen Meng,
Yi-Hui Chou,
Andy T. Liu,
Hung-yi Lee
Abstract:
Self-supervised Speech Models (S3Ms) have been proven successful in many speech downstream tasks, like ASR. However, how pre-training data affects S3Ms' downstream behavior remains an unexplored issue. In this paper, we study how pre-training data affects S3Ms by pre-training models on biased datasets targeting different factors of speech, including gender, content, and prosody, and evaluate these…
▽ More
Self-supervised Speech Models (S3Ms) have been proven successful in many speech downstream tasks, like ASR. However, how pre-training data affects S3Ms' downstream behavior remains an unexplored issue. In this paper, we study how pre-training data affects S3Ms by pre-training models on biased datasets targeting different factors of speech, including gender, content, and prosody, and evaluate these pre-trained S3Ms on selected downstream tasks in SUPERB Benchmark. Our experiments show that S3Ms have tolerance toward gender bias. Moreover, we find that the content of speech has little impact on the performance of S3Ms across downstream tasks, but S3Ms do show a preference toward a slower speech rate.
△ Less
Submitted 26 April, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
Automatic Identification of the End-Diastolic and End-Systolic Cardiac Frames from Invasive Coronary Angiography Videos
Authors:
Yinghui Meng,
Minghao Dong,
Xumin Dai,
Haipeng Tang,
Chen Zhao,
**gfeng Jiang,
Shun Xu,
Ying Zhou,
Fubao Zhu1,
Zhihui Xu,
Weihua Zhou
Abstract:
Automatic identification of proper image frames at the end-diastolic (ED) and end-systolic (ES) frames during the review of invasive coronary angiograms (ICA) is important to assess blood flow during a cardiac cycle, reconstruct the 3D arterial anatomy from bi-planar views, and generate the complementary fusion map with myocardial images. The current identification method primarily relies on visua…
▽ More
Automatic identification of proper image frames at the end-diastolic (ED) and end-systolic (ES) frames during the review of invasive coronary angiograms (ICA) is important to assess blood flow during a cardiac cycle, reconstruct the 3D arterial anatomy from bi-planar views, and generate the complementary fusion map with myocardial images. The current identification method primarily relies on visual interpretation, making it not only time-consuming but also less reproducible. In this paper, we propose a new method to automatically identify angiographic image frames associated with the ED and ES cardiac phases by using the trajectories of key vessel points (i.e. landmarks). More specifically, a detection algorithm is first used to detect the key points of coronary arteries, and then an optical flow method is employed to track the trajectories of the selected key points. The ED and ES frames are identified based on all these trajectories. Our method was tested with 62 ICA videos from two separate medical centers (22 and 9 patients in sites 1 and 2, respectively). Comparing consensus interpretations by two human expert readers, excellent agreement was achieved by the proposed algorithm: the agreement rates within a one-frame range were 92.99% and 92.73% for the automatic identification of the ED and ES image frames, respectively. In conclusion, the proposed automated method showed great potential for being an integral part of automated ICA image analysis.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
Improvement of image classification by multiple optical scattering
Authors:
Xinyu Gao,
Yi Li,
Yanqing Qiu,
Bangning Mao,
Miaogen Chen,
Yanlong Meng,
Chunliu Zhao,
Juan Kang,
Yong Guo,
Changyu Shen
Abstract:
Multiple optical scattering occurs when light propagates in a non-uniform medium. During the multiple scattering, images were distorted and the spatial information they carried became scrambled. However, the image information is not lost but presents in the form of speckle patterns (SPs). In this study, we built up an optical random scattering system based on an LCD and an RGB laser source. We fou…
▽ More
Multiple optical scattering occurs when light propagates in a non-uniform medium. During the multiple scattering, images were distorted and the spatial information they carried became scrambled. However, the image information is not lost but presents in the form of speckle patterns (SPs). In this study, we built up an optical random scattering system based on an LCD and an RGB laser source. We found that the image classification can be improved by the help of random scattering which is considered as a feedforward neural network to extracts features from image. Along with the ridge classification deployed on computer, we achieved excellent classification accuracy higher than 94%, for a variety of data sets covering medical, agricultural, environmental protection and other fields. In addition, the proposed optical scattering system has the advantages of high speed, low power consumption, and miniaturization, which is suitable for deploying in edge computing applications.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.
-
Deep-learning-based Hyperspectral imaging through a RGB camera
Authors:
Xinyu Gao,
Tianlang Wang,
**g Yang,
**chao Tao,
Yanqing Qiu,
Yanlong Meng,
Banging Mao,
Pengwei Zhou,
Yi Li
Abstract:
Hyperspectral image (HSI) contains both spatial pattern and spectral information which has been widely used in food safety, remote sensing, and medical detection. However, the acquisition of hyperspectral images is usually costly due to the complicated apparatus for the acquisition of optical spectrum. Recently, it has been reported that HSI can be reconstructed from single RGB image using convolu…
▽ More
Hyperspectral image (HSI) contains both spatial pattern and spectral information which has been widely used in food safety, remote sensing, and medical detection. However, the acquisition of hyperspectral images is usually costly due to the complicated apparatus for the acquisition of optical spectrum. Recently, it has been reported that HSI can be reconstructed from single RGB image using convolution neural network (CNN) algorithms. Compared with the traditional hyperspectral cameras, the method based on CNN algorithms is simple, portable and low cost. In this study, we focused on the influence of the RGB camera spectral sensitivity (CSS) on the HSI. A Xenon lamp incorporated with a monochromator were used as the standard light source to calibrate the CSS. And the experimental results show that the CSS plays a significant role in the reconstruction accuracy of an HSI. In addition, we proposed a new HSI reconstruction network where the dimensional structure of the original hyperspectral datacube was modified by 3D matrix transpose to improve the reconstruction accuracy.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.
-
Conformal Three-Dimensional Interphase of Li Metal Anode Revealed by Low Dose Cryo-Electron Microscopy
Authors:
Bing Han,
Xiangyan Li,
Shuang Bai,
Yucheng Zou,
Bingyu Lu,
Minghao Zhang,
Xiaomin Ma,
Zhi Chang,
Ying Shirley Meng,
Meng Gu
Abstract:
Using cryogenic transmission electron microscopy, we revealed three dimensional (3D) structural details of the electrochemically plated lithium (Li) flakes and their solid electrolyte interphase (SEI), including the composite SEI skin-layer and SEI fossil pieces buried inside the Li matrix. As the SEI skin-layer is largely comprised of nanocrystalline LiF and Li2O in amorphous polymeric matrix, wh…
▽ More
Using cryogenic transmission electron microscopy, we revealed three dimensional (3D) structural details of the electrochemically plated lithium (Li) flakes and their solid electrolyte interphase (SEI), including the composite SEI skin-layer and SEI fossil pieces buried inside the Li matrix. As the SEI skin-layer is largely comprised of nanocrystalline LiF and Li2O in amorphous polymeric matrix, when complete Li strip** occurs, the compromised SEI three-dimensional framework buckles, forming nanoscale bends and wrinkles. We showed that the flexibility and resilience of the SEI skin-layer plays a vital role in preserving an intact SEI 3D framework after Li strip**. The intact SEI network enables the nucleation and growth of the newly plated Li inside the previously formed SEI network in the subsequent cycles, preventing additional large amount of SEI formation between newly plated Li metal and the electrolyte. In addition, cells cycled under the accurately controlled uniaxial pressure can further enhance the repeated utilization of the SEI framework and improve the coulombic efficiency (CE) by up to 97%, demonstrating an effective strategy of reducing the formation of additional SEI and inactive dead Li. The identification of such flexible and porous 3D SEI framework clarifies the working mechanism of SEI in lithium metal anode for batteries. The insights provided in this work will inspire researchers to design more functional artificial 3D SEI on other metal anodes to improve rechargeable metal battery with long cycle life.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
SynthASR: Unlocking Synthetic Data for Speech Recognition
Authors:
Amin Fazel,
Wei Yang,
Yulan Liu,
Roberto Barra-Chicote,
Yixiong Meng,
Roland Maas,
Jasha Droppo
Abstract:
End-to-end (E2E) automatic speech recognition (ASR) models have recently demonstrated superior performance over the traditional hybrid ASR models. Training an E2E ASR model requires a large amount of data which is not only expensive but may also raise dependency on production data. At the same time, synthetic speech generated by the state-of-the-art text-to-speech (TTS) engines has advanced to nea…
▽ More
End-to-end (E2E) automatic speech recognition (ASR) models have recently demonstrated superior performance over the traditional hybrid ASR models. Training an E2E ASR model requires a large amount of data which is not only expensive but may also raise dependency on production data. At the same time, synthetic speech generated by the state-of-the-art text-to-speech (TTS) engines has advanced to near-human naturalness. In this work, we propose to utilize synthetic speech for ASR training (SynthASR) in applications where data is sparse or hard to get for ASR model training. In addition, we apply continual learning with a novel multi-stage training strategy to address catastrophic forgetting, achieved by a mix of weighted multi-style training, data augmentation, encoder freezing, and parameter regularization. In our experiments conducted on in-house datasets for a new application of recognizing medication names, training ASR RNN-T models with synthetic audio via the proposed multi-stage training improved the recognition performance on new application by more than 65% relative, without degradation on existing general applications. Our observations show that SynthASR holds great promise in training the state-of-the-art large-scale E2E ASR models for new applications while reducing the costs and dependency on production data.
△ Less
Submitted 14 June, 2021;
originally announced June 2021.
-
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling
Authors:
**gbei Li,
Yi Meng,
Chenyi Li,
Zhiyong Wu,
Helen Meng,
Chao Weng,
Dan Su
Abstract:
Comparing with traditional text-to-speech (TTS) systems, conversational TTS systems are required to synthesize speeches with proper speaking style confirming to the conversational context. However, state-of-the-art context modeling methods in conversational TTS only model the textual information in context with a recurrent neural network (RNN). Such methods have limited ability in modeling the int…
▽ More
Comparing with traditional text-to-speech (TTS) systems, conversational TTS systems are required to synthesize speeches with proper speaking style confirming to the conversational context. However, state-of-the-art context modeling methods in conversational TTS only model the textual information in context with a recurrent neural network (RNN). Such methods have limited ability in modeling the inter-speaker influence in conversations, and also neglect the speaking styles and the intra-speaker inertia inside each speaker. Inspired by DialogueGCN and its superiority in modeling such conversational influences than RNN based approaches, we propose a graph-based multi-modal context modeling method and adopt it to conversational TTS to enhance the speaking styles of synthesized speeches. Both the textual and speaking style information in the context are extracted and processed by DialogueGCN to model the inter- and intra-speaker influence in conversations. The outputs of DialogueGCN are then summarized by attention mechanism, and converted to the enhanced speaking style for current utterance. An English conversation corpus is collected and annotated for our research and released to public. Experiment results on this corpus demonstrate the effectiveness of our proposed approach, which outperforms the state-of-the-art context modeling method in conversational TTS in both MOS and ABX preference rate.
△ Less
Submitted 31 March, 2022; v1 submitted 11 June, 2021;
originally announced June 2021.
-
Safety-Critical Control of Stochastic Systems using Stochastic Control Barrier Functions
Authors:
Chuanzheng Wang,
Yiming Meng,
Stephen L. Smith,
Jun Liu
Abstract:
Control barrier functions have been widely used for synthesizing safety-critical controls, often via solving quadratic programs. However, the existence of Gaussian-type noise may lead to unsafe actions and result in severe consequences. In this paper, we study systems modeled by stochastic differential equations (SDEs) driven by Brownian motions. We propose a notion of stochastic control barrier f…
▽ More
Control barrier functions have been widely used for synthesizing safety-critical controls, often via solving quadratic programs. However, the existence of Gaussian-type noise may lead to unsafe actions and result in severe consequences. In this paper, we study systems modeled by stochastic differential equations (SDEs) driven by Brownian motions. We propose a notion of stochastic control barrier functions (SCBFs)and show that SCBFs can significantly reduce the control efforts, especially in the presence of noise, compared to stochastic reciprocal control barrier functions (SRCBFs), and offer a less conservative estimation of safety probability, compared to stochastic zeroing control barrier functions (SZCBFs). Based on this less conservative probabilistic estimation for the proposed notion of SCBFs, we further extend the results to handle high relative degree safety constraints using high-order SCBFs. We demonstrate that the proposed SCBFs achieve good trade-offs of performance and control efforts, both through theoretical analysis and numerical simulations.
△ Less
Submitted 6 April, 2021;
originally announced April 2021.
-
Learning Control Barrier Functions with High Relative Degree for Safety-Critical Control
Authors:
Chuanzheng Wang,
Yinan Li,
Yiming Meng,
Stephen L. Smith,
Jun Liu
Abstract:
Control barrier functions have shown great success in addressing control problems with safety guarantees. These methods usually find the next safe control input by solving an online quadratic programming problem. However, model uncertainty is a big challenge in synthesizing controllers. This may lead to the generation of unsafe control actions, resulting in severe consequences. In this paper, we d…
▽ More
Control barrier functions have shown great success in addressing control problems with safety guarantees. These methods usually find the next safe control input by solving an online quadratic programming problem. However, model uncertainty is a big challenge in synthesizing controllers. This may lead to the generation of unsafe control actions, resulting in severe consequences. In this paper, we develop a learning framework to deal with system uncertainty. Our method mainly focuses on learning the dynamics of the control barrier function, especially for high relative degree with respect to a system. We show that for each order, the time derivative of the control barrier function can be separated into the time derivative of the nominal control barrier function and a remainder. This implies that we can use a neural network to learn the remainder so that we can approximate the dynamics of the real control barrier function. We show by simulation that our method can generate safe trajectories under parametric uncertainty using a differential drive robot model.
△ Less
Submitted 21 November, 2020;
originally announced November 2020.
-
Smooth Converse Lyapunov-Barrier Theorems for Asymptotic Stability with Safety Constraints and Reach-Avoid-Stay Specifications
Authors:
Yiming Meng,
Yinan Li,
Maxwell Fitzsimmons,
Jun Liu
Abstract:
Stability and safety are two important aspects in safety-critical control of dynamical systems. It has been a well established fact in control theory that stability properties can be characterized by Lyapunov functions. Reachability properties can also be naturally captured by Lyapunov functions for finite-time stability. Motivated by safety-critical control applications, such as in autonomous sys…
▽ More
Stability and safety are two important aspects in safety-critical control of dynamical systems. It has been a well established fact in control theory that stability properties can be characterized by Lyapunov functions. Reachability properties can also be naturally captured by Lyapunov functions for finite-time stability. Motivated by safety-critical control applications, such as in autonomous systems and robotics, there has been a recent surge of interests in characterizing safety properties using barrier functions. Lyapunov and barrier functions conditions, however, are sometimes viewed as competing objectives. In this paper, we provide a unified theoretical treatment of Lyapunov and barrier functions in terms of converse theorems for stability properties with safety guarantees and reach-avoid-stay type specifications. We show that if a system (modeled as a dynamical system with measurable perturbations) possesses a stability with safety property, then there exists a smooth Lyapunov function to certify such a property. This Lyapunov function is shown to be defined on the entire set of initial conditions from which solutions satisfy this property. A similar but slightly weaker statement is made for reach-avoid-stay specifications. We show by a simple example that the latter statement cannot be strengthened without additional assumptions. We further extend the results for systems with control inputs and prove existence of converse Lyapunov-barrier functions for reach-and-avoid specifications. While the converse Lyapunov-barrier theorems are not constructive, as with classical converse Lyapunov theorems, we believe that the unified necessary and sufficient conditions with a single Lyapunov-barrier function are of theoretical interest and can hopefully shed some light on computational approaches.
△ Less
Submitted 30 December, 2021; v1 submitted 9 September, 2020;
originally announced September 2020.
-
A Lifting Wing Fixed on Multirotor UAVs for Long Flight Ranges
Authors:
Kun Xiao,
Yao Meng,
Xunhua Dai,
Haotian Zhang,
Quan Quan
Abstract:
This paper presents a lifting-wing multirotor UAV that allows long-range flight. The UAV features a lifting wing in a special mounting angle that works together with rotors to supply lift when it flies forward, achieving a reduction in energy consumption and improvement of flight range compared to traditional multirotor UAVs. Its dynamic model is built according to the classical multirotor theory…
▽ More
This paper presents a lifting-wing multirotor UAV that allows long-range flight. The UAV features a lifting wing in a special mounting angle that works together with rotors to supply lift when it flies forward, achieving a reduction in energy consumption and improvement of flight range compared to traditional multirotor UAVs. Its dynamic model is built according to the classical multirotor theory and the fixed-wing theory, as the aerodynamics of its multiple propellers and that of its lifting wing are almost decoupled. Its design takes into consideration aerodynamics, airframe configuration and the mounting angle. The performance of the UAV is verified by experiments, which show that the lifting wing saves 50.14% of the power when the UAV flies at the cruise speed (15m/s).
△ Less
Submitted 29 June, 2020; v1 submitted 28 June, 2020;
originally announced June 2020.
-
Time Series Classification for Locating Forced Oscillation Sources
Authors:
Yao Meng,
Zhe Yu,
Ning Lu,
Di shi
Abstract:
Forced oscillations are caused by sustained cyclic disturbances. This paper presents a machine learning (ML) based time-series classification method that uses the synchrophasor measurements to locate the sources of forced oscillations for fast disturbance removal. Sequential feature selection is used to identify the most informative measurements of each power plant so that multivariate time series…
▽ More
Forced oscillations are caused by sustained cyclic disturbances. This paper presents a machine learning (ML) based time-series classification method that uses the synchrophasor measurements to locate the sources of forced oscillations for fast disturbance removal. Sequential feature selection is used to identify the most informative measurements of each power plant so that multivariate time series (MTS) can be constructed. By training the Mahalanobis matrix, we measure and compare the distance between the MTSs. Templates for representing each class is constructed to reduce the size of training datasets and improve the online matching efficiency. Dynamic time war** (DTW) algorithm is used to align the out-of-sync MTSs to account for oscillation detection errors. The algorithm is validated on two test systems: the IEEE 39-bus system and the WECC 179-bus system. When a forced oscillation occurs, MTSs will be constructed by designated PMU measurements. Then, the MTSs will be classified by the trained classifiers, the class membership of which corresponds to the location of each oscillation source. Simulation results show that the proposed method can be used online to identify the forced oscillation sources with high accuracy. The robustness of the proposed algorithm in the presence of oscillation detection errors is also quantified.
△ Less
Submitted 15 April, 2020;
originally announced April 2020.
-
FeederGAN: Synthetic Feeder Generation via Deep Graph Adversarial Nets
Authors:
Ming Liang,
Yao Meng,
Jiyu Wang,
David Lubkeman,
Ning Lu
Abstract:
This paper presents a novel, automated, generative adversarial networks (GAN) based synthetic feeder generation mechanism, abbreviated as FeederGAN. FeederGAN digests real feeder models represented by directed graphs via a deep learning framework powered by GAN and graph convolutional networks (GCN). Information of a distribution feeder circuit is extracted from its model input files so that the d…
▽ More
This paper presents a novel, automated, generative adversarial networks (GAN) based synthetic feeder generation mechanism, abbreviated as FeederGAN. FeederGAN digests real feeder models represented by directed graphs via a deep learning framework powered by GAN and graph convolutional networks (GCN). Information of a distribution feeder circuit is extracted from its model input files so that the device connectivity is mapped onto the adjacency matrix and the device characteristics, such as circuit types (i.e., 3-phase, 2-phase, and 1-phase) and component attributes (e.g., length and current ratings), are mapped onto the attribute matrix. Then, Wasserstein distance is used to optimize the GAN and GCN is used to discriminate the generated graphs from the actual ones. A greedy method based on graph theory is developed to reconstruct the feeder using the generated adjacency and attribute matrices. Our results show that the GAN generated feeders resemble the actual feeder in both topology and attributes verified by visual inspection and by empirical statistics obtained from actual distribution feeders.
△ Less
Submitted 16 September, 2020; v1 submitted 3 April, 2020;
originally announced April 2020.
-
MPC-Based Precision Cooling Strategy (PCS) for Efficient Thermal Management of Automotive Air Conditioning System
Authors:
Hao Wang,
Yan Meng,
Quansheng Zhang,
Mohammad Reza Amini,
Ilya V. Kolmanovsky,
**g Sun,
Mark Jennings
Abstract:
In this paper, we propose an MPC-based precision cooling strategy (PCS) for energy efficient thermal management of automotive air conditioning (A/C) system. The proposed PCS is able to provide precise tracking of the time-varying cooling power trajectory, which is assumed to match the passenger comfort requirements. In addition, by leveraging the emerging connected and automated vehicles (CAVs) te…
▽ More
In this paper, we propose an MPC-based precision cooling strategy (PCS) for energy efficient thermal management of automotive air conditioning (A/C) system. The proposed PCS is able to provide precise tracking of the time-varying cooling power trajectory, which is assumed to match the passenger comfort requirements. In addition, by leveraging the emerging connected and automated vehicles (CAVs) technology, vehicle speed preview can be incorporated in our A/C thermal management strategy for further energy efficiency improvement. This proposed A/C thermal management strategy is developed and evaluated based on a physics-based A/C system model (ACSim) from Ford Motor Company for the vehicles with electrified powertrains. In a comparison with Ford benchmark case over SC03 cycle, for tracking the same cooling power trajectory, the proposed PCS provides 4.9% energy saving at the cost of a slight increase in the cabin temperature (less than 1$^oC$). It is also demonstrated that by coordinating with future vehicle speed and shifting the A/C power load, the A/C energy consumption can be further reduced.
△ Less
Submitted 10 June, 2019;
originally announced June 2019.
-
Verifying nonlinear analog and mixed-signal circuits with inputs
Authors:
Chuchu Fan,
Yu Meng,
Jürgen Maier,
Ezio Bartocci,
Sayan Mitra,
Ulrich Schmid
Abstract:
We present a new technique for verifying nonlinear and hybrid models with inputs. We observe that once an input signal is fixed, the sensitivity analysis of the model can be computed much more precisely. Based on this result, we propose a new simulation-driven verification algorithm and apply it to a suite of nonlinear and hybrid models of CMOS digital circuits under different input signals. The m…
▽ More
We present a new technique for verifying nonlinear and hybrid models with inputs. We observe that once an input signal is fixed, the sensitivity analysis of the model can be computed much more precisely. Based on this result, we propose a new simulation-driven verification algorithm and apply it to a suite of nonlinear and hybrid models of CMOS digital circuits under different input signals. The models are low-dimensional but with highly nonlinear ODEs, with nearly hundreds of logarithmic and exponential terms. Some of our experiments analyze the metastability of bistable circuits with very sensitive ODEs and rigorously establish the connection between metastability recovery time and sensitivity.
△ Less
Submitted 8 March, 2018;
originally announced March 2018.
-
Forced Oscillation Source Location via Multivariate Time Series Classification
Authors:
Yao Meng,
Zhe Yu,
Di Shi,
Desong Bian,
Zhiwei Wang
Abstract:
Precisely locating low-frequency oscillation sources is the prerequisite of suppressing sustained oscillation, which is an essential guarantee for the secure and stable operation of power grids. Using synchrophasor measurements, a machine learning method is proposed to locate the source of forced oscillation in power systems. Rotor angle and active power of each power plant are utilized to constru…
▽ More
Precisely locating low-frequency oscillation sources is the prerequisite of suppressing sustained oscillation, which is an essential guarantee for the secure and stable operation of power grids. Using synchrophasor measurements, a machine learning method is proposed to locate the source of forced oscillation in power systems. Rotor angle and active power of each power plant are utilized to construct multivariate time series (MTS). Applying Mahalanobis distance metric and dynamic time war**, the distance between MTS with different phases or lengths can be appropriately measured. The obtained distance metric, representing characteristics during the transient phase of forced oscillation under different disturbance sources, is used for offline classifier training and online matching to locate the disturbance source. Simulation results using the four-machine two-area system and IEEE 39-bus system indicate that the proposed location method can identify the power system forced oscillation source online with high accuracy.
△ Less
Submitted 8 November, 2017;
originally announced November 2017.
-
Coordination Over Multi-Agent Networks With Unmeasurable States and Finite-Level Quantization
Authors:
Yang Meng,
Tao Li,
Ji-Feng Zhang
Abstract:
In this note, the coordination of linear discrete-time multi-agent systems over digital networks is investigated with unmeasurable states in agents' dynamics. The quantized-observer based communication protocols and Certainty Equivalence principle based control protocols are proposed to characterize the inter-agent communication and the cooperative control in an integrative framework. By investiga…
▽ More
In this note, the coordination of linear discrete-time multi-agent systems over digital networks is investigated with unmeasurable states in agents' dynamics. The quantized-observer based communication protocols and Certainty Equivalence principle based control protocols are proposed to characterize the inter-agent communication and the cooperative control in an integrative framework. By investigating the structural and asymptotic properties of the equations of stabilization and estimation errors nonlinearly coupled by the finite-level quantization scheme, some necessary conditions and sufficient conditions are given for the existence of such communication and control protocols to ensure the inter-agent state observation and cooperative stabilization. It is shown that these conditions come down to the simultaneous stabilizability and the detectability of the dynamics of agents and the structure of the communication network.
△ Less
Submitted 29 April, 2016; v1 submitted 13 May, 2015;
originally announced May 2015.