-
SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals
Authors:
Rahul Thapa,
Bryan He,
Magnus Ruud Kjaer,
Hyatt Moore,
Gauri Ganjoo,
Emmanuel Mignot,
James Zou
Abstract:
Sleep is a complex physiological process evaluated through various modalities recording electrical brain, cardiac, and respiratory activities. We curate a large polysomnography dataset from over 14,000 participants comprising over 100,000 hours of multi-modal sleep recordings. Leveraging this extensive dataset, we developed SleepFM, the first multi-modal foundation model for sleep analysis. We sho…
▽ More
Sleep is a complex physiological process evaluated through various modalities recording electrical brain, cardiac, and respiratory activities. We curate a large polysomnography dataset from over 14,000 participants comprising over 100,000 hours of multi-modal sleep recordings. Leveraging this extensive dataset, we developed SleepFM, the first multi-modal foundation model for sleep analysis. We show that a novel leave-one-out approach for contrastive learning significantly improves downstream task performance compared to representations from standard pairwise contrastive learning. A logistic regression model trained on SleepFM's learned embeddings outperforms an end-to-end trained convolutional neural network (CNN) on sleep stage classification (macro AUROC 0.88 vs 0.72 and macro AUPRC 0.72 vs 0.48) and sleep disordered breathing detection (AUROC 0.85 vs 0.69 and AUPRC 0.77 vs 0.61). Notably, the learned embeddings achieve 48% top-1 average accuracy in retrieving the corresponding recording clips of other modalities from 90,000 candidates. This work demonstrates the value of holistic multi-modal sleep modeling to fully capture the richness of sleep recordings. SleepFM is open source and available at https://github.com/rthapa84/sleepfm-codebase.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
A Two-sided Model for EV Market Dynamics and Policy Implications
Authors:
Haoxuan Ma,
Brian Yueshuai He,
Tomas Kaljevic,
Jiaqi Ma
Abstract:
The diffusion of Electric Vehicles (EVs) plays a pivotal role in mitigating greenhouse gas emissions, particularly in the U.S., where ambitious zero-emission and carbon neutrality objectives have been set. In pursuit of these goals, many states have implemented a range of incentive policies aimed at stimulating EV adoption and charging infrastructure development, especially public EV charging stat…
▽ More
The diffusion of Electric Vehicles (EVs) plays a pivotal role in mitigating greenhouse gas emissions, particularly in the U.S., where ambitious zero-emission and carbon neutrality objectives have been set. In pursuit of these goals, many states have implemented a range of incentive policies aimed at stimulating EV adoption and charging infrastructure development, especially public EV charging stations (EVCS). This study examines the indirect network effect observed between EV adoption and EVCS deployment within urban landscapes. We developed a two-sided log-log regression model with historical data on EV purchases and EVCS development to quantify this effect. To test the robustness, we then conducted a case study of the EV market in Los Angeles (LA) County, which suggests that a 1% increase in EVCS correlates with a 0.35% increase in EV sales. Additionally, we forecasted the future EV market dynamics in LA County, revealing a notable disparity between current policies and the targeted 80% EV market share for private cars by 2045. To bridge this gap, we proposed a combined policy recommendation that enhances EV incentives by 60% and EVCS rebates by 66%, facilitating the achievement of future EV market objectives.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Learning the irreversible progression trajectory of Alzheimer's disease
Authors:
Yipei Wang,
Bing He,
Shannon Risacher,
Andrew Saykin,
**gwen Yan,
Xiaoqian Wang
Abstract:
Alzheimer's disease (AD) is a progressive and irreversible brain disorder that unfolds over the course of 30 years. Therefore, it is critical to capture the disease progression in an early stage such that intervention can be applied before the onset of symptoms. Machine learning (ML) models have been shown effective in predicting the onset of AD. Yet for subjects with follow-up visits, existing te…
▽ More
Alzheimer's disease (AD) is a progressive and irreversible brain disorder that unfolds over the course of 30 years. Therefore, it is critical to capture the disease progression in an early stage such that intervention can be applied before the onset of symptoms. Machine learning (ML) models have been shown effective in predicting the onset of AD. Yet for subjects with follow-up visits, existing techniques for AD classification only aim for accurate group assignment, where the monotonically increasing risk across follow-up visits is usually ignored. Resulted fluctuating risk scores across visits violate the irreversibility of AD, hampering the trustworthiness of models and also providing little value to understanding the disease progression. To address this issue, we propose a novel regularization approach to predict AD longitudinally. Our technique aims to maintain the expected monotonicity of increasing disease risk during progression while preserving expressiveness. Specifically, we introduce a monotonicity constraint that encourages the model to predict disease risk in a consistent and ordered manner across follow-up visits. We evaluate our method using the longitudinal structural MRI and amyloid-PET imaging data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Our model outperforms existing techniques in capturing the progressiveness of disease risk, and at the same time preserves prediction accuracy.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds
Authors:
Tianrui Lou,
Xiaojun Jia,
**dong Gu,
Li Liu,
Siyuan Liang,
Bangyan He,
Xiaochun Cao
Abstract:
Adversarial attack methods based on point manipulation for 3D point cloud classification have revealed the fragility of 3D models, yet the adversarial examples they produce are easily perceived or defended against. The trade-off between the imperceptibility and adversarial strength leads most point attack methods to inevitably introduce easily detectable outlier points upon a successful attack. An…
▽ More
Adversarial attack methods based on point manipulation for 3D point cloud classification have revealed the fragility of 3D models, yet the adversarial examples they produce are easily perceived or defended against. The trade-off between the imperceptibility and adversarial strength leads most point attack methods to inevitably introduce easily detectable outlier points upon a successful attack. Another promising strategy, shape-based attack, can effectively eliminate outliers, but existing methods often suffer significant reductions in imperceptibility due to irrational deformations. We find that concealing deformation perturbations in areas insensitive to human eyes can achieve a better trade-off between imperceptibility and adversarial strength, specifically in parts of the object surface that are complex and exhibit drastic curvature changes. Therefore, we propose a novel shape-based adversarial attack method, HiT-ADV, which initially conducts a two-stage search for attack regions based on saliency and imperceptibility scores, and then adds deformation perturbations in each attack region using Gaussian kernel functions. Additionally, HiT-ADV is extendable to physical attack. We propose that by employing benign resampling and benign rigid transformations, we can further enhance physical adversarial strength with little sacrifice to imperceptibility. Extensive experiments have validated the superiority of our method in terms of adversarial and imperceptible properties in both digital and physical spaces. Our code is avaliable at: https://github.com/TRLou/HiT-ADV.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Safety Control of Uncertain MIMO Systems Using Dynamic Output Feedback Barrier Pairs
Authors:
Binghan He,
Takashi Tanaka
Abstract:
Safety control of dynamical systems using barrier functions relies on knowing the full state information. This paper introduces a novel approach for safety control in uncertain MIMO systems with partial state information. The proposed method combines the synthesis of a vector norm barrier function and a dynamic output feedback safety controller to ensure robust safety enforcement. The safety contr…
▽ More
Safety control of dynamical systems using barrier functions relies on knowing the full state information. This paper introduces a novel approach for safety control in uncertain MIMO systems with partial state information. The proposed method combines the synthesis of a vector norm barrier function and a dynamic output feedback safety controller to ensure robust safety enforcement. The safety controller guarantees the invariance of the barrier function under uncertain dynamics and disturbances. To address the challenges associated with safety verification using partial state information, a barrier function estimator is developed. This estimator employs an identifier-based state estimator to obtain a state estimate that is affine in the uncertain model parameters of the system. By incorporating a priori knowledge of the limits of the uncertain model parameters and disturbances, the state estimate provides a robust upper bound for the barrier function. Comparative analysis with existing control barrier function based methods shows the advantage of the proposed approach in enforcing safety constraints under tight input constraints and the utilization of estimated state information.
△ Less
Submitted 15 March, 2024; v1 submitted 1 August, 2023;
originally announced August 2023.
-
Communication and Control in Collaborative UAVs: Recent Advances and Future Trends
Authors:
Shumaila Javaid,
Nasir Saeed,
Zakria Qadir,
Hamza Fahim,
Bin He,
Houbing Song,
Muhammad Bilal
Abstract:
The recent progress in unmanned aerial vehicles (UAV) technology has significantly advanced UAV-based applications for military, civil, and commercial domains. Nevertheless, the challenges of establishing high-speed communication links, flexible control strategies, and develo** efficient collaborative decision-making algorithms for a swarm of UAVs limit their autonomy, robustness, and reliabilit…
▽ More
The recent progress in unmanned aerial vehicles (UAV) technology has significantly advanced UAV-based applications for military, civil, and commercial domains. Nevertheless, the challenges of establishing high-speed communication links, flexible control strategies, and develo** efficient collaborative decision-making algorithms for a swarm of UAVs limit their autonomy, robustness, and reliability. Thus, a growing focus has been witnessed on collaborative communication to allow a swarm of UAVs to coordinate and communicate autonomously for the cooperative completion of tasks in a short time with improved efficiency and reliability. This work presents a comprehensive review of collaborative communication in a multi-UAV system. We thoroughly discuss the characteristics of intelligent UAVs and their communication and control requirements for autonomous collaboration and coordination. Moreover, we review various UAV collaboration tasks, summarize the applications of UAV swarm networks for dense urban environments and present the use case scenarios to highlight the current developments of UAV-based applications in various domains. Finally, we identify several exciting future research direction that needs attention for advancing the research in collaborative UAVs.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
ColoristaNet for Photorealistic Video Style Transfer
Authors:
Xiaowen Qiu,
Ruize Xu,
Boan He,
Yingtao Zhang,
Wenqiang Zhang,
Weifeng Ge
Abstract:
Photorealistic style transfer aims to transfer the artistic style of an image onto an input image or video while kee** photorealism. In this paper, we think it's the summary statistics matching scheme in existing algorithms that leads to unrealistic stylization. To avoid employing the popular Gram loss, we propose a self-supervised style transfer framework, which contains a style removal part an…
▽ More
Photorealistic style transfer aims to transfer the artistic style of an image onto an input image or video while kee** photorealism. In this paper, we think it's the summary statistics matching scheme in existing algorithms that leads to unrealistic stylization. To avoid employing the popular Gram loss, we propose a self-supervised style transfer framework, which contains a style removal part and a style restoration part. The style removal network removes the original image styles, and the style restoration network recovers image styles in a supervised manner. Meanwhile, to address the problems in current feature transformation methods, we propose decoupled instance normalization to decompose feature transformation into style whitening and restylization. It works quite well in ColoristaNet and can transfer image styles efficiently while kee** photorealism. To ensure temporal coherency, we also incorporate optical flow methods and ConvLSTM to embed contextual information. Experiments demonstrates that ColoristaNet can achieve better stylization effects when compared with state-of-the-art algorithms.
△ Less
Submitted 21 December, 2022; v1 submitted 18 December, 2022;
originally announced December 2022.
-
Hybrid stability augmentation control of multi-rotor UAV in confined space based on adaptive backstep** control
Authors:
QuanXi Zhan,
JunRui Zhang,
ChenYang Sun,
RunJie Shen,
Bin He
Abstract:
This paper applies the UAV to the inspection of water diversion pipelines in hydropower stations. The diversion pipeline is an enclosed space, so the airflow disturbance caused by the rotation of the UAV blades and the strong air convection from the chimney effect have a great impact on the flight control of the UAV. Although the traditional linear control PID flight control algorithm has been wid…
▽ More
This paper applies the UAV to the inspection of water diversion pipelines in hydropower stations. The diversion pipeline is an enclosed space, so the airflow disturbance caused by the rotation of the UAV blades and the strong air convection from the chimney effect have a great impact on the flight control of the UAV. Although the traditional linear control PID flight control algorithm has been widely used and can meet the requirements of general flight tasks, it cannot guarantee the stability of the system over a wide range. The inspection of a diversion line in an enclosed space requires high system stability and robustness of the UAV controller. In this paper, a hybrid stabilised adaptive backstep** control method is proposed. Firstly, a multi-rotor UAV model is analysed and transformed into a strict feedback form with external disturbances; then adaptive techniques are used to estimate the airflow disturbances caused by the blades, and the attitude and position tracking controllers are designed by combining backstep** control and PID control respectively; finally, the asymptotic stability of the system is ensured by constructing a Lyapunov function. The experimental data show that the flight controller designed in this paper has good robustness and tracking performance, and can better resist the disturbance caused by airflow disturbance in confined space.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
Barrier Pairs for Safety Control of Uncertain Output Feedback Systems
Authors:
Binghan He,
Takashi Tanaka
Abstract:
The barrier function method for safety control typically assumes the availability of full state information. Unfortunately, in many scenarios involving uncertain dynamical systems, full state information is often unavailable. In this paper, we aim to solve the safety control problem for an uncertain single-input single-output system with partial state information. First, we develop a synthesis met…
▽ More
The barrier function method for safety control typically assumes the availability of full state information. Unfortunately, in many scenarios involving uncertain dynamical systems, full state information is often unavailable. In this paper, we aim to solve the safety control problem for an uncertain single-input single-output system with partial state information. First, we develop a synthesis method that simultaneously creates a barrier function and a dynamic output feedback safety controller. This safety controller guarantees that the unit sub-level set of the barrier function is an invariant set under the uncertain dynamics and disturbances of the system. Then, we build an identifier-based estimator that provides a state estimate affine to the uncertain model parameters of the system. To detect the potential risks of the system, a fault detector uses the state estimate to find an upper bound for the barrier function. The fault detector triggers the safety controller when the system's original action leads to a potential safety issue and resumes the original action when the potential safety issue is resolved by the safety controller.
△ Less
Submitted 1 August, 2023; v1 submitted 10 September, 2022;
originally announced September 2022.
-
Rethinking: Deep-learning-based Demodulation and Decoding
Authors:
Boxiang He,
Zitao Wu,
Fanggang Wang
Abstract:
In this paper, we focus on the demodulation/decoding of the complex modulations/codes that approach the Shannon capacity. Theoretically, the maximum likelihood (ML) algorithm can achieve the optimal error performance whereas it has $\mathcal{O}(2^k)$ demodulation/decoding complexity with $k$ denoting the number of information bits. Recent progress in deep learning provides a new direction to tackl…
▽ More
In this paper, we focus on the demodulation/decoding of the complex modulations/codes that approach the Shannon capacity. Theoretically, the maximum likelihood (ML) algorithm can achieve the optimal error performance whereas it has $\mathcal{O}(2^k)$ demodulation/decoding complexity with $k$ denoting the number of information bits. Recent progress in deep learning provides a new direction to tackle the demodulation and the decoding. The purpose of this paper is to analyze the feasibility of the neural network to demodulate/decode the complex modulations/codes close to the Shannon capacity and characterize the error performance and the complexity of the neural network. Regarding the neural network demodulator, we use the golden angle modulation (GAM), a promising modulation format that can offer the Shannon capacity approaching performance, to evaluate the demodulator. It is observed that the neural network demodulator can get a close performance to the ML-based method while it suffers from the lower complexity order in the low-order GAM. Regarding the neural network decoder, we use the Gaussian codebook, achieving the Shannon capacity, to evaluate the decoder. We also observe that the neural network decoder achieves the error performance close to the ML decoder with a much lower complexity order in the small Gaussian codebook. Limited by the current training resources, we cannot evaluate the performance of the high-order modulation and the long codeword. But, based on the results of the low-order GAM and the small Gaussian codebook, we boldly give our conjecture: the neural network demodulator/decoder is a strong candidate approach for demodulating/decoding the complex modulations/codes close to the Shannon capacity owing to the error performance of the near-ML algorithm and the lower complexity.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Electrocardiographic Deep Learning for Predicting Post-Procedural Mortality
Authors:
David Ouyang,
John Theurer,
Nathan R. Stein,
J. Weston Hughes,
Pierre Elias,
Bryan He,
Neal Yuan,
Grant Duffy,
Roopinder K. Sandhu,
Joseph Ebinger,
Patrick Botting,
Melvin Jujjavarapu,
Brian Claggett,
James E. Tooley,
Tim Poterucha,
Jonathan H. Chen,
Michael Nurok,
Marco Perez,
Adler Perotte,
James Y. Zou,
Nancy R. Cook,
Sumeet S. Chugh,
Susan Cheng,
Christine M. Albert
Abstract:
Background. Pre-operative risk assessments used in clinical practice are limited in their ability to identify risk for post-operative mortality. We hypothesize that electrocardiograms contain hidden risk markers that can help prognosticate post-operative mortality. Methods. In a derivation cohort of 45,969 pre-operative patients (age 59+- 19 years, 55 percent women), a deep learning algorithm was…
▽ More
Background. Pre-operative risk assessments used in clinical practice are limited in their ability to identify risk for post-operative mortality. We hypothesize that electrocardiograms contain hidden risk markers that can help prognosticate post-operative mortality. Methods. In a derivation cohort of 45,969 pre-operative patients (age 59+- 19 years, 55 percent women), a deep learning algorithm was developed to leverage waveform signals from pre-operative ECGs to discriminate post-operative mortality. Model performance was assessed in a holdout internal test dataset and in two external hospital cohorts and compared with the Revised Cardiac Risk Index (RCRI) score. Results. In the derivation cohort, there were 1,452 deaths. The algorithm discriminates mortality with an AUC of 0.83 (95% CI 0.79-0.87) surpassing the discrimination of the RCRI score with an AUC of 0.67 (CI 0.61-0.72) in the held out test cohort. Patients determined to be high risk by the deep learning model's risk prediction had an unadjusted odds ratio (OR) of 8.83 (5.57-13.20) for post-operative mortality as compared to an unadjusted OR of 2.08 (CI 0.77-3.50) for post-operative mortality for RCRI greater than 2. The deep learning algorithm performed similarly for patients undergoing cardiac surgery with an AUC of 0.85 (CI 0.77-0.92), non-cardiac surgery with an AUC of 0.83 (0.79-0.88), and catherization or endoscopy suite procedures with an AUC of 0.76 (0.72-0.81). The algorithm similarly discriminated risk for mortality in two separate external validation cohorts from independent healthcare systems with AUCs of 0.79 (0.75-0.83) and 0.75 (0.74-0.76) respectively. Conclusion. The findings demonstrate how a novel deep learning algorithm, applied to pre-operative ECGs, can improve discrimination of post-operative mortality.
△ Less
Submitted 30 April, 2022;
originally announced May 2022.
-
UncertaINR: Uncertainty Quantification of End-to-End Implicit Neural Representations for Computed Tomography
Authors:
Francisca Vasconcelos,
Bobby He,
Nalini Singh,
Yee Whye Teh
Abstract:
Implicit neural representations (INRs) have achieved impressive results for scene reconstruction and computer graphics, where their performance has primarily been assessed on reconstruction accuracy. As INRs make their way into other domains, where model predictions inform high-stakes decision-making, uncertainty quantification of INR inference is becoming critical. To that end, we study a Bayesia…
▽ More
Implicit neural representations (INRs) have achieved impressive results for scene reconstruction and computer graphics, where their performance has primarily been assessed on reconstruction accuracy. As INRs make their way into other domains, where model predictions inform high-stakes decision-making, uncertainty quantification of INR inference is becoming critical. To that end, we study a Bayesian reformulation of INRs, UncertaINR, in the context of computed tomography, and evaluate several Bayesian deep learning implementations in terms of accuracy and calibration. We find that they achieve well-calibrated uncertainty, while retaining accuracy competitive with other classical, INR-based, and CNN-based reconstruction techniques. Contrary to common intuition in the Bayesian deep learning literature, we find that INRs obtain the best calibration with computationally efficient Monte Carlo dropout, outperforming Hamiltonian Monte Carlo and deep ensembles. Moreover, in contrast to the best-performing prior approaches, UncertaINR does not require a large training dataset, but only a handful of validation images.
△ Less
Submitted 2 May, 2023; v1 submitted 22 February, 2022;
originally announced February 2022.
-
NeRV: Neural Representations for Videos
Authors:
Hao Chen,
Bo He,
Hanyu Wang,
Yixuan Ren,
Ser-Nam Lim,
Abhinav Shrivastava
Abstract:
We propose a novel neural representation for videos (NeRV) which encodes videos in neural networks. Unlike conventional representations that treat videos as frame sequences, we represent videos as neural networks taking frame index as input. Given a frame index, NeRV outputs the corresponding RGB image. Video encoding in NeRV is simply fitting a neural network to video frames and decoding process…
▽ More
We propose a novel neural representation for videos (NeRV) which encodes videos in neural networks. Unlike conventional representations that treat videos as frame sequences, we represent videos as neural networks taking frame index as input. Given a frame index, NeRV outputs the corresponding RGB image. Video encoding in NeRV is simply fitting a neural network to video frames and decoding process is a simple feedforward operation. As an image-wise implicit representation, NeRV output the whole image and shows great efficiency compared to pixel-wise implicit representation, improving the encoding speed by 25x to 70x, the decoding speed by 38x to 132x, while achieving better video quality. With such a representation, we can treat videos as neural networks, simplifying several video-related tasks. For example, conventional video compression methods are restricted by a long and complex pipeline, specifically designed for the task. In contrast, with NeRV, we can use any neural network compression method as a proxy for video compression, and achieve comparable performance to traditional frame-based video compression approaches (H.264, HEVC \etc). Besides compression, we demonstrate the generalization of NeRV for video denoising. The source code and pre-trained model can be found at https://github.com/haochen-rye/NeRV.git.
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
The Performance Evaluation of Attention-Based Neural ASR under Mixed Speech Input
Authors:
Bradley He,
Martin Radfar
Abstract:
In order to evaluate the performance of the attention based neural ASR under noisy conditions, the current trend is to present hours of various noisy speech data to the model and measure the overall word/phoneme error rate (W/PER). In general, it is unclear how these models perform when exposed to a cocktail party setup in which two or more speakers are active. In this paper, we present the mixtur…
▽ More
In order to evaluate the performance of the attention based neural ASR under noisy conditions, the current trend is to present hours of various noisy speech data to the model and measure the overall word/phoneme error rate (W/PER). In general, it is unclear how these models perform when exposed to a cocktail party setup in which two or more speakers are active. In this paper, we present the mixtures of speech signals to a popular attention-based neural ASR, known as Listen, Attend, and Spell (LAS), at different target-to-interference ratio (TIR) and measure the phoneme error rate. In particular, we investigate in details when two phonemes are mixed what will be the predicted phoneme; in this fashion we build a model in which the most probable predictions for a phoneme are given. We found a 65% relative increase in PER when LAS was presented with mixed speech signals at TIR = 0 dB and the performance approaches the unmixed scenario at TIR = 30 dB. Our results show the model, when presented with mixed phonemes signals, tend to predict those that have higher accuracies during evaluation of original phoneme signals.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
High-Throughput Precision Phenoty** of Left Ventricular Hypertrophy with Cardiovascular Deep Learning
Authors:
Grant Duffy,
Paul P Cheng,
Neal Yuan,
Bryan He,
Alan C. Kwan,
Matthew J. Shun-Shin,
Kevin M. Alexander,
Joseph Ebinger,
Matthew P. Lungren,
Florian Rader,
David H. Liang,
Ingela Schnittger,
Euan A. Ashley,
James Y. Zou,
Jignesh Patel,
Ronald Witteles,
Susan Cheng,
David Ouyang
Abstract:
Left ventricular hypertrophy (LVH) results from chronic remodeling caused by a broad range of systemic and cardiovascular disease including hypertension, aortic stenosis, hypertrophic cardiomyopathy, and cardiac amyloidosis. Early detection and characterization of LVH can significantly impact patient care but is limited by under-recognition of hypertrophy, measurement error and variability, and di…
▽ More
Left ventricular hypertrophy (LVH) results from chronic remodeling caused by a broad range of systemic and cardiovascular disease including hypertension, aortic stenosis, hypertrophic cardiomyopathy, and cardiac amyloidosis. Early detection and characterization of LVH can significantly impact patient care but is limited by under-recognition of hypertrophy, measurement error and variability, and difficulty differentiating etiologies of LVH. To overcome this challenge, we present EchoNet-LVH - a deep learning workflow that automatically quantifies ventricular hypertrophy with precision equal to human experts and predicts etiology of LVH. Trained on 28,201 echocardiogram videos, our model accurately measures intraventricular wall thickness (mean absolute error [MAE] 1.4mm, 95% CI 1.2-1.5mm), left ventricular diameter (MAE 2.4mm, 95% CI 2.2-2.6mm), and posterior wall thickness (MAE 1.2mm, 95% CI 1.1-1.3mm) and classifies cardiac amyloidosis (area under the curve of 0.83) and hypertrophic cardiomyopathy (AUC 0.98) from other etiologies of LVH. In external datasets from independent domestic and international healthcare systems, EchoNet-LVH accurately quantified ventricular parameters (R2 of 0.96 and 0.90 respectively) and detected cardiac amyloidosis (AUC 0.79) and hypertrophic cardiomyopathy (AUC 0.89) on the domestic external validation site. Leveraging measurements across multiple heart beats, our model can more accurately identify subtle changes in LV geometry and its causal etiologies. Compared to human experts, EchoNet-LVH is fully automated, allowing for reproducible, precise measurements, and lays the foundation for precision diagnosis of cardiac hypertrophy. As a resource to promote further innovation, we also make publicly available a large dataset of 23,212 annotated echocardiogram videos.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.
-
The Medical Segmentation Decathlon
Authors:
Michela Antonelli,
Annika Reinke,
Spyridon Bakas,
Keyvan Farahani,
AnnetteKopp-Schneider,
Bennett A. Landman,
Geert Litjens,
Bjoern Menze,
Olaf Ronneberger,
Ronald M. Summers,
Bram van Ginneken,
Michel Bilello,
Patrick Bilic,
Patrick F. Christ,
Richard K. G. Do,
Marc J. Gollub,
Stephan H. Heckers,
Henkjan Huisman,
William R. Jarnagin,
Maureen K. McHugo,
Sandy Napel,
Jennifer S. Goli Pernicka,
Kawal Rhode,
Catalina Tobon-Gomez,
Eugene Vorontsov
, et al. (34 additional authors not shown)
Abstract:
International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical pro…
▽ More
International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical problem. We hypothesized that a method capable of performing well on multiple tasks will generalize well to a previously unseen task and potentially outperform a custom-designed solution. To investigate the hypothesis, we organized the Medical Segmentation Decathlon (MSD) - a biomedical image analysis challenge, in which algorithms compete in a multitude of both tasks and modalities. The underlying data set was designed to explore the axis of difficulties typically encountered when dealing with medical images, such as small data sets, unbalanced labels, multi-site data and small objects. The MSD challenge confirmed that algorithms with a consistent good performance on a set of tasks preserved their good average performance on a different set of previously unseen tasks. Moreover, by monitoring the MSD winner for two years, we found that this algorithm continued generalizing well to a wide range of other clinical problems, further confirming our hypothesis. Three main conclusions can be drawn from this study: (1) state-of-the-art image segmentation algorithms are mature, accurate, and generalize well when retrained on unseen tasks; (2) consistent algorithmic performance across multiple tasks is a strong surrogate of algorithmic generalizability; (3) the training of accurate AI segmentation models is now commoditized to non AI experts.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
TSTNN: Two-stage Transformer based Neural Network for Speech Enhancement in the Time Domain
Authors:
Kai Wang,
Bengbeng He,
Wei-** Zhu
Abstract:
In this paper, we propose a transformer-based architecture, called two-stage transformer neural network (TSTNN) for end-to-end speech denoising in the time domain. The proposed model is composed of an encoder, a two-stage transformer module (TSTM), a masking module and a decoder. The encoder maps input noisy speech into feature representation. The TSTM exploits four stacked two-stage transformer b…
▽ More
In this paper, we propose a transformer-based architecture, called two-stage transformer neural network (TSTNN) for end-to-end speech denoising in the time domain. The proposed model is composed of an encoder, a two-stage transformer module (TSTM), a masking module and a decoder. The encoder maps input noisy speech into feature representation. The TSTM exploits four stacked two-stage transformer blocks to efficiently extract local and global information from the encoder output stage by stage. The masking module creates a mask which will be multiplied with the encoder output. Finally, the decoder uses the masked encoder feature to reconstruct the enhanced speech. Experimental results on the benchmark dataset show that the TSTNN outperforms most state-of-the-art models in time or frequency domain while having significantly lower model complexity.
△ Less
Submitted 17 March, 2021;
originally announced March 2021.
-
Coarse-to-fine Airway Segmentation Using Multi information Fusion Network and CNN-based Region Growing
Authors:
**quan Guo,
Rongda Fu,
Lin Pan,
Shaohua Zheng,
Liqin Huang,
Bin Zheng,
Bingwei He
Abstract:
Automatic airway segmentation from chest computed tomography (CT) scans plays an important role in pulmonary disease diagnosis and computer-assisted therapy. However, low contrast at peripheral branches and complex tree-like structures remain as two mainly challenges for airway segmentation. Recent research has illustrated that deep learning methods perform well in segmentation tasks. Motivated by…
▽ More
Automatic airway segmentation from chest computed tomography (CT) scans plays an important role in pulmonary disease diagnosis and computer-assisted therapy. However, low contrast at peripheral branches and complex tree-like structures remain as two mainly challenges for airway segmentation. Recent research has illustrated that deep learning methods perform well in segmentation tasks. Motivated by these works, a coarse-to-fine segmentation framework is proposed to obtain a complete airway tree. Our framework segments the overall airway and small branches via the multi-information fusion convolution neural network (Mif-CNN) and the CNN-based region growing, respectively. In Mif-CNN, atrous spatial pyramid pooling (ASPP) is integrated into a u-shaped network, and it can expend the receptive field and capture multi-scale information. Meanwhile, boundary and location information are incorporated into semantic information. These information are fused to help Mif-CNN utilize additional context knowledge and useful features. To improve the performance of the segmentation result, the CNN-based region growing method is designed to focus on obtaining small branches. A voxel classification network (VCN), which can entirely capture the rich information around each voxel, is applied to classify the voxels into airway and non-airway. In addition, a shape reconstruction method is used to refine the airway tree.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
TransMask: A Compact and Fast Speech Separation Model Based on Transformer
Authors:
Zining Zhang,
Bingsheng He,
Zhenjie Zhang
Abstract:
Speech separation is an important problem in speech processing, which targets to separate and generate clean speech from a mixed audio containing speech from different speakers. Empowered by the deep learning technologies over sequence-to-sequence domain, recent neural speech separation models are now capable of generating highly clean speech audios. To make these models more practical by reducing…
▽ More
Speech separation is an important problem in speech processing, which targets to separate and generate clean speech from a mixed audio containing speech from different speakers. Empowered by the deep learning technologies over sequence-to-sequence domain, recent neural speech separation models are now capable of generating highly clean speech audios. To make these models more practical by reducing the model size and inference time while maintaining high separation quality, we propose a new transformer-based speech separation approach, called TransMask. By fully un-leashing the power of self-attention on long-term dependency exception, we demonstrate the size of TransMask is more than 60% smaller and the inference is more than 2 times faster than state-of-the-art solutions. TransMask fully utilizes the parallelism during inference, and achieves nearly linear inference time within reasonable input audio lengths. It also outperforms existing solutions on output speech audio quality, achieving SDR above 16 over Librimix benchmark.
△ Less
Submitted 19 February, 2021;
originally announced February 2021.
-
GAZEV: GAN-Based Zero-Shot Voice Conversion over Non-parallel Speech Corpus
Authors:
Zining Zhang,
Bingsheng He,
Zhenjie Zhang
Abstract:
Non-parallel many-to-many voice conversion is recently attract-ing huge research efforts in the speech processing community. A voice conversion system transforms an utterance of a source speaker to another utterance of a target speaker by kee** the content in the original utterance and replacing by the vocal features from the target speaker. Existing solutions, e.g., StarGAN-VC2, present promisi…
▽ More
Non-parallel many-to-many voice conversion is recently attract-ing huge research efforts in the speech processing community. A voice conversion system transforms an utterance of a source speaker to another utterance of a target speaker by kee** the content in the original utterance and replacing by the vocal features from the target speaker. Existing solutions, e.g., StarGAN-VC2, present promising results, only when speech corpus of the engaged speakers is available during model training. AUTOVCis able to perform voice conversion on unseen speakers, but it needs an external pretrained speaker verification model. In this paper, we present our new GAN-based zero-shot voice conversion solution, called GAZEV, which targets to support unseen speakers on both source and target utterances. Our key technical contribution is the adoption of speaker embedding loss on top of the GAN framework, as well as adaptive instance normalization strategy, in order to address the limitations of speaker identity transfer in existing solutions. Our empirical evaluations demonstrate significant performance improvement on output speech quality and comparable speaker similarity to AUTOVC.
△ Less
Submitted 24 October, 2020;
originally announced October 2020.
-
X-TaSNet: Robust and Accurate Time-Domain Speaker Extraction Network
Authors:
Zining Zhang,
Bingsheng He,
Zhenjie Zhang
Abstract:
Extracting the speech of a target speaker from mixed audios, based on a reference speech from the target speaker, is a challenging yet powerful technology in speech processing. Recent studies of speaker-independent speech separation, such as TasNet, have shown promising results by applying deep neural networks over the time-domain waveform. Such separation neural network does not directly generate…
▽ More
Extracting the speech of a target speaker from mixed audios, based on a reference speech from the target speaker, is a challenging yet powerful technology in speech processing. Recent studies of speaker-independent speech separation, such as TasNet, have shown promising results by applying deep neural networks over the time-domain waveform. Such separation neural network does not directly generate reliable and accurate output when target speakers are specified, because of the necessary prior on the number of speakers and the lack of robustness when dealing with audios with absent speakers. In this paper, we break these limitations by introducing a new speaker-aware speech masking method, called X-TaSNet. Our proposal adopts new strategies, including a distortion-based loss and corresponding alternating training scheme, to better address the robustness issue. X-TaSNet significantly enhances the extracted speech quality, doubling SDRi and SI-SNRi of the output speech audio over state-of-the-art voice filtering approach. X-TaSNet also improves the reliability of the results by improving the accuracy of speaker identity in the output audio to 95.4%, such that it returns silent audios in most cases when the target speaker is absent. These results demonstrate X-TaSNet moves one solid step towards more practical applications of speaker extraction technology.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
A Complex Stiffness Human Impedance Model with Customizable Exoskeleton Control
Authors:
Binghan He,
Huang Huang,
Gray C. Thomas,
Luis Sentis
Abstract:
The natural impedance, or dynamic relationship between force and motion, of a human operator can determine the stability of exoskeletons that use interaction-torque feedback to amplify human strength. While human impedance is typically modelled as a linear system, our experiments on a single-joint exoskeleton testbed involving 10 human subjects show evidence of nonlinear behavior: a low-frequency…
▽ More
The natural impedance, or dynamic relationship between force and motion, of a human operator can determine the stability of exoskeletons that use interaction-torque feedback to amplify human strength. While human impedance is typically modelled as a linear system, our experiments on a single-joint exoskeleton testbed involving 10 human subjects show evidence of nonlinear behavior: a low-frequency asymptotic phase for the dynamic stiffness of the human that is different than the expected zero, and an unexpectedly consistent dam** ratio as the stiffness and inertia vary. To explain these observations, this paper considers a new frequency-domain model of the human joint dynamics featuring complex value stiffness comprising a real stiffness term and a hysteretic dam** term. Using a statistical F-test we show that the hysteretic dam** term is not only significant but is even more significant than the linear dam** term. Further analysis reveals a linear trend linking hysteretic dam** and the real part of the stiffness, which allows us to simplify the complex stiffness model down to a 1-parameter system. Then, we introduce and demonstrate a customizable fractional-order controller that exploits this hysteretic dam** behavior to improve strength amplification bandwidth while maintaining stability, and explore a tuning approach which ensures that this stability property is robust to muscle co-contraction for each individual.
△ Less
Submitted 25 September, 2020;
originally announced September 2020.
-
Adaptive Compliance Sha** with Human Impedance Estimation
Authors:
Huang Huang,
Henry F. Cappel,
Gray C. Thomas,
Binghan He,
Luis Sentis
Abstract:
Human impedance parameters play an integral role in the dynamics of strength amplification exoskeletons. Many methods are used to estimate the stiffness of human muscles, but few are used to improve the performance of strength amplification controllers for these devices. We propose a compliance sha** amplification controller incorporating an accurate online human stiffness estimation from surfac…
▽ More
Human impedance parameters play an integral role in the dynamics of strength amplification exoskeletons. Many methods are used to estimate the stiffness of human muscles, but few are used to improve the performance of strength amplification controllers for these devices. We propose a compliance sha** amplification controller incorporating an accurate online human stiffness estimation from surface electromyography (sEMG) sensors and stretch sensors connected to the forearm and upper arm of the human. These sensor values along with exoskeleton position and velocity are used to train a random forest regression model that accurately predicts a person's stiffness despite varying movement, relaxation, and muscle co-contraction. Our model's accuracy is verified using experimental test data and the model is implemented into the compliance sha** controller. Ultimately we show that the online estimation of stiffness can improve the bandwidth and amplification of the controller while remaining robustly stable.
△ Less
Submitted 2 August, 2020; v1 submitted 28 October, 2019;
originally announced October 2019.
-
Robust Estimator-Based Safety Verification: A Vector Norm Approach
Authors:
Binghan He,
Gray C. Thomas,
Luis Sentis
Abstract:
In this paper, we consider the problem of verifying safety constraint satisfaction for single-input single-output systems with uncertain transfer function coefficients. We propose a new type of barrier function based on a vector norm. This type of barrier function has a measurable upper bound without full state availability. An identifier-based estimator allows an exact bound for the uncertainty-b…
▽ More
In this paper, we consider the problem of verifying safety constraint satisfaction for single-input single-output systems with uncertain transfer function coefficients. We propose a new type of barrier function based on a vector norm. This type of barrier function has a measurable upper bound without full state availability. An identifier-based estimator allows an exact bound for the uncertainty-based component of the barrier function estimate. Assuming that the system is safe initially allows an exponentially decreasing bound on the error due to the estimator transient. Barrier function and estimator synthesis is proposed as two convex sub-problems, exploiting linear matrix inequalities. The barrier function controller combination is then used to construct a safety backup controller. And we demonstrate the system in a simulation of a 1 degree-of-freedom human-exoskeleton interaction.
△ Less
Submitted 1 August, 2020; v1 submitted 5 October, 2019;
originally announced October 2019.
-
Collaborative Computation Offloading in Wireless Powered Mobile-Edge Computing Systems
Authors:
Binqi He,
Suzhi Bi,
Hong Xing,
Xiaohui Lin
Abstract:
This paper studies a novel user cooperation model in a wireless powered mobile edge computing system where two wireless users harvest wireless power transferred by one energy node and can offload part of their computation tasks to an edge server (ES) for remote execution. In particular, we consider that the direct communication link between one user to the ES is blocked, such that the other user a…
▽ More
This paper studies a novel user cooperation model in a wireless powered mobile edge computing system where two wireless users harvest wireless power transferred by one energy node and can offload part of their computation tasks to an edge server (ES) for remote execution. In particular, we consider that the direct communication link between one user to the ES is blocked, such that the other user acts as a relay to forward its offloading data to the server. Meanwhile, instead of forwarding all the received task data, we also allow the hel** user to compute part of the received task locally to reduce the potentially high energy and time cost on task offloading to the ES. Our aim is to maximize the amount of data that can be processed within a given time frame of the two users by jointly optimizing the amount of task data computed at each device (users and ES), the system time allocation, the transmit power and CPU frequency of the users. We propose an efficient method to find the optimal solution and show that the proposed user cooperation can effectively enhance the computation performance of the system compared to other representative benchmark methods under different scenarios.
△ Less
Submitted 4 September, 2019; v1 submitted 25 August, 2019;
originally announced August 2019.
-
Modeling and Loop Sha** of Single-Joint Amplification Exoskeleton with Contact Sensing and Series Elastic Actuation
Authors:
Binghan He,
Gray C. Thomas,
Nicholas Paine,
Luis Sentis
Abstract:
In this paper we consider a class of exoskeletons designed to amplify the strength of humans through feedback of sensed human-robot interactions and actuator forces. We define an amplification error signal based on a reference amplification rate, and design a linear feedback compensator to attenuate this error. Since the human operator is an integral part of the system, we design the compensator t…
▽ More
In this paper we consider a class of exoskeletons designed to amplify the strength of humans through feedback of sensed human-robot interactions and actuator forces. We define an amplification error signal based on a reference amplification rate, and design a linear feedback compensator to attenuate this error. Since the human operator is an integral part of the system, we design the compensator to be robust to both a realistic variation in human impedance and a large variation in load impedance. We demonstrate our strategy on a one-degree of freedom amplification exoskeleton connected to a human arm, following a three dimensional matrix of experimentation: slow or fast human motion; light or extreme exoskeleton load; and soft or clenched human arm impedances. We demonstrate that a slightly aggressive controller results in a borderline stable system---but only for soft human musculoeskeletal behavior and a heavy load. This class of exoskeleton systems is interesting because it can both amplify a human's interaction forces --- so long as the human contacts the environment through the exoskeleton --- and attenuate the operator's perception of the exoskeleton's reflected dynamics at frequencies within the bandwidth of the control.
△ Less
Submitted 19 November, 2019; v1 submitted 27 September, 2018;
originally announced September 2018.
-
Safety Control Synthesis with Input Limits: a Hybrid Approach
Authors:
Gray C. Thomas,
Binghan He,
Luis Sentis
Abstract:
We introduce a hybrid (discrete--continuous) safety controller which enforces strict state and input constraints on a system---but only acts when necessary, preserving transparent operation of the original system within some safe region of the state space. We define this space using a Min-Quadratic Barrier function, which we construct along the equilibrium manifold using the Lyapunov functions whi…
▽ More
We introduce a hybrid (discrete--continuous) safety controller which enforces strict state and input constraints on a system---but only acts when necessary, preserving transparent operation of the original system within some safe region of the state space. We define this space using a Min-Quadratic Barrier function, which we construct along the equilibrium manifold using the Lyapunov functions which result from linear matrix inequality controller synthesis for locally valid uncertain linearizations. We also introduce the concept of a barrier pair, which makes it easy to extend the approach to include trajectory-based augmentations to the safe region, in the style of LQR-Trees. We demonstrate our controller and barrier pair synthesis method in simulation-based examples.
△ Less
Submitted 20 November, 2019; v1 submitted 27 February, 2018;
originally announced February 2018.