Search | arXiv e-print repository

SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals

Authors: Rahul Thapa, Bryan He, Magnus Ruud Kjaer, Hyatt Moore, Gauri Ganjoo, Emmanuel Mignot, James Zou

Abstract: Sleep is a complex physiological process evaluated through various modalities recording electrical brain, cardiac, and respiratory activities. We curate a large polysomnography dataset from over 14,000 participants comprising over 100,000 hours of multi-modal sleep recordings. Leveraging this extensive dataset, we developed SleepFM, the first multi-modal foundation model for sleep analysis. We sho… ▽ More Sleep is a complex physiological process evaluated through various modalities recording electrical brain, cardiac, and respiratory activities. We curate a large polysomnography dataset from over 14,000 participants comprising over 100,000 hours of multi-modal sleep recordings. Leveraging this extensive dataset, we developed SleepFM, the first multi-modal foundation model for sleep analysis. We show that a novel leave-one-out approach for contrastive learning significantly improves downstream task performance compared to representations from standard pairwise contrastive learning. A logistic regression model trained on SleepFM's learned embeddings outperforms an end-to-end trained convolutional neural network (CNN) on sleep stage classification (macro AUROC 0.88 vs 0.72 and macro AUPRC 0.72 vs 0.48) and sleep disordered breathing detection (AUROC 0.85 vs 0.69 and AUPRC 0.77 vs 0.61). Notably, the learned embeddings achieve 48% top-1 average accuracy in retrieving the corresponding recording clips of other modalities from 90,000 candidates. This work demonstrates the value of holistic multi-modal sleep modeling to fully capture the richness of sleep recordings. SleepFM is open source and available at https://github.com/rthapa84/sleepfm-codebase. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17702 [pdf]

A Two-sided Model for EV Market Dynamics and Policy Implications

Authors: Haoxuan Ma, Brian Yueshuai He, Tomas Kaljevic, Jiaqi Ma

Abstract: The diffusion of Electric Vehicles (EVs) plays a pivotal role in mitigating greenhouse gas emissions, particularly in the U.S., where ambitious zero-emission and carbon neutrality objectives have been set. In pursuit of these goals, many states have implemented a range of incentive policies aimed at stimulating EV adoption and charging infrastructure development, especially public EV charging stat… ▽ More The diffusion of Electric Vehicles (EVs) plays a pivotal role in mitigating greenhouse gas emissions, particularly in the U.S., where ambitious zero-emission and carbon neutrality objectives have been set. In pursuit of these goals, many states have implemented a range of incentive policies aimed at stimulating EV adoption and charging infrastructure development, especially public EV charging stations (EVCS). This study examines the indirect network effect observed between EV adoption and EVCS deployment within urban landscapes. We developed a two-sided log-log regression model with historical data on EV purchases and EVCS development to quantify this effect. To test the robustness, we then conducted a case study of the EV market in Los Angeles (LA) County, which suggests that a 1% increase in EVCS correlates with a 0.35% increase in EV sales. Additionally, we forecasted the future EV market dynamics in LA County, revealing a notable disparity between current policies and the targeted 80% EV market share for private cars by 2045. To bridge this gap, we proposed a combined policy recommendation that enhances EV incentives by 60% and EVCS rebates by 66%, facilitating the achievement of future EV market objectives. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Conference preprint, 8 pages, 3 figures

arXiv:2403.06087 [pdf, other]

Learning the irreversible progression trajectory of Alzheimer's disease

Authors: Yipei Wang, Bing He, Shannon Risacher, Andrew Saykin, **gwen Yan, Xiaoqian Wang

Abstract: Alzheimer's disease (AD) is a progressive and irreversible brain disorder that unfolds over the course of 30 years. Therefore, it is critical to capture the disease progression in an early stage such that intervention can be applied before the onset of symptoms. Machine learning (ML) models have been shown effective in predicting the onset of AD. Yet for subjects with follow-up visits, existing te… ▽ More Alzheimer's disease (AD) is a progressive and irreversible brain disorder that unfolds over the course of 30 years. Therefore, it is critical to capture the disease progression in an early stage such that intervention can be applied before the onset of symptoms. Machine learning (ML) models have been shown effective in predicting the onset of AD. Yet for subjects with follow-up visits, existing techniques for AD classification only aim for accurate group assignment, where the monotonically increasing risk across follow-up visits is usually ignored. Resulted fluctuating risk scores across visits violate the irreversibility of AD, hampering the trustworthiness of models and also providing little value to understanding the disease progression. To address this issue, we propose a novel regularization approach to predict AD longitudinally. Our technique aims to maintain the expected monotonicity of increasing disease risk during progression while preserving expressiveness. Specifically, we introduce a monotonicity constraint that encourages the model to predict disease risk in a consistent and ordered manner across follow-up visits. We evaluate our method using the longitudinal structural MRI and amyloid-PET imaging data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Our model outperforms existing techniques in capturing the progressiveness of disease risk, and at the same time preserves prediction accuracy. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: accepted by ISBI 2024

arXiv:2403.05247 [pdf, other]

Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds

Authors: Tianrui Lou, Xiaojun Jia, **dong Gu, Li Liu, Siyuan Liang, Bangyan He, Xiaochun Cao

Abstract: Adversarial attack methods based on point manipulation for 3D point cloud classification have revealed the fragility of 3D models, yet the adversarial examples they produce are easily perceived or defended against. The trade-off between the imperceptibility and adversarial strength leads most point attack methods to inevitably introduce easily detectable outlier points upon a successful attack. An… ▽ More Adversarial attack methods based on point manipulation for 3D point cloud classification have revealed the fragility of 3D models, yet the adversarial examples they produce are easily perceived or defended against. The trade-off between the imperceptibility and adversarial strength leads most point attack methods to inevitably introduce easily detectable outlier points upon a successful attack. Another promising strategy, shape-based attack, can effectively eliminate outliers, but existing methods often suffer significant reductions in imperceptibility due to irrational deformations. We find that concealing deformation perturbations in areas insensitive to human eyes can achieve a better trade-off between imperceptibility and adversarial strength, specifically in parts of the object surface that are complex and exhibit drastic curvature changes. Therefore, we propose a novel shape-based adversarial attack method, HiT-ADV, which initially conducts a two-stage search for attack regions based on saliency and imperceptibility scores, and then adds deformation perturbations in each attack region using Gaussian kernel functions. Additionally, HiT-ADV is extendable to physical attack. We propose that by employing benign resampling and benign rigid transformations, we can further enhance physical adversarial strength with little sacrifice to imperceptibility. Extensive experiments have validated the superiority of our method in terms of adversarial and imperceptible properties in both digital and physical spaces. Our code is avaliable at: https://github.com/TRLou/HiT-ADV. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024

arXiv:2308.00326 [pdf]

Safety Control of Uncertain MIMO Systems Using Dynamic Output Feedback Barrier Pairs

Authors: Binghan He, Takashi Tanaka

Abstract: Safety control of dynamical systems using barrier functions relies on knowing the full state information. This paper introduces a novel approach for safety control in uncertain MIMO systems with partial state information. The proposed method combines the synthesis of a vector norm barrier function and a dynamic output feedback safety controller to ensure robust safety enforcement. The safety contr… ▽ More Safety control of dynamical systems using barrier functions relies on knowing the full state information. This paper introduces a novel approach for safety control in uncertain MIMO systems with partial state information. The proposed method combines the synthesis of a vector norm barrier function and a dynamic output feedback safety controller to ensure robust safety enforcement. The safety controller guarantees the invariance of the barrier function under uncertain dynamics and disturbances. To address the challenges associated with safety verification using partial state information, a barrier function estimator is developed. This estimator employs an identifier-based state estimator to obtain a state estimate that is affine in the uncertain model parameters of the system. By incorporating a priori knowledge of the limits of the uncertain model parameters and disturbances, the state estimate provides a robust upper bound for the barrier function. Comparative analysis with existing control barrier function based methods shows the advantage of the proposed approach in enforcing safety constraints under tight input constraints and the utilization of estimated state information. △ Less

Submitted 15 March, 2024; v1 submitted 1 August, 2023; originally announced August 2023.

arXiv:2302.12175 [pdf, other]

doi 10.1109/TITS.2023.3248841

Communication and Control in Collaborative UAVs: Recent Advances and Future Trends

Authors: Shumaila Javaid, Nasir Saeed, Zakria Qadir, Hamza Fahim, Bin He, Houbing Song, Muhammad Bilal

Abstract: The recent progress in unmanned aerial vehicles (UAV) technology has significantly advanced UAV-based applications for military, civil, and commercial domains. Nevertheless, the challenges of establishing high-speed communication links, flexible control strategies, and develo** efficient collaborative decision-making algorithms for a swarm of UAVs limit their autonomy, robustness, and reliabilit… ▽ More The recent progress in unmanned aerial vehicles (UAV) technology has significantly advanced UAV-based applications for military, civil, and commercial domains. Nevertheless, the challenges of establishing high-speed communication links, flexible control strategies, and develo** efficient collaborative decision-making algorithms for a swarm of UAVs limit their autonomy, robustness, and reliability. Thus, a growing focus has been witnessed on collaborative communication to allow a swarm of UAVs to coordinate and communicate autonomously for the cooperative completion of tasks in a short time with improved efficiency and reliability. This work presents a comprehensive review of collaborative communication in a multi-UAV system. We thoroughly discuss the characteristics of intelligent UAVs and their communication and control requirements for autonomous collaboration and coordination. Moreover, we review various UAV collaboration tasks, summarize the applications of UAV swarm networks for dense urban environments and present the use case scenarios to highlight the current developments of UAV-based applications in various domains. Finally, we identify several exciting future research direction that needs attention for advancing the research in collaborative UAVs. △ Less

Submitted 23 February, 2023; originally announced February 2023.

arXiv:2212.09247 [pdf, other]

ColoristaNet for Photorealistic Video Style Transfer

Authors: Xiaowen Qiu, Ruize Xu, Boan He, Yingtao Zhang, Wenqiang Zhang, Weifeng Ge

Abstract: Photorealistic style transfer aims to transfer the artistic style of an image onto an input image or video while kee** photorealism. In this paper, we think it's the summary statistics matching scheme in existing algorithms that leads to unrealistic stylization. To avoid employing the popular Gram loss, we propose a self-supervised style transfer framework, which contains a style removal part an… ▽ More Photorealistic style transfer aims to transfer the artistic style of an image onto an input image or video while kee** photorealism. In this paper, we think it's the summary statistics matching scheme in existing algorithms that leads to unrealistic stylization. To avoid employing the popular Gram loss, we propose a self-supervised style transfer framework, which contains a style removal part and a style restoration part. The style removal network removes the original image styles, and the style restoration network recovers image styles in a supervised manner. Meanwhile, to address the problems in current feature transformation methods, we propose decoupled instance normalization to decompose feature transformation into style whitening and restylization. It works quite well in ColoristaNet and can transfer image styles efficiently while kee** photorealism. To ensure temporal coherency, we also incorporate optical flow methods and ConvLSTM to embed contextual information. Experiments demonstrates that ColoristaNet can achieve better stylization effects when compared with state-of-the-art algorithms. △ Less

Submitted 21 December, 2022; v1 submitted 18 December, 2022; originally announced December 2022.

Comments: 30 pages, 29 figures

arXiv:2212.07656 [pdf, other]

Hybrid stability augmentation control of multi-rotor UAV in confined space based on adaptive backstep** control

Authors: QuanXi Zhan, JunRui Zhang, ChenYang Sun, RunJie Shen, Bin He

Abstract: This paper applies the UAV to the inspection of water diversion pipelines in hydropower stations. The diversion pipeline is an enclosed space, so the airflow disturbance caused by the rotation of the UAV blades and the strong air convection from the chimney effect have a great impact on the flight control of the UAV. Although the traditional linear control PID flight control algorithm has been wid… ▽ More This paper applies the UAV to the inspection of water diversion pipelines in hydropower stations. The diversion pipeline is an enclosed space, so the airflow disturbance caused by the rotation of the UAV blades and the strong air convection from the chimney effect have a great impact on the flight control of the UAV. Although the traditional linear control PID flight control algorithm has been widely used and can meet the requirements of general flight tasks, it cannot guarantee the stability of the system over a wide range. The inspection of a diversion line in an enclosed space requires high system stability and robustness of the UAV controller. In this paper, a hybrid stabilised adaptive backstep** control method is proposed. Firstly, a multi-rotor UAV model is analysed and transformed into a strict feedback form with external disturbances; then adaptive techniques are used to estimate the airflow disturbances caused by the blades, and the attitude and position tracking controllers are designed by combining backstep** control and PID control respectively; finally, the asymptotic stability of the system is ensured by constructing a Lyapunov function. The experimental data show that the flight controller designed in this paper has good robustness and tracking performance, and can better resist the disturbance caused by airflow disturbance in confined space. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Comments: 7 pages

arXiv:2209.04606 [pdf, other]

Barrier Pairs for Safety Control of Uncertain Output Feedback Systems

Authors: Binghan He, Takashi Tanaka

Abstract: The barrier function method for safety control typically assumes the availability of full state information. Unfortunately, in many scenarios involving uncertain dynamical systems, full state information is often unavailable. In this paper, we aim to solve the safety control problem for an uncertain single-input single-output system with partial state information. First, we develop a synthesis met… ▽ More The barrier function method for safety control typically assumes the availability of full state information. Unfortunately, in many scenarios involving uncertain dynamical systems, full state information is often unavailable. In this paper, we aim to solve the safety control problem for an uncertain single-input single-output system with partial state information. First, we develop a synthesis method that simultaneously creates a barrier function and a dynamic output feedback safety controller. This safety controller guarantees that the unit sub-level set of the barrier function is an invariant set under the uncertain dynamics and disturbances of the system. Then, we build an identifier-based estimator that provides a state estimate affine to the uncertain model parameters of the system. To detect the potential risks of the system, a fault detector uses the state estimate to find an upper bound for the barrier function. The fault detector triggers the safety controller when the system's original action leads to a potential safety issue and resumes the original action when the potential safety issue is resolved by the safety controller. △ Less

Submitted 1 August, 2023; v1 submitted 10 September, 2022; originally announced September 2022.

arXiv:2206.06025 [pdf, other]

Rethinking: Deep-learning-based Demodulation and Decoding

Authors: Boxiang He, Zitao Wu, Fanggang Wang

Abstract: In this paper, we focus on the demodulation/decoding of the complex modulations/codes that approach the Shannon capacity. Theoretically, the maximum likelihood (ML) algorithm can achieve the optimal error performance whereas it has $\mathcal{O}(2^k)$ demodulation/decoding complexity with $k$ denoting the number of information bits. Recent progress in deep learning provides a new direction to tackl… ▽ More In this paper, we focus on the demodulation/decoding of the complex modulations/codes that approach the Shannon capacity. Theoretically, the maximum likelihood (ML) algorithm can achieve the optimal error performance whereas it has $\mathcal{O}(2^k)$ demodulation/decoding complexity with $k$ denoting the number of information bits. Recent progress in deep learning provides a new direction to tackle the demodulation and the decoding. The purpose of this paper is to analyze the feasibility of the neural network to demodulate/decode the complex modulations/codes close to the Shannon capacity and characterize the error performance and the complexity of the neural network. Regarding the neural network demodulator, we use the golden angle modulation (GAM), a promising modulation format that can offer the Shannon capacity approaching performance, to evaluate the demodulator. It is observed that the neural network demodulator can get a close performance to the ML-based method while it suffers from the lower complexity order in the low-order GAM. Regarding the neural network decoder, we use the Gaussian codebook, achieving the Shannon capacity, to evaluate the decoder. We also observe that the neural network decoder achieves the error performance close to the ML decoder with a much lower complexity order in the small Gaussian codebook. Limited by the current training resources, we cannot evaluate the performance of the high-order modulation and the long codeword. But, based on the results of the low-order GAM and the small Gaussian codebook, we boldly give our conjecture: the neural network demodulator/decoder is a strong candidate approach for demodulating/decoding the complex modulations/codes close to the Shannon capacity owing to the error performance of the near-ML algorithm and the lower complexity. △ Less

Submitted 13 June, 2022; originally announced June 2022.

arXiv:2205.03242 [pdf]

Electrocardiographic Deep Learning for Predicting Post-Procedural Mortality

Authors: David Ouyang, John Theurer, Nathan R. Stein, J. Weston Hughes, Pierre Elias, Bryan He, Neal Yuan, Grant Duffy, Roopinder K. Sandhu, Joseph Ebinger, Patrick Botting, Melvin Jujjavarapu, Brian Claggett, James E. Tooley, Tim Poterucha, Jonathan H. Chen, Michael Nurok, Marco Perez, Adler Perotte, James Y. Zou, Nancy R. Cook, Sumeet S. Chugh, Susan Cheng, Christine M. Albert

Abstract: Background. Pre-operative risk assessments used in clinical practice are limited in their ability to identify risk for post-operative mortality. We hypothesize that electrocardiograms contain hidden risk markers that can help prognosticate post-operative mortality. Methods. In a derivation cohort of 45,969 pre-operative patients (age 59+- 19 years, 55 percent women), a deep learning algorithm was… ▽ More Background. Pre-operative risk assessments used in clinical practice are limited in their ability to identify risk for post-operative mortality. We hypothesize that electrocardiograms contain hidden risk markers that can help prognosticate post-operative mortality. Methods. In a derivation cohort of 45,969 pre-operative patients (age 59+- 19 years, 55 percent women), a deep learning algorithm was developed to leverage waveform signals from pre-operative ECGs to discriminate post-operative mortality. Model performance was assessed in a holdout internal test dataset and in two external hospital cohorts and compared with the Revised Cardiac Risk Index (RCRI) score. Results. In the derivation cohort, there were 1,452 deaths. The algorithm discriminates mortality with an AUC of 0.83 (95% CI 0.79-0.87) surpassing the discrimination of the RCRI score with an AUC of 0.67 (CI 0.61-0.72) in the held out test cohort. Patients determined to be high risk by the deep learning model's risk prediction had an unadjusted odds ratio (OR) of 8.83 (5.57-13.20) for post-operative mortality as compared to an unadjusted OR of 2.08 (CI 0.77-3.50) for post-operative mortality for RCRI greater than 2. The deep learning algorithm performed similarly for patients undergoing cardiac surgery with an AUC of 0.85 (CI 0.77-0.92), non-cardiac surgery with an AUC of 0.83 (0.79-0.88), and catherization or endoscopy suite procedures with an AUC of 0.76 (0.72-0.81). The algorithm similarly discriminated risk for mortality in two separate external validation cohorts from independent healthcare systems with AUCs of 0.79 (0.75-0.83) and 0.75 (0.74-0.76) respectively. Conclusion. The findings demonstrate how a novel deep learning algorithm, applied to pre-operative ECGs, can improve discrimination of post-operative mortality. △ Less

Submitted 30 April, 2022; originally announced May 2022.

arXiv:2202.10847 [pdf, other]

UncertaINR: Uncertainty Quantification of End-to-End Implicit Neural Representations for Computed Tomography

Authors: Francisca Vasconcelos, Bobby He, Nalini Singh, Yee Whye Teh

Abstract: Implicit neural representations (INRs) have achieved impressive results for scene reconstruction and computer graphics, where their performance has primarily been assessed on reconstruction accuracy. As INRs make their way into other domains, where model predictions inform high-stakes decision-making, uncertainty quantification of INR inference is becoming critical. To that end, we study a Bayesia… ▽ More Implicit neural representations (INRs) have achieved impressive results for scene reconstruction and computer graphics, where their performance has primarily been assessed on reconstruction accuracy. As INRs make their way into other domains, where model predictions inform high-stakes decision-making, uncertainty quantification of INR inference is becoming critical. To that end, we study a Bayesian reformulation of INRs, UncertaINR, in the context of computed tomography, and evaluate several Bayesian deep learning implementations in terms of accuracy and calibration. We find that they achieve well-calibrated uncertainty, while retaining accuracy competitive with other classical, INR-based, and CNN-based reconstruction techniques. Contrary to common intuition in the Bayesian deep learning literature, we find that INRs obtain the best calibration with computationally efficient Monte Carlo dropout, outperforming Hamiltonian Monte Carlo and deep ensembles. Moreover, in contrast to the best-performing prior approaches, UncertaINR does not require a large training dataset, but only a handful of validation images. △ Less

Submitted 2 May, 2023; v1 submitted 22 February, 2022; originally announced February 2022.

Comments: Published in the Transactions on Machine Learning Research (TMLR) April 2023 [https://openreview.net/forum?id=jdGMBgYvfX]

arXiv:2110.13903 [pdf, other]

NeRV: Neural Representations for Videos

Authors: Hao Chen, Bo He, Hanyu Wang, Yixuan Ren, Ser-Nam Lim, Abhinav Shrivastava

Abstract: We propose a novel neural representation for videos (NeRV) which encodes videos in neural networks. Unlike conventional representations that treat videos as frame sequences, we represent videos as neural networks taking frame index as input. Given a frame index, NeRV outputs the corresponding RGB image. Video encoding in NeRV is simply fitting a neural network to video frames and decoding process… ▽ More We propose a novel neural representation for videos (NeRV) which encodes videos in neural networks. Unlike conventional representations that treat videos as frame sequences, we represent videos as neural networks taking frame index as input. Given a frame index, NeRV outputs the corresponding RGB image. Video encoding in NeRV is simply fitting a neural network to video frames and decoding process is a simple feedforward operation. As an image-wise implicit representation, NeRV output the whole image and shows great efficiency compared to pixel-wise implicit representation, improving the encoding speed by 25x to 70x, the decoding speed by 38x to 132x, while achieving better video quality. With such a representation, we can treat videos as neural networks, simplifying several video-related tasks. For example, conventional video compression methods are restricted by a long and complex pipeline, specifically designed for the task. In contrast, with NeRV, we can use any neural network compression method as a proxy for video compression, and achieve comparable performance to traditional frame-based video compression approaches (H.264, HEVC \etc). Besides compression, we demonstrate the generalization of NeRV for video denoising. The source code and pre-trained model can be found at https://github.com/haochen-rye/NeRV.git. △ Less

Submitted 26 October, 2021; originally announced October 2021.

Comments: To appear at NeurIPS 2021

arXiv:2108.01245 [pdf, other]

The Performance Evaluation of Attention-Based Neural ASR under Mixed Speech Input

Authors: Bradley He, Martin Radfar

Abstract: In order to evaluate the performance of the attention based neural ASR under noisy conditions, the current trend is to present hours of various noisy speech data to the model and measure the overall word/phoneme error rate (W/PER). In general, it is unclear how these models perform when exposed to a cocktail party setup in which two or more speakers are active. In this paper, we present the mixtur… ▽ More In order to evaluate the performance of the attention based neural ASR under noisy conditions, the current trend is to present hours of various noisy speech data to the model and measure the overall word/phoneme error rate (W/PER). In general, it is unclear how these models perform when exposed to a cocktail party setup in which two or more speakers are active. In this paper, we present the mixtures of speech signals to a popular attention-based neural ASR, known as Listen, Attend, and Spell (LAS), at different target-to-interference ratio (TIR) and measure the phoneme error rate. In particular, we investigate in details when two phonemes are mixed what will be the predicted phoneme; in this fashion we build a model in which the most probable predictions for a phoneme are given. We found a 65% relative increase in PER when LAS was presented with mixed speech signals at TIR = 0 dB and the performance approaches the unmixed scenario at TIR = 30 dB. Our results show the model, when presented with mixed phonemes signals, tend to predict those that have higher accuracies during evaluation of original phoneme signals. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: 5 pages, 3 figures

arXiv:2106.12511 [pdf]

doi 10.1001/jamacardio.2021.6059

High-Throughput Precision Phenoty** of Left Ventricular Hypertrophy with Cardiovascular Deep Learning

Authors: Grant Duffy, Paul P Cheng, Neal Yuan, Bryan He, Alan C. Kwan, Matthew J. Shun-Shin, Kevin M. Alexander, Joseph Ebinger, Matthew P. Lungren, Florian Rader, David H. Liang, Ingela Schnittger, Euan A. Ashley, James Y. Zou, Jignesh Patel, Ronald Witteles, Susan Cheng, David Ouyang

Abstract: Left ventricular hypertrophy (LVH) results from chronic remodeling caused by a broad range of systemic and cardiovascular disease including hypertension, aortic stenosis, hypertrophic cardiomyopathy, and cardiac amyloidosis. Early detection and characterization of LVH can significantly impact patient care but is limited by under-recognition of hypertrophy, measurement error and variability, and di… ▽ More Left ventricular hypertrophy (LVH) results from chronic remodeling caused by a broad range of systemic and cardiovascular disease including hypertension, aortic stenosis, hypertrophic cardiomyopathy, and cardiac amyloidosis. Early detection and characterization of LVH can significantly impact patient care but is limited by under-recognition of hypertrophy, measurement error and variability, and difficulty differentiating etiologies of LVH. To overcome this challenge, we present EchoNet-LVH - a deep learning workflow that automatically quantifies ventricular hypertrophy with precision equal to human experts and predicts etiology of LVH. Trained on 28,201 echocardiogram videos, our model accurately measures intraventricular wall thickness (mean absolute error [MAE] 1.4mm, 95% CI 1.2-1.5mm), left ventricular diameter (MAE 2.4mm, 95% CI 2.2-2.6mm), and posterior wall thickness (MAE 1.2mm, 95% CI 1.1-1.3mm) and classifies cardiac amyloidosis (area under the curve of 0.83) and hypertrophic cardiomyopathy (AUC 0.98) from other etiologies of LVH. In external datasets from independent domestic and international healthcare systems, EchoNet-LVH accurately quantified ventricular parameters (R2 of 0.96 and 0.90 respectively) and detected cardiac amyloidosis (AUC 0.79) and hypertrophic cardiomyopathy (AUC 0.89) on the domestic external validation site. Leveraging measurements across multiple heart beats, our model can more accurately identify subtle changes in LV geometry and its causal etiologies. Compared to human experts, EchoNet-LVH is fully automated, allowing for reproducible, precise measurements, and lays the foundation for precision diagnosis of cardiac hypertrophy. As a resource to promote further innovation, we also make publicly available a large dataset of 23,212 annotated echocardiogram videos. △ Less

Submitted 23 June, 2021; originally announced June 2021.

arXiv:2106.05735 [pdf, other]

doi 10.1038/s41467-022-30695-9

The Medical Segmentation Decathlon

Authors: Michela Antonelli, Annika Reinke, Spyridon Bakas, Keyvan Farahani, AnnetteKopp-Schneider, Bennett A. Landman, Geert Litjens, Bjoern Menze, Olaf Ronneberger, Ronald M. Summers, Bram van Ginneken, Michel Bilello, Patrick Bilic, Patrick F. Christ, Richard K. G. Do, Marc J. Gollub, Stephan H. Heckers, Henkjan Huisman, William R. Jarnagin, Maureen K. McHugo, Sandy Napel, Jennifer S. Goli Pernicka, Kawal Rhode, Catalina Tobon-Gomez, Eugene Vorontsov , et al. (34 additional authors not shown)

Abstract: International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical pro… ▽ More International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical problem. We hypothesized that a method capable of performing well on multiple tasks will generalize well to a previously unseen task and potentially outperform a custom-designed solution. To investigate the hypothesis, we organized the Medical Segmentation Decathlon (MSD) - a biomedical image analysis challenge, in which algorithms compete in a multitude of both tasks and modalities. The underlying data set was designed to explore the axis of difficulties typically encountered when dealing with medical images, such as small data sets, unbalanced labels, multi-site data and small objects. The MSD challenge confirmed that algorithms with a consistent good performance on a set of tasks preserved their good average performance on a different set of previously unseen tasks. Moreover, by monitoring the MSD winner for two years, we found that this algorithm continued generalizing well to a wide range of other clinical problems, further confirming our hypothesis. Three main conclusions can be drawn from this study: (1) state-of-the-art image segmentation algorithms are mature, accurate, and generalize well when retrained on unseen tasks; (2) consistent algorithmic performance across multiple tasks is a strong surrogate of algorithmic generalizability; (3) the training of accurate AI segmentation models is now commoditized to non AI experts. △ Less

Submitted 10 June, 2021; originally announced June 2021.

MSC Class: 68T07

arXiv:2103.09963 [pdf]

TSTNN: Two-stage Transformer based Neural Network for Speech Enhancement in the Time Domain

Authors: Kai Wang, Bengbeng He, Wei-** Zhu

Abstract: In this paper, we propose a transformer-based architecture, called two-stage transformer neural network (TSTNN) for end-to-end speech denoising in the time domain. The proposed model is composed of an encoder, a two-stage transformer module (TSTM), a masking module and a decoder. The encoder maps input noisy speech into feature representation. The TSTM exploits four stacked two-stage transformer b… ▽ More In this paper, we propose a transformer-based architecture, called two-stage transformer neural network (TSTNN) for end-to-end speech denoising in the time domain. The proposed model is composed of an encoder, a two-stage transformer module (TSTM), a masking module and a decoder. The encoder maps input noisy speech into feature representation. The TSTM exploits four stacked two-stage transformer blocks to efficiently extract local and global information from the encoder output stage by stage. The masking module creates a mask which will be multiplied with the encoder output. Finally, the decoder uses the masked encoder feature to reconstruct the enhanced speech. Experimental results on the benchmark dataset show that the TSTNN outperforms most state-of-the-art models in time or frequency domain while having significantly lower model complexity. △ Less

Submitted 17 March, 2021; originally announced March 2021.

Comments: 5 pages, 4 figures, accepted by IEEE ICASSP 2021

arXiv:2102.12755 [pdf, other]

Coarse-to-fine Airway Segmentation Using Multi information Fusion Network and CNN-based Region Growing

Authors: **quan Guo, Rongda Fu, Lin Pan, Shaohua Zheng, Liqin Huang, Bin Zheng, Bingwei He

Abstract: Automatic airway segmentation from chest computed tomography (CT) scans plays an important role in pulmonary disease diagnosis and computer-assisted therapy. However, low contrast at peripheral branches and complex tree-like structures remain as two mainly challenges for airway segmentation. Recent research has illustrated that deep learning methods perform well in segmentation tasks. Motivated by… ▽ More Automatic airway segmentation from chest computed tomography (CT) scans plays an important role in pulmonary disease diagnosis and computer-assisted therapy. However, low contrast at peripheral branches and complex tree-like structures remain as two mainly challenges for airway segmentation. Recent research has illustrated that deep learning methods perform well in segmentation tasks. Motivated by these works, a coarse-to-fine segmentation framework is proposed to obtain a complete airway tree. Our framework segments the overall airway and small branches via the multi-information fusion convolution neural network (Mif-CNN) and the CNN-based region growing, respectively. In Mif-CNN, atrous spatial pyramid pooling (ASPP) is integrated into a u-shaped network, and it can expend the receptive field and capture multi-scale information. Meanwhile, boundary and location information are incorporated into semantic information. These information are fused to help Mif-CNN utilize additional context knowledge and useful features. To improve the performance of the segmentation result, the CNN-based region growing method is designed to focus on obtaining small branches. A voxel classification network (VCN), which can entirely capture the rich information around each voxel, is applied to classify the voxels into airway and non-airway. In addition, a shape reconstruction method is used to refine the airway tree. △ Less

Submitted 25 February, 2021; originally announced February 2021.

arXiv:2102.09978 [pdf, other]

TransMask: A Compact and Fast Speech Separation Model Based on Transformer

Authors: Zining Zhang, Bingsheng He, Zhenjie Zhang

Abstract: Speech separation is an important problem in speech processing, which targets to separate and generate clean speech from a mixed audio containing speech from different speakers. Empowered by the deep learning technologies over sequence-to-sequence domain, recent neural speech separation models are now capable of generating highly clean speech audios. To make these models more practical by reducing… ▽ More Speech separation is an important problem in speech processing, which targets to separate and generate clean speech from a mixed audio containing speech from different speakers. Empowered by the deep learning technologies over sequence-to-sequence domain, recent neural speech separation models are now capable of generating highly clean speech audios. To make these models more practical by reducing the model size and inference time while maintaining high separation quality, we propose a new transformer-based speech separation approach, called TransMask. By fully un-leashing the power of self-attention on long-term dependency exception, we demonstrate the size of TransMask is more than 60% smaller and the inference is more than 2 times faster than state-of-the-art solutions. TransMask fully utilizes the parallelism during inference, and achieves nearly linear inference time within reasonable input audio lengths. It also outperforms existing solutions on output speech audio quality, achieving SDR above 16 over Librimix benchmark. △ Less

Submitted 19 February, 2021; originally announced February 2021.

Comments: Accepted in ICASSP2021

arXiv:2010.12788 [pdf, other]

GAZEV: GAN-Based Zero-Shot Voice Conversion over Non-parallel Speech Corpus

Authors: Zining Zhang, Bingsheng He, Zhenjie Zhang

Abstract: Non-parallel many-to-many voice conversion is recently attract-ing huge research efforts in the speech processing community. A voice conversion system transforms an utterance of a source speaker to another utterance of a target speaker by kee** the content in the original utterance and replacing by the vocal features from the target speaker. Existing solutions, e.g., StarGAN-VC2, present promisi… ▽ More Non-parallel many-to-many voice conversion is recently attract-ing huge research efforts in the speech processing community. A voice conversion system transforms an utterance of a source speaker to another utterance of a target speaker by kee** the content in the original utterance and replacing by the vocal features from the target speaker. Existing solutions, e.g., StarGAN-VC2, present promising results, only when speech corpus of the engaged speakers is available during model training. AUTOVCis able to perform voice conversion on unseen speakers, but it needs an external pretrained speaker verification model. In this paper, we present our new GAN-based zero-shot voice conversion solution, called GAZEV, which targets to support unseen speakers on both source and target utterances. Our key technical contribution is the adoption of speaker embedding loss on top of the GAN framework, as well as adaptive instance normalization strategy, in order to address the limitations of speaker identity transfer in existing solutions. Our empirical evaluations demonstrate significant performance improvement on output speech quality and comparable speaker similarity to AUTOVC. △ Less

Submitted 24 October, 2020; originally announced October 2020.

arXiv:2010.12766 [pdf, other]

X-TaSNet: Robust and Accurate Time-Domain Speaker Extraction Network

Authors: Zining Zhang, Bingsheng He, Zhenjie Zhang

Abstract: Extracting the speech of a target speaker from mixed audios, based on a reference speech from the target speaker, is a challenging yet powerful technology in speech processing. Recent studies of speaker-independent speech separation, such as TasNet, have shown promising results by applying deep neural networks over the time-domain waveform. Such separation neural network does not directly generate… ▽ More Extracting the speech of a target speaker from mixed audios, based on a reference speech from the target speaker, is a challenging yet powerful technology in speech processing. Recent studies of speaker-independent speech separation, such as TasNet, have shown promising results by applying deep neural networks over the time-domain waveform. Such separation neural network does not directly generate reliable and accurate output when target speakers are specified, because of the necessary prior on the number of speakers and the lack of robustness when dealing with audios with absent speakers. In this paper, we break these limitations by introducing a new speaker-aware speech masking method, called X-TaSNet. Our proposal adopts new strategies, including a distortion-based loss and corresponding alternating training scheme, to better address the robustness issue. X-TaSNet significantly enhances the extracted speech quality, doubling SDRi and SI-SNRi of the output speech audio over state-of-the-art voice filtering approach. X-TaSNet also improves the reliability of the results by improving the accuracy of speaker identity in the output audio to 95.4%, such that it returns silent audios in most cases when the target speaker is absent. These results demonstrate X-TaSNet moves one solid step towards more practical applications of speaker extraction technology. △ Less

Submitted 23 October, 2020; originally announced October 2020.

arXiv:2009.12446 [pdf, other]

doi 10.1109/TNSRE.2020.3027501

A Complex Stiffness Human Impedance Model with Customizable Exoskeleton Control

Authors: Binghan He, Huang Huang, Gray C. Thomas, Luis Sentis

Abstract: The natural impedance, or dynamic relationship between force and motion, of a human operator can determine the stability of exoskeletons that use interaction-torque feedback to amplify human strength. While human impedance is typically modelled as a linear system, our experiments on a single-joint exoskeleton testbed involving 10 human subjects show evidence of nonlinear behavior: a low-frequency… ▽ More The natural impedance, or dynamic relationship between force and motion, of a human operator can determine the stability of exoskeletons that use interaction-torque feedback to amplify human strength. While human impedance is typically modelled as a linear system, our experiments on a single-joint exoskeleton testbed involving 10 human subjects show evidence of nonlinear behavior: a low-frequency asymptotic phase for the dynamic stiffness of the human that is different than the expected zero, and an unexpectedly consistent dam** ratio as the stiffness and inertia vary. To explain these observations, this paper considers a new frequency-domain model of the human joint dynamics featuring complex value stiffness comprising a real stiffness term and a hysteretic dam** term. Using a statistical F-test we show that the hysteretic dam** term is not only significant but is even more significant than the linear dam** term. Further analysis reveals a linear trend linking hysteretic dam** and the real part of the stiffness, which allows us to simplify the complex stiffness model down to a 1-parameter system. Then, we introduce and demonstrate a customizable fractional-order controller that exploits this hysteretic dam** behavior to improve strength amplification bandwidth while maintaining stability, and explore a tuning approach which ensures that this stability property is robust to muscle co-contraction for each individual. △ Less

Submitted 25 September, 2020; originally announced September 2020.

Comments: 10 pages, 7 figures, 4 tables. arXiv admin note: text overlap with arXiv:1903.00704

arXiv:1910.12902 [pdf, other]

doi 10.23919/ACC45564.2020.9147875

Adaptive Compliance Sha** with Human Impedance Estimation

Authors: Huang Huang, Henry F. Cappel, Gray C. Thomas, Binghan He, Luis Sentis

Abstract: Human impedance parameters play an integral role in the dynamics of strength amplification exoskeletons. Many methods are used to estimate the stiffness of human muscles, but few are used to improve the performance of strength amplification controllers for these devices. We propose a compliance sha** amplification controller incorporating an accurate online human stiffness estimation from surfac… ▽ More Human impedance parameters play an integral role in the dynamics of strength amplification exoskeletons. Many methods are used to estimate the stiffness of human muscles, but few are used to improve the performance of strength amplification controllers for these devices. We propose a compliance sha** amplification controller incorporating an accurate online human stiffness estimation from surface electromyography (sEMG) sensors and stretch sensors connected to the forearm and upper arm of the human. These sensor values along with exoskeleton position and velocity are used to train a random forest regression model that accurately predicts a person's stiffness despite varying movement, relaxation, and muscle co-contraction. Our model's accuracy is verified using experimental test data and the model is implemented into the compliance sha** controller. Ultimately we show that the online estimation of stiffness can improve the bandwidth and amplification of the controller while remaining robustly stable. △ Less

Submitted 2 August, 2020; v1 submitted 28 October, 2019; originally announced October 2019.

Comments: 8 pages, 9 figures, Accepted for publication at the 2020 American Control Conference. Copyright IEEE 2020

arXiv:1910.02317 [pdf, other]

doi 10.23919/ACC45564.2020.9147276

Robust Estimator-Based Safety Verification: A Vector Norm Approach

Authors: Binghan He, Gray C. Thomas, Luis Sentis

Abstract: In this paper, we consider the problem of verifying safety constraint satisfaction for single-input single-output systems with uncertain transfer function coefficients. We propose a new type of barrier function based on a vector norm. This type of barrier function has a measurable upper bound without full state availability. An identifier-based estimator allows an exact bound for the uncertainty-b… ▽ More In this paper, we consider the problem of verifying safety constraint satisfaction for single-input single-output systems with uncertain transfer function coefficients. We propose a new type of barrier function based on a vector norm. This type of barrier function has a measurable upper bound without full state availability. An identifier-based estimator allows an exact bound for the uncertainty-based component of the barrier function estimate. Assuming that the system is safe initially allows an exponentially decreasing bound on the error due to the estimator transient. Barrier function and estimator synthesis is proposed as two convex sub-problems, exploiting linear matrix inequalities. The barrier function controller combination is then used to construct a safety backup controller. And we demonstrate the system in a simulation of a 1 degree-of-freedom human-exoskeleton interaction. △ Less

Submitted 1 August, 2020; v1 submitted 5 October, 2019; originally announced October 2019.

Comments: 6 pages, 5 figures. Accepted for publication at the 2020 American Control Conference. Copyright IEEE 2020

arXiv:1908.09334 [pdf, other]

Collaborative Computation Offloading in Wireless Powered Mobile-Edge Computing Systems

Authors: Binqi He, Suzhi Bi, Hong Xing, Xiaohui Lin

Abstract: This paper studies a novel user cooperation model in a wireless powered mobile edge computing system where two wireless users harvest wireless power transferred by one energy node and can offload part of their computation tasks to an edge server (ES) for remote execution. In particular, we consider that the direct communication link between one user to the ES is blocked, such that the other user a… ▽ More This paper studies a novel user cooperation model in a wireless powered mobile edge computing system where two wireless users harvest wireless power transferred by one energy node and can offload part of their computation tasks to an edge server (ES) for remote execution. In particular, we consider that the direct communication link between one user to the ES is blocked, such that the other user acts as a relay to forward its offloading data to the server. Meanwhile, instead of forwarding all the received task data, we also allow the hel** user to compute part of the received task locally to reduce the potentially high energy and time cost on task offloading to the ES. Our aim is to maximize the amount of data that can be processed within a given time frame of the two users by jointly optimizing the amount of task data computed at each device (users and ES), the system time allocation, the transmit power and CPU frequency of the users. We propose an efficient method to find the optimal solution and show that the proposed user cooperation can effectively enhance the computation performance of the system compared to other representative benchmark methods under different scenarios. △ Less

Submitted 4 September, 2019; v1 submitted 25 August, 2019; originally announced August 2019.

Comments: The paper is accepted for publication by IEEE GLOBECOM 2019, at Waikoloa, HI, USA, in Dec. 2019

arXiv:1809.10560 [pdf, other]

doi 10.23919/ACC.2019.8814421

Modeling and Loop Sha** of Single-Joint Amplification Exoskeleton with Contact Sensing and Series Elastic Actuation

Authors: Binghan He, Gray C. Thomas, Nicholas Paine, Luis Sentis

Abstract: In this paper we consider a class of exoskeletons designed to amplify the strength of humans through feedback of sensed human-robot interactions and actuator forces. We define an amplification error signal based on a reference amplification rate, and design a linear feedback compensator to attenuate this error. Since the human operator is an integral part of the system, we design the compensator t… ▽ More In this paper we consider a class of exoskeletons designed to amplify the strength of humans through feedback of sensed human-robot interactions and actuator forces. We define an amplification error signal based on a reference amplification rate, and design a linear feedback compensator to attenuate this error. Since the human operator is an integral part of the system, we design the compensator to be robust to both a realistic variation in human impedance and a large variation in load impedance. We demonstrate our strategy on a one-degree of freedom amplification exoskeleton connected to a human arm, following a three dimensional matrix of experimentation: slow or fast human motion; light or extreme exoskeleton load; and soft or clenched human arm impedances. We demonstrate that a slightly aggressive controller results in a borderline stable system---but only for soft human musculoeskeletal behavior and a heavy load. This class of exoskeleton systems is interesting because it can both amplify a human's interaction forces --- so long as the human contacts the environment through the exoskeleton --- and attenuate the operator's perception of the exoskeleton's reflected dynamics at frequencies within the bandwidth of the control. △ Less

Submitted 19 November, 2019; v1 submitted 27 September, 2018; originally announced September 2018.

Comments: 8 pages, 12 figures, 4 tables. Accepted for publication at the 2019 American Control Conference. Copyright IEEE 2019

arXiv:1802.10188 [pdf, other]

doi 10.23919/ACC.2018.8431457

Safety Control Synthesis with Input Limits: a Hybrid Approach

Authors: Gray C. Thomas, Binghan He, Luis Sentis

Abstract: We introduce a hybrid (discrete--continuous) safety controller which enforces strict state and input constraints on a system---but only acts when necessary, preserving transparent operation of the original system within some safe region of the state space. We define this space using a Min-Quadratic Barrier function, which we construct along the equilibrium manifold using the Lyapunov functions whi… ▽ More We introduce a hybrid (discrete--continuous) safety controller which enforces strict state and input constraints on a system---but only acts when necessary, preserving transparent operation of the original system within some safe region of the state space. We define this space using a Min-Quadratic Barrier function, which we construct along the equilibrium manifold using the Lyapunov functions which result from linear matrix inequality controller synthesis for locally valid uncertain linearizations. We also introduce the concept of a barrier pair, which makes it easy to extend the approach to include trajectory-based augmentations to the safe region, in the style of LQR-Trees. We demonstrate our controller and barrier pair synthesis method in simulation-based examples. △ Less

Submitted 20 November, 2019; v1 submitted 27 February, 2018; originally announced February 2018.

Comments: 6 pages, 7 figures. Accepted for publication at the 2018 American Controls Conference. Copyright IEEE 2018

Showing 1–27 of 27 results for author: He, B