-
xLSTM-UNet can be an Effective 2D \& 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart
Authors:
Tianrun Chen,
Chaotao Ding,
Lanyun Zhu,
Tao Xu,
Deyi Ji,
Ying Zang,
Zejian Li
Abstract:
Convolutional Neural Networks (CNNs) and Vision Transformers (ViT) have been pivotal in biomedical image segmentation, yet their ability to manage long-range dependencies remains constrained by inherent locality and computational overhead. To overcome these challenges, in this technical report, we first propose xLSTM-UNet, a UNet structured deep learning neural network that leverages Vision-LSTM (…
▽ More
Convolutional Neural Networks (CNNs) and Vision Transformers (ViT) have been pivotal in biomedical image segmentation, yet their ability to manage long-range dependencies remains constrained by inherent locality and computational overhead. To overcome these challenges, in this technical report, we first propose xLSTM-UNet, a UNet structured deep learning neural network that leverages Vision-LSTM (xLSTM) as its backbone for medical image segmentation. xLSTM is a recently proposed as the successor of Long Short-Term Memory (LSTM) networks and have demonstrated superior performance compared to Transformers and State Space Models (SSMs) like Mamba in Neural Language Processing (NLP) and image classification (as demonstrated in Vision-LSTM, or ViL implementation). Here, xLSTM-UNet we designed extend the success in biomedical image segmentation domain. By integrating the local feature extraction strengths of convolutional layers with the long-range dependency capturing abilities of xLSTM, xLSTM-UNet offers a robust solution for comprehensive image analysis. We validate the efficacy of xLSTM-UNet through experiments. Our findings demonstrate that xLSTM-UNet consistently surpasses the performance of leading CNN-based, Transformer-based, and Mamba-based segmentation networks in multiple datasets in biomedical segmentation including organs in abdomen MRI, instruments in endoscopic images, and cells in microscopic images. With comprehensive experiments performed, this technical report highlights the potential of xLSTM-based architectures in advancing biomedical image analysis in both 2D and 3D. The code, models, and datasets are publicly available at \href{http://tianrun-chen.github.io/xLSTM-UNet/}{http://tianrun-chen.github.io/xLSTM-Unet/}
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Authors:
Philip Anastassiou,
Jiawei Chen,
Jitong Chen,
Yuanzhe Chen,
Zhuo Chen,
Ziyi Chen,
Jian Cong,
Lelai Deng,
Chuang Ding,
Lu Gao,
Mingqing Gong,
Peisong Huang,
Qingqing Huang,
Zhiying Huang,
Yuanyuan Huo,
Dongya Jia,
Chumin Li,
Feiya Li,
Hui Li,
Jiaxin Li,
Xiaoyang Li,
Xingxing Li,
Lin Liu,
Shouda Liu,
Sichao Liu
, et al. (21 additional authors not shown)
Abstract:
We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub…
▽ More
We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and subjective evaluations. With fine-tuning, we achieve even higher subjective scores across these metrics. Seed-TTS offers superior controllability over various speech attributes such as emotion and is capable of generating highly expressive and diverse speech for speakers in the wild. Furthermore, we propose a self-distillation method for speech factorization, as well as a reinforcement learning approach to enhance model robustness, speaker similarity, and controllability. We additionally present a non-autoregressive (NAR) variant of the Seed-TTS model, named $\text{Seed-TTS}_\text{DiT}$, which utilizes a fully diffusion-based architecture. Unlike previous NAR-based TTS systems, $\text{Seed-TTS}_\text{DiT}$ does not depend on pre-estimated phoneme durations and performs speech generation through end-to-end processing. We demonstrate that this variant achieves comparable performance to the language model-based variant and showcase its effectiveness in speech editing. We encourage readers to listen to demos at \url{https://bytedancespeech.github.io/seedtts_tech_report}.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
A 49.8mm2 Fully Integrated, 1.5m Transmission-Range, High-Data-Rate IR-UWB Transmitter for Brain Implants
Authors:
Cong Ding,
Mingxiang Gao,
Anja K. Skrivervik,
Mahsa Shoaran
Abstract:
To address the challenge of extending the transmission range of implantable TXs while also minimizing their size and power consumption, this paper introduces a transcutaneous, high data-rate, fully integrated IR-UWB transmitter that employs a novel co-designed power amplifier (PA) and antenna interface for enhanced performance. With the co-designed interface, we achieved the smallest footprint of…
▽ More
To address the challenge of extending the transmission range of implantable TXs while also minimizing their size and power consumption, this paper introduces a transcutaneous, high data-rate, fully integrated IR-UWB transmitter that employs a novel co-designed power amplifier (PA) and antenna interface for enhanced performance. With the co-designed interface, we achieved the smallest footprint of 49.8mm2 and the longest transmission range of 1.5m compared to the state-of-the-art IR-UWB TXs.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
SiamQuality: A ConvNet-Based Foundation Model for Imperfect Physiological Signals
Authors:
Cheng Ding,
Zhicheng Guo,
Zhaoliang Chen,
Randall J Lee,
Cynthia Rudin,
Xiao Hu
Abstract:
Foundation models, especially those using transformers as backbones, have gained significant popularity, particularly in language and language-vision tasks. However, large foundation models are typically trained on high-quality data, which poses a significant challenge, given the prevalence of poor-quality real-world data. This challenge is more pronounced for develo** foundation models for phys…
▽ More
Foundation models, especially those using transformers as backbones, have gained significant popularity, particularly in language and language-vision tasks. However, large foundation models are typically trained on high-quality data, which poses a significant challenge, given the prevalence of poor-quality real-world data. This challenge is more pronounced for develo** foundation models for physiological data; such data are often noisy, incomplete, or inconsistent. The present work aims to provide a toolset for develo** foundation models on physiological data. We leverage a large dataset of photoplethysmography (PPG) signals from hospitalized intensive care patients. For this data, we propose SimQuality, a novel self-supervised learning task based on convolutional neural networks (CNNs) as the backbone to enforce representations to be similar for good and poor quality signals that are from similar physiological states. We pre-trained the SimQuality on over 36 million 30-second PPG pairs and then fine-tuned and tested on six downstream tasks using external datasets. The results demonstrate the superiority of the proposed approach on all the downstream tasks, which are extremely important for heart monitoring on wearable devices. Our method indicates that CNNs can be an effective backbone for foundation models that are robust to training data quality.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
SQUWA: Signal Quality Aware DNN Architecture for Enhanced Accuracy in Atrial Fibrillation Detection from Noisy PPG Signals
Authors:
Runze Yan,
Cheng Ding,
Ran Xiao,
Aleksandr Fedorov,
Randall J Lee,
Fadi Nahab,
Xiao Hu
Abstract:
Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambu…
▽ More
Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambulatory settings. Conventional approaches typically discard corrupted segments or attempt to reconstruct original signals, allowing for the use of standard machine learning techniques. However, this reduces dataset size and introduces biases, compromising prediction accuracy and the effectiveness of continuous monitoring. We propose a novel deep learning model, Signal Quality Weighted Fusion of Attentional Convolution and Recurrent Neural Network (SQUWA), designed to learn how to retain accurate predictions from partially corrupted PPG. Specifically, SQUWA innovatively integrates an attention mechanism that directly considers signal quality during the learning process, dynamically adjusting the weights of time series segments based on their quality. This approach enhances the influence of higher-quality segments while reducing that of lower-quality ones, effectively utilizing partially corrupted segments. This approach represents a departure from the conventional methods that exclude such segments, enabling the utilization of a broader range of data, which has great implications for less disruption when monitoring of AF risks and more accurate estimation of AF burdens. Our extensive experiments show that SQUWA outperform existing PPG-based models, achieving the highest AUCPR of 0.89 with label noise mitigation. This also exceeds the 0.86 AUCPR of models trained with using both electrocardiogram (ECG) and PPG data.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
DMC4ML: Data Movement Complexity for Machine Learning
Authors:
Chen Ding,
Christopher Kanan,
Dylan McKellips,
Toranosuke Ozawa,
Arian Shahmirza,
Wesley Smith
Abstract:
The greatest demand for today's computing is machine learning. This paper analyzes three machine learning algorithms: transformers, spatial convolution, and FFT. The analysis is novel in three aspects. First, it measures the cost of memory access on an abstract memory hierarchy, instead of traditional time or space complexity. Second, the analysis is asymptotic and identifies the primary sources o…
▽ More
The greatest demand for today's computing is machine learning. This paper analyzes three machine learning algorithms: transformers, spatial convolution, and FFT. The analysis is novel in three aspects. First, it measures the cost of memory access on an abstract memory hierarchy, instead of traditional time or space complexity. Second, the analysis is asymptotic and identifies the primary sources of the memory cost. Finally, the result is symbolic, which can be used to select algorithmic parameters such as the group size in grouped query attention for any dimension size and number of heads and the batch size for batched convolution for any image size and kernel size.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Reconsideration on evaluation of machine learning models in continuous monitoring using wearables
Authors:
Cheng Ding,
Zhicheng Guo,
Cynthia Rudin,
Ran Xiao,
Fadi B Nahab,
Xiao Hu
Abstract:
This paper explores the challenges in evaluating machine learning (ML) models for continuous health monitoring using wearable devices beyond conventional metrics. We state the complexities posed by real-world variability, disease dynamics, user-specific characteristics, and the prevalence of false notifications, necessitating novel evaluation strategies. Drawing insights from large-scale heart stu…
▽ More
This paper explores the challenges in evaluating machine learning (ML) models for continuous health monitoring using wearable devices beyond conventional metrics. We state the complexities posed by real-world variability, disease dynamics, user-specific characteristics, and the prevalence of false notifications, necessitating novel evaluation strategies. Drawing insights from large-scale heart studies, the paper offers a comprehensive guideline for robust ML model evaluation on continuous health monitoring.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Photoplethysmography based atrial fibrillation detection: an updated review from July 2019
Authors:
Cheng Ding,
Ran Xiao,
Weijia Wang,
Elizabeth Holdsworth,
Xiao Hu
Abstract:
Atrial fibrillation (AF) is a prevalent cardiac arrhythmia associated with significant health ramifications, including an elevated susceptibility to ischemic stroke, heart disease, and heightened mortality. Photoplethysmography (PPG) has emerged as a promising technology for continuous AF monitoring for its cost-effectiveness and widespread integration into wearable devices. Our team previously co…
▽ More
Atrial fibrillation (AF) is a prevalent cardiac arrhythmia associated with significant health ramifications, including an elevated susceptibility to ischemic stroke, heart disease, and heightened mortality. Photoplethysmography (PPG) has emerged as a promising technology for continuous AF monitoring for its cost-effectiveness and widespread integration into wearable devices. Our team previously conducted an exhaustive review on PPG-based AF detection before June 2019. However, since then, more advanced technologies have emerged in this field. This paper offers a comprehensive review of the latest advancements in PPG-based AF detection, utilizing digital health and artificial intelligence (AI) solutions, within the timeframe spanning from July 2019 to December 2022. Through extensive exploration of scientific databases, we have identified 59 pertinent studies. Our comprehensive review encompasses an in-depth assessment of the statistical methodologies, traditional machine learning techniques, and deep learning approaches employed in these studies. In addition, we address the challenges encountered in the domain of PPG-based AF detection. Furthermore, we maintain a dedicated website to curate the latest research in this area, with regular updates on a regular basis.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
BGGAN: Generative AI Enables Representing Brain Structure-Function Connections for Alzheimer's Disease
Authors:
Chen Ding,
Shuqiang Wang
Abstract:
The relationship between brain structure and function is critical for revealing the pathogenesis of brain disease, including Alzheimer's disease (AD). However, it is a great challenge to map brain structure-function connections due to various reasons. In this work, a bidirectional graph generative adversarial networks (BGGAN) is proposed to represent brain structure-function connections. Specifica…
▽ More
The relationship between brain structure and function is critical for revealing the pathogenesis of brain disease, including Alzheimer's disease (AD). However, it is a great challenge to map brain structure-function connections due to various reasons. In this work, a bidirectional graph generative adversarial networks (BGGAN) is proposed to represent brain structure-function connections. Specifically, by designing a module incorporating inner graph convolution network (InnerGCN), the generators of BGGAN can employ features of direct and indirect brain regions to learn the map** function between structural domain and functional domain. Besides, a new module named Balancer is designed to counterpoise the optimization between generators and discriminators. By introducing the Balancer into BGGAN, both the structural generator and functional generator can not only alleviate the issue of mode collapse but also learn complementarity of structural and functional features. Experimental results using ADNI datasets show that the both the generated structure connections and generated function connections can improve the identification accuracy of AD. More importantly, based the proposed model, it is found that the relationship between brain structure and function is not a complete one-to-one correspondence. Brain structure is the basis of brain function. The strong structural connections are almost accompanied by strong functional connections.
△ Less
Submitted 5 October, 2023; v1 submitted 16 September, 2023;
originally announced September 2023.
-
Learned Kernels for Sparse, Interpretable, and Efficient Medical Time Series Processing
Authors:
Sully F. Chen,
Zhicheng Guo,
Cheng Ding,
Xiao Hu,
Cynthia Rudin
Abstract:
Background: Rapid, reliable, and accurate interpretation of medical signals is crucial for high-stakes clinical decision-making. The advent of deep learning allowed for an explosion of new models that offered unprecedented performance in medical time series processing but at a cost: deep learning models are often compute-intensive and lack interpretability.
Methods: We propose Sparse Mixture of…
▽ More
Background: Rapid, reliable, and accurate interpretation of medical signals is crucial for high-stakes clinical decision-making. The advent of deep learning allowed for an explosion of new models that offered unprecedented performance in medical time series processing but at a cost: deep learning models are often compute-intensive and lack interpretability.
Methods: We propose Sparse Mixture of Learned Kernels (SMoLK), an interpretable architecture for medical time series processing. The method learns a set of lightweight flexible kernels to construct a single-layer neural network, providing not only interpretability, but also efficiency and robustness. We introduce novel parameter reduction techniques to further reduce the size of our network. We demonstrate the power of our architecture on two important tasks: photoplethysmography (PPG) artifact detection and atrial fibrillation detection from single-lead electrocardiograms (ECGs). Our approach has performance similar to the state-of-the-art deep neural networks with several orders of magnitude fewer parameters, allowing for deep neural network level performance with extremely low-power wearable devices.
Results: Our interpretable method achieves greater than 99% of the performance of the state-of-the-art methods on the PPG artifact detection task, and even outperforms the state-of-the-art on a challenging out-of-distribution test set, while using dramatically fewer parameters (2% of the parameters of Segade, and about half of the parameters of Tiny-PPG). On single lead atrial fibrillation detection, our method matches the performance of a 1D-residual convolutional network, at less than 1% the parameter count, while exhibiting considerably better performance in the low-data regime, even when compared to a parameter-matched control deep network.
△ Less
Submitted 2 April, 2024; v1 submitted 6 July, 2023;
originally announced July 2023.
-
A Self-Supervised Algorithm for Denoising Photoplethysmography Signals for Heart Rate Estimation from Wearables
Authors:
Pranay Jain,
Cheng Ding,
Cynthia Rudin,
Xiao Hu
Abstract:
Smart watches and other wearable devices are equipped with photoplethysmography (PPG) sensors for monitoring heart rate and other aspects of cardiovascular health. However, PPG signals collected from such devices are susceptible to corruption from noise and motion artifacts, which cause errors in heart rate estimation. Typical denoising approaches filter or reconstruct the signal in ways that elim…
▽ More
Smart watches and other wearable devices are equipped with photoplethysmography (PPG) sensors for monitoring heart rate and other aspects of cardiovascular health. However, PPG signals collected from such devices are susceptible to corruption from noise and motion artifacts, which cause errors in heart rate estimation. Typical denoising approaches filter or reconstruct the signal in ways that eliminate much of the morphological information, even from the clean parts of the signal that would be useful to preserve. In this work, we develop an algorithm for denoising PPG signals that reconstructs the corrupted parts of the signal, while preserving the clean parts of the PPG signal. Our novel framework relies on self-supervised training, where we leverage a large database of clean PPG signals to train a denoising autoencoder. As we show, our reconstructed signals provide better estimates of heart rate from PPG signals than the leading heart rate estimation methods. Further experiments show significant improvement in Heart Rate Variability (HRV) estimation from PPG signals using our algorithm. We conclude that our algorithm denoises PPG signals in a way that can improve downstream analysis of many different health metrics from wearable devices.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Conceptual Study and Performance Analysis of Tandem Dual-Antenna Spaceborne SAR Interferometry
Authors:
Fengming Hu,
Feng Xu,
Xiaolan Qiu,
Chibiao Ding,
Yaqiu **
Abstract:
Multi-baseline synthetic aperture radar interferometry (MB-InSAR), capable of map** 3D surface model with high precision, is able to overcome the ill-posed problem in the single-baseline InSAR by use of the baseline diversity. Single pass MB acquisition with the advantages of high coherence and simple phase components has a more practical capability in 3D reconstruction than conventional repeat-…
▽ More
Multi-baseline synthetic aperture radar interferometry (MB-InSAR), capable of map** 3D surface model with high precision, is able to overcome the ill-posed problem in the single-baseline InSAR by use of the baseline diversity. Single pass MB acquisition with the advantages of high coherence and simple phase components has a more practical capability in 3D reconstruction than conventional repeat-pass MB acquisition. Using an asymptotic 3D phase unwrap** (PU), it is possible to get a reliable 3D reconstruction using very sparse acquisitions but the interferograms should follow the optimal baseline design. However, current spaceborne SAR system doesn't satisfy this principle, inducing more difficulties in practical application. In this article, a new concept of Tandem Dual-Antenna SAR Interferometry (TDA-InSAR) system for single-pass reliable 3D surface map** using the asymptotic 3D PU is proposed. Its optimal MB acquisition is analyzed to achieve both good relative height precision and flexible baseline design. Two indicators, i.e., expected relative height precision and successful phase unwrap** rate, are selected to optimize the system parameters and evaluate the performance of various baseline configurations. Additionally, simulation-based demonstrations are conducted to evaluate the performance in typical scenarios and investigate the impact of various error sources. The results indicate that the proposed TDA-InSAR is able to get the specified MB acquisition for the asymptotic 3D PU, which offers a feasible solution for single-pass 3D SAR imaging.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Learning From Alarms: A Robust Learning Approach for Accurate Photoplethysmography-Based Atrial Fibrillation Detection using Eight Million Samples Labeled with Imprecise Arrhythmia Alarms
Authors:
Cheng Ding,
Zhicheng Guo,
Cynthia Rudin,
Ran Xiao,
Amit Shah,
Duc H. Do,
Randall J Lee,
Gari Clifford,
Fadi B Nahab,
Xiao Hu
Abstract:
Atrial fibrillation (AF) is a common cardiac arrhythmia with serious health consequences if not detected and treated early. Detecting AF using wearable devices with photoplethysmography (PPG) sensors and deep neural networks has demonstrated some success using proprietary algorithms in commercial solutions. However, further advancement of this paradigm of continuous AF detection in ambulatory sett…
▽ More
Atrial fibrillation (AF) is a common cardiac arrhythmia with serious health consequences if not detected and treated early. Detecting AF using wearable devices with photoplethysmography (PPG) sensors and deep neural networks has demonstrated some success using proprietary algorithms in commercial solutions. However, further advancement of this paradigm of continuous AF detection in ambulatory settings, towards a population-wide screening use case, still faces several challenges, one of which is the lack of large-scale labeled training data. To address this challenge, in this study, we propose to leverage AF alarms from bedside patient monitors to label concurrent PPG signals, resulting in the largest PPG-AF dataset so far (8.5M 30-second records from 24100 patients) and demonstrating a practical approach to build large labeled PPG datasets. Furthermore, we recognize that the AF labels thus obtained contain errors because of false AF alarms generated from imperfect built-in algorithms from bedside monitors. Dealing with label noise with unknown distribution characteristics in this case requires advanced algorithms. We, therefore, introduce and open source a novel loss design, the cluster membership consistency (CMC) loss, to mitigate label errors. By comparing CMC with state-of-the-art methods selected from a noisy label competition, we demonstrate its superiority in multiple aspects including handling label noise in PPG data, resilience to poor-quality signals, and computational efficiency.
△ Less
Submitted 12 November, 2023; v1 submitted 7 November, 2022;
originally announced November 2022.
-
A 16-Channel Low-Power Neural Connectivity Extraction and Phase-Locked Deep Brain Stimulation SoC
Authors:
Uisub Shin,
Cong Ding,
Virginia Woods,
Alik S. Widge,
Mahsa Shoaran
Abstract:
Growing evidence suggests that phase-locked deep brain stimulation (DBS) can effectively regulate abnormal brain connectivity in neurological and psychiatric disorders. This letter therefore presents a low-power SoC with both neural connectivity extraction and phase-locked DBS capabilities. A 16-channel low-noise analog front-end (AFE) records local field potentials (LFPs) from multiple brain regi…
▽ More
Growing evidence suggests that phase-locked deep brain stimulation (DBS) can effectively regulate abnormal brain connectivity in neurological and psychiatric disorders. This letter therefore presents a low-power SoC with both neural connectivity extraction and phase-locked DBS capabilities. A 16-channel low-noise analog front-end (AFE) records local field potentials (LFPs) from multiple brain regions with precise gain matching. A novel low-complexity phase estimator and neural connectivity processor subsequently enable energy-efficient, yet accurate measurement of the instantaneous phase and cross-regional synchrony measures. Through flexible combination of neural biomarkers such as phase synchrony and spectral energy, a four-channel charge-balanced neurostimulator is triggered to treat various pathological brain conditions. Fabricated in 65nm CMOS, the SoC occupies a silicon area of 2.24mm2 and consumes 60uW, achieving over 60% power saving in neural connectivity extraction compared to the state-of-the-art. Extensive in-vivo measurements demonstrate multi-channel LFP recording, real-time extraction of phase and neural connectivity measures, and phase-locked stimulation in rats.
△ Less
Submitted 3 July, 2022;
originally announced July 2022.
-
MetaLR: Meta-tuning of Learning Rates for Transfer Learning in Medical Imaging
Authors:
Yixiong Chen,
Li Liu,
**gxian Li,
Hua Jiang,
Chris Ding,
Zongwei Zhou
Abstract:
In medical image analysis, transfer learning is a powerful method for deep neural networks (DNNs) to generalize well on limited medical data. Prior efforts have focused on develo** pre-training algorithms on domains such as lung ultrasound, chest X-ray, and liver CT to bridge domain gaps. However, we find that model fine-tuning also plays a crucial role in adapting medical knowledge to target ta…
▽ More
In medical image analysis, transfer learning is a powerful method for deep neural networks (DNNs) to generalize well on limited medical data. Prior efforts have focused on develo** pre-training algorithms on domains such as lung ultrasound, chest X-ray, and liver CT to bridge domain gaps. However, we find that model fine-tuning also plays a crucial role in adapting medical knowledge to target tasks. The common fine-tuning method is manually picking transferable layers (e.g., the last few layers) to update, which is labor-expensive. In this work, we propose a meta-learning-based LR tuner, named MetaLR, to make different layers automatically co-adapt to downstream tasks based on their transferabilities across domains. MetaLR learns appropriate LRs for different layers in an online manner, preventing highly transferable layers from forgetting their medical representation abilities and driving less transferable layers to adapt actively to new domains. Extensive experiments on various medical applications show that MetaLR outperforms previous state-of-the-art (SOTA) fine-tuning strategies. Codes are released.
△ Less
Submitted 29 May, 2023; v1 submitted 3 June, 2022;
originally announced June 2022.
-
NeuralTree: A 256-Channel 0.227-$μ$J/Class Versatile Neural Activity Classification and Closed-Loop Neuromodulation SoC
Authors:
Uisub Shin,
Cong Ding,
Bingzhao Zhu,
Yashwanth Vyza,
Alix Trouillet,
Emilie C. M. Revol,
Stéphanie P. Lacour,
Mahsa Shoaran
Abstract:
Closed-loop neural interfaces with on-chip machine learning can detect and suppress disease symptoms in neurological disorders or restore lost functions in paralyzed patients. While high-density neural recording can provide rich neural activity information for accurate disease-state detection, existing systems have low channel counts and poor scalability, which could limit their therapeutic effica…
▽ More
Closed-loop neural interfaces with on-chip machine learning can detect and suppress disease symptoms in neurological disorders or restore lost functions in paralyzed patients. While high-density neural recording can provide rich neural activity information for accurate disease-state detection, existing systems have low channel counts and poor scalability, which could limit their therapeutic efficacy. This work presents a highly scalable and versatile closed-loop neural interface SoC that can overcome these limitations. A 256-channel time-division multiplexed (TDM) front-end with a two-step fast-settling mixed-signal DC servo loop (DSL) is proposed to record high-spatial-resolution neural activity and perform channel-selective brain-state inference. A tree-structured neural network (NeuralTree) classification processor extracts a rich set of neural biomarkers in a patient- and disease-specific manner. Trained with an energy-aware learning algorithm, the NeuralTree classifier detects the symptoms of underlying disorders (e.g., epilepsy and movement disorders) at an optimal energy-accuracy tradeoff. A 16-channel high-voltage (HV) compliant neurostimulator closes the therapeutic loop by delivering charge-balanced biphasic current pulses to the brain. The proposed SoC was fabricated in 65-nm CMOS and achieved a 0.227-$μ$J/class energy efficiency in a compact area of 0.014mm$^2$/channel. The SoC was extensively verified on human electroencephalography (EEG) and intracranial EEG (iEEG) epilepsy datasets, obtaining 95.6%/94% sensitivity and 96.8%/96.9% specificity, respectively. In vivo neural recordings using soft $μ$ECoG arrays and multi-domain biomarker extraction were further performed on a rat model of epilepsy. In addition, for the first time in literature, on-chip classification of rest-state tremor in Parkinson's disease (PD) from human local field potentials (LFPs) was demonstrated.
△ Less
Submitted 8 December, 2022; v1 submitted 12 May, 2022;
originally announced May 2022.
-
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition
Authors:
Qianying Liu,
Zhuo Gong,
Zhengdong Yang,
Yuhang Yang,
Sheng Li,
Chenchen Ding,
Nobuaki Minematsu,
Hao Huang,
Fei Cheng,
Chenhui Chu,
Sadao Kurohashi
Abstract:
Low-resource speech recognition has been long-suffering from insufficient training data. In this paper, we propose an approach that leverages neighboring languages to improve low-resource scenario performance, founded on the hypothesis that similar linguistic units in neighboring languages exhibit comparable term frequency distributions, which enables us to construct a Huffman tree for performing…
▽ More
Low-resource speech recognition has been long-suffering from insufficient training data. In this paper, we propose an approach that leverages neighboring languages to improve low-resource scenario performance, founded on the hypothesis that similar linguistic units in neighboring languages exhibit comparable term frequency distributions, which enables us to construct a Huffman tree for performing multilingual hierarchical Softmax decoding. This hierarchical structure enables cross-lingual knowledge sharing among similar tokens, thereby enhancing low-resource training outcomes. Empirical analyses demonstrate that our method is effective in improving the accuracy and efficiency of low-resource speech recognition.
△ Less
Submitted 30 April, 2023; v1 submitted 8 April, 2022;
originally announced April 2022.
-
Identification of diffracted vortex beams at different propagation distances using deep learning
Authors:
Heng Lv,
Yan Guo,
Zi-Xiang Yang,
Chunling Ding,
Wu-Hao Cai,
Chenglong You,
Rui-Bo **
Abstract:
Orbital angular momentum of light is regarded as a valuable resource in quantum technology, especially in quantum communication and quantum sensing and ranging. However, the OAM state of light is susceptible to undesirable experimental conditions such as propagation distance and phase distortions, which hinders the potential for the realistic implementation of relevant technologies. In this articl…
▽ More
Orbital angular momentum of light is regarded as a valuable resource in quantum technology, especially in quantum communication and quantum sensing and ranging. However, the OAM state of light is susceptible to undesirable experimental conditions such as propagation distance and phase distortions, which hinders the potential for the realistic implementation of relevant technologies. In this article, we exploit an enhanced deep learning neural network to identify different OAM modes of light at multiple propagation distances with phase distortions. Specifically, our trained deep learning neural network can efficiently identify the vortex beam's topological charge and propagation distance with 97% accuracy. Our technique has important implications for OAM based communication and sensing protocols.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
A Novel Gradient Descent Least Squares (GDLS) Algorithm for Efficient SMV Gridless Line Spectrum Estimation with Applications in Tomographic SAR Imaging
Authors:
Ruizhe Shi,
Zhe Zhang,
Xiaolan Qiu,
Chibiao Ding
Abstract:
This paper presents a novel efficient method for gridless line spectrum estimation problem with single snapshot, namely the gradient descent least squares (GDLS) method. Conventional single snapshot (a.k.a. single measure vector or SMV) line spectrum estimation methods either rely on smoothing techniques that sacrifice the array aperture, or adopt the sparsity constraint and utilize compressed sen…
▽ More
This paper presents a novel efficient method for gridless line spectrum estimation problem with single snapshot, namely the gradient descent least squares (GDLS) method. Conventional single snapshot (a.k.a. single measure vector or SMV) line spectrum estimation methods either rely on smoothing techniques that sacrifice the array aperture, or adopt the sparsity constraint and utilize compressed sensing (CS) method by defining prior grids and resulting in the off-grid problem. Recently emerged atomic norm minimization (ANM) methods achieved gridless SMV line spectrum estimation, but its computational complexity is extremely high; thus it is practically infeasible in real applications with large problem scales. Our proposed GDLS method reformulates the line spectrum estimations problem into a least squares (LS) estimation problem and solves the corresponding objective function via gradient descent algorithm in an iterative fashion with efficiency. The convergence guarantee, computational complexity, as well as performance analysis are discussed in this paper. Numerical simulations and real data experiments show that the proposed GDLS algorithm outperforms the state-of-the-art methods e.g., CS and ANM, in terms of estimation performances. It can completely avoid the off-grid problem, and its computational complexity is significantly lower than ANM. Our method has been tested in tomographic SAR (TomoSAR) imaging applications via simulated and real experiment data. Results show great potential of the proposed method in terms of better cloud point performance and eliminating the gridding effect.
△ Less
Submitted 27 April, 2022; v1 submitted 16 March, 2022;
originally announced March 2022.
-
The ByteDance Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2021
Authors:
Keke Wang,
Xudong Mao,
Hao Wu,
Chen Ding,
Chuxiang Shang,
Rui Xia,
Yuxuan Wang
Abstract:
This paper describes the ByteDance speaker diarization system for the fourth track of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). The VoxSRC-21 provides both the dev set and test set of VoxConverse for use in validation and a standalone test set for evaluation. We first collect the duration and signal-to-noise ratio (SNR) of all audio and find that the distribution of the VoxConve…
▽ More
This paper describes the ByteDance speaker diarization system for the fourth track of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). The VoxSRC-21 provides both the dev set and test set of VoxConverse for use in validation and a standalone test set for evaluation. We first collect the duration and signal-to-noise ratio (SNR) of all audio and find that the distribution of the VoxConverse's test set and the VoxSRC-21's test set is more closer. Our system consists of voice active detection (VAD), speaker embedding extraction, spectral clustering followed by a re-clustering step based on agglomerative hierarchical clustering (AHC) and overlapped speech detection and handling. Finally, we integrate systems with different time scales using DOVER-Lap. Our best system achieves 5.15\% of the diarization error rate (DER) on evaluation set, ranking the second at the diarization track of the challenge.
△ Less
Submitted 5 September, 2021;
originally announced September 2021.
-
Log-Spectral Matching GAN: PPG-based Atrial Fibrillation Detection can be Enhanced by GAN-based Data Augmentation with Integration of Spectral Loss
Authors:
Cheng Ding,
Ran Xiao,
Duc Do,
David Scott Lee,
Shadi Kalantarian,
Randall J Lee,
Xiao Hu
Abstract:
Photoplethysmography (PPG) is a ubiquitous physiological measurement that detects beat-to-beat pulsatile blood volume changes and hence has a potential for monitoring cardiovascular conditions, particularly in ambulatory settings. A PPG dataset that is created for a particular use case is often imbalanced, due to a low prevalence of the pathological condition it targets to predict and the paroxysm…
▽ More
Photoplethysmography (PPG) is a ubiquitous physiological measurement that detects beat-to-beat pulsatile blood volume changes and hence has a potential for monitoring cardiovascular conditions, particularly in ambulatory settings. A PPG dataset that is created for a particular use case is often imbalanced, due to a low prevalence of the pathological condition it targets to predict and the paroxysmal nature of the condition as well. To tackle this problem, we propose log-spectral matching GAN (LSM-GAN), a generative model that can be used as a data augmentation technique to alleviate the class imbalance in a PPG dataset to train a classifier. LSM-GAN utilizes a novel generator that generates a synthetic signal without a up-sampling process of input white noises, as well as adds the mismatch between real and synthetic signals in frequency domain to the conventional adversarial loss. In this study, experiments are designed focusing on examining how the influence of LSM-GAN as a data augmentation technique on one specific classification task - atrial fibrillation (AF) detection using PPG. We show that by taking spectral information into consideration, LSM-GAN as a data augmentation solution can generate more realistic PPG signals. The code of LSM-GAN is available at https://github.com/chengding0713/Log-Spectral-matching-GAN.
△ Less
Submitted 31 January, 2022; v1 submitted 11 August, 2021;
originally announced August 2021.
-
A Multi-Agent Reinforcement Learning Approach For Safe and Efficient Behavior Planning Of Connected Autonomous Vehicles
Authors:
Songyang Han,
Shanglin Zhou,
Jiangwei Wang,
Lynn Pepin,
Caiwen Ding,
Jie Fu,
Fei Miao
Abstract:
The recent advancements in wireless technology enable connected autonomous vehicles (CAVs) to gather information about their environment by vehicle-to-vehicle (V2V) communication. In this work, we design an information-sharing-based multi-agent reinforcement learning (MARL) framework for CAVs, to take advantage of the extra information when making decisions to improve traffic efficiency and safety…
▽ More
The recent advancements in wireless technology enable connected autonomous vehicles (CAVs) to gather information about their environment by vehicle-to-vehicle (V2V) communication. In this work, we design an information-sharing-based multi-agent reinforcement learning (MARL) framework for CAVs, to take advantage of the extra information when making decisions to improve traffic efficiency and safety. The safe actor-critic algorithm we propose has two new techniques: the truncated Q-function and safe action map**. The truncated Q-function utilizes the shared information from neighboring CAVs such that the joint state and action spaces of the Q-function do not grow in our algorithm for a large-scale CAV system. We prove the bound of the approximation error between the truncated-Q and global Q-functions. The safe action map** provides a provable safety guarantee for both the training and execution based on control barrier functions. Using the CARLA simulator for experiments, we show that our approach can improve the CAV system's efficiency in terms of average velocity and comfort under different CAV ratios and different traffic densities. We also show that our approach avoids the execution of unsafe actions and always maintains a safe distance from other vehicles. We construct an obstacle-at-corner scenario to show that the shared vision can help CAVs to observe obstacles earlier and take action to avoid traffic jams.
△ Less
Submitted 3 September, 2022; v1 submitted 9 March, 2020;
originally announced March 2020.
-
Joint Beamforming and Computation Offloading for Multi-user Mobile-Edge Computing
Authors:
Changfeng Ding,
Jun-Bo Wang,
Ming Cheng,
Chuanwen Chang,
**-Yuan Wang,
Min Lin
Abstract:
Mobile edge computing (MEC) is considered as an efficient method to relieve the computation burden of mobile devices. In order to reduce the energy consumption and time delay of mobile devices (MDs) in MEC, multiple users multiple input and multiple output (MU-MIMO) communications is considered to be applied to the MEC system. The purpose of this paper is to minimize the weighted sum of energy con…
▽ More
Mobile edge computing (MEC) is considered as an efficient method to relieve the computation burden of mobile devices. In order to reduce the energy consumption and time delay of mobile devices (MDs) in MEC, multiple users multiple input and multiple output (MU-MIMO) communications is considered to be applied to the MEC system. The purpose of this paper is to minimize the weighted sum of energy consumption and time delay of MDs by jointly considering the offloading decision and MU-MIMO beamforming problems. And the resulting optimization problem is a mixed-integer non-linear programming problem, which is NP-hard. To solve the optimization problem, a semidefinite relaxation based algorithm is proposed to solve the offloading decision problem. Then, the MU-MIMO beamforming design problem is handled with a newly proposed fractional programming method. Simulation results show that the proposed algorithms can effectively reduce the energy consumption and time delay of the computation offloading.
△ Less
Submitted 4 January, 2020;
originally announced January 2020.
-
A SOT-MRAM-based Processing-In-Memory Engine for Highly Compressed DNN Implementation
Authors:
Geng Yuan,
Xiaolong Ma,
Sheng Lin,
Zhengang Li,
Caiwen Ding
Abstract:
The computing wall and data movement challenges of deep neural networks (DNNs) have exposed the limitations of conventional CMOS-based DNN accelerators. Furthermore, the deep structure and large model size will make DNNs prohibitive to embedded systems and IoT devices, where low power consumption are required. To address these challenges, spin orbit torque magnetic random-access memory (SOT-MRAM)…
▽ More
The computing wall and data movement challenges of deep neural networks (DNNs) have exposed the limitations of conventional CMOS-based DNN accelerators. Furthermore, the deep structure and large model size will make DNNs prohibitive to embedded systems and IoT devices, where low power consumption are required. To address these challenges, spin orbit torque magnetic random-access memory (SOT-MRAM) and SOT-MRAM based Processing-In-Memory (PIM) engines have been used to reduce the power consumption of DNNs since SOT-MRAM has the characteristic of near-zero standby power, high density, none-volatile. However, the drawbacks of SOT-MRAM based PIM engines such as high writing latency and requiring low bit-width data decrease its popularity as a favorable energy efficient DNN accelerator. To mitigate these drawbacks, we propose an ultra energy efficient framework by using model compression techniques including weight pruning and quantization from the software level considering the architecture of SOT-MRAM PIM. And we incorporate the alternating direction method of multipliers (ADMM) into the training phase to further guarantee the solution feasibility and satisfy SOT-MRAM hardware constraints. Thus, the footprint and power consumption of SOT-MRAM PIM can be reduced, while increasing the overall system throughput at the meantime, making our proposed ADMM-based SOT-MRAM PIM more energy efficiency and suitable for embedded systems or IoT devices. Our experimental results show the accuracy and compression rate of our proposed framework is consistently outperforming the reference works, while the efficiency (area \& power) and throughput of SOT-MRAM PIM engine is significantly improved.
△ Less
Submitted 24 November, 2019;
originally announced December 2019.
-
Algorithmic Design and Implementation of Unobtrusive Multistatic Serial LiDAR Image
Authors:
Chi Ding,
Zheng Cao,
Matthew S. Emigh,
Jose C. Principe,
Bing Ouyang,
Anni Vuorenkoski,
Fraser Dalgleish,
Brian Ramos,
Yanjun Li
Abstract:
To fully understand interactions between marine hydrokinetic (MHK) equipment and marine animals, a fast and effective monitoring system is required to capture relevant information whenever underwater animals appear. A new automated underwater imaging system composed of LiDAR (Light Detection and Ranging) imaging hardware and a scene understanding software module named Unobtrusive Multistatic Seria…
▽ More
To fully understand interactions between marine hydrokinetic (MHK) equipment and marine animals, a fast and effective monitoring system is required to capture relevant information whenever underwater animals appear. A new automated underwater imaging system composed of LiDAR (Light Detection and Ranging) imaging hardware and a scene understanding software module named Unobtrusive Multistatic Serial LiDAR Imager (UMSLI) to supervise the presence of animals near turbines. UMSLI integrates the front end LiDAR hardware and a series of software modules to achieve image preprocessing, detection, tracking, segmentation and classification in a hierarchical manner.
△ Less
Submitted 8 November, 2019;
originally announced November 2019.
-
Deep Compressed Pneumonia Detection for Low-Power Embedded Devices
Authors:
Hongjia Li,
Sheng Lin,
Ning Liu,
Caiwen Ding,
Yanzhi Wang
Abstract:
Deep neural networks (DNNs) have been expanded into medical fields and triggered the revolution of some medical applications by extracting complex features and achieving high accuracy and performance, etc. On the contrast, the large-scale network brings high requirements of both memory storage and computation resource, especially for portable medical devices and other embedded systems. In this wor…
▽ More
Deep neural networks (DNNs) have been expanded into medical fields and triggered the revolution of some medical applications by extracting complex features and achieving high accuracy and performance, etc. On the contrast, the large-scale network brings high requirements of both memory storage and computation resource, especially for portable medical devices and other embedded systems. In this work, we first train a DNN for pneumonia detection using the dataset provided by RSNA Pneumonia Detection Challenge. To overcome hardware limitation for implementing large-scale networks, we develop a systematic structured weight pruning method with filter sparsity, column sparsity and combined sparsity. Experiments show that we can achieve up to 36x compression ratio compared to the original model with 106 layers, while maintaining no accuracy degradation. We evaluate the proposed methods on an embedded low-power device, Jetson TX2, and achieve low power usage and high energy efficiency.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
Tiny but Accurate: A Pruned, Quantized and Optimized Memristor Crossbar Framework for Ultra Efficient DNN Implementation
Authors:
Xiaolong Ma,
Geng Yuan,
Sheng Lin,
Caiwen Ding,
Fuxun Yu,
Tao Liu,
Wujie Wen,
Xiang Chen,
Yanzhi Wang
Abstract:
The state-of-art DNN structures involve intensive computation and high memory storage. To mitigate the challenges, the memristor crossbar array has emerged as an intrinsically suitable matrix computation and low-power acceleration framework for DNN applications. However, the high accuracy solution for extreme model compression on memristor crossbar array architecture is still waiting for unravelin…
▽ More
The state-of-art DNN structures involve intensive computation and high memory storage. To mitigate the challenges, the memristor crossbar array has emerged as an intrinsically suitable matrix computation and low-power acceleration framework for DNN applications. However, the high accuracy solution for extreme model compression on memristor crossbar array architecture is still waiting for unraveling. In this paper, we propose a memristor-based DNN framework which combines both structured weight pruning and quantization by incorporating alternating direction method of multipliers (ADMM) algorithm for better pruning and quantization performance. We also discover the non-optimality of the ADMM solution in weight pruning and the unused data path in a structured pruned model. Motivated by these discoveries, we design a software-hardware co-optimization framework which contains the first proposed Network Purification and Unused Path Removal algorithms targeting on post-processing a structured pruned model after ADMM steps. By taking memristor hardware constraints into our whole framework, we achieve extreme high compression ratio on the state-of-art neural network structures with minimum accuracy loss. For quantizing structured pruned model, our framework achieves nearly no accuracy loss after quantizing weights to 8-bit memristor weight representation. We share our models at anonymous link https://bit.ly/2VnMUy0.
△ Less
Submitted 27 August, 2019;
originally announced August 2019.
-
A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron SuperconductingTechnology
Authors:
Ruizhe Cai,
Ao Ren,
Olivia Chen,
Ning Liu,
Caiwen Ding,
Xuehai Qian,
Jie Han,
Wenhui Luo,
Nobuyuki Yoshikawa,
Yanzhi Wang
Abstract:
The Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology has been recently developed, which achieves the highest energy efficiency among superconducting logic families, potentially huge gain compared with state-of-the-art CMOS. In 2016, the successful fabrication and testing of AQFP-based circuits with the scale of 83,000 JJs have demonstrated the scalability and potential of implem…
▽ More
The Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology has been recently developed, which achieves the highest energy efficiency among superconducting logic families, potentially huge gain compared with state-of-the-art CMOS. In 2016, the successful fabrication and testing of AQFP-based circuits with the scale of 83,000 JJs have demonstrated the scalability and potential of implementing large-scale systems using AQFP. As a result, it will be promising for AQFP in high-performance computing and deep space applications, with Deep Neural Network (DNN) inference acceleration as an important example. Besides ultra-high energy efficiency, AQFP exhibits two unique characteristics: the deep pipelining nature since each AQFP logic gate is connected with an AC clock signal, which increases the difficulty to avoid RAW hazards; the second is the unique opportunity of true random number generation (RNG) using a single AQFP buffer, far more efficient than RNG in CMOS. We point out that these two characteristics make AQFP especially compatible with the \emph{stochastic computing} (SC) technique, which uses a time-independent bit sequence for value representation, and is compatible with the deep pipelining nature. Further, the application of SC has been investigated in DNNs in prior work, and the suitability has been illustrated as SC is more compatible with approximate computations. This work is the first to develop an SC-based DNN acceleration framework using AQFP technology.
△ Less
Submitted 21 July, 2019;
originally announced July 2019.
-
One-pass Multi-task Networks with Cross-task Guided Attention for Brain Tumor Segmentation
Authors:
Chenhong Zhou,
Changxing Ding,
Xinchao Wang,
Zhentai Lu,
Dacheng Tao
Abstract:
Class imbalance has emerged as one of the major challenges for medical image segmentation. The model cascade (MC) strategy significantly alleviates the class imbalance issue via running a set of individual deep models for coarse-to-fine segmentation. Despite its outstanding performance, however, this method leads to undesired system complexity and also ignores the correlation among the models. To…
▽ More
Class imbalance has emerged as one of the major challenges for medical image segmentation. The model cascade (MC) strategy significantly alleviates the class imbalance issue via running a set of individual deep models for coarse-to-fine segmentation. Despite its outstanding performance, however, this method leads to undesired system complexity and also ignores the correlation among the models. To handle these flaws, we propose a light-weight deep model, i.e., the One-pass Multi-task Network (OM-Net) to solve class imbalance better than MC does, while requiring only one-pass computation. First, OM-Net integrates the separate segmentation tasks into one deep model, which consists of shared parameters to learn joint features, as well as task-specific parameters to learn discriminative features. Second, to more effectively optimize OM-Net, we take advantage of the correlation among tasks to design both an online training data transfer strategy and a curriculum learning-based training strategy. Third, we further propose sharing prediction results between tasks and design a cross-task guided attention (CGA) module which can adaptively recalibrate channel-wise feature responses based on the category-specific statistics. Finally, a simple yet effective post-processing method is introduced to refine the segmentation results. Extensive experiments are conducted to demonstrate the effectiveness of the proposed techniques. Most impressively, we achieve state-of-the-art performance on the BraTS 2015 testing set and BraTS 2017 online validation set. Using these proposed approaches, we also won joint third place in the BraTS 2018 challenge among 64 participating teams. The code is publicly available at https://github.com/chenhong-zhou/OM-Net.
△ Less
Submitted 7 February, 2020; v1 submitted 4 June, 2019;
originally announced June 2019.
-
E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs
Authors:
Zhe Li,
Caiwen Ding,
Siyue Wang,
Wujie Wen,
Youwei Zhuo,
Chang Liu,
Qinru Qiu,
Wenyao Xu,
Xue Lin,
Xuehai Qian,
Yanzhi Wang
Abstract:
Recurrent Neural Networks (RNNs) are becoming increasingly important for time series-related applications which require efficient and real-time implementations. The two major types are Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. It is a challenging task to have real-time, efficient, and accurate hardware RNN implementations because of the high sensitivity to imprecision…
▽ More
Recurrent Neural Networks (RNNs) are becoming increasingly important for time series-related applications which require efficient and real-time implementations. The two major types are Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. It is a challenging task to have real-time, efficient, and accurate hardware RNN implementations because of the high sensitivity to imprecision accumulation and the requirement of special activation function implementations.
A key limitation of the prior works is the lack of a systematic design optimization framework of RNN model and hardware implementations, especially when the block size (or compression ratio) should be jointly optimized with RNN type, layer size, etc. In this paper, we adopt the block-circulant matrix-based framework, and present the Efficient RNN (E-RNN) framework for FPGA implementations of the Automatic Speech Recognition (ASR) application. The overall goal is to improve performance/energy efficiency under accuracy requirement. We use the alternating direction method of multipliers (ADMM) technique for more accurate block-circulant training, and present two design explorations providing guidance on block size and reducing RNN training trials. Based on the two observations, we decompose E-RNN in two phases: Phase I on determining RNN model to reduce computation and storage subject to accuracy requirement, and Phase II on hardware implementations given RNN model, including processing element design/optimization, quantization, activation implementation, etc. Experimental results on actual FPGA deployments show that E-RNN achieves a maximum energy efficiency improvement of 37.4$\times$ compared with ESE, and more than 2$\times$ compared with C-LSTM, under the same accuracy.
△ Less
Submitted 12 December, 2018;
originally announced December 2018.
-
Audio-only Bird Species Automated Identification Method with Limited Training Data Based on Multi-Channel Deep Convolutional Neural Networks
Authors:
Jiang-jian Xie,
Chang-qing Ding,
Wen-bin Li,
Cheng-hao Cai
Abstract:
Based on the transfer learning, we design a bird species identification model that uses the VGG-16 model (pretrained on ImageNet) for feature extraction, then a classifier consisting of two fully-connected hidden layers and a Softmax layer is attached. We compare the performance of the proposed model with the original VGG16 model. The results show that the former has higher train efficiency, but l…
▽ More
Based on the transfer learning, we design a bird species identification model that uses the VGG-16 model (pretrained on ImageNet) for feature extraction, then a classifier consisting of two fully-connected hidden layers and a Softmax layer is attached. We compare the performance of the proposed model with the original VGG16 model. The results show that the former has higher train efficiency, but lower mean average precisions(MAP). To improve the MAP of the proposed model, we investigate the result fusion mode to form multi-channel identification model, the best MAP reaches 0.9998. The number of model parameters is 13110, which is only 0.0082% of the VGG16 model. Also, the size demand of sample is decreased.
△ Less
Submitted 3 March, 2018;
originally announced March 2018.
-
An Optimal Control Approach to the Persistent Monitoring Problem
Authors:
Christos. G. Cassandras,
Xuchao Lin,
Xu Chu Ding
Abstract:
We propose an optimal control framework for persistent monitoring problems where the objective is to control the movement of mobile nodes to minimize an uncertainty metric in a given mission space. For multi agent in a one-dimensional mission space, we show that the optimal solution is obtained in terms of a sequence of switching locations and waiting time on these switching points, thus reducing…
▽ More
We propose an optimal control framework for persistent monitoring problems where the objective is to control the movement of mobile nodes to minimize an uncertainty metric in a given mission space. For multi agent in a one-dimensional mission space, we show that the optimal solution is obtained in terms of a sequence of switching locations and waiting time on these switching points, thus reducing it to a parametric optimization problem. Using Infinitesimal Perturbation Analysis (IPA) we obtain a complete solution through a gradient-based algorithm. We also discuss a receding horizon controller which is capable of obtaining a near-optimal solution on-the-fly.
△ Less
Submitted 27 February, 2012;
originally announced February 2012.
-
Temporal Logic Motion Control using Actor-Critic Methods
Authors:
Xu Chu Ding,
**g Wang,
Morteza Lahijanian,
Ioannis Ch. Paschalidis,
Calin A. Belta
Abstract:
In this paper, we consider the problem of deploying a robot from a specification given as a temporal logic statement about some properties satisfied by the regions of a large, partitioned environment. We assume that the robot has noisy sensors and actuators and model its motion through the regions of the environment as a Markov Decision Process (MDP). The robot control problem becomes finding the…
▽ More
In this paper, we consider the problem of deploying a robot from a specification given as a temporal logic statement about some properties satisfied by the regions of a large, partitioned environment. We assume that the robot has noisy sensors and actuators and model its motion through the regions of the environment as a Markov Decision Process (MDP). The robot control problem becomes finding the control policy maximizing the probability of satisfying the temporal logic task on the MDP. For a large environment, obtaining transition probabilities for each state-action pair, as well as solving the necessary optimization problem for the optimal policy are usually not computationally feasible. To address these issues, we propose an approximate dynamic programming framework based on a least-square temporal difference learning method of the actor-critic type. This framework operates on sample paths of the robot and optimizes a randomized control policy with respect to a small set of parameters. The transition probabilities are obtained only when needed. Hardware-in-the-loop simulations confirm that convergence of the parameters translates to an approximately optimal policy.
△ Less
Submitted 23 February, 2012; v1 submitted 9 February, 2012;
originally announced February 2012.
-
Synthesis of Distributed Control and Communication Schemes from Global LTL Specifications
Authors:
Yushan Chen,
Xu Chu Ding,
Calin Belta
Abstract:
We introduce a technique for synthesis of control and communication strategies for a team of agents from a global task specification given as a Linear Temporal Logic (LTL) formula over a set of properties that can be satisfied by the agents. We consider a purely discrete scenario, in which the dynamics of each agent is modeled as a finite transition system. The proposed computational framework con…
▽ More
We introduce a technique for synthesis of control and communication strategies for a team of agents from a global task specification given as a Linear Temporal Logic (LTL) formula over a set of properties that can be satisfied by the agents. We consider a purely discrete scenario, in which the dynamics of each agent is modeled as a finite transition system. The proposed computational framework consists of two main steps. First, we extend results from concurrency theory to check whether the specification is distributable among the agents. Second, we generate individual control and communication strategies by using ideas from LTL model checking. We apply the method to automatically deploy a team of miniature cars in our Robotic Urban-Like Environment.
△ Less
Submitted 6 September, 2011;
originally announced September 2011.
-
Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion Control
Authors:
Reza Moazzez Estan**i,
Xu Chu Ding,
Morteza Lahijanian,
**g Wang,
Calin A. Belta,
Ioannis Ch. Paschalidis
Abstract:
We consider the problem of finding a control policy for a Markov Decision Process (MDP) to maximize the probability of reaching some states while avoiding some other states. This problem is motivated by applications in robotics, where such problems naturally arise when probabilistic models of robot motion are required to satisfy temporal logic task specifications. We transform this problem into a…
▽ More
We consider the problem of finding a control policy for a Markov Decision Process (MDP) to maximize the probability of reaching some states while avoiding some other states. This problem is motivated by applications in robotics, where such problems naturally arise when probabilistic models of robot motion are required to satisfy temporal logic task specifications. We transform this problem into a Stochastic Shortest Path (SSP) problem and develop a new approximate dynamic programming algorithm to solve it. This algorithm is of the actor-critic type and uses a least-square temporal difference learning method. It operates on sample paths of the system and optimizes the policy within a pre-specified class parameterized by a parsimonious set of parameters. We show its convergence to a policy corresponding to a stationary point in the parameters' space. Simulation results confirm the effectiveness of the proposed solution.
△ Less
Submitted 30 August, 2011; v1 submitted 23 August, 2011;
originally announced August 2011.
-
Multi-robot Deployment From LTL Specifications with Reduced Communication
Authors:
Marius Kloetzer,
Xu Chu Ding,
Calin Belta
Abstract:
In this paper, we develop a computational framework for fully automatic deployment of a team of unicycles from a global specification given as an LTL formula over some regions of interest. Our hierarchical approach consists of four steps: (i) the construction of finite abstractions for the motions of each robot, (ii) the parallel composition of the abstractions, (iii) the generation of a satisfyin…
▽ More
In this paper, we develop a computational framework for fully automatic deployment of a team of unicycles from a global specification given as an LTL formula over some regions of interest. Our hierarchical approach consists of four steps: (i) the construction of finite abstractions for the motions of each robot, (ii) the parallel composition of the abstractions, (iii) the generation of a satisfying motion of the team; (iv) map** this motion to individual robot control and communication strategies. The main result of the paper is an algorithm to reduce the amount of inter-robot communication during the fourth step of the procedure.
△ Less
Submitted 16 August, 2011;
originally announced August 2011.
-
An Optimal Control Approach for the Persistent Monitoring Problem
Authors:
Christos G. Cassandras,
Xu Chu Ding,
Xuchao Lin
Abstract:
We propose an optimal control framework for persistent monitoring problems where the objective is to control the movement of mobile agents to minimize an uncertainty metric in a given mission space. For a single agent in a one-dimensional space, we show that the optimal solution is obtained in terms of a sequence of switching locations, thus reducing it to a parametric optimization problem. Using…
▽ More
We propose an optimal control framework for persistent monitoring problems where the objective is to control the movement of mobile agents to minimize an uncertainty metric in a given mission space. For a single agent in a one-dimensional space, we show that the optimal solution is obtained in terms of a sequence of switching locations, thus reducing it to a parametric optimization problem. Using Infinitesimal Perturbation Analysis (IPA) we obtain a complete solution through a gradient-based algorithm. We also discuss a receding horizon controller which is capable of obtaining a near-optimal solution on-the-fly. We illustrate our approach with numerical examples.
△ Less
Submitted 5 October, 2011; v1 submitted 16 August, 2011;
originally announced August 2011.
-
LTL Control in Uncertain Environments with Probabilistic Satisfaction Guarantees
Authors:
Xu Chu Ding,
Stephen L. Smith,
Calin Belta,
Daniela Rus
Abstract:
We present a method to generate a robot control strategy that maximizes the probability to accomplish a task. The task is given as a Linear Temporal Logic (LTL) formula over a set of properties that can be satisfied at the regions of a partitioned environment. We assume that the probabilities with which the properties are satisfied at the regions are known, and the robot can determine the truth va…
▽ More
We present a method to generate a robot control strategy that maximizes the probability to accomplish a task. The task is given as a Linear Temporal Logic (LTL) formula over a set of properties that can be satisfied at the regions of a partitioned environment. We assume that the probabilities with which the properties are satisfied at the regions are known, and the robot can determine the truth value of a proposition only at the current region. Motivated by several results on partitioned-based abstractions, we assume that the motion is performed on a graph. To account for noisy sensors and actuators, we assume that a control action enables several transitions with known probabilities. We show that this problem can be reduced to the problem of generating a control policy for a Markov Decision Process (MDP) such that the probability of satisfying an LTL formula over its states is maximized. We provide a complete solution for the latter problem that builds on existing results from probabilistic model checking. We include an illustrative case study.
△ Less
Submitted 7 April, 2011; v1 submitted 6 April, 2011;
originally announced April 2011.
-
MDP Optimal Control under Temporal Logic Constraints
Authors:
Xu Chu Ding,
Stephen L. Smith,
Calin Belta,
Daniela Rus
Abstract:
In this paper, we develop a method to automatically generate a control policy for a dynamical system modeled as a Markov Decision Process (MDP). The control specification is given as a Linear Temporal Logic (LTL) formula over a set of propositions defined on the states of the MDP. We synthesize a control policy such that the MDP satisfies the given specification almost surely, if such a policy exi…
▽ More
In this paper, we develop a method to automatically generate a control policy for a dynamical system modeled as a Markov Decision Process (MDP). The control specification is given as a Linear Temporal Logic (LTL) formula over a set of propositions defined on the states of the MDP. We synthesize a control policy such that the MDP satisfies the given specification almost surely, if such a policy exists. In addition, we designate an "optimizing proposition" to be repeatedly satisfied, and we formulate a novel optimization criterion in terms of minimizing the expected cost in between satisfactions of this proposition. We propose a sufficient condition for a policy to be optimal, and develop a dynamic programming algorithm that synthesizes a policy that is optimal under some conditions, and sub-optimal otherwise. This problem is motivated by robotic applications requiring persistent tasks, such as environmental monitoring or data gathering, to be performed.
△ Less
Submitted 23 March, 2011; v1 submitted 22 March, 2011;
originally announced March 2011.
-
Probabilistically Safe Vehicle Control in a Hostile Environment
Authors:
Igor Cizelj,
Xu Chu Ding,
Morteza Lahijanian,
Alessandro Pinto,
Calin Belta
Abstract:
In this paper we present an approach to control a vehicle in a hostile environment with static obstacles and moving adversaries. The vehicle is required to satisfy a mission objective expressed as a temporal logic specification over a set of properties satisfied at regions of a partitioned environment. We model the movements of adversaries in between regions of the environment as Poisson processes…
▽ More
In this paper we present an approach to control a vehicle in a hostile environment with static obstacles and moving adversaries. The vehicle is required to satisfy a mission objective expressed as a temporal logic specification over a set of properties satisfied at regions of a partitioned environment. We model the movements of adversaries in between regions of the environment as Poisson processes. Furthermore, we assume that the time it takes for the vehicle to traverse in between two facets of each region is exponentially distributed, and we obtain the rate of this exponential distribution from a simulator of the environment. We capture the motion of the vehicle and the vehicle updates of adversaries distributions as a Markov Decision Process. Using tools in Probabilistic Computational Tree Logic, we find a control strategy for the vehicle that maximizes the probability of accomplishing the mission objective. We demonstrate our approach with illustrative case studies.
△ Less
Submitted 24 March, 2011; v1 submitted 21 March, 2011;
originally announced March 2011.