-
Sparse Multi-baseline SAR Cross-modal 3D Reconstruction of Vehicle Targets
Authors:
Da Li,
Guoqiang Zhao,
Houjun Sun,
Jiacheng Bao
Abstract:
Multi-baseline SAR 3D imaging faces significant challenges due to data sparsity. In recent years, deep learning techniques have achieved notable success in enhancing the quality of sparse SAR 3D imaging. However, previous work typically rely on full-aperture high-resolution radar images to supervise the training of deep neural networks (DNNs), utilizing only single-modal information from radar dat…
▽ More
Multi-baseline SAR 3D imaging faces significant challenges due to data sparsity. In recent years, deep learning techniques have achieved notable success in enhancing the quality of sparse SAR 3D imaging. However, previous work typically rely on full-aperture high-resolution radar images to supervise the training of deep neural networks (DNNs), utilizing only single-modal information from radar data. Consequently, imaging performance is limited, and acquiring full-aperture data for multi-baseline SAR is costly and sometimes impractical in real-world applications. In this paper, we propose a Cross-Modal Reconstruction Network (CMR-Net), which integrates differentiable render and cross-modal supervision with optical images to reconstruct highly sparse multi-baseline SAR 3D images of vehicle targets into visually structured and high-resolution images. We meticulously designed the network architecture and training strategies to enhance network generalization capability. Remarkably, CMR-Net, trained solely on simulated data, demonstrates high-resolution reconstruction capabilities on both publicly available simulation datasets and real measured datasets, outperforming traditional sparse reconstruction algorithms based on compressed sensing and other learning-based methods. Additionally, using optical images as supervision provides a cost-effective way to build training datasets, reducing the difficulty of method dissemination. Our work showcases the broad prospects of deep learning in multi-baseline SAR 3D imaging and offers a novel path for researching radar imaging based on cross-modal learning theory.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Representation and De-interleaving of Mixtures of Hidden Markov Processes
Authors:
Jiadi Bao,
Mengtao Zhu,
Yunjie Li,
Shafei Wang
Abstract:
De-interleaving of the mixtures of Hidden Markov Processes (HMPs) generally depends on its representation model. Existing representation models consider Markov chain mixtures rather than hidden Markov, resulting in the lack of robustness to non-ideal situations such as observation noise or missing observations. Besides, de-interleaving methods utilize a search-based strategy, which is time-consumi…
▽ More
De-interleaving of the mixtures of Hidden Markov Processes (HMPs) generally depends on its representation model. Existing representation models consider Markov chain mixtures rather than hidden Markov, resulting in the lack of robustness to non-ideal situations such as observation noise or missing observations. Besides, de-interleaving methods utilize a search-based strategy, which is time-consuming. To address these issues, this paper proposes a novel representation model and corresponding de-interleaving methods for the mixtures of HMPs. At first, a generative model for representing the mixtures of HMPs is designed. Subsequently, the de-interleaving process is formulated as a posterior inference for the generative model. Secondly, an exact inference method is developed to maximize the likelihood of the complete data, and two approximate inference methods are developed to maximize the evidence lower bound by creating tractable structures. Then, a theoretical error probability lower bound is derived using the likelihood ratio test, and the algorithms are shown to get reasonably close to the bound. Finally, simulation results demonstrate that the proposed methods are highly effective and robust for non-ideal situations, outperforming baseline methods on simulated and real-life data.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
A Symmetric Regressor for MRI-Based Assessment of Striatal Dopamine Transporter Uptake in Parkinson's Disease
Authors:
Walid Abdullah Al,
Il Dong Yun,
Yun Jung Bae
Abstract:
Dopamine transporter (DAT) imaging is commonly used for monitoring Parkinson's disease (PD), where striatal DAT uptake amount is computed to assess PD severity. However, DAT imaging has a high cost and the risk of radiance exposure and is not available in general clinics. Recently, MRI patch of the nigral region has been proposed as a safer and easier alternative. This paper proposes a symmetric r…
▽ More
Dopamine transporter (DAT) imaging is commonly used for monitoring Parkinson's disease (PD), where striatal DAT uptake amount is computed to assess PD severity. However, DAT imaging has a high cost and the risk of radiance exposure and is not available in general clinics. Recently, MRI patch of the nigral region has been proposed as a safer and easier alternative. This paper proposes a symmetric regressor for predicting the DAT uptake amount from the nigral MRI patch. Acknowledging the symmetry between the right and left nigrae, the proposed regressor incorporates a paired input-output model that simultaneously predicts the DAT uptake amounts for both the right and left striata. Moreover, it employs a symmetric loss that imposes a constraint on the difference between right-to-left predictions, resembling the high correlation in DAT uptake amounts in the two lateral sides. Additionally, we propose a symmetric Monte-Carlo (MC) dropout method for providing a fruitful uncertainty estimate of the DAT uptake prediction, which utilizes the above symmetry. We evaluated the proposed approach on 734 nigral patches, which demonstrated significantly improved performance of the symmetric regressor compared with the standard regressors while giving better explainability and feature representation. The symmetric MC dropout also gave precise uncertainty ranges with a high probability of including the true DAT uptake amounts within the range.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Connections between Reachability and Time Optimality
Authors:
Juho Bae,
Ji Hoon Bai,
Byung-Yoon Lee,
Jun-Yong Lee,
Chang-Hun Lee
Abstract:
This paper presents the concept of an equivalence relation between the set of optimal control problems. By leveraging this concept, we show that the boundary of the reachability set can be constructed by the solutions of time optimal problems. Alongside, a more generalized equivalence theorem is presented together. The findings facilitate the use of solution structures from a certain class of opti…
▽ More
This paper presents the concept of an equivalence relation between the set of optimal control problems. By leveraging this concept, we show that the boundary of the reachability set can be constructed by the solutions of time optimal problems. Alongside, a more generalized equivalence theorem is presented together. The findings facilitate the use of solution structures from a certain class of optimal control problems to address problems in corresponding equivalent classes. As a byproduct, we state and prove the construction methods of the reachability sets of three-dimensional curves with prescribed curvature bound. The findings are twofold: Firstly, we prove that any boundary point of the reachability set, with the terminal direction taken into account, can be accessed via curves of H, CSC, CCC, or their respective subsegments, where H denotes a helicoidal arc, C a circular arc with maximum curvature, and S a straight segment. Secondly, we show that any boundary point of the reachability set, without considering the terminal direction, can be accessed by curves of CC, CS, or their respective subsegments. These findings extend the developments presented in literature regarding planar curves, or Dubins car dynamics, into spatial curves in $\mathbb{R}^3$. For higher dimensions, we confirm that the problem of identifying the reachability set of curvature bounded paths subsumes the well-known Markov-Dubins problem. These advancements in understanding the reachability of curvature bounded paths in $\mathbb{R}^3$ hold significant practical implications, particularly in the contexts of mission planning problems and time optimal guidance.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Constraint-Aware Mesh Refinement Method by Reachability Set Envelope of Curvature Bounded Paths
Authors:
Juho Bae,
Ji Hoon Bai,
Byung-Yoon Lee,
Jun-Yong Lee
Abstract:
This paper presents an enhanced direct-method-based approach for the real-time solution of optimal control problems to handle path constraints, such as obstacles. The principal contributions of this work are twofold: first, the existing methods for constructing reachability sets in the literature are extended to derive the envelope of these sets, which determines the region swept by all feasible t…
▽ More
This paper presents an enhanced direct-method-based approach for the real-time solution of optimal control problems to handle path constraints, such as obstacles. The principal contributions of this work are twofold: first, the existing methods for constructing reachability sets in the literature are extended to derive the envelope of these sets, which determines the region swept by all feasible trajectories between adjacent sample points. Second, we propose a novel method to guarantee constraint violation-free between discrete states in two dimensions through mesh refinement approach. To illustrate the effectiveness of the proposed methodology, numerical simulations are conducted on real-time path planning for fixed-wing unmanned aerial vehicles.
△ Less
Submitted 4 March, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Sound of Story: Multi-modal Storytelling with Audio
Authors:
Jaeyeon Bae,
Seokhoon Jeong,
Seokun Kang,
Namgi Han,
Jae-Yon Lee,
Hyounghun Kim,
Taehwan Kim
Abstract:
Storytelling is multi-modal in the real world. When one tells a story, one may use all of the visualizations and sounds along with the story itself. However, prior studies on storytelling datasets and tasks have paid little attention to sound even though sound also conveys meaningful semantics of the story. Therefore, we propose to extend story understanding and telling areas by establishing a new…
▽ More
Storytelling is multi-modal in the real world. When one tells a story, one may use all of the visualizations and sounds along with the story itself. However, prior studies on storytelling datasets and tasks have paid little attention to sound even though sound also conveys meaningful semantics of the story. Therefore, we propose to extend story understanding and telling areas by establishing a new component called "background sound" which is story context-based audio without any linguistic information. For this purpose, we introduce a new dataset, called "Sound of Story (SoS)", which has paired image and text sequences with corresponding sound or background music for a story. To the best of our knowledge, this is the largest well-curated dataset for storytelling with sound. Our SoS dataset consists of 27,354 stories with 19.6 images per story and 984 hours of speech-decoupled audio such as background music and other sounds. As benchmark tasks for storytelling with sound and the dataset, we propose retrieval tasks between modalities, and audio generation tasks from image-text sequences, introducing strong baselines for them. We believe the proposed dataset and tasks may shed light on the multi-modal understanding of storytelling in terms of sound. Downloading the dataset and baseline codes for each task will be released in the link: https://github.com/Sosdatasets/SoS_Dataset.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis
Authors:
Jae-Sung Bae,
Joun Yeop Lee,
Ji-Hyun Lee,
Seongkyu Mun,
Taehwa Kang,
Hoon-Young Cho,
Chanwoo Kim
Abstract:
Previous works in zero-shot text-to-speech (ZS-TTS) have attempted to enhance its systems by enlarging the training data through crowd-sourcing or augmenting existing speech data. However, the use of low-quality data has led to a decline in the overall system performance. To avoid such degradation, instead of directly augmenting the input data, we propose a latent filling (LF) method that adopts s…
▽ More
Previous works in zero-shot text-to-speech (ZS-TTS) have attempted to enhance its systems by enlarging the training data through crowd-sourcing or augmenting existing speech data. However, the use of low-quality data has led to a decline in the overall system performance. To avoid such degradation, instead of directly augmenting the input data, we propose a latent filling (LF) method that adopts simple but effective latent space data augmentation in the speaker embedding space of the ZS-TTS system. By incorporating a consistency loss, LF can be seamlessly integrated into existing ZS-TTS systems without the need for additional training stages. Experimental results show that LF significantly improves speaker similarity while preserving speech quality.
△ Less
Submitted 22 January, 2024; v1 submitted 5 October, 2023;
originally announced October 2023.
-
Ensemble Kalman Filters with Resampling
Authors:
Omar Al Ghattas,
Jiajun Bao,
Daniel Sanz-Alonso
Abstract:
Filtering is concerned with online estimation of the state of a dynamical system from partial and noisy observations. In applications where the state of the system is high dimensional, ensemble Kalman filters are often the method of choice. These algorithms rely on an ensemble of interacting particles to sequentially estimate the state as new observations become available. Despite the practical su…
▽ More
Filtering is concerned with online estimation of the state of a dynamical system from partial and noisy observations. In applications where the state of the system is high dimensional, ensemble Kalman filters are often the method of choice. These algorithms rely on an ensemble of interacting particles to sequentially estimate the state as new observations become available. Despite the practical success of ensemble Kalman filters, theoretical understanding is hindered by the intricate dependence structure of the interacting particles. This paper investigates ensemble Kalman filters that incorporate an additional resampling step to break the dependency between particles. The new algorithm is amenable to a theoretical analysis that extends and improves upon those available for filters without resampling, while also performing well in numerical examples.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
FlexDTI: Flexible diffusion gradient encoding scheme-based highly efficient diffusion tensor imaging using deep learning
Authors:
Zejun Wu,
Jiechao Wang,
Zunquan Chen,
Qinqin Yang,
Zhen Xing,
Dairong Cao,
Jianfeng Bao,
Taishan Kang,
Jianzhong Lin,
Shuhui Cai,
Zhong Chen,
Congbo Cai
Abstract:
Objective: Most deep neural network-based diffusion tensor imaging methods require the diffusion gradients' number and directions in the data to be reconstructed to match those in the training data. This work aims to develop and evaluate a novel dynamic-convolution-based method called FlexDTI for highly efficient diffusion tensor reconstruction with flexible diffusion encoding gradient scheme. App…
▽ More
Objective: Most deep neural network-based diffusion tensor imaging methods require the diffusion gradients' number and directions in the data to be reconstructed to match those in the training data. This work aims to develop and evaluate a novel dynamic-convolution-based method called FlexDTI for highly efficient diffusion tensor reconstruction with flexible diffusion encoding gradient scheme. Approach: FlexDTI was developed to achieve high-quality DTI parametric map** with flexible number and directions of diffusion encoding gradients. The method used dynamic convolution kernels to embed diffusion gradient direction information into feature maps of the corresponding diffusion signal. Furthermore, it realized the generalization of a flexible number of diffusion gradient directions by setting the maximum number of input channels of the network. The network was trained and tested using datasets from the Human Connectome Project and local hospitals. Results from FlexDTI and other advanced tensor parameter estimation methods were compared. Main results: Compared to other methods, FlexDTI successfully achieves high-quality diffusion tensor-derived parameters even if the number and directions of diffusion encoding gradients change. It reduces normalized root mean squared error (NRMSE) by about 50% on fractional anisotropy (FA) and 15% on mean diffusivity (MD), compared with the state-of-the-art deep learning method with flexible diffusion encoding gradient scheme. Significance: FlexDTI can well learn diffusion gradient direction information to achieve generalized DTI reconstruction with flexible diffusion gradient scheme. Both flexibility and reconstruction quality can be taken into account in this network.
△ Less
Submitted 21 December, 2023; v1 submitted 2 August, 2023;
originally announced August 2023.
-
BMAD: Benchmarks for Medical Anomaly Detection
Authors:
**an Bao,
Hanshi Sun,
Hanqiu Deng,
Yinsheng He,
Zhaoxiang Zhang,
Xingyu Li
Abstract:
Anomaly detection (AD) is a fundamental research problem in machine learning and computer vision, with practical applications in industrial inspection, video surveillance, and medical diagnosis. In medical imaging, AD is especially vital for detecting and diagnosing anomalies that may indicate rare diseases or conditions. However, there is a lack of a universal and fair benchmark for evaluating AD…
▽ More
Anomaly detection (AD) is a fundamental research problem in machine learning and computer vision, with practical applications in industrial inspection, video surveillance, and medical diagnosis. In medical imaging, AD is especially vital for detecting and diagnosing anomalies that may indicate rare diseases or conditions. However, there is a lack of a universal and fair benchmark for evaluating AD methods on medical images, which hinders the development of more generalized and robust AD methods in this specific domain. To bridge this gap, we introduce a comprehensive evaluation benchmark for assessing anomaly detection methods on medical images. This benchmark encompasses six reorganized datasets from five medical domains (i.e. brain MRI, liver CT, retinal OCT, chest X-ray, and digital histopathology) and three key evaluation metrics, and includes a total of fourteen state-of-the-art AD algorithms. This standardized and well-curated medical benchmark with the well-structured codebase enables comprehensive comparisons among recently proposed anomaly detection methods. It will facilitate the community to conduct a fair comparison and advance the field of AD on medical imaging. More information on BMAD is available in our GitHub repository: https://github.com/DorisBao/BMAD
△ Less
Submitted 27 April, 2024; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Distributed Data-driven Predictive Control via Dissipative Behavior Synthesis
Authors:
Yitao Yan,
Jie Bao,
Biao Huang
Abstract:
This paper presents a distributed data-driven predictive control (DDPC) approach using the behavioral framework. It aims to design a network of controllers for an interconnected system with linear time-invariant (LTI) subsystems such that a given global (network-wide) cost function is minimized while desired control performance (e.g., network stability and disturbance rejection) is achieved using…
▽ More
This paper presents a distributed data-driven predictive control (DDPC) approach using the behavioral framework. It aims to design a network of controllers for an interconnected system with linear time-invariant (LTI) subsystems such that a given global (network-wide) cost function is minimized while desired control performance (e.g., network stability and disturbance rejection) is achieved using dissipativity in the quadratic difference form (QdF). By viewing dissipativity as a behavior and integrating it into the control design as a virtual dynamical system, the proposed approach carries out the entire design process in a unified framework with a set-theoretic viewpoint. This leads to an effective data-driven distributed control design, where the global design goal can be achieved by distributed optimization based on the local QdF conditions. The approach is illustrated by an example throughout the paper.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
Bayesian Non-parametric Hidden Markov Model for Agile Radar Pulse Sequences Streaming Analysis
Authors:
Jiadi Bao,
Yunjie Li,
Mengtao Zhu,
Shafei Wang
Abstract:
Multi-function radars (MFRs) are sophisticated types of sensors with the capabilities of complex agile inter-pulse modulation implementation and dynamic work mode scheduling. The developments in MFRs pose great challenges to modern electronic reconnaissance systems or radar warning receivers for recognition and inference of MFR work modes. To address this issue, this paper proposes an online proce…
▽ More
Multi-function radars (MFRs) are sophisticated types of sensors with the capabilities of complex agile inter-pulse modulation implementation and dynamic work mode scheduling. The developments in MFRs pose great challenges to modern electronic reconnaissance systems or radar warning receivers for recognition and inference of MFR work modes. To address this issue, this paper proposes an online processing framework for parameter estimation and change point detection of MFR work modes. At first, this paper designed a fully-conjugate Bayesian non-parametric hidden Markov model with a designed prior distribution (agile BNP-HMM) to represent the MFR pulse agility characteristics. The proposed model allows fully-variational Bayesian inference. Then, the proposed framework is constructed by two main parts. The first part is the agile BNP-HMM model for automatically inferring the number of HMM hidden states and emission distribution of the corresponding hidden states. An estimation error lower bound on performance is derived and the proposed algorithm is shown to be close to the bound. The second part utilizes the streaming Bayesian updating to facilitate computation, and designed an online work mode change detection framework based upon a weighted sequential probability ratio test. We demonstrate that the proposed framework is consistently highly effective and robust to baseline methods on diverse simulated data-sets.
△ Less
Submitted 22 August, 2023; v1 submitted 8 February, 2023;
originally announced February 2023.
-
Gene-SGAN: a method for discovering disease subtypes with imaging and genetic signatures via multi-view weakly-supervised deep clustering
Authors:
Zhijian Yang,
Junhao Wen,
Ahmed Abdulkadir,
Yuhan Cui,
Guray Erus,
Elizabeth Mamourian,
Randa Melhem,
Dhivya Srinivasan,
Sindhuja T. Govindarajan,
Jiong Chen,
Mohamad Habes,
Colin L. Masters,
Paul Maruff,
Jurgen Fripp,
Luigi Ferrucci,
Marilyn S. Albert,
Sterling C. Johnson,
John C. Morris,
Pamela LaMontagne,
Daniel S. Marcus,
Tammie L. S. Benzinger,
David A. Wolk,
Li Shen,
**gxuan Bao,
Susan M. Resnick
, et al. (3 additional authors not shown)
Abstract:
Disease heterogeneity has been a critical challenge for precision diagnosis and treatment, especially in neurologic and neuropsychiatric diseases. Many diseases can display multiple distinct brain phenotypes across individuals, potentially reflecting disease subtypes that can be captured using MRI and machine learning methods. However, biological interpretability and treatment relevance are limite…
▽ More
Disease heterogeneity has been a critical challenge for precision diagnosis and treatment, especially in neurologic and neuropsychiatric diseases. Many diseases can display multiple distinct brain phenotypes across individuals, potentially reflecting disease subtypes that can be captured using MRI and machine learning methods. However, biological interpretability and treatment relevance are limited if the derived subtypes are not associated with genetic drivers or susceptibility factors. Herein, we describe Gene-SGAN - a multi-view, weakly-supervised deep clustering method - which dissects disease heterogeneity by jointly considering phenotypic and genetic data, thereby conferring genetic correlations to the disease subtypes and associated endophenotypic signatures. We first validate the generalizability, interpretability, and robustness of Gene-SGAN in semi-synthetic experiments. We then demonstrate its application to real multi-site datasets from 28,858 individuals, deriving subtypes of Alzheimer's disease and brain endophenotypes associated with hypertension, from MRI and SNP data. Derived brain phenotypes displayed significant differences in neuroanatomical patterns, genetic determinants, biological and clinical biomarkers, indicating potentially distinct underlying neuropathologic processes, genetic drivers, and susceptibility factors. Overall, Gene-SGAN is broadly applicable to disease subty** and endophenotype discovery, and is herein tested on disease-related, genetically-driven neuroimaging phenotypes.
△ Less
Submitted 25 January, 2023;
originally announced January 2023.
-
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space
Authors:
Jihwan Lee,
Jae-Sung Bae,
Seongkyu Mun,
Hee** Choi,
Joun Yeop Lee,
Hoon-Young Cho,
Chanwoo Kim
Abstract:
With the recent developments in cross-lingual Text-to-Speech (TTS) systems, L2 (second-language, or foreign) accent problems arise. Moreover, running a subjective evaluation for such cross-lingual TTS systems is troublesome. The vowel space analysis, which is often utilized to explore various aspects of language including L2 accents, is a great alternative analysis tool. In this study, we apply th…
▽ More
With the recent developments in cross-lingual Text-to-Speech (TTS) systems, L2 (second-language, or foreign) accent problems arise. Moreover, running a subjective evaluation for such cross-lingual TTS systems is troublesome. The vowel space analysis, which is often utilized to explore various aspects of language including L2 accents, is a great alternative analysis tool. In this study, we apply the vowel space analysis method to explore L2 accents of cross-lingual TTS systems. Through the vowel space analysis, we observe the three followings: a) a parallel architecture (Glow-TTS) is less L2-accented than an auto-regressive one (Tacotron); b) L2 accents are more dominant in non-shared vowels in a language pair; and c) L2 accents of cross-lingual TTS systems share some phenomena with those of human L2 learners. Our findings imply that it is necessary for TTS systems to handle each language pair differently, depending on their linguistic characteristics such as non-shared vowels. They also hint that we can further incorporate linguistics knowledge in develo** cross-lingual TTS systems.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
Channel Modeling for UAV-to-Ground Communications with Posture Variation and Fuselage Scattering Effect
Authors:
Boyu Hua,
Haoran Ni,
Qiuming Zhu,
Cheng-Xiang Wang,
Tongtong Zhou,
Kai Mao,
Junwei Bao,
Xiaofei Zhang
Abstract:
Unmanned aerial vehicle (UAV)-to-ground (U2G) channel models play a pivotal role for reliable communications between UAV and ground terminal. This paper proposes a three-dimensional (3D) non-stationary hybrid model including both large-scale and small-scale fading for U2G multiple-input-multiple-output (MIMO) channels. Distinctive channel characteristics under U2G scenarios, i.e., 3D trajectory an…
▽ More
Unmanned aerial vehicle (UAV)-to-ground (U2G) channel models play a pivotal role for reliable communications between UAV and ground terminal. This paper proposes a three-dimensional (3D) non-stationary hybrid model including both large-scale and small-scale fading for U2G multiple-input-multiple-output (MIMO) channels. Distinctive channel characteristics under U2G scenarios, i.e., 3D trajectory and posture of UAV, fuselage scattering effect (FSE), and posture variation fading (PVF), are incorporated into the proposed model. The channel parameters, i.e., path loss (PL), shadow fading (SF), path delay, and path angle, are generated incorporating machine learning (ML) and ray tracing (RT) techniques to capture the structure-related characteristics. In order to guarantee the physical continuity of channel parameters such as Doppler phase and path power, the time evolution methods of inter- and intra- stationary intervals are proposed. Key statistical properties , i.e., temporal autocorrection function (ACF), power delay profile (PDP), level crossing rate (LCR), average fading duration (AFD), and stationary interval (SI) are given, and the impact of the change of fuselage and posture variation is analyzed. It is demonstrated that both posture variation and fuselage scattering have crucial effects on channel characteristics. The validity and practicability of the proposed model are verified by comparing the simulation results with the measured ones.
△ Less
Submitted 13 October, 2022; v1 submitted 5 October, 2022;
originally announced October 2022.
-
A Realistic 3D Non-Stationary Channel Model for UAV-to-Vehicle Communications Incorporating Fuselage Posture
Authors:
Boyu Hua,
Tongtong Zhou,
Qiuming Zhu,
Kai Mao,
Junwei Bao,
Weizhi Zhong,
Naeem Ahmed
Abstract:
Considering the unmanned aerial vehicle (UAV) three-dimensional (3D) posture, a novel 3D non-stationary geometry-based stochastic model (GBSM) is proposed for multiple-input multiple-output (MIMO) UAV-to-vehicle (U2V) channels. It consists of a line-of-sight (LoS) and non-line-of-sight (NLoS) components. The factor of fuselage posture is considered by introducing a time-variant 3D posture matrix.…
▽ More
Considering the unmanned aerial vehicle (UAV) three-dimensional (3D) posture, a novel 3D non-stationary geometry-based stochastic model (GBSM) is proposed for multiple-input multiple-output (MIMO) UAV-to-vehicle (U2V) channels. It consists of a line-of-sight (LoS) and non-line-of-sight (NLoS) components. The factor of fuselage posture is considered by introducing a time-variant 3D posture matrix. Some important statistical properties, i.e. the temporal autocorrelation function (ACF) and spatial cross correlation function (CCF), are derived and investigated. Simulation results show that the fuselage posture has significant impact on the U2V channel characteristic and aggravate the non-stationarity. The agreements between analytical, simulated, and measured results verify the correctness of proposed model and derivations. Moreover, it is demonstrated that the proposed model is also compatible to the existing GBSM without considering fuselage posture.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Avocodo: Generative Adversarial Network for Artifact-free Vocoder
Authors:
Taejun Bak,
Junmo Lee,
Hanbin Bae,
**hyeok Yang,
Jae-Sung Bae,
Young-Sun Joo
Abstract:
Neural vocoders based on the generative adversarial neural network (GAN) have been widely used due to their fast inference speed and lightweight networks while generating high-quality speech waveforms. Since the perceptually important speech components are primarily concentrated in the low-frequency bands, most GAN-based vocoders perform multi-scale analysis that evaluates downsampled speech wavef…
▽ More
Neural vocoders based on the generative adversarial neural network (GAN) have been widely used due to their fast inference speed and lightweight networks while generating high-quality speech waveforms. Since the perceptually important speech components are primarily concentrated in the low-frequency bands, most GAN-based vocoders perform multi-scale analysis that evaluates downsampled speech waveforms. This multi-scale analysis helps the generator improve speech intelligibility. However, in preliminary experiments, we discovered that the multi-scale analysis which focuses on the low-frequency bands causes unintended artifacts, e.g., aliasing and imaging artifacts, which degrade the synthesized speech waveform quality. Therefore, in this paper, we investigate the relationship between these artifacts and GAN-based vocoders and propose a GAN-based vocoder, called Avocodo, that allows the synthesis of high-fidelity speech with reduced artifacts. We introduce two kinds of discriminators to evaluate speech waveforms in various perspectives: a collaborative multi-band discriminator and a sub-band discriminator. We also utilize a pseudo quadrature mirror filter bank to obtain downsampled multi-band speech waveforms while avoiding aliasing. According to experimental results, Avocodo outperforms baseline GAN-based vocoders, both objectively and subjectively, while reproducing speech with fewer artifacts.
△ Less
Submitted 3 January, 2023; v1 submitted 27 June, 2022;
originally announced June 2022.
-
Fault Diagnosis of Inter-turn Short Circuit in Permanent Magnet Synchronous Motors with Current Signal Imaging and Unsupervised Learning
Authors:
W. Jung,
S. H. Yun,
Y. S. Lim,
S. Cheong,
J. Bae,
Y. H. Park
Abstract:
This paper proposes machine-independent feature engineering for winding inter-turn short circuit fault that uses electrical current signals. Electrical current signal collected from permanent magnet synchronous motor (PMSM) is subjected to different environmental and operational conditions. To solve these problems, robust current signal imaging method and deep learning-based feature extraction met…
▽ More
This paper proposes machine-independent feature engineering for winding inter-turn short circuit fault that uses electrical current signals. Electrical current signal collected from permanent magnet synchronous motor (PMSM) is subjected to different environmental and operational conditions. To solve these problems, robust current signal imaging method and deep learning-based feature extraction method are developed. The overall procedure includes the following three key steps: (1) transformation of a time-series current signal to two-dimensional image, (2) extracting features using convolutional neural networks, and (3) calculating a health indicator using Mahalanobis distance. Transformation of the time-series signal is based on recurrence plots (RP). The proposed RP method develops from feature engineering that provides the dominant fault feature representations in a robust way. The proposed RP is designed that maximizes the features of inter-turn short fault and minimizes the effect of noise from systems with various capacities. To demonstrate the validity of the proposed method, two case studies are conducted using an artificial fault seeded testbed with two different capacities of motor. By calculating the feature using only the electrical current signal of the motor without the parameters related to the capacity of the motor, the proposed feature can be applied to motors with different capacities while maintaining the same performance.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
A Contraction-constrained Model Predictive Control for Multi-timescale Nonlinear Processes
Authors:
Ryan McCloy,
Lai Wei,
Jie Bao
Abstract:
Many chemical processes exhibit diverse timescale dynamics with a strong coupling between timescale sensitive variables. Model predictive control with a non-uniformly spaced optimisation horizon is an effective approach to multi-timescale control and offers opportunities for reduced computational complexity. In such an approach the fast, moderate and slow dynamics can be included in the optimisati…
▽ More
Many chemical processes exhibit diverse timescale dynamics with a strong coupling between timescale sensitive variables. Model predictive control with a non-uniformly spaced optimisation horizon is an effective approach to multi-timescale control and offers opportunities for reduced computational complexity. In such an approach the fast, moderate and slow dynamics can be included in the optimisation problem by implementing smaller time intervals earlier in the prediction horizon and increasingly larger intervals towards the end of the prediction. In this paper, a reference-flexible condition is developed based on the contraction theory to provide a stability guarantee for a nonlinear system under non-uniform prediction horizons.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
A Contraction-constrained Model Predictive Control for Nonlinear Processes using Disturbance Forecasts
Authors:
Ryan McCloy,
Lai Wei,
Jie Bao
Abstract:
Model predictive control (MPC) has become the most widely used advanced control method in process industry. In many cases, forecasts of the disturbances are available, e.g., predicted renewable power generation based on weather forecast. While the predictions of disturbances may not be accurate, utilizing the information can significantly improve the control performance in response to the disturba…
▽ More
Model predictive control (MPC) has become the most widely used advanced control method in process industry. In many cases, forecasts of the disturbances are available, e.g., predicted renewable power generation based on weather forecast. While the predictions of disturbances may not be accurate, utilizing the information can significantly improve the control performance in response to the disturbances. By exploiting process and disturbance models, future system behaviour can be predicted and used to optimise control actions via minimisation of an economical cost function which incorporates these predictions. However, stability guarantee of the resulting closed-loop system is often difficult in this approach when the processes are nonlinear. Proposed in the following article is a contraction-constrained predictive controller which optimises process economy whilst ensuring stabilisation to operating targets subject to disturbance measurements and forecasts.
△ Less
Submitted 6 June, 2022; v1 submitted 9 May, 2022;
originally announced May 2022.
-
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech
Authors:
Jae-Sung Bae,
**hyeok Yang,
Tae-Jun Bak,
Young-Sun Joo
Abstract:
This paper proposes a hierarchical and multi-scale variational autoencoder-based non-autoregressive text-to-speech model (HiMuV-TTS) to generate natural speech with diverse speaking styles. Recent advances in non-autoregressive TTS (NAR-TTS) models have significantly improved the inference speed and robustness of synthesized speech. However, the diversity of speaking styles and naturalness are nee…
▽ More
This paper proposes a hierarchical and multi-scale variational autoencoder-based non-autoregressive text-to-speech model (HiMuV-TTS) to generate natural speech with diverse speaking styles. Recent advances in non-autoregressive TTS (NAR-TTS) models have significantly improved the inference speed and robustness of synthesized speech. However, the diversity of speaking styles and naturalness are needed to be improved. To solve this problem, we propose the HiMuV-TTS model that first determines the global-scale prosody and then determines the local-scale prosody via conditioning on the global-scale prosody and the learned text representation. In addition, we improve the quality of speech by adopting the adversarial training technique. Experimental results verify that the proposed HiMuV-TTS model can generate more diverse and natural speech as compared to TTS models with single-scale variational autoencoders, and can represent different prosody information in each scale.
△ Less
Submitted 15 August, 2022; v1 submitted 8 April, 2022;
originally announced April 2022.
-
Into-TTS : Intonation Template Based Prosody Control System
Authors:
Jihwan Lee,
Joun Yeop Lee,
Hee** Choi,
Seongkyu Mun,
Sangjun Park,
Jae-Sung Bae,
Chanwoo Kim
Abstract:
Intonations play an important role in delivering the intention of a speaker. However, current end-to-end TTS systems often fail to model proper intonations. To alleviate this problem, we propose a novel, intuitive method to synthesize speech in different intonations using predefined intonation templates. Prior to TTS model training, speech data are grouped into intonation templates in an unsupervi…
▽ More
Intonations play an important role in delivering the intention of a speaker. However, current end-to-end TTS systems often fail to model proper intonations. To alleviate this problem, we propose a novel, intuitive method to synthesize speech in different intonations using predefined intonation templates. Prior to TTS model training, speech data are grouped into intonation templates in an unsupervised manner. Two proposed modules are added to the end-to-end TTS framework: an intonation predictor and an intonation encoder. The intonation predictor recommends a suitable intonation template to the given text. The intonation encoder, attached to the text encoder output, synthesizes speech abiding the requested intonation template. Main contributions of our paper are: (a) an easy-to-use intonation control system covering a wide range of users; (b) better performance in wrap** speech in a requested intonation with improved objective and subjective evaluation; and (c) incorporating a pre-trained language model for intonation modelling. Audio samples are available at https://srtts.github.io/IntoTTS.
△ Less
Submitted 6 November, 2022; v1 submitted 4 April, 2022;
originally announced April 2022.
-
Impact Intensity Estimation of a Quadruped Robot without Using a Force Sensor
Authors:
Ba-Phuc Huynh,
Joonbum Bae
Abstract:
Estimating the impact intensity is one of the significant tasks of the legged robot. Accurate feedback of the impact may support the robot to plan a suitable and efficient trajectory to adapt to unknown complex terrains. Ordinarily, this task is performed by a force sensor in the robot's foot. In this letter, an impact intensity estimation without using a force sensor is proposed. An artificial ne…
▽ More
Estimating the impact intensity is one of the significant tasks of the legged robot. Accurate feedback of the impact may support the robot to plan a suitable and efficient trajectory to adapt to unknown complex terrains. Ordinarily, this task is performed by a force sensor in the robot's foot. In this letter, an impact intensity estimation without using a force sensor is proposed. An artificial neural network model is designed to predict the motor torques of the legs in an instantaneous position in the trajectory without utilizing the complex kinematic and dynamic models of motion. An unscented Kalman filter is used during the trajectory to smooth and stabilize the measurement. Based on the difference between the predicted information and the filtered value, the state and intensity of the robot foot's impact with the obstacles are estimated. The simulation and experiment on a quadruped robot are carried out to verify the effectiveness of the proposed method.
△ Less
Submitted 3 April, 2022;
originally announced April 2022.
-
Self Pre-training with Masked Autoencoders for Medical Image Classification and Segmentation
Authors:
Lei Zhou,
Huidong Liu,
Joseph Bae,
Junjun He,
Dimitris Samaras,
Prateek Prasanna
Abstract:
Masked Autoencoder (MAE) has recently been shown to be effective in pre-training Vision Transformers (ViT) for natural image analysis. By reconstructing full images from partially masked inputs, a ViT encoder aggregates contextual information to infer masked image regions. We believe that this context aggregation ability is particularly essential to the medical image domain where each anatomical s…
▽ More
Masked Autoencoder (MAE) has recently been shown to be effective in pre-training Vision Transformers (ViT) for natural image analysis. By reconstructing full images from partially masked inputs, a ViT encoder aggregates contextual information to infer masked image regions. We believe that this context aggregation ability is particularly essential to the medical image domain where each anatomical structure is functionally and mechanically connected to other structures and regions. Because there is no ImageNet-scale medical image dataset for pre-training, we investigate a self pre-training paradigm with MAE for medical image analysis tasks. Our method pre-trains a ViT on the training set of the target data instead of another dataset. Thus, self pre-training can benefit more scenarios where pre-training data is hard to acquire. Our experimental results show that MAE self pre-training markedly improves diverse medical image tasks including chest X-ray disease classification, abdominal CT multi-organ segmentation, and MRI brain tumor segmentation. Code is available at https://github.com/cvlab-stonybrook/SelfMedMAE
△ Less
Submitted 21 April, 2023; v1 submitted 10 March, 2022;
originally announced March 2022.
-
Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations
Authors:
Aishik Konwer,
Xuan Xu,
Joseph Bae,
Chao Chen,
Prateek Prasanna
Abstract:
Clinical outcome or severity prediction from medical images has largely focused on learning representations from single-timepoint or snapshot scans. It has been shown that disease progression can be better characterized by temporal imaging. We therefore hypothesized that outcome predictions can be improved by utilizing the disease progression information from sequential images. We present a deep l…
▽ More
Clinical outcome or severity prediction from medical images has largely focused on learning representations from single-timepoint or snapshot scans. It has been shown that disease progression can be better characterized by temporal imaging. We therefore hypothesized that outcome predictions can be improved by utilizing the disease progression information from sequential images. We present a deep learning approach that leverages temporal progression information to improve clinical outcome predictions from single-timepoint images. In our method, a self-attention based Temporal Convolutional Network (TCN) is used to learn a representation that is most reflective of the disease trajectory. Meanwhile, a Vision Transformer is pretrained in a self-supervised fashion to extract features from single-timepoint images. The key contribution is to design a recalibration module that employs maximum mean discrepancy loss (MMD) to align distributions of the above two contextual representations. We train our system to predict clinical outcomes and severity grades from single-timepoint images. Experiments on chest and osteoarthritis radiography datasets demonstrate that our approach outperforms other state-of-the-art techniques.
△ Less
Submitted 30 March, 2022; v1 submitted 2 March, 2022;
originally announced March 2022.
-
Adaptive Contraction-based Control of Uncertain Nonlinear Processes using Neural Networks
Authors:
Lai Wei,
Ryan McCloy,
Jie Bao
Abstract:
Driven by the flexible manufacturing trend in the process control industry and the uncertain nature of chemical process models, this article aims to achieve offset-free tracking for a family of uncertain nonlinear systems (e.g., using process models with parametric uncertainties) with adaptable performance. The proposed adaptive control approach incorporates into the control loop an adaptive neura…
▽ More
Driven by the flexible manufacturing trend in the process control industry and the uncertain nature of chemical process models, this article aims to achieve offset-free tracking for a family of uncertain nonlinear systems (e.g., using process models with parametric uncertainties) with adaptable performance. The proposed adaptive control approach incorporates into the control loop an adaptive neural network embedded contraction-based controller (to ensure convergence to time-varying references) and an online parameter identification module coupled with reference generation (to ensure modelled parameters converge those of the physical system). The integrated learning and control approach involves training a state and parameter dependent neural network to learn a contraction metric parameterized by the uncertain parameter and a differential feedback gain. This neural network is then embedded in an adaptive contraction-based control law which is updated by parameter estimates online. As uncertain parameter estimates converge to the corresponding physical values, offset-free tracking, simultaneously with improved convergence rates, can be achieved, resulting in a flexible, efficient and less conservative approach to the reference tracking control of uncertain nonlinear processes. An illustrative example is included to demonstrate the overall approach. An illustrative example is included to demonstrate the overall approach.
△ Less
Submitted 9 May, 2022; v1 submitted 30 January, 2022;
originally announced January 2022.
-
Electrolyte Flow Rate Control for Vanadium Redox Flow Batteries using the Linear Parameter Varying Framework
Authors:
Ryan McCloy,
Yifeng Li,
Jie Bao,
Maria Skyllas-Kazacos
Abstract:
In this article, an electrolyte flow rate control approach is developed for an all-vanadium redox flow battery (VRB) system based on the linear parameter varying (LPV) framework. The electrolyte flow rate is regulated to provide a trade-off between stack voltage efficiency and pum** energy losses, so as to achieve optimal battery energy efficiency. The nonlinear process model is embedded in a li…
▽ More
In this article, an electrolyte flow rate control approach is developed for an all-vanadium redox flow battery (VRB) system based on the linear parameter varying (LPV) framework. The electrolyte flow rate is regulated to provide a trade-off between stack voltage efficiency and pum** energy losses, so as to achieve optimal battery energy efficiency. The nonlinear process model is embedded in a linear parameter varying state-space description and a set of state feedback controllers are designed to handle fluctuations in current during both charging and discharging. Simulation studies have been conducted under different operating conditions to demonstrate the performance of the proposed approach. This control approach was further implemented on a laboratory scale VRB system.
△ Less
Submitted 9 May, 2022; v1 submitted 30 January, 2022;
originally announced January 2022.
-
Lung Swap** Autoencoder: Learning a Disentangled Structure-texture Representation of Chest Radiographs
Authors:
Lei Zhou,
Joseph Bae,
Huidong Liu,
Gagandeep Singh,
Jeremy Green,
Amit Gupta,
Dimitris Samaras,
Prateek Prasanna
Abstract:
Well-labeled datasets of chest radiographs (CXRs) are difficult to acquire due to the high cost of annotation. Thus, it is desirable to learn a robust and transferable representation in an unsupervised manner to benefit tasks that lack labeled data. Unlike natural images, medical images have their own domain prior; e.g., we observe that many pulmonary diseases, such as the COVID-19, manifest as ch…
▽ More
Well-labeled datasets of chest radiographs (CXRs) are difficult to acquire due to the high cost of annotation. Thus, it is desirable to learn a robust and transferable representation in an unsupervised manner to benefit tasks that lack labeled data. Unlike natural images, medical images have their own domain prior; e.g., we observe that many pulmonary diseases, such as the COVID-19, manifest as changes in the lung tissue texture rather than the anatomical structure. Therefore, we hypothesize that studying only the texture without the influence of structure variations would be advantageous for downstream prognostic and predictive modeling tasks. In this paper, we propose a generative framework, the Lung Swap** Autoencoder (LSAE), that learns factorized representations of a CXR to disentangle the texture factor from the structure factor. Specifically, by adversarial training, the LSAE is optimized to generate a hybrid image that preserves the lung shape in one image but inherits the lung texture of another. To demonstrate the effectiveness of the disentangled texture representation, we evaluate the texture encoder $Enc^t$ in LSAE on ChestX-ray14 (N=112,120), and our own multi-institutional COVID-19 outcome prediction dataset, COVOC (N=340 (Subset-1) + 53 (Subset-2)). On both datasets, we reach or surpass the state-of-the-art by finetuning $Enc^t$ in LSAE that is 77% smaller than a baseline Inception v3. Additionally, in semi-and-self supervised settings with a similar model budget, $Enc^t$ in LSAE is also competitive with the state-of-the-art MoCo. By "re-mixing" the texture and shape factors, we generate meaningful hybrid images that can augment the training set. This data augmentation method can further improve COVOC prediction performance. The improvement is consistent even when we directly evaluate the Subset-1 trained model on Subset-2 without any fine-tuning.
△ Less
Submitted 18 January, 2022;
originally announced January 2022.
-
Contraction Analysis and Control Synthesis for Discrete-time Nonlinear Processes
Authors:
Lai Wei,
Ryan McCloy,
Jie Bao
Abstract:
Shifting away from the traditional mass production approach, the process industry is moving towards more agile, cost-effective and dynamic process operation (next-generation smart plants). This warrants the development of control systems for nonlinear chemical processes to be capable of tracking time-varying setpoints to produce products with different specifications as per market demand and deal…
▽ More
Shifting away from the traditional mass production approach, the process industry is moving towards more agile, cost-effective and dynamic process operation (next-generation smart plants). This warrants the development of control systems for nonlinear chemical processes to be capable of tracking time-varying setpoints to produce products with different specifications as per market demand and deal with variations in the raw materials and utility (e.g., energy). This article presents a systematic approach to the implementation of contraction-based control for discrete-time nonlinear processes. Through the differential dynamic system framework, the contraction conditions to ensure the exponential convergence to feasible time-varying references are derived. The discrete-time differential dissipativity condition is further developed, which can be used for control designs for disturbance rejection. Computationally tractable equivalent conditions are then derived and additionally transformed into an SOS programming problem, such that a discrete-time control contraction metric and stabilising feedback controller can be jointly obtained. Synthesis and implementation details are provided and demonstrated through a numerical case study.
△ Less
Submitted 9 May, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
Robust End-to-End Focal Liver Lesion Detection using Unregistered Multiphase Computed Tomography Images
Authors:
Sang-gil Lee,
Eunji Kim,
Jae Seok Bae,
Jung Hoon Kim,
Sungroh Yoon
Abstract:
The computer-aided diagnosis of focal liver lesions (FLLs) can help improve workflow and enable correct diagnoses; FLL detection is the first step in such a computer-aided diagnosis. Despite the recent success of deep-learning-based approaches in detecting FLLs, current methods are not sufficiently robust for assessing misaligned multiphase data. By introducing an attention-guided multiphase align…
▽ More
The computer-aided diagnosis of focal liver lesions (FLLs) can help improve workflow and enable correct diagnoses; FLL detection is the first step in such a computer-aided diagnosis. Despite the recent success of deep-learning-based approaches in detecting FLLs, current methods are not sufficiently robust for assessing misaligned multiphase data. By introducing an attention-guided multiphase alignment in feature space, this study presents a fully automated, end-to-end learning framework for detecting FLLs from multiphase computed tomography (CT) images. Our method is robust to misaligned multiphase images owing to its complete learning-based approach, which reduces the sensitivity of the model's performance to the quality of registration and enables a standalone deployment of the model in clinical practice. Evaluation on a large-scale dataset with 280 patients confirmed that our method outperformed previous state-of-the-art methods and significantly reduced the performance degradation for detecting FLLs using misaligned multiphase CT images. The robustness of the proposed method can enhance the clinical adoption of the deep-learning-based computer-aided detection system.
△ Less
Submitted 16 December, 2021; v1 submitted 1 December, 2021;
originally announced December 2021.
-
ProductAE: Towards Training Larger Channel Codes based on Neural Product Codes
Authors:
Mohammad Vahid Jamali,
Hamid Saber,
Homayoon Hatami,
Jung Hyun Bae
Abstract:
There have been significant research activities in recent years to automate the design of channel encoders and decoders via deep learning. Due the dimensionality challenge in channel coding, it is prohibitively complex to design and train relatively large neural channel codes via deep learning techniques. Consequently, most of the results in the literature are limited to relatively short codes hav…
▽ More
There have been significant research activities in recent years to automate the design of channel encoders and decoders via deep learning. Due the dimensionality challenge in channel coding, it is prohibitively complex to design and train relatively large neural channel codes via deep learning techniques. Consequently, most of the results in the literature are limited to relatively short codes having less than 100 information bits. In this paper, we construct ProductAEs, a computationally efficient family of deep-learning driven (encoder, decoder) pairs, that aim at enabling the training of relatively large channel codes (both encoders and decoders) with a manageable training complexity. We build upon the ideas from classical product codes, and propose constructing large neural codes using smaller code components. More specifically, instead of directly training the encoder and decoder for a large neural code of dimension $k$ and blocklength $n$, we provide a framework that requires training neural encoders and decoders for the code parameters $(n_1,k_1)$ and $(n_2,k_2)$ such that $n_1 n_2=n$ and $k_1 k_2=k$. Our training results show significant gains, over all ranges of signal-to-noise ratio (SNR), for a code of parameters $(225,100)$ and a moderate-length code of parameters $(441,196)$, over polar codes under successive cancellation (SC) decoder. Moreover, our results demonstrate meaningful gains over Turbo Autoencoder (TurboAE) and state-of-the-art classical codes. This is the first work to design product autoencoders and a pioneering work on training large channel codes.
△ Less
Submitted 10 September, 2022; v1 submitted 9 October, 2021;
originally announced October 2021.
-
Acoustic Signal based Non-Contact Ball Bearing Fault Diagnosis Using Adaptive Wavelet Denoising
Authors:
Wonho Jung,
Jaewoong Bae,
Yong-Hwa Park
Abstract:
This paper presents a non-contact fault diagnostic method for ball bearing using adaptive wavelet denoising, statistical-spectral acoustic features, and one-dimensional (1D) convolutional neural networks (CNN). The health conditions of the ball bearing are monitored by microphone under noisy conditions. To eliminate noise, adaptive wavelet denoising method based on kurtosis-entropy (KE) index is p…
▽ More
This paper presents a non-contact fault diagnostic method for ball bearing using adaptive wavelet denoising, statistical-spectral acoustic features, and one-dimensional (1D) convolutional neural networks (CNN). The health conditions of the ball bearing are monitored by microphone under noisy conditions. To eliminate noise, adaptive wavelet denoising method based on kurtosis-entropy (KE) index is proposed. Multiple acoustic features are extracted base on expert knowledge. The 1D ResNet is used to classify the health conditions of the bearings. Case study is presented to examine the proposed method's capability to monitor the condition of ball bearing. The fault diagnosis results were compared with and without the adaptive wavelet denoising. The results show its effectiveness on the proposed fault diagnostic method using acoustic signals.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Model-based Synthetic Data-driven Learning (MOST-DL): Application in Single-shot T2 Map** with Severe Head Motion Using Overlap**-echo Acquisition
Authors:
Qinqin Yang,
Yanhong Lin,
Jiechao Wang,
Jianfeng Bao,
Xiaoyin Wang,
Lingceng Ma,
Zihan Zhou,
Qizhi Yang,
Shuhui Cai,
Hongjian He,
Congbo Cai,
Jiyang Dong,
**gliang Cheng,
Zhong Chen,
Jianhui Zhong
Abstract:
Use of synthetic data has provided a potential solution for addressing unavailable or insufficient training samples in deep learning-based magnetic resonance imaging (MRI). However, the challenge brought by domain gap between synthetic and real data is usually encountered, especially under complex experimental conditions. In this study, by combining Bloch simulation and general MRI models, we prop…
▽ More
Use of synthetic data has provided a potential solution for addressing unavailable or insufficient training samples in deep learning-based magnetic resonance imaging (MRI). However, the challenge brought by domain gap between synthetic and real data is usually encountered, especially under complex experimental conditions. In this study, by combining Bloch simulation and general MRI models, we propose a framework for addressing the lack of training data in supervised learning scenarios, termed MOST-DL. A challenging application is demonstrated to verify the proposed framework and achieve motion-robust T2 map** using single-shot overlap**-echo acquisition. We decompose the process into two main steps: (1) calibrationless parallel reconstruction for ultra-fast pulse sequence and (2) intra-shot motion correction for T2 map**. To bridge the domain gap, realistic textures from a public database and various imperfection simulations were explored. The neural network was first trained with pure synthetic data and then evaluated with in vivo human brain. Both simulation and in vivo experiments show that the MOST-DL method significantly reduces ghosting and motion artifacts in T2 maps in the presence of unpredictable subject movement and has the potential to be applied to motion-prone patients in the clinic.
△ Less
Submitted 29 May, 2022; v1 submitted 30 July, 2021;
originally announced July 2021.
-
Attention-based Multi-scale Gated Recurrent Encoder with Novel Correlation Loss for COVID-19 Progression Prediction
Authors:
Aishik Konwer,
Joseph Bae,
Gagandeep Singh,
Rishabh Gattu,
Syed Ali,
Jeremy Green,
Tej Phatak,
Prateek Prasanna
Abstract:
COVID-19 image analysis has mostly focused on diagnostic tasks using single timepoint scans acquired upon disease presentation or admission. We present a deep learning-based approach to predict lung infiltrate progression from serial chest radiographs (CXRs) of COVID-19 patients. Our method first utilizes convolutional neural networks (CNNs) for feature extraction from patches within the concerned…
▽ More
COVID-19 image analysis has mostly focused on diagnostic tasks using single timepoint scans acquired upon disease presentation or admission. We present a deep learning-based approach to predict lung infiltrate progression from serial chest radiographs (CXRs) of COVID-19 patients. Our method first utilizes convolutional neural networks (CNNs) for feature extraction from patches within the concerned lung zone, and also from neighboring and remote boundary regions. The framework further incorporates a multi-scale Gated Recurrent Unit (GRU) with a correlation module for effective predictions. The GRU accepts CNN feature vectors from three different areas as input and generates a fused representation. The correlation module attempts to minimize the correlation loss between hidden representations of concerned and neighboring area feature vectors, while maximizing the loss between the same from concerned and remote regions. Further, we employ an attention module over the output hidden states of each encoder timepoint to generate a context vector. This vector is used as an input to a decoder module to predict patch severity grades at a future timepoint. Finally, we ensemble the patch classification scores to calculate patient-wise grades. Specifically, our framework predicts zone-wise disease severity for a patient on a given day by learning representations from the previous temporal CXRs. Our novel multi-institutional dataset comprises sequential CXR scans from N=93 patients. Our approach outperforms transfer learning and radiomic feature-based baseline approaches on this dataset.
△ Less
Submitted 17 July, 2021;
originally announced July 2021.
-
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Authors:
**hyeok Yang,
Jae-Sung Bae,
Taejun Bak,
Youngik Kim,
Hoon-Young Cho
Abstract:
Recent advances in neural multi-speaker text-to-speech (TTS) models have enabled the generation of reasonably good speech quality with a single model and made it possible to synthesize the speech of a speaker with limited training data. Fine-tuning to the target speaker data with the multi-speaker model can achieve better quality, however, there still exists a gap compared to the real speech sampl…
▽ More
Recent advances in neural multi-speaker text-to-speech (TTS) models have enabled the generation of reasonably good speech quality with a single model and made it possible to synthesize the speech of a speaker with limited training data. Fine-tuning to the target speaker data with the multi-speaker model can achieve better quality, however, there still exists a gap compared to the real speech sample and the model depends on the speaker. In this work, we propose GANSpeech, which is a high-fidelity multi-speaker TTS model that adopts the adversarial training method to a non-autoregressive multi-speaker TTS model. In addition, we propose simple but efficient automatic scaling methods for feature matching loss used in adversarial training. In the subjective listening tests, GANSpeech significantly outperformed the baseline multi-speaker FastSpeech and FastSpeech2 models, and showed a better MOS score than the speaker-specific fine-tuned FastSpeech2.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech
Authors:
Jae-Sung Bae,
Tae-Jun Bak,
Young-Sun Joo,
Hoon-Young Cho
Abstract:
In this paper, we propose methods for improving the modeling performance of a Transformer-based non-autoregressive text-to-speech (TNA-TTS) model. Although the text encoder and audio decoder handle different types and lengths of data (i.e., text and audio), the TNA-TTS models are not designed considering these variations. Therefore, to improve the modeling performance of the TNA-TTS model we propo…
▽ More
In this paper, we propose methods for improving the modeling performance of a Transformer-based non-autoregressive text-to-speech (TNA-TTS) model. Although the text encoder and audio decoder handle different types and lengths of data (i.e., text and audio), the TNA-TTS models are not designed considering these variations. Therefore, to improve the modeling performance of the TNA-TTS model we propose a hierarchical Transformer structure-based text encoder and audio decoder that are designed to accommodate the characteristics of each module. For the text encoder, we constrain each self-attention layer so the encoder focuses on a text sequence from the local to the global scope. Conversely, the audio decoder constrains its self-attention layers to focus in the reverse direction, i.e., from global to local scope. Additionally, we further improve the pitch modeling accuracy of the audio decoder by providing sentence and word-level pitch as conditions. Various objective and subjective evaluations verified that the proposed method outperformed the baseline TNA-TTS.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis
Authors:
Taejun Bak,
Jae-Sung Bae,
Hanbin Bae,
Young-Ik Kim,
Hoon-Young Cho
Abstract:
Methods for modeling and controlling prosody with acoustic features have been proposed for neural text-to-speech (TTS) models. Prosodic speech can be generated by conditioning acoustic features. However, synthesized speech with a large pitch-shift scale suffers from audio quality degradation, and speaker characteristics deformation. To address this problem, we propose a feed-forward Transformer ba…
▽ More
Methods for modeling and controlling prosody with acoustic features have been proposed for neural text-to-speech (TTS) models. Prosodic speech can be generated by conditioning acoustic features. However, synthesized speech with a large pitch-shift scale suffers from audio quality degradation, and speaker characteristics deformation. To address this problem, we propose a feed-forward Transformer based TTS model that is designed based on the source-filter theory. This model, called FastPitchFormant, has a unique structure that handles text and acoustic features in parallel. With modeling each feature separately, the tendency that the model learns the relationship between two features can be mitigated.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
Discrete-time Contraction-based Control of Nonlinear Systems with Parametric Uncertainties using Neural Networks
Authors:
Lai Wei,
Ryan McCloy,
Jie Bao
Abstract:
In response to the continuously changing feedstock supply and market demand for products with different specifications, the processes need to be operated at time-varying operating conditions and targets (e.g., setpoints) to improve the process economy, in contrast to traditional process operations around predetermined equilibriums. In this paper, a contraction theory-based control approach using n…
▽ More
In response to the continuously changing feedstock supply and market demand for products with different specifications, the processes need to be operated at time-varying operating conditions and targets (e.g., setpoints) to improve the process economy, in contrast to traditional process operations around predetermined equilibriums. In this paper, a contraction theory-based control approach using neural networks is developed for nonlinear chemical processes to achieve time-varying reference tracking. This approach leverages the universal approximation characteristics of neural networks with discrete-time contraction analysis and control. It involves training a neural network to learn a contraction metric and differential feedback gain, that is embedded in a contraction-based controller. A second, separate neural network is also incorporated into the control-loop to perform online learning of uncertain system model parameters. The resulting control scheme is capable of achieving efficient offset-free tracking of time-varying references, with a full range of model uncertainty, without the need for controller structure redesign as the reference changes. This is a robust approach that can deal with bounded parametric uncertainties in the process model, which are commonly encountered in industrial (chemical) processes. This approach also ensures the process stability during online simultaneous learning and control. Simulation examples are provided to illustrate the above approach.
△ Less
Submitted 20 June, 2022; v1 submitted 12 May, 2021;
originally announced May 2021.
-
Control Contraction Metric Synthesis for Discrete-time Nonlinear Systems
Authors:
Lai Wei,
Ryan Mccloy,
Jie Bao
Abstract:
Flexible manufacturing has been the trend in the area of the modern chemical process nowadays. One of the essential characteristics of flexible manufacturing is to track time-varying target trajectories (e.g. diversity and quantity of products). A possible tool to achieve time-varying targets is contraction theory. However, the contraction theory was developed for continuous time systems and there…
▽ More
Flexible manufacturing has been the trend in the area of the modern chemical process nowadays. One of the essential characteristics of flexible manufacturing is to track time-varying target trajectories (e.g. diversity and quantity of products). A possible tool to achieve time-varying targets is contraction theory. However, the contraction theory was developed for continuous time systems and there lacks analysis and synthesis tools for discrete-time systems. This article develops a systematic approach to discrete-time contraction analysis and control synthesis using Discrete-time Control Contraction Metrics (DCCM) which can be implemented using Sum of Square (SOS) programming. The proposed approach is demonstrated by illustrative example.
△ Less
Submitted 12 May, 2021; v1 submitted 21 April, 2021;
originally announced April 2021.
-
Behavioural Approach to Distributed Control of Interconnected Systems
Authors:
Yitao Yan,
Jie Bao,
Biao Huang
Abstract:
This paper formulates a framework for the analysis and distributed control of interconnected systems from the behavioural perspective. The discussions are carried out from the viewpoint of set theory and the results are completely representation-free. The core of a dynamical system can be represented as the set of all trajectories admissible through the system and interconnections are interpreted…
▽ More
This paper formulates a framework for the analysis and distributed control of interconnected systems from the behavioural perspective. The discussions are carried out from the viewpoint of set theory and the results are completely representation-free. The core of a dynamical system can be represented as the set of all trajectories admissible through the system and interconnections are interpreted as constraints on the choice of trajectories. We develop a structure in which the interconnected behaviour can be directly built from the behaviours of the subsystems in an explicit way without any presumed forms of representations. We show that the interconnected behaviour can also be fully obtained from local observations of the subsystem. Furthermore, we develop the necessary and sufficient conditions for the existence of distributed controller behaviours and their explicit construction. Due to the entirely representation-free nature of this framework, it unites various representations and descriptions of features of dynamical systems (e.g. models, dissipativity, data, etc.) as behaviours, allowing for the formation of a unified platform for the analysis and distributed control for interconnected systems.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music
Authors:
Hanbin Bae,
Jae-Sung Bae,
Young-Sun Joo,
Young-Ik Kim,
Hoon-Young Cho
Abstract:
Recently, it has become easier to obtain speech data from various media such as the internet or YouTube, but directly utilizing them to train a neural text-to-speech (TTS) model is difficult. The proportion of clean speech is insufficient and the remainder includes background music. Even with the global style token (GST). Therefore, we propose the following method to successfully train an end-to-e…
▽ More
Recently, it has become easier to obtain speech data from various media such as the internet or YouTube, but directly utilizing them to train a neural text-to-speech (TTS) model is difficult. The proportion of clean speech is insufficient and the remainder includes background music. Even with the global style token (GST). Therefore, we propose the following method to successfully train an end-to-end TTS model with limited broadcast data. First, the background music is removed from the speech by introducing a music filter. Second, the GST-TTS model with an auxiliary quality classifier is trained with the filtered speech and a small amount of clean speech. In particular, the quality classifier makes the embedding vector of the GST layer focus on representing the speech quality (filtered or clean) of the input speech. The experimental results verified that the proposed method synthesized much more high-quality speech than conventional methods.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Authors:
Jungil Kong,
Jaehyeon Kim,
Jaekyoung Bae
Abstract:
Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive and flow-based generative models. In this work, we propose HiFi-GAN, which achieves both efficient and high-fidelity speech synthesis. As speech…
▽ More
Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive and flow-based generative models. In this work, we propose HiFi-GAN, which achieves both efficient and high-fidelity speech synthesis. As speech audio consists of sinusoidal signals with various periods, we demonstrate that modeling periodic patterns of an audio is crucial for enhancing sample quality. A subjective human evaluation (mean opinion score, MOS) of a single speaker dataset indicates that our proposed method demonstrates similarity to human quality while generating 22.05 kHz high-fidelity audio 167.9 times faster than real-time on a single V100 GPU. We further show the generality of HiFi-GAN to the mel-spectrogram inversion of unseen speakers and end-to-end speech synthesis. Finally, a small footprint version of HiFi-GAN generates samples 13.4 times faster than real-time on CPU with comparable quality to an autoregressive counterpart.
△ Less
Submitted 23 October, 2020; v1 submitted 12 October, 2020;
originally announced October 2020.
-
Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning
Authors:
Jae-Sung Bae,
Hanbin Bae,
Young-Sun Joo,
Junmo Lee,
Gyeong-Hoon Lee,
Hoon-Young Cho
Abstract:
This paper proposes a controllable end-to-end text-to-speech (TTS) system to control the speaking speed (speed-controllable TTS; SCTTS) of synthesized speech with sentence-level speaking-rate value as an additional input. The speaking-rate value, the ratio of the number of input phonemes to the length of input speech, is adopted in the proposed system to control the speaking speed. Furthermore, th…
▽ More
This paper proposes a controllable end-to-end text-to-speech (TTS) system to control the speaking speed (speed-controllable TTS; SCTTS) of synthesized speech with sentence-level speaking-rate value as an additional input. The speaking-rate value, the ratio of the number of input phonemes to the length of input speech, is adopted in the proposed system to control the speaking speed. Furthermore, the proposed SCTTS system can control the speaking speed while retaining other speech attributes, such as the pitch, by adopting the global style token-based style encoder. The proposed SCTTS does not require any additional well-trained model or an external speech database to extract phoneme-level duration information and can be trained in an end-to-end manner. In addition, our listening tests on fast-, normal-, and slow-speed speech showed that the SCTTS can generate more natural speech than other phoneme duration control approaches which increase or decrease duration at the same rate for the entire sentence, especially in the case of slow-speed speech.
△ Less
Submitted 13 August, 2020; v1 submitted 30 July, 2020;
originally announced July 2020.
-
Predicting Clinical Outcomes in COVID-19 using Radiomics and Deep Learning on Chest Radiographs: A Multi-Institutional Study
Authors:
Joseph Bae,
Saarthak Kapse,
Gagandeep Singh,
Rishabh Gattu,
Syed Ali,
Neal Shah,
Colin Marshall,
Jonathan Pierce,
Tej Phatak,
Amit Gupta,
Jeremy Green,
Nikhil Madan,
Prateek Prasanna
Abstract:
We predict mechanical ventilation requirement and mortality using computational modeling of chest radiographs (CXRs) for coronavirus disease 2019 (COVID-19) patients. This two-center, retrospective study analyzed 530 deidentified CXRs from 515 COVID-19 patients treated at Stony Brook University Hospital and Newark Beth Israel Medical Center between March and August 2020. DL and machine learning cl…
▽ More
We predict mechanical ventilation requirement and mortality using computational modeling of chest radiographs (CXRs) for coronavirus disease 2019 (COVID-19) patients. This two-center, retrospective study analyzed 530 deidentified CXRs from 515 COVID-19 patients treated at Stony Brook University Hospital and Newark Beth Israel Medical Center between March and August 2020. DL and machine learning classifiers to predict mechanical ventilation requirement and mortality were trained and evaluated using patient CXRs. A novel radiomic embedding framework was also explored for outcome prediction. All results are compared against radiologist grading of CXRs (zone-wise expert severity scores). Radiomic and DL classification models had mAUCs of 0.78+/-0.02 and 0.81+/-0.04, compared with expert scores mAUCs of 0.75+/-0.02 and 0.79+/-0.05 for mechanical ventilation requirement and mortality prediction, respectively. Combined classifiers using both radiomics and expert severity scores resulted in mAUCs of 0.79+/-0.04 and 0.83+/-0.04 for each prediction task, demonstrating improvement over either artificial intelligence or radiologist interpretation alone. Our results also suggest instances where inclusion of radiomic features in DL improves model predictions, something that might be explored in other pathologies. The models proposed in this study and the prognostic information they provide might aid physician decision making and resource allocation during the COVID-19 pandemic.
△ Less
Submitted 1 July, 2021; v1 submitted 15 July, 2020;
originally announced July 2020.
-
PriorGAN: Real Data Prior for Generative Adversarial Nets
Authors:
Shuyang Gu,
Jianmin Bao,
Dong Chen,
Fang Wen
Abstract:
Generative adversarial networks (GANs) have achieved rapid progress in learning rich data distributions. However, we argue about two main issues in existing techniques. First, the low quality problem where the learned distribution has massive low quality samples. Second, the missing modes problem where the learned distribution misses some certain regions of the real data distribution. To address t…
▽ More
Generative adversarial networks (GANs) have achieved rapid progress in learning rich data distributions. However, we argue about two main issues in existing techniques. First, the low quality problem where the learned distribution has massive low quality samples. Second, the missing modes problem where the learned distribution misses some certain regions of the real data distribution. To address these two issues, we propose a novel prior that captures the whole real data distribution for GANs, which are called PriorGANs. To be specific, we adopt a simple yet elegant Gaussian Mixture Model (GMM) to build an explicit probability distribution on the feature level for the whole real data. By maximizing the probability of generated data, we can push the low quality samples to high quality. Meanwhile, equipped with the prior, we can estimate the missing modes in the learned distribution and design a sampling strategy on the real data to solve the problem. The proposed real data prior can generalize to various training settings of GANs, such as LSGAN, WGAN-GP, SNGAN, and even the StyleGAN. Our experiments demonstrate that PriorGANs outperform the state-of-the-art on the CIFAR-10, FFHQ, LSUN-cat, and LSUN-bird datasets by large margins.
△ Less
Submitted 30 June, 2020;
originally announced June 2020.
-
Evaluation of Sampling Methods for Robotic Sediment Sampling Systems
Authors:
Jun Han Bae,
Wonse Jo,
Jee Hwan Park,
Richard M. Voyles,
Sara K. McMillan,
Byung-Cheol Min
Abstract:
Analysis of sediments from rivers, lakes, reservoirs, wetlands and other constructed surface water impoundments is an important tool to characterize the function and health of these systems, but is generally carried out manually. This is costly and can be hazardous and difficult for humans due to inaccessibility, contamination, or availability of required equipment. Robotic sampling systems can ea…
▽ More
Analysis of sediments from rivers, lakes, reservoirs, wetlands and other constructed surface water impoundments is an important tool to characterize the function and health of these systems, but is generally carried out manually. This is costly and can be hazardous and difficult for humans due to inaccessibility, contamination, or availability of required equipment. Robotic sampling systems can ease these burdens, but little work has examined the efficiency of such sampling means and no prior work has investigated the quality of the resulting samples. This paper presents an experimental study that evaluates and optimizes sediment sampling patterns applied to a robot sediment sampling system that allows collection of minimally-disturbed sediment cores from natural and man-made water bodies for various sediment types. To meet this need, we developed and tested a robotic sampling platform in the laboratory to test functionality under a range of sediment types and operating conditions. Specifically, we focused on three patterns by which a cylindrical coring device was driven into the sediment (linear, helical, and zig-zag) for three sediment types (coarse sand, medium sand, and silt). The results show that the optimal sampling pattern varies depending on the type of sediment and can be optimized based on the sampling objective. We examined two sampling objectives: maximizing the mass of minimally disturbed sediment and minimizing the power per mass of sample. This study provides valuable data to aid in the selection of optimal sediment coring methods for various applications and builds a solid foundation for future field testing under a range of environmental conditions.
△ Less
Submitted 23 June, 2020;
originally announced June 2020.
-
GIQA: Generated Image Quality Assessment
Authors:
Shuyang Gu,
Jianmin Bao,
Dong Chen,
Fang Wen
Abstract:
Generative adversarial networks (GANs) have achieved impressive results today, but not all generated images are perfect. A number of quantitative criteria have recently emerged for generative model, but none of them are designed for a single generated image. In this paper, we propose a new research topic, Generated Image Quality Assessment (GIQA), which quantitatively evaluates the quality of each…
▽ More
Generative adversarial networks (GANs) have achieved impressive results today, but not all generated images are perfect. A number of quantitative criteria have recently emerged for generative model, but none of them are designed for a single generated image. In this paper, we propose a new research topic, Generated Image Quality Assessment (GIQA), which quantitatively evaluates the quality of each generated image. We introduce three GIQA algorithms from two perspectives: learning-based and data-based. We evaluate a number of images generated by various recent GAN models on different datasets and demonstrate that they are consistent with human assessments. Furthermore, GIQA is available to many applications, like separately evaluating the realism and diversity of generative models, and enabling online hard negative mining (OHEM) in the training of GANs to improve the results.
△ Less
Submitted 14 July, 2020; v1 submitted 19 March, 2020;
originally announced March 2020.
-
Phase-Aware Speech Enhancement with a Recurrent Two Stage Net work
Authors:
Juntae Kim,
Jaesung Bae
Abstract:
We propose a neural network-based speech enhancement (SE) method called the phase-aware recurrent two stage network (rTSN). The rTSN is an extension of our previously proposed two stage network (TSN) framework. This TSN framework was equipped with a boosting strategy (BS) that initially estimates the multiple base predictions (MBPs) from a prior neural network (pri-NN) and then the MBPs are aggreg…
▽ More
We propose a neural network-based speech enhancement (SE) method called the phase-aware recurrent two stage network (rTSN). The rTSN is an extension of our previously proposed two stage network (TSN) framework. This TSN framework was equipped with a boosting strategy (BS) that initially estimates the multiple base predictions (MBPs) from a prior neural network (pri-NN) and then the MBPs are aggregated by a posterior neural network (post-NN) to obtain the final prediction. The TSN outperformed various state-of-the-art methods; however, it adopted the simple deep neural network as pri-NN. We have found that the pri-NN affects the performance (in perceptual quality), more than post-NN; therefore we adopted the long short-term memory recurrent neural network (LSTM-RNN) as pri-NN to boost the context information usage within speech signals. Further, the TSN framework did not consider the phase reconstruction, though phase information affected the perceptual quality. Therefore, we proposed to adopt the phase reconstruction method based on the Griffin-Lim algorithm. Finally, we evaluated rTSN with baselines such as TSN in perceptual quality related metrics as well as the phone recognition error rate.
△ Less
Submitted 27 January, 2020;
originally announced January 2020.
-
End-Point Detection with State Transition Model based on Chunk-Wise Classification
Authors:
Juntae Kim,
Jaesung Bae,
Minsoo Hahn
Abstract:
A state transition model (STM) based on chunk-wise classification was proposed for end-point detection (EPD). In general, EPD is developed using frame-wise voice activity detection (VAD) with additional STM, in which the state transition is conducted based on VAD's frame-level decision (speech or non-speech). However, VAD errors frequently occur in noisy environments, even though we use state-of-t…
▽ More
A state transition model (STM) based on chunk-wise classification was proposed for end-point detection (EPD). In general, EPD is developed using frame-wise voice activity detection (VAD) with additional STM, in which the state transition is conducted based on VAD's frame-level decision (speech or non-speech). However, VAD errors frequently occur in noisy environments, even though we use state-of-the-art deep neural network based VAD, which causes the undesired state transition of STM. In this work, to build robust STM, a state transition is conducted based on chunk-wise classification as EPD does not need to be conducted in frame-level. The chunk consists of multiple frames and the classification of chunk between speech and non-speech is done by aggregating the decisions of VAD for multiple frames, so that some undesired VAD errors in a chunk can be smoothed by other correct VAD decisions. Finally, the model was evaluated in both qualitative and quantitative measures including phone error rate.
△ Less
Submitted 22 December, 2019;
originally announced December 2019.
-
Electronics of Time-of-flight Measurement for Back-n at CSNS
Authors:
T. Yu,
P. Cao,
X. Y. Ji,
L. K. Xie,
X. R. Huang,
Q. An,
H. Y. Bai,
J. Bao,
Y. H. Chen,
P. J. Cheng,
Z. Q. Cui,
R. R. Fan,
C. Q. Feng,
M. H. Gu,
Z. J. Han,
G. Z. He,
Y. C. He,
Y. F. He,
H. X. Huang,
W. L. Huang,
X. L. Ji,
H. Y. Jiang,
W. Jiang,
H. Y. **g,
L. Kang
, et al. (46 additional authors not shown)
Abstract:
Back-n is a white neutron experimental facility at China Spallation Neutron Source (CSNS). The time structure of the primary proton beam make it fully applicable to use TOF (time-of-flight) method for neutron energy measuring. We implement the electronics of TOF measurement on the general-purpose readout electronics designed for all of the seven detectors in Back-n. The electronics is based on PXI…
▽ More
Back-n is a white neutron experimental facility at China Spallation Neutron Source (CSNS). The time structure of the primary proton beam make it fully applicable to use TOF (time-of-flight) method for neutron energy measuring. We implement the electronics of TOF measurement on the general-purpose readout electronics designed for all of the seven detectors in Back-n. The electronics is based on PXIe (Peripheral Component Interconnect Express eXtensions for Instrumentation) platform, which is composed of FDM (Field Digitizer Modules), TCM (Trigger and Clock Module), and SCM (Signal Conditioning Module). T0 signal synchronous to the CSNS accelerator represents the neutron emission from the target. It is the start of time stamp. The trigger and clock module (TCM) receives, synchronizes and distributes the T0 signal to each FDM based on the PXIe backplane bus. Meantime, detector signals after being conditioned are fed into FDMs for waveform digitizing. First sample point of the signal is the stop of time stamp. According to the start, stop time stamp and the time of signal over threshold, the total TOF can be obtained. FPGA-based (Field Programmable Gate Array) TDC is implemented on TCM to accurately acquire the time interval between the asynchronous T0 signal and the global synchronous clock phase. There is also an FPGA-based TDC on FDM to accurately acquire the time interval between T0 arriving at FDM and the first sample point of the detector signal, the over threshold time of signal is obtained offline. This method for TOF measurement is efficient and not needed for additional modules. Test result shows the accuracy of TOF is sub-nanosecond and can meet the requirement for Back-n at CSNS.
△ Less
Submitted 24 June, 2018;
originally announced June 2018.