Skip to main content

Showing 1–50 of 51 results for author: Bao, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.04158  [pdf, other

    cs.CV eess.IV

    Sparse Multi-baseline SAR Cross-modal 3D Reconstruction of Vehicle Targets

    Authors: Da Li, Guoqiang Zhao, Houjun Sun, Jiacheng Bao

    Abstract: Multi-baseline SAR 3D imaging faces significant challenges due to data sparsity. In recent years, deep learning techniques have achieved notable success in enhancing the quality of sparse SAR 3D imaging. However, previous work typically rely on full-aperture high-resolution radar images to supervise the training of deep neural networks (DNNs), utilizing only single-modal information from radar dat… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2406.00416  [pdf, other

    stat.ML cs.LG eess.SP

    Representation and De-interleaving of Mixtures of Hidden Markov Processes

    Authors: Jiadi Bao, Mengtao Zhu, Yunjie Li, Shafei Wang

    Abstract: De-interleaving of the mixtures of Hidden Markov Processes (HMPs) generally depends on its representation model. Existing representation models consider Markov chain mixtures rather than hidden Markov, resulting in the lack of robustness to non-ideal situations such as observation noise or missing observations. Besides, de-interleaving methods utilize a search-based strategy, which is time-consumi… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 13 pages, 9 figures, submitted to IEEE transactions on Signal Processing

  3. arXiv:2404.11929  [pdf, other

    eess.IV cs.AI cs.CV

    A Symmetric Regressor for MRI-Based Assessment of Striatal Dopamine Transporter Uptake in Parkinson's Disease

    Authors: Walid Abdullah Al, Il Dong Yun, Yun Jung Bae

    Abstract: Dopamine transporter (DAT) imaging is commonly used for monitoring Parkinson's disease (PD), where striatal DAT uptake amount is computed to assess PD severity. However, DAT imaging has a high cost and the risk of radiance exposure and is not available in general clinics. Recently, MRI patch of the nigral region has been proposed as a safer and easier alternative. This paper proposes a symmetric r… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  4. arXiv:2403.18707  [pdf, other

    math.OC eess.SY

    Connections between Reachability and Time Optimality

    Authors: Juho Bae, Ji Hoon Bai, Byung-Yoon Lee, Jun-Yong Lee, Chang-Hun Lee

    Abstract: This paper presents the concept of an equivalence relation between the set of optimal control problems. By leveraging this concept, we show that the boundary of the reachability set can be constructed by the solutions of time optimal problems. Alongside, a more generalized equivalence theorem is presented together. The findings facilitate the use of solution structures from a certain class of opti… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Submitted to Automatica

  5. arXiv:2401.14304  [pdf, other

    eess.SY

    Constraint-Aware Mesh Refinement Method by Reachability Set Envelope of Curvature Bounded Paths

    Authors: Juho Bae, Ji Hoon Bai, Byung-Yoon Lee, Jun-Yong Lee

    Abstract: This paper presents an enhanced direct-method-based approach for the real-time solution of optimal control problems to handle path constraints, such as obstacles. The principal contributions of this work are twofold: first, the existing methods for constructing reachability sets in the literature are extended to derive the envelope of these sets, which determines the region swept by all feasible t… ▽ More

    Submitted 4 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: Preprint submitted to Automatica

  6. arXiv:2310.19264  [pdf, other

    cs.MM cs.SD eess.AS

    Sound of Story: Multi-modal Storytelling with Audio

    Authors: Jaeyeon Bae, Seokhoon Jeong, Seokun Kang, Namgi Han, Jae-Yon Lee, Hyounghun Kim, Taehwan Kim

    Abstract: Storytelling is multi-modal in the real world. When one tells a story, one may use all of the visualizations and sounds along with the story itself. However, prior studies on storytelling datasets and tasks have paid little attention to sound even though sound also conveys meaningful semantics of the story. Therefore, we propose to extend story understanding and telling areas by establishing a new… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Findings of EMNLP 2023, project: https://github.com/Sosdatasets/SoS_Dataset/

  7. arXiv:2310.03538  [pdf, other

    eess.AS

    Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis

    Authors: Jae-Sung Bae, Joun Yeop Lee, Ji-Hyun Lee, Seongkyu Mun, Taehwa Kang, Hoon-Young Cho, Chanwoo Kim

    Abstract: Previous works in zero-shot text-to-speech (ZS-TTS) have attempted to enhance its systems by enlarging the training data through crowd-sourcing or augmenting existing speech data. However, the use of low-quality data has led to a decline in the overall system performance. To avoid such degradation, instead of directly augmenting the input data, we propose a latent filling (LF) method that adopts s… ▽ More

    Submitted 22 January, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: Accepted to ICASSP 2024

  8. arXiv:2308.08751  [pdf, other

    eess.SY math.NA math.ST

    Ensemble Kalman Filters with Resampling

    Authors: Omar Al Ghattas, Jiajun Bao, Daniel Sanz-Alonso

    Abstract: Filtering is concerned with online estimation of the state of a dynamical system from partial and noisy observations. In applications where the state of the system is high dimensional, ensemble Kalman filters are often the method of choice. These algorithms rely on an ensemble of interacting particles to sequentially estimate the state as new observations become available. Despite the practical su… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: 32 pages, 5 figures

  9. arXiv:2308.01173  [pdf, other

    eess.IV

    FlexDTI: Flexible diffusion gradient encoding scheme-based highly efficient diffusion tensor imaging using deep learning

    Authors: Zejun Wu, Jiechao Wang, Zunquan Chen, Qinqin Yang, Zhen Xing, Dairong Cao, Jianfeng Bao, Taishan Kang, Jianzhong Lin, Shuhui Cai, Zhong Chen, Congbo Cai

    Abstract: Objective: Most deep neural network-based diffusion tensor imaging methods require the diffusion gradients' number and directions in the data to be reconstructed to match those in the training data. This work aims to develop and evaluate a novel dynamic-convolution-based method called FlexDTI for highly efficient diffusion tensor reconstruction with flexible diffusion encoding gradient scheme. App… ▽ More

    Submitted 21 December, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: 24 pages,9 figures,3 tables

  10. arXiv:2306.11876  [pdf, other

    eess.IV cs.CV

    BMAD: Benchmarks for Medical Anomaly Detection

    Authors: **an Bao, Hanshi Sun, Hanqiu Deng, Yinsheng He, Zhaoxiang Zhang, Xingyu Li

    Abstract: Anomaly detection (AD) is a fundamental research problem in machine learning and computer vision, with practical applications in industrial inspection, video surveillance, and medical diagnosis. In medical imaging, AD is especially vital for detecting and diagnosing anomalies that may indicate rare diseases or conditions. However, there is a lack of a universal and fair benchmark for evaluating AD… ▽ More

    Submitted 27 April, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

  11. Distributed Data-driven Predictive Control via Dissipative Behavior Synthesis

    Authors: Yitao Yan, Jie Bao, Biao Huang

    Abstract: This paper presents a distributed data-driven predictive control (DDPC) approach using the behavioral framework. It aims to design a network of controllers for an interconnected system with linear time-invariant (LTI) subsystems such that a given global (network-wide) cost function is minimized while desired control performance (e.g., network stability and disturbance rejection) is achieved using… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Journal ref: IEEE Transactions on Automatic Control, 2023

  12. arXiv:2302.04407  [pdf, other

    eess.SP

    Bayesian Non-parametric Hidden Markov Model for Agile Radar Pulse Sequences Streaming Analysis

    Authors: Jiadi Bao, Yunjie Li, Mengtao Zhu, Shafei Wang

    Abstract: Multi-function radars (MFRs) are sophisticated types of sensors with the capabilities of complex agile inter-pulse modulation implementation and dynamic work mode scheduling. The developments in MFRs pose great challenges to modern electronic reconnaissance systems or radar warning receivers for recognition and inference of MFR work modes. To address this issue, this paper proposes an online proce… ▽ More

    Submitted 22 August, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: 15 pages, 10 figures, submitted to IEEE transactions on signal processing

  13. arXiv:2301.10772  [pdf

    q-bio.QM cs.LG eess.IV

    Gene-SGAN: a method for discovering disease subtypes with imaging and genetic signatures via multi-view weakly-supervised deep clustering

    Authors: Zhijian Yang, Junhao Wen, Ahmed Abdulkadir, Yuhan Cui, Guray Erus, Elizabeth Mamourian, Randa Melhem, Dhivya Srinivasan, Sindhuja T. Govindarajan, Jiong Chen, Mohamad Habes, Colin L. Masters, Paul Maruff, Jurgen Fripp, Luigi Ferrucci, Marilyn S. Albert, Sterling C. Johnson, John C. Morris, Pamela LaMontagne, Daniel S. Marcus, Tammie L. S. Benzinger, David A. Wolk, Li Shen, **gxuan Bao, Susan M. Resnick , et al. (3 additional authors not shown)

    Abstract: Disease heterogeneity has been a critical challenge for precision diagnosis and treatment, especially in neurologic and neuropsychiatric diseases. Many diseases can display multiple distinct brain phenotypes across individuals, potentially reflecting disease subtypes that can be captured using MRI and machine learning methods. However, biological interpretability and treatment relevance are limite… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

  14. arXiv:2211.03078  [pdf, other

    eess.AS cs.SD

    An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space

    Authors: Jihwan Lee, Jae-Sung Bae, Seongkyu Mun, Hee** Choi, Joun Yeop Lee, Hoon-Young Cho, Chanwoo Kim

    Abstract: With the recent developments in cross-lingual Text-to-Speech (TTS) systems, L2 (second-language, or foreign) accent problems arise. Moreover, running a subjective evaluation for such cross-lingual TTS systems is troublesome. The vowel space analysis, which is often utilized to explore various aspects of language including L2 accents, is a great alternative analysis tool. In this study, we apply th… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  15. arXiv:2210.02245  [pdf, other

    eess.SP eess.IV

    Channel Modeling for UAV-to-Ground Communications with Posture Variation and Fuselage Scattering Effect

    Authors: Boyu Hua, Haoran Ni, Qiuming Zhu, Cheng-Xiang Wang, Tongtong Zhou, Kai Mao, Junwei Bao, Xiaofei Zhang

    Abstract: Unmanned aerial vehicle (UAV)-to-ground (U2G) channel models play a pivotal role for reliable communications between UAV and ground terminal. This paper proposes a three-dimensional (3D) non-stationary hybrid model including both large-scale and small-scale fading for U2G multiple-input-multiple-output (MIMO) channels. Distinctive channel characteristics under U2G scenarios, i.e., 3D trajectory an… ▽ More

    Submitted 13 October, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

  16. arXiv:2209.08800  [pdf, ps, other

    eess.SP

    A Realistic 3D Non-Stationary Channel Model for UAV-to-Vehicle Communications Incorporating Fuselage Posture

    Authors: Boyu Hua, Tongtong Zhou, Qiuming Zhu, Kai Mao, Junwei Bao, Weizhi Zhong, Naeem Ahmed

    Abstract: Considering the unmanned aerial vehicle (UAV) three-dimensional (3D) posture, a novel 3D non-stationary geometry-based stochastic model (GBSM) is proposed for multiple-input multiple-output (MIMO) UAV-to-vehicle (U2V) channels. It consists of a line-of-sight (LoS) and non-line-of-sight (NLoS) components. The factor of fuselage posture is considered by introducing a time-variant 3D posture matrix.… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: 12 pages, 8 figures, CNCOM

  17. arXiv:2206.13404  [pdf, other

    eess.AS cs.AI cs.SD

    Avocodo: Generative Adversarial Network for Artifact-free Vocoder

    Authors: Taejun Bak, Junmo Lee, Hanbin Bae, **hyeok Yang, Jae-Sung Bae, Young-Sun Joo

    Abstract: Neural vocoders based on the generative adversarial neural network (GAN) have been widely used due to their fast inference speed and lightweight networks while generating high-quality speech waveforms. Since the perceptually important speech components are primarily concentrated in the low-frequency bands, most GAN-based vocoders perform multi-scale analysis that evaluates downsampled speech wavef… ▽ More

    Submitted 3 January, 2023; v1 submitted 27 June, 2022; originally announced June 2022.

    Comments: Accepted for publication in the 37th AAAI conference on artificial intelligence (AAAI 2023)

  18. arXiv:2206.07651  [pdf

    eess.SP

    Fault Diagnosis of Inter-turn Short Circuit in Permanent Magnet Synchronous Motors with Current Signal Imaging and Unsupervised Learning

    Authors: W. Jung, S. H. Yun, Y. S. Lim, S. Cheong, J. Bae, Y. H. Park

    Abstract: This paper proposes machine-independent feature engineering for winding inter-turn short circuit fault that uses electrical current signals. Electrical current signal collected from permanent magnet synchronous motor (PMSM) is subjected to different environmental and operational conditions. To solve these problems, robust current signal imaging method and deep learning-based feature extraction met… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: submitted to IECON 2022

  19. arXiv:2205.04465  [pdf, other

    eess.SY math.OC

    A Contraction-constrained Model Predictive Control for Multi-timescale Nonlinear Processes

    Authors: Ryan McCloy, Lai Wei, Jie Bao

    Abstract: Many chemical processes exhibit diverse timescale dynamics with a strong coupling between timescale sensitive variables. Model predictive control with a non-uniformly spaced optimisation horizon is an effective approach to multi-timescale control and offers opportunities for reduced computational complexity. In such an approach the fast, moderate and slow dynamics can be included in the optimisati… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: Submitted to IEEE for possible publication. arXiv admin note: text overlap with arXiv:2205.04033

  20. arXiv:2205.04033  [pdf, other

    eess.SY math.OC

    A Contraction-constrained Model Predictive Control for Nonlinear Processes using Disturbance Forecasts

    Authors: Ryan McCloy, Lai Wei, Jie Bao

    Abstract: Model predictive control (MPC) has become the most widely used advanced control method in process industry. In many cases, forecasts of the disturbances are available, e.g., predicted renewable power generation based on weather forecast. While the predictions of disturbances may not be accurate, utilizing the information can significantly improve the control performance in response to the disturba… ▽ More

    Submitted 6 June, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

    Comments: Accepted for presentation at 7th International Symposium on Advanced Control of Industrial Processes (AdCONIP 2022)

  21. arXiv:2204.04004  [pdf, other

    eess.AS cs.SD

    Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech

    Authors: Jae-Sung Bae, **hyeok Yang, Tae-Jun Bak, Young-Sun Joo

    Abstract: This paper proposes a hierarchical and multi-scale variational autoencoder-based non-autoregressive text-to-speech model (HiMuV-TTS) to generate natural speech with diverse speaking styles. Recent advances in non-autoregressive TTS (NAR-TTS) models have significantly improved the inference speed and robustness of synthesized speech. However, the diversity of speaking styles and naturalness are nee… ▽ More

    Submitted 15 August, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: Accepted to INTERSPEECH 2022

  22. arXiv:2204.01271  [pdf, other

    eess.AS cs.LG cs.SD

    Into-TTS : Intonation Template Based Prosody Control System

    Authors: Jihwan Lee, Joun Yeop Lee, Hee** Choi, Seongkyu Mun, Sangjun Park, Jae-Sung Bae, Chanwoo Kim

    Abstract: Intonations play an important role in delivering the intention of a speaker. However, current end-to-end TTS systems often fail to model proper intonations. To alleviate this problem, we propose a novel, intuitive method to synthesize speech in different intonations using predefined intonation templates. Prior to TTS model training, speech data are grouped into intonation templates in an unsupervi… ▽ More

    Submitted 6 November, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: Submitted to ICASSP 2023

  23. arXiv:2204.01003  [pdf

    cs.RO eess.SY

    Impact Intensity Estimation of a Quadruped Robot without Using a Force Sensor

    Authors: Ba-Phuc Huynh, Joonbum Bae

    Abstract: Estimating the impact intensity is one of the significant tasks of the legged robot. Accurate feedback of the impact may support the robot to plan a suitable and efficient trajectory to adapt to unknown complex terrains. Ordinarily, this task is performed by a force sensor in the robot's foot. In this letter, an impact intensity estimation without using a force sensor is proposed. An artificial ne… ▽ More

    Submitted 3 April, 2022; originally announced April 2022.

    Comments: 11 pages, 8 figures, 2 tables. The video is available at https://www.vinabot.com/2022/04/impact-intensity-estimation-of.html

  24. arXiv:2203.05573  [pdf, other

    eess.IV cs.CV cs.LG

    Self Pre-training with Masked Autoencoders for Medical Image Classification and Segmentation

    Authors: Lei Zhou, Huidong Liu, Joseph Bae, Junjun He, Dimitris Samaras, Prateek Prasanna

    Abstract: Masked Autoencoder (MAE) has recently been shown to be effective in pre-training Vision Transformers (ViT) for natural image analysis. By reconstructing full images from partially masked inputs, a ViT encoder aggregates contextual information to infer masked image regions. We believe that this context aggregation ability is particularly essential to the medical image domain where each anatomical s… ▽ More

    Submitted 21 April, 2023; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: ISBI2023 camera-ready version (no substantial difference from v1); Code is available at https://github.com/cvlab-stonybrook/SelfMedMAE

  25. arXiv:2203.01933  [pdf, other

    eess.IV cs.CV

    Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations

    Authors: Aishik Konwer, Xuan Xu, Joseph Bae, Chao Chen, Prateek Prasanna

    Abstract: Clinical outcome or severity prediction from medical images has largely focused on learning representations from single-timepoint or snapshot scans. It has been shown that disease progression can be better characterized by temporal imaging. We therefore hypothesized that outcome predictions can be improved by utilizing the disease progression information from sequential images. We present a deep l… ▽ More

    Submitted 30 March, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

    Comments: Accepted in CVPR 2022 (ORAL)

  26. arXiv:2201.12816  [pdf, other

    eess.SY

    Adaptive Contraction-based Control of Uncertain Nonlinear Processes using Neural Networks

    Authors: Lai Wei, Ryan McCloy, Jie Bao

    Abstract: Driven by the flexible manufacturing trend in the process control industry and the uncertain nature of chemical process models, this article aims to achieve offset-free tracking for a family of uncertain nonlinear systems (e.g., using process models with parametric uncertainties) with adaptable performance. The proposed adaptive control approach incorporates into the control loop an adaptive neura… ▽ More

    Submitted 9 May, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

    Comments: Accepted for presentation at 13th IFAC Symposium on Dynamics and Control of Process Systems, including Biosystems (DYCOPS 2022)

  27. arXiv:2201.12812  [pdf, other

    eess.SY

    Electrolyte Flow Rate Control for Vanadium Redox Flow Batteries using the Linear Parameter Varying Framework

    Authors: Ryan McCloy, Yifeng Li, Jie Bao, Maria Skyllas-Kazacos

    Abstract: In this article, an electrolyte flow rate control approach is developed for an all-vanadium redox flow battery (VRB) system based on the linear parameter varying (LPV) framework. The electrolyte flow rate is regulated to provide a trade-off between stack voltage efficiency and pum** energy losses, so as to achieve optimal battery energy efficiency. The nonlinear process model is embedded in a li… ▽ More

    Submitted 9 May, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

    Comments: Accepted to Journal of Process Control (29/Apr/2022)

  28. arXiv:2201.07344  [pdf, other

    eess.IV cs.CV

    Lung Swap** Autoencoder: Learning a Disentangled Structure-texture Representation of Chest Radiographs

    Authors: Lei Zhou, Joseph Bae, Huidong Liu, Gagandeep Singh, Jeremy Green, Amit Gupta, Dimitris Samaras, Prateek Prasanna

    Abstract: Well-labeled datasets of chest radiographs (CXRs) are difficult to acquire due to the high cost of annotation. Thus, it is desirable to learn a robust and transferable representation in an unsupervised manner to benefit tasks that lack labeled data. Unlike natural images, medical images have their own domain prior; e.g., we observe that many pulmonary diseases, such as the COVID-19, manifest as ch… ▽ More

    Submitted 18 January, 2022; originally announced January 2022.

    Comments: Extended version of the MICCAI 2021 paper https://link.springer.com/chapter/10.1007/978-3-030-87234-2_33 The code is available at https://github.com/cvlab-stonybrook/LSAE

  29. arXiv:2112.04699  [pdf, other

    eess.SY

    Contraction Analysis and Control Synthesis for Discrete-time Nonlinear Processes

    Authors: Lai Wei, Ryan McCloy, Jie Bao

    Abstract: Shifting away from the traditional mass production approach, the process industry is moving towards more agile, cost-effective and dynamic process operation (next-generation smart plants). This warrants the development of control systems for nonlinear chemical processes to be capable of tracking time-varying setpoints to produce products with different specifications as per market demand and deal… ▽ More

    Submitted 9 May, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: Accepted to Journal of Process Control (27/Apr/2022). arXiv admin note: text overlap with arXiv:2105.05432, arXiv:2104.10352

  30. arXiv:2112.01535  [pdf, other

    eess.IV cs.AI cs.LG

    Robust End-to-End Focal Liver Lesion Detection using Unregistered Multiphase Computed Tomography Images

    Authors: Sang-gil Lee, Eunji Kim, Jae Seok Bae, Jung Hoon Kim, Sungroh Yoon

    Abstract: The computer-aided diagnosis of focal liver lesions (FLLs) can help improve workflow and enable correct diagnoses; FLL detection is the first step in such a computer-aided diagnosis. Despite the recent success of deep-learning-based approaches in detecting FLLs, current methods are not sufficiently robust for assessing misaligned multiphase data. By introducing an attention-guided multiphase align… ▽ More

    Submitted 16 December, 2021; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: IEEE TETCI. 14 pages, 8 figures, 5 tables

  31. arXiv:2110.04466  [pdf, other

    cs.IT cs.LG eess.SY

    ProductAE: Towards Training Larger Channel Codes based on Neural Product Codes

    Authors: Mohammad Vahid Jamali, Hamid Saber, Homayoon Hatami, Jung Hyun Bae

    Abstract: There have been significant research activities in recent years to automate the design of channel encoders and decoders via deep learning. Due the dimensionality challenge in channel coding, it is prohibitively complex to design and train relatively large neural channel codes via deep learning techniques. Consequently, most of the results in the literature are limited to relatively short codes hav… ▽ More

    Submitted 10 September, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

  32. arXiv:2110.03348  [pdf

    stat.AP eess.SP

    Acoustic Signal based Non-Contact Ball Bearing Fault Diagnosis Using Adaptive Wavelet Denoising

    Authors: Wonho Jung, Jaewoong Bae, Yong-Hwa Park

    Abstract: This paper presents a non-contact fault diagnostic method for ball bearing using adaptive wavelet denoising, statistical-spectral acoustic features, and one-dimensional (1D) convolutional neural networks (CNN). The health conditions of the ball bearing are monitored by microphone under noisy conditions. To eliminate noise, adaptive wavelet denoising method based on kurtosis-entropy (KE) index is p… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP 2022

  33. arXiv:2107.14521  [pdf, other

    eess.IV

    Model-based Synthetic Data-driven Learning (MOST-DL): Application in Single-shot T2 Map** with Severe Head Motion Using Overlap**-echo Acquisition

    Authors: Qinqin Yang, Yanhong Lin, Jiechao Wang, Jianfeng Bao, Xiaoyin Wang, Lingceng Ma, Zihan Zhou, Qizhi Yang, Shuhui Cai, Hongjian He, Congbo Cai, Jiyang Dong, **gliang Cheng, Zhong Chen, Jianhui Zhong

    Abstract: Use of synthetic data has provided a potential solution for addressing unavailable or insufficient training samples in deep learning-based magnetic resonance imaging (MRI). However, the challenge brought by domain gap between synthetic and real data is usually encountered, especially under complex experimental conditions. In this study, by combining Bloch simulation and general MRI models, we prop… ▽ More

    Submitted 29 May, 2022; v1 submitted 30 July, 2021; originally announced July 2021.

    Comments: 15 pages, 13 figures

  34. arXiv:2107.08330  [pdf, other

    eess.IV cs.CV

    Attention-based Multi-scale Gated Recurrent Encoder with Novel Correlation Loss for COVID-19 Progression Prediction

    Authors: Aishik Konwer, Joseph Bae, Gagandeep Singh, Rishabh Gattu, Syed Ali, Jeremy Green, Tej Phatak, Prateek Prasanna

    Abstract: COVID-19 image analysis has mostly focused on diagnostic tasks using single timepoint scans acquired upon disease presentation or admission. We present a deep learning-based approach to predict lung infiltrate progression from serial chest radiographs (CXRs) of COVID-19 patients. Our method first utilizes convolutional neural networks (CNNs) for feature extraction from patches within the concerned… ▽ More

    Submitted 17 July, 2021; originally announced July 2021.

    Comments: The paper is early accepted to MICCAI 2021

  35. arXiv:2106.15153  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis

    Authors: **hyeok Yang, Jae-Sung Bae, Taejun Bak, Youngik Kim, Hoon-Young Cho

    Abstract: Recent advances in neural multi-speaker text-to-speech (TTS) models have enabled the generation of reasonably good speech quality with a single model and made it possible to synthesize the speech of a speaker with limited training data. Fine-tuning to the target speaker data with the multi-speaker model can achieve better quality, however, there still exists a gap compared to the real speech sampl… ▽ More

    Submitted 29 June, 2021; originally announced June 2021.

    Comments: Accepted to INTERSPEECH 2021

  36. arXiv:2106.15144  [pdf, other

    eess.AS

    Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech

    Authors: Jae-Sung Bae, Tae-Jun Bak, Young-Sun Joo, Hoon-Young Cho

    Abstract: In this paper, we propose methods for improving the modeling performance of a Transformer-based non-autoregressive text-to-speech (TNA-TTS) model. Although the text encoder and audio decoder handle different types and lengths of data (i.e., text and audio), the TNA-TTS models are not designed considering these variations. Therefore, to improve the modeling performance of the TNA-TTS model we propo… ▽ More

    Submitted 29 June, 2021; originally announced June 2021.

    Comments: Accepted to INTERSPEECH 2021

  37. arXiv:2106.15123  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis

    Authors: Taejun Bak, Jae-Sung Bae, Hanbin Bae, Young-Ik Kim, Hoon-Young Cho

    Abstract: Methods for modeling and controlling prosody with acoustic features have been proposed for neural text-to-speech (TTS) models. Prosodic speech can be generated by conditioning acoustic features. However, synthesized speech with a large pitch-shift scale suffers from audio quality degradation, and speaker characteristics deformation. To address this problem, we propose a feed-forward Transformer ba… ▽ More

    Submitted 29 June, 2021; originally announced June 2021.

    Comments: Accepted to INTERSPEECH 2021

  38. arXiv:2105.05432  [pdf, other

    eess.SY cs.LG math.OC

    Discrete-time Contraction-based Control of Nonlinear Systems with Parametric Uncertainties using Neural Networks

    Authors: Lai Wei, Ryan McCloy, Jie Bao

    Abstract: In response to the continuously changing feedstock supply and market demand for products with different specifications, the processes need to be operated at time-varying operating conditions and targets (e.g., setpoints) to improve the process economy, in contrast to traditional process operations around predetermined equilibriums. In this paper, a contraction theory-based control approach using n… ▽ More

    Submitted 20 June, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

    Comments: This work has been submitted to Computers & Chemical Engineering for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  39. arXiv:2104.10352  [pdf, other

    eess.SY math.OC

    Control Contraction Metric Synthesis for Discrete-time Nonlinear Systems

    Authors: Lai Wei, Ryan Mccloy, Jie Bao

    Abstract: Flexible manufacturing has been the trend in the area of the modern chemical process nowadays. One of the essential characteristics of flexible manufacturing is to track time-varying target trajectories (e.g. diversity and quantity of products). A possible tool to achieve time-varying targets is contraction theory. However, the contraction theory was developed for continuous time systems and there… ▽ More

    Submitted 12 May, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: This work is accepted by 11th IFAC SYMPOSIUM on Advanced Control of Chemical Processes

  40. arXiv:2103.10063  [pdf, other

    eess.SY

    Behavioural Approach to Distributed Control of Interconnected Systems

    Authors: Yitao Yan, Jie Bao, Biao Huang

    Abstract: This paper formulates a framework for the analysis and distributed control of interconnected systems from the behavioural perspective. The discussions are carried out from the viewpoint of set theory and the results are completely representation-free. The core of a dynamical system can be represented as the set of all trajectories admissible through the system and interconnections are interpreted… ▽ More

    Submitted 18 March, 2021; originally announced March 2021.

  41. arXiv:2103.03049  [pdf, other

    eess.AS cs.LG cs.SD

    A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music

    Authors: Hanbin Bae, Jae-Sung Bae, Young-Sun Joo, Young-Ik Kim, Hoon-Young Cho

    Abstract: Recently, it has become easier to obtain speech data from various media such as the internet or YouTube, but directly utilizing them to train a neural text-to-speech (TTS) model is difficult. The proportion of clean speech is insufficient and the remainder includes background music. Even with the global style token (GST). Therefore, we propose the following method to successfully train an end-to-e… ▽ More

    Submitted 4 March, 2021; originally announced March 2021.

    Comments: Accepted at ICASSP 2021

  42. arXiv:2010.05646  [pdf, other

    cs.SD cs.LG eess.AS

    HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

    Authors: Jungil Kong, Jaehyeon Kim, Jaekyoung Bae

    Abstract: Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive and flow-based generative models. In this work, we propose HiFi-GAN, which achieves both efficient and high-fidelity speech synthesis. As speech… ▽ More

    Submitted 23 October, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020. Code available at https://github.com/jik876/hifi-gan

  43. arXiv:2007.15281  [pdf, other

    eess.AS cs.SD

    Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning

    Authors: Jae-Sung Bae, Hanbin Bae, Young-Sun Joo, Junmo Lee, Gyeong-Hoon Lee, Hoon-Young Cho

    Abstract: This paper proposes a controllable end-to-end text-to-speech (TTS) system to control the speaking speed (speed-controllable TTS; SCTTS) of synthesized speech with sentence-level speaking-rate value as an additional input. The speaking-rate value, the ratio of the number of input phonemes to the length of input speech, is adopted in the proposed system to control the speaking speed. Furthermore, th… ▽ More

    Submitted 13 August, 2020; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: Accepted to INTERSPEECH 2020

  44. arXiv:2007.08028  [pdf

    q-bio.QM cs.CV cs.LG eess.IV

    Predicting Clinical Outcomes in COVID-19 using Radiomics and Deep Learning on Chest Radiographs: A Multi-Institutional Study

    Authors: Joseph Bae, Saarthak Kapse, Gagandeep Singh, Rishabh Gattu, Syed Ali, Neal Shah, Colin Marshall, Jonathan Pierce, Tej Phatak, Amit Gupta, Jeremy Green, Nikhil Madan, Prateek Prasanna

    Abstract: We predict mechanical ventilation requirement and mortality using computational modeling of chest radiographs (CXRs) for coronavirus disease 2019 (COVID-19) patients. This two-center, retrospective study analyzed 530 deidentified CXRs from 515 COVID-19 patients treated at Stony Brook University Hospital and Newark Beth Israel Medical Center between March and August 2020. DL and machine learning cl… ▽ More

    Submitted 1 July, 2021; v1 submitted 15 July, 2020; originally announced July 2020.

    Comments: Joseph Bae and Saarthak Kapse have contributed equally to this work

    ACM Class: J.3; I.2.6

  45. arXiv:2006.16990  [pdf, other

    cs.CV eess.IV

    PriorGAN: Real Data Prior for Generative Adversarial Nets

    Authors: Shuyang Gu, Jianmin Bao, Dong Chen, Fang Wen

    Abstract: Generative adversarial networks (GANs) have achieved rapid progress in learning rich data distributions. However, we argue about two main issues in existing techniques. First, the low quality problem where the learned distribution has massive low quality samples. Second, the missing modes problem where the learned distribution misses some certain regions of the real data distribution. To address t… ▽ More

    Submitted 30 June, 2020; originally announced June 2020.

  46. arXiv:2006.13360  [pdf, other

    cs.RO eess.SY

    Evaluation of Sampling Methods for Robotic Sediment Sampling Systems

    Authors: Jun Han Bae, Wonse Jo, Jee Hwan Park, Richard M. Voyles, Sara K. McMillan, Byung-Cheol Min

    Abstract: Analysis of sediments from rivers, lakes, reservoirs, wetlands and other constructed surface water impoundments is an important tool to characterize the function and health of these systems, but is generally carried out manually. This is costly and can be hazardous and difficult for humans due to inaccessibility, contamination, or availability of required equipment. Robotic sampling systems can ea… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

  47. arXiv:2003.08932  [pdf, other

    eess.IV cs.CV

    GIQA: Generated Image Quality Assessment

    Authors: Shuyang Gu, Jianmin Bao, Dong Chen, Fang Wen

    Abstract: Generative adversarial networks (GANs) have achieved impressive results today, but not all generated images are perfect. A number of quantitative criteria have recently emerged for generative model, but none of them are designed for a single generated image. In this paper, we propose a new research topic, Generated Image Quality Assessment (GIQA), which quantitatively evaluates the quality of each… ▽ More

    Submitted 14 July, 2020; v1 submitted 19 March, 2020; originally announced March 2020.

    Comments: ECCV2020

  48. arXiv:2001.09772  [pdf

    eess.AS

    Phase-Aware Speech Enhancement with a Recurrent Two Stage Net work

    Authors: Juntae Kim, Jaesung Bae

    Abstract: We propose a neural network-based speech enhancement (SE) method called the phase-aware recurrent two stage network (rTSN). The rTSN is an extension of our previously proposed two stage network (TSN) framework. This TSN framework was equipped with a boosting strategy (BS) that initially estimates the multiple base predictions (MBPs) from a prior neural network (pri-NN) and then the MBPs are aggreg… ▽ More

    Submitted 27 January, 2020; originally announced January 2020.

  49. arXiv:1912.10442  [pdf

    eess.AS cs.SD

    End-Point Detection with State Transition Model based on Chunk-Wise Classification

    Authors: Juntae Kim, Jaesung Bae, Minsoo Hahn

    Abstract: A state transition model (STM) based on chunk-wise classification was proposed for end-point detection (EPD). In general, EPD is developed using frame-wise voice activity detection (VAD) with additional STM, in which the state transition is conducted based on VAD's frame-level decision (speech or non-speech). However, VAD errors frequently occur in noisy environments, even though we use state-of-t… ▽ More

    Submitted 22 December, 2019; originally announced December 2019.

  50. arXiv:1806.09250  [pdf

    physics.ins-det eess.SP

    Electronics of Time-of-flight Measurement for Back-n at CSNS

    Authors: T. Yu, P. Cao, X. Y. Ji, L. K. Xie, X. R. Huang, Q. An, H. Y. Bai, J. Bao, Y. H. Chen, P. J. Cheng, Z. Q. Cui, R. R. Fan, C. Q. Feng, M. H. Gu, Z. J. Han, G. Z. He, Y. C. He, Y. F. He, H. X. Huang, W. L. Huang, X. L. Ji, H. Y. Jiang, W. Jiang, H. Y. **g, L. Kang , et al. (46 additional authors not shown)

    Abstract: Back-n is a white neutron experimental facility at China Spallation Neutron Source (CSNS). The time structure of the primary proton beam make it fully applicable to use TOF (time-of-flight) method for neutron energy measuring. We implement the electronics of TOF measurement on the general-purpose readout electronics designed for all of the seven detectors in Back-n. The electronics is based on PXI… ▽ More

    Submitted 24 June, 2018; originally announced June 2018.

    Comments: 4 pages, 13 figures, 21st IEEE Real Time Conference