Skip to main content

Showing 1–12 of 12 results for author: Xie, F

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.02940  [pdf, other

    cs.SD eess.AS

    Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder

    Authors: Haohan Guo, Fenglong Xie, Dongchao Yang, Hui Lu, Xixin Wu, Helen Meng

    Abstract: VQ-VAE, as a mainstream approach of speech tokenizer, has been troubled by ``index collapse'', where only a small number of codewords are activated in large codebooks. This work proposes product-quantized (PQ) VAE with more codebooks but fewer codewords to address this problem and build large-codebook speech tokenizers. It encodes speech features into multiple VQ subspaces and composes them into c… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  2. arXiv:2309.00126  [pdf, other

    cs.SD cs.CL eess.AS

    QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning

    Authors: Haohan Guo, Fenglong Xie, Jiawen Kang, Yujia Xiao, Xixin Wu, Helen Meng

    Abstract: This paper proposes a novel semi-supervised TTS framework, QS-TTS, to improve TTS quality with lower supervised data requirements via Vector-Quantized Self-Supervised Speech Representation Learning (VQ-S3RL) utilizing more unlabeled speech audio. This framework comprises two VQ-S3R learners: first, the principal learner aims to provide a generative Multi-Stage Multi-Codebook (MSMC) VQ-S3R via the… ▽ More

    Submitted 31 August, 2023; originally announced September 2023.

  3. arXiv:2303.00460  [pdf, other

    cs.RO eess.SY

    Multi-Arm Robot Task Planning for Fruit Harvesting Using Multi-Agent Reinforcement Learning

    Authors: Tao Li, Feng Xie, Ya Xiong, Qingchun Feng

    Abstract: The emergence of harvesting robotics offers a promising solution to the issue of limited agricultural labor resources and the increasing demand for fruits. Despite notable advancements in the field of harvesting robotics, the utilization of such technology in orchards is still limited. The key challenge is to improve operational efficiency. Taking into account inner-arm conflicts, couplings of DoF… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

  4. arXiv:2210.15131  [pdf, other

    cs.SD cs.CL eess.AS

    Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations

    Authors: Haohan Guo, Fenglong Xie, Xixin Wu, Hui Lu, Helen Meng

    Abstract: This paper aims to enhance low-resource TTS by reducing training data requirements using compact speech representations. A Multi-Stage Multi-Codebook (MSMC) VQ-GAN is trained to learn the representation, MSMCR, and decode it to waveforms. Subsequently, we train the multi-stage predictor to predict MSMCRs from the text for TTS synthesis. Moreover, we optimize the training strategy by leveraging mor… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  5. arXiv:2209.10887  [pdf, other

    cs.SD cs.CL eess.AS

    A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS

    Authors: Haohan Guo, Fenglong Xie, Frank K. Soong, Xixin Wu, Helen Meng

    Abstract: We propose a Multi-Stage, Multi-Codebook (MSMC) approach to high-performance neural TTS synthesis. A vector-quantized, variational autoencoder (VQ-VAE) based feature analyzer is used to encode Mel spectrograms of speech training data by down-sampling progressively in multiple stages into MSMC Representations (MSMCRs) with different time resolutions, and quantizing them with multiple VQ codebooks,… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

  6. arXiv:2206.11458  [pdf, other

    eess.IV cs.CV q-bio.QM

    Weighted Concordance Index Loss-based Multimodal Survival Modeling for Radiation Encephalopathy Assessment in Nasopharyngeal Carcinoma Radiotherapy

    Authors: Jiansheng Fang, Anwei Li, Pu-Yun OuYang, Jiajian Li, **gwen Wang, Hongbo Liu, Fang-Yun Xie, Jiang Liu

    Abstract: Radiation encephalopathy (REP) is the most common complication for nasopharyngeal carcinoma (NPC) radiotherapy. It is highly desirable to assist clinicians in optimizing the NPC radiotherapy regimen to reduce radiotherapy-induced temporal lobe injury (RTLI) according to the probability of REP onset. To the best of our knowledge, it is the first exploration of predicting radiotherapy-induced REP by… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

    Comments: 11 pages, 3 figures, MICCAI2022

  7. arXiv:2109.13673  [pdf, other

    cs.CL cs.SD eess.AS

    Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS

    Authors: Shilun Lin, Wenchao Su, Li Meng, Fenglong Xie, Xinhui Li, Li Lu

    Abstract: This paper presents Nana-HDR, a new non-attentive non-autoregressive model with hybrid Transformer-based Dense-fuse encoder and RNN-based decoder for TTS. It mainly consists of three parts: Firstly, a novel Dense-fuse encoder with dense connections between basic Transformer blocks for coarse feature fusion and a multi-head attention layer for fine feature fusion. Secondly, a single-layer non-autor… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

  8. GNSS Radio Occultation on Aerial Platforms with Commercial Off-The-Shelf Receivers

    Authors: Bryan C. Chan, Ashish Goel, Jonathan Kosh, Tyler G. R. Reid, Corey R. Snyder, Paul M. Tarantino, Saraswati Soedarmadji, Widyadewi Soedarmadji, Kevin Nelson, Feiqin Xie, Michael Vergalla

    Abstract: In recent decades, GNSS Radio Occultation soundings have proven an invaluable input to global weather forecasting. The success of government-sponsored programs such as COSMIC is now complemented by commercial low-cost cubesat implementations. The result is access to more than 10,000 soundings per day and improved weather forecasting accuracy. This movement towards commercialization has been suppor… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

    Comments: 16 pages, 25 figures, ION GNSS+ 2021

    Journal ref: Proceedings of the 34th International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2021)

  9. arXiv:2102.00247  [pdf, other

    cs.CL eess.AS

    Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet

    Authors: Shilun Lin, Fenglong Xie, Li Meng, Xinhui Li, Li Lu

    Abstract: In this work, a robust and efficient text-to-speech (TTS) synthesis system named Triple M is proposed for large-scale online application. The key components of Triple M are: 1) A sequence-to-sequence model adopts a novel multi-guidance attention to transfer complementary advantages from guiding attention mechanisms to the basic attention mechanism without in-domain performance loss and online serv… ▽ More

    Submitted 7 April, 2021; v1 submitted 30 January, 2021; originally announced February 2021.

  10. arXiv:2011.11952  [pdf, other

    eess.IV cs.CV

    Alleviating Class-wise Gradient Imbalance for Pulmonary Airway Segmentation

    Authors: Hao Zheng, Yulei Qin, Yun Gu, Fangfang Xie, Jie Yang, Jiayuan Sun, Guang-zhong Yang

    Abstract: Automated airway segmentation is a prerequisite for pre-operative diagnosis and intra-operative navigation for pulmonary intervention. Due to the small size and scattered spatial distribution of peripheral bronchi, this is hampered by severe class imbalance between foreground and background regions, which makes it challenging for CNN-based methods to parse distal small airways. In this paper, we d… ▽ More

    Submitted 29 April, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

  11. Diesel Generator Model Parameterization for Microgrid Simulation Using Hybrid Box-Constrained Levenberg-Marquardt Algorithm

    Authors: Qian Long, Hui Yu, Fuhong Xie, Ning Lu, David Lubkeman

    Abstract: Existing generator parameterization methods, typically developed for large turbine generator units, are difficult to apply to small kW-level diesel generators in microgrid applications. This paper presents a model parameterization method that estimates a complete set of kW-level diesel generator parameters simultaneously using only load-step-change tests with limited measurement points. This metho… ▽ More

    Submitted 25 September, 2020; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: 9 pages, 9 figures, accepted by IEEE Transactions on Smart Grids

  12. arXiv:2002.07257  [pdf

    eess.SY

    An Networked HIL Simulation System for Modeling Large-scale Power Systems

    Authors: Fuhong Xie, Catie McEntee, Mingzhi Zhang, Ning Lu, Xinda Ke, Mallikarjuna R. Vallem, Nader Samaan

    Abstract: This paper presents a network hardware-in-the-loop (HIL) simulation system for modeling large-scale power systems. Researchers have developed many HIL test systems for power systems in recent years. Those test systems can model both microsecond-level dynamic responses of power electronic systems and millisecond-level transients of transmission and distribution grids. By integrating individual HIL… ▽ More

    Submitted 17 February, 2020; originally announced February 2020.