Skip to main content

Showing 1–14 of 14 results for author: Hong, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.02429  [pdf, other

    eess.AS cs.SD

    Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion

    Authors: Ruiqi Li, Rongjie Huang, Yongqi Wang, Zhiqing Hong, Zhou Zhao

    Abstract: Speech-to-singing voice conversion (STS) task always suffers from data scarcity, because it requires paired speech and singing data. Compounding this issue are the challenges of content-pitch alignment and the suboptimal quality of generated outputs, presenting significant hurdles in STS research. This paper presents SVPT, an STS approach boosted by a self-supervised singing voice pre-training mod… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 13 pages

  2. arXiv:2405.09940  [pdf, other

    eess.AS cs.SD

    Robust Singing Voice Transcription Serves Synthesis

    Authors: Ruiqi Li, Yu Zhang, Yongqi Wang, Zhiqing Hong, Rongjie Huang, Zhou Zhao

    Abstract: Note-level Automatic Singing Voice Transcription (AST) converts singing recordings into note sequences, facilitating the automatic annotation of singing datasets for Singing Voice Synthesis (SVS) applications. Current AST methods, however, struggle with accuracy and robustness when used for practical annotation. This paper presents ROSVOT, the first robust AST model that serves SVS, incorporating… ▽ More

    Submitted 3 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: ACL 2024

  3. arXiv:2404.17064  [pdf, other

    eess.IV cs.CV

    Detection of Peri-Pancreatic Edema using Deep Learning and Radiomics Techniques

    Authors: Ziliang Hong, Debesh Jha, Koushik Biswas, Zheyuan Zhang, Yury Velichko, Cemal Yazici, Temel Tirkes, Amir Borhani, Baris Turkbey, Alpay Medetalibeyoglu, Gorkem Durak, Ulas Bagci

    Abstract: Identifying peri-pancreatic edema is a pivotal indicator for identifying disease progression and prognosis, emphasizing the critical need for accurate detection and assessment in pancreatitis diagnosis and management. This study \textit{introduces a novel CT dataset sourced from 255 patients with pancreatic diseases, featuring annotated pancreas segmentation masks and corresponding diagnostic labe… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  4. arXiv:2404.09313  [pdf, other

    eess.AS cs.AI

    Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment

    Authors: Zhiqing Hong, Rongjie Huang, Xize Cheng, Yongqi Wang, Ruiqi Li, Fuming You, Zhou Zhao, Zhimeng Zhang

    Abstract: A song is a combination of singing voice and accompaniment. However, existing works focus on singing voice synthesis and music generation independently. Little attention was paid to explore song synthesis. In this work, we propose a novel task called text-to-song synthesis which incorporating both vocals and accompaniments generation. We develop Melodist, a two-stage text-to-song method that consi… ▽ More

    Submitted 20 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: ACL 2024 Main

  5. arXiv:2403.11780  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt

    Authors: Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, Ruiqi Li, Wenrui Liu, Fuming You, Tao **, Zhou Zhao

    Abstract: Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly. We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language. We adopt a model architecture based on a decoder-only… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by NAACL 2024 (main conference)

  6. arXiv:2309.07566  [pdf, other

    cs.SD cs.AI eess.AS

    Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer

    Authors: Yongqi Wang, Jionghao Bai, Rongjie Huang, Ruiqi Li, Zhiqing Hong, Zhou Zhao

    Abstract: Direct speech-to-speech translation (S2ST) with discrete self-supervised representations has achieved remarkable accuracy, but is unable to preserve the speaker timbre of the source speech during translation. Meanwhile, the scarcity of high-quality speaker-parallel data poses a challenge for learning style transfer between source and target speech. We propose an S2ST framework with an acoustic lan… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: 5 pages, 1 figure. submitted to ICASSP 2024

  7. arXiv:2304.12995  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

    Authors: Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, **glin Liu, Yi Ren, Zhou Zhao, Shinji Watanabe

    Abstract: Large language models (LLMs) have exhibited remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. Despite the recent success, current LLMs are not capable of processing complex audio information or conducting spoken conversations (like Siri or Alexa). In this work, we propose a multi-modal AI system named AudioGPT, which complements… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

  8. arXiv:2303.09248  [pdf, other

    cs.CV eess.IV

    Cross-Dimensional Refined Learning for Real-Time 3D Visual Perception from Monocular Video

    Authors: Ziyang Hong, C. Patrick Yue

    Abstract: We present a novel real-time capable learning method that jointly perceives a 3D scene's geometry structure and semantic labels. Recent approaches to real-time 3D scene reconstruction mostly adopt a volumetric scheme, where a Truncated Signed Distance Function (TSDF) is directly regressed. However, these volumetric approaches tend to focus on the global coherence of their reconstructions, which le… ▽ More

    Submitted 10 September, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Accpeted to ICCV 2023 Workshops. Project page: https://hafred.github.io/cdrnet/

  9. arXiv:2207.14174  [pdf, other

    eess.SP cs.AI

    Bayesian Optimization-Based Beam Alignment for MmWave MIMO Communication Systems

    Authors: Songjie Yang, Baojuan Liu, Zhiqin Hong, Zhongpei Zhang

    Abstract: Due to the very narrow beam used in millimeter wave communication (mmWave), beam alignment (BA) is a critical issue. In this work, we investigate the issue of mmWave BA and present a novel beam alignment scheme on the basis of a machine learning strategy, Bayesian optimization (BO). In this context, we consider the beam alignment issue to be a black box function and then use BO to find the possibl… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

  10. Low-complexity Sparse Array Synthesis Based on Off-grid Compressive Sensing

    Authors: Songjie Yang, Baojuan Liu, Zhiqin Hong, Zhongpei Zhang

    Abstract: A novel sparse array synthesis method for non-uniform planar arrays is proposed, which belongs to compressive sensing (CS)-based systhesis. Particularly, we propose an off-grid refinement technique to simultaneously optimize the antenna element positions and excitations with a low complexity, in response to the antenna position optimization problem that is difficult for standard CS. More important… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

  11. arXiv:2205.13249  [pdf, other

    cs.SD cs.LG eess.AS

    DT-SV: A Transformer-based Time-domain Approach for Speaker Verification

    Authors: Nan Zhang, Jianzong Wang, Zhenhou Hong, Chendong Zhao, Xiaoyang Qu, **g Xiao

    Abstract: Speaker verification (SV) aims to determine whether the speaker's identity of a test utterance is the same as the reference speech. In the past few years, extracting speaker embeddings using deep neural networks for SV systems has gone mainstream. Recently, different attention mechanisms and Transformer networks have been explored widely in SV fields. However, utilizing the original Transformer in… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

    Comments: Accepted by IJCNN2022 (The 2022 International Joint Conference on Neural Networks)

  12. arXiv:2107.05162  [pdf, other

    eess.SP

    Board-level Code-Modulated Embedded Test and Calibration of an X-band Phased-Array Transceiver

    Authors: Zhangjie Hong, Simon Schönherr, Vikas Chauhan, Brian Floyd

    Abstract: We present methods for built-in test and calibration of phased arrays using code-modulated embedded test (CoMET). Our approach employs Cartesian modulation of test signals within each element using existing phase shifters, combining of these signals into an aggregate code-multiplexed response, downconversion and creation of code-modulated element-to-element "interference products" using a built-in… ▽ More

    Submitted 11 July, 2021; originally announced July 2021.

  13. EfficientTDNN: Efficient Architecture Search for Speaker Recognition

    Authors: Rui Wang, Zhihua Wei, Haoran Duan, Shouling Ji, Yang Long, Zhen Hong

    Abstract: Convolutional neural networks (CNNs), such as the time-delay neural network (TDNN), have shown their remarkable capability in learning speaker embedding. However, they meanwhile bring a huge computational cost in storage size, processing, and memory. Discovering the specialized CNN that meets a specific constraint requires a substantial effort of human experts. Compared with hand-designed approach… ▽ More

    Submitted 18 June, 2022; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: 13 pages, 12 figures, accepted to TASLP

  14. arXiv:1802.00285  [pdf, other

    cs.CV cs.RO eess.SY

    Virtual-to-Real: Learning to Control in Visual Semantic Segmentation

    Authors: Zhang-Wei Hong, Chen Yu-Ming, Shih-Yang Su, Tzu-Yun Shann, Yi-Hsiang Chang, Hsuan-Kung Yang, Brian Hsi-Lin Ho, Chih-Chieh Tu, Yueh-Chuan Chang, Tsu-Ching Hsiao, Hsin-Wei Hsiao, Sih-Pin Lai, Chun-Yi Lee

    Abstract: Collecting training data from the physical world is usually time-consuming and even dangerous for fragile robots, and thus, recent advances in robot learning advocate the use of simulators as the training platform. Unfortunately, the reality gap between synthetic and real visual data prohibits direct migration of the models trained in virtual worlds to the real world. This paper proposes a modular… ▽ More

    Submitted 28 October, 2018; v1 submitted 1 February, 2018; originally announced February 2018.

    Comments: 7 pages, accepted by IJCAI-18