Skip to main content

Showing 1–15 of 15 results for author: Ning, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.07846  [pdf, other

    eess.AS

    DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion

    Authors: Ziqian Ning, Shuai Wang, Pengcheng Zhu, Zhichao Wang, Jixun Yao, Lei Xie, Mengxiao Bi

    Abstract: Streaming voice conversion has become increasingly popular for its potential in real-time applications. The recently proposed DualVC 2 has achieved robust and high-quality streaming voice conversion with a latency of about 180ms. Nonetheless, the recognition-synthesis framework hinders end-to-end optimization, and the instability of automatic speech recognition (ASR) model with short chunks makes… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  2. arXiv:2405.10786  [pdf, other

    eess.AS

    Distinctive and Natural Speaker Anonymization via Singular Value Transformation-assisted Matrix

    Authors: Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie

    Abstract: Speaker anonymization is an effective privacy protection solution that aims to conceal the speaker's identity while preserving the naturalness and distinctiveness of the original speech. Mainstream approaches use an utterance-level vector from a pre-trained automatic speaker verification (ASV) model to represent speaker identity, which is then averaged or modified for anonymization. However, these… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

  3. A Bearing-Angle Approach for Unknown Target Motion Analysis Based on Visual Measurements

    Authors: Zian Ning, Yin Zhang, Jianan Li, Zhang Chen, Shiyu Zhao

    Abstract: Vision-based estimation of the motion of a moving target is usually formulated as a bearing-only estimation problem where the visual measurement is modeled as a bearing vector. Although the bearing-only approach has been studied for decades, a fundamental limitation of this approach is that it requires extra lateral motion of the observer to enhance the target's observability. Unfortunately, the e… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Journal ref: The International Journal of Robotics Research (2024) 1-20

  4. arXiv:2312.16850  [pdf, other

    cs.SD eess.AS

    Accent-VITS:accent transfer for end-to-end TTS

    Authors: Linhan Ma, Yongmao Zhang, Xinfa Zhu, Yi Lei, Ziqian Ning, Pengcheng Zhu, Lei Xie

    Abstract: Accent transfer aims to transfer an accent from a source speaker to synthetic speech in the target speaker's voice. The main challenge is how to effectively disentangle speaker timbre and accent which are entangled in speech. This paper presents a VITS-based end-to-end accent transfer model named Accent-VITS.Based on the main structure of VITS, Accent-VITS makes substantial improvements to enable… ▽ More

    Submitted 29 December, 2023; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted by NCMMSC2023

  5. arXiv:2312.04786  [pdf, other

    cs.IT cs.LG eess.SP

    Joint User Association, Interference Cancellation and Power Control for Multi-IRS Assisted UAV Communications

    Authors: Zhaolong Ning, Hao Hu, Xiaojie Wang, Qingqing Wu, Chau Yuen, F. Richard Yu, Yan Zhang

    Abstract: Intelligent reflecting surface (IRS)-assisted unmanned aerial vehicle (UAV) communications are expected to alleviate the load of ground base stations in a cost-effective way. Existing studies mainly focus on the deployment and resource allocation of a single IRS instead of multiple IRSs, whereas it is extremely challenging for joint multi-IRS multi-user association in UAV communications with const… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  6. arXiv:2310.20242  [pdf, other

    cs.NI eess.SP

    Intelligent-Reflecting-Surface-Assisted UAV Communications for 6G Networks

    Authors: Zhaolong Ning, Tengfeng Li, Yu Wu, Xiaojie Wang, Qingqing Wu, Fei Richard Yu, Song Guo

    Abstract: In 6th-Generation (6G) mobile networks, Intelligent Reflective Surfaces (IRSs) and Unmanned Aerial Vehicles (UAVs) have emerged as promising technologies to address the coverage difficulties and resource constraints faced by terrestrial networks. UAVs, with their mobility and low costs, offer diverse connectivity options for mobile users and a novel deployment paradigm for 6G networks. However, th… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  7. arXiv:2310.02802  [pdf, other

    eess.AS

    VITS-Based Singing Voice Conversion Leveraging Whisper and multi-scale F0 Modeling

    Authors: Ziqian Ning, Yuepeng Jiang, Zhichao Wang, Bin Zhang, Lei Xie

    Abstract: This paper introduces the T23 team's system submitted to the Singing Voice Conversion Challenge 2023. Following the recognition-synthesis framework, our singing conversion model is based on VITS, incorporating four key modules: a prior encoder, a posterior encoder, a decoder, and a parallel bank of transposed convolutions (PBTC) module. We particularly leverage Whisper, a powerful pre-trained ASR… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  8. arXiv:2309.15496  [pdf, other

    eess.AS cs.SD

    DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion

    Authors: Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Shuai Wang, Jixun Yao, Lei Xie, Mengxiao Bi

    Abstract: Voice conversion is becoming increasingly popular, and a growing number of application scenarios require models with streaming inference capabilities. The recently proposed DualVC attempts to achieve this objective through streaming model architecture design and intra-model knowledge distillation along with hybrid predictive coding to compensate for the lack of future information. However, DualVC… ▽ More

    Submitted 18 January, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP2024

  9. arXiv:2309.09262  [pdf, other

    eess.AS cs.SD

    PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts

    Authors: Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, **g**g Yin, Hongbin Zhou, Heng Lu, Lei Xie

    Abstract: Style voice conversion aims to transform the style of source speech to a desired style according to real-world application demands. However, the current style voice conversion approach relies on pre-defined labels or reference speech to control the conversion process, which leads to limitations in style diversity or falls short in terms of the intuitive and interpretability of style representation… ▽ More

    Submitted 26 December, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

  10. arXiv:2308.00947  [pdf, other

    eess.IV cs.CV cs.LG

    Decomposing and Coupling Saliency Map for Lesion Segmentation in Ultrasound Images

    Authors: Zhenyuan Ning, Yixiao Mao, Qian** Feng, Shengzhou Zhong, Yu Zhang

    Abstract: Complex scenario of ultrasound image, in which adjacent tissues (i.e., background) share similar intensity with and even contain richer texture patterns than lesion region (i.e., foreground), brings a unique challenge for accurate lesion segmentation. This work presents a decomposition-coupling network, called DC-Net, to deal with this challenge in a (foreground-background) saliency map disentangl… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: 18 pages, 18 figures

  11. arXiv:2305.12425  [pdf, other

    eess.AS cs.SD

    DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding

    Authors: Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Jixun Yao, Shuai Wang, Lei Xie, Mengxiao Bi

    Abstract: Voice conversion is an increasingly popular technology, and the growing number of real-time applications requires models with streaming conversion capabilities. Unlike typical (non-streaming) voice conversion, which can leverage the entire utterance as full context, streaming voice conversion faces significant challenges due to the missing future information, resulting in degraded intelligibility,… ▽ More

    Submitted 30 May, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

  12. arXiv:2211.04710  [pdf, other

    eess.AS cs.SD

    Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features

    Authors: Ziqian Ning, Qicong Xie, Pengcheng Zhu, Zhichao Wang, Liumeng Xue, Jixun Yao, Lei Xie, Mengxiao Bi

    Abstract: Voice conversion for highly expressive speech is challenging. Current approaches struggle with the balancing between speaker similarity, intelligibility and expressiveness. To address this problem, we propose Expressive-VC, a novel end-to-end voice conversion framework that leverages advantages from both neural bottleneck feature (BNF) approach and information perturbation approach. Specifically,… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  13. arXiv:2211.03036  [pdf, other

    eess.AS cs.SD

    Preserving background sound in noise-robust voice conversion via multi-task learning

    Authors: Jixun Yao, Yi Lei, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie, Hai Li, Junhui Liu, Danming Xie

    Abstract: Background sound is an informative form of art that is helpful in providing a more immersive experience in real-application voice conversion (VC) scenarios. However, prior research about VC, mainly focusing on clean voices, pay rare attention to VC with background sound. The critical problem for preserving background sound in VC is inevitable speech distortion by the neural separation model and th… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  14. arXiv:2205.05675  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

    Authors: Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, **gyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, **shan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang , et al. (86 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Validation code of the baseline model is available at https://github.com/ofsoundof/IMDN. Validation of all submitted models is available at https://github.com/ofsoundof/NTIRE2022_ESR

  15. arXiv:2003.10144  [pdf

    eess.IV cs.CV

    CF2-Net: Coarse-to-Fine Fusion Convolutional Network for Breast Ultrasound Image Segmentation

    Authors: Zhenyuan Ning, Ke Wang, Shengzhou Zhong, Qian** Feng, Yu Zhang

    Abstract: Breast ultrasound (BUS) image segmentation plays a crucial role in a computer-aided diagnosis system, which is regarded as a useful tool to help increase the accuracy of breast cancer diagnosis. Recently, many deep learning methods have been developed for segmentation of BUS image and show some advantages compared with conventional region-, model-, and traditional learning-based methods. However,… ▽ More

    Submitted 23 March, 2020; originally announced March 2020.

    Comments: 8 pages, 6 figures

    MSC Class: 68T07(Primary) 68T45(Secondary) ACM Class: I.4.6