Skip to main content

Showing 1–10 of 10 results for author: Zang, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.01530  [pdf, other

    eess.IV cs.CV

    xLSTM-UNet can be an Effective 2D \& 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart

    Authors: Tianrun Chen, Chaotao Ding, Lanyun Zhu, Tao Xu, Deyi Ji, Ying Zang, Zejian Li

    Abstract: Convolutional Neural Networks (CNNs) and Vision Transformers (ViT) have been pivotal in biomedical image segmentation, yet their ability to manage long-range dependencies remains constrained by inherent locality and computational overhead. To overcome these challenges, in this technical report, we first propose xLSTM-UNet, a UNet structured deep learning neural network that leverages Vision-LSTM (… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.02438  [pdf, other

    eess.AS cs.MM cs.SD

    CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

    Authors: Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, **g Guo, Tomoki Toda, Zhiyao Duan

    Abstract: Recent singing voice synthesis and conversion advancements necessitate robust singing voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to limited controllability, diversity in deepfake methods, and licensing restrictions. Addressing these gaps, we introduce CtrSVDD, a large-scale, diverse collection of bonafide and deepfake singing vocals. These vocals are synthesi… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  3. arXiv:2405.13428  [pdf, other

    cs.SD eess.AS

    Ambisonizer: Neural Upmixing as Spherical Harmonics Generation

    Authors: Yongyi Zang, Yifan Wang, Minglun Lee

    Abstract: Neural upmixing, the task of generating immersive music with an increased number of channels from fewer input channels, has been an active research area, with mono-to-stereo and stereo-to-surround upmixing treated as separate problems. In this paper, we propose a unified approach to neural upmixing by formulating it as spherical harmonics - more specifically, Ambisonic generation. We explicitly fo… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Under review

  4. arXiv:2405.05244  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan

    Authors: You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Tomoki Toda, Zhiyao Duan

    Abstract: The rapid advancement of AI-generated singing voices, which now closely mimic natural human singing and align seamlessly with musical scores, has led to heightened concerns for artists and the music industry. Unlike spoken voice, singing voice presents unique challenges due to its musical nature and the presence of strong background music, making singing voice deepfake detection (SVDD) a specializ… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Evaluation plan of the SVDD Challenge @ SLT 2024

  5. arXiv:2309.09085  [pdf, other

    cs.SD cs.IR cs.MM eess.AS eess.SP

    SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription

    Authors: Yongyi Zang, Yi Zhong, Frank Cwitkowitz, Zhiyao Duan

    Abstract: Guitar tablature is a form of music notation widely used among guitarists. It captures not only the musical content of a piece, but also its implementation and ornamentation on the instrument. Guitar Tablature Transcription (GTT) is an important task with broad applications in music education, composition, and entertainment. Existing GTT datasets are quite limited in size and scope, rendering mode… ▽ More

    Submitted 24 January, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024

  6. arXiv:2309.07525  [pdf, other

    cs.SD cs.AI eess.AS

    SingFake: Singing Voice Deepfake Detection

    Authors: Yongyi Zang, You Zhang, Mojtaba Heydari, Zhiyao Duan

    Abstract: The rise of singing voice synthesis presents critical challenges to artists and industry stakeholders over unauthorized voice usage. Unlike synthesized speech, synthesized singing voices are typically released in songs containing strong background music that may hide synthesis artifacts. Additionally, singing voices present different acoustic and linguistic characteristics from speech utterances.… ▽ More

    Submitted 21 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted at ICASSP 2024

  7. Phase perturbation improves channel robustness for speech spoofing countermeasures

    Authors: Yongyi Zang, You Zhang, Zhiyao Duan

    Abstract: In this paper, we aim to address the problem of channel robustness in speech countermeasure (CM) systems, which are used to distinguish synthetic speech from human natural speech. On the basis of two hypotheses, we suggest an approach for perturbing phase information during the training of time-domain CM systems. Communication networks often employ lossy compression codec that encodes only magnitu… ▽ More

    Submitted 6 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: 5 pages; Proceedings of Interspeech 2023

  8. Principle-driven Fiber Transmission Model based on PINN Neural Network

    Authors: Yubin Zang, Zhenming Yu, Kun Xu, Xingzeng Lan, Minghua Chen, Sigang Yang, Hongwei Chen

    Abstract: In this paper, a novel principle-driven fiber transmission model based on physical induced neural network (PINN) is proposed. Unlike data-driven models which regard fiber transmission problem as data regression tasks, this model views it as an equation solving problem. Instead of adopting input signals and output signals which are calculated by SSFM algorithm in advance before training, this princ… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

  9. arXiv:2107.07407  [pdf

    eess.SP math.AG

    A Method for Detecting Abnormal Data of Network Nodes Based on Convolutional Neural Network

    Authors: Yihao Zang, Xianhao Shen, Shaohua Niu

    Abstract: Abnormal data detection is an important step to ensure the accuracy and reliability of node data in wireless sensor networks. In this paper, a data classification method based on convolutional neural network is proposed to solve the problem of data anomaly detection in wireless sensor networks. First, Normal data and abnormal data generated after injection fault are normalized and mapped to gray i… ▽ More

    Submitted 28 June, 2021; originally announced July 2021.

  10. arXiv:1909.07788  [pdf

    eess.SP cs.AI physics.optics

    Electro-optical Neural Networks based on Time-stretch Method

    Authors: Yubin Zang, Minghua Chen, Sigang Yang, Hongwei Chen

    Abstract: In this paper, a novel architecture of electro-optical neural networks based on the time-stretch method is proposed and numerically simulated. By stretching time-domain ultrashort pulses, multiplications of large scale weight matrices and vectors can be implemented on light and multiple-layer of feedforward neural network operations can be easily implemented with fiber loops. Via simulation, the p… ▽ More

    Submitted 12 September, 2019; originally announced September 2019.