Skip to main content

Showing 1–50 of 96 results for author: Yi, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.16200  [pdf, other

    cs.LG cs.CR cs.IT eess.SP

    Towards unlocking the mystery of adversarial fragility of neural networks

    Authors: **gchao Gao, Raghu Mudumbai, Xiaodong Wu, Jirong Yi, Catherine Xu, Hui Xie, Weiyu Xu

    Abstract: In this paper, we study the adversarial robustness of deep neural networks for classification tasks. We look at the smallest magnitude of possible additive perturbations that can change the output of a classification algorithm. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural network for classification. In particular, our theoretical results show that neural ne… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 21 pages

  2. arXiv:2406.09664  [pdf, other

    cs.SD eess.AS

    Frequency-mix Knowledge Distillation for Fake Speech Detection

    Authors: Cunhang Fan, Shunbo Dong, Jun Xue, Yujie Chen, Jiangyan Yi, Zhao Lv

    Abstract: In the telephony scenarios, the fake speech detection (FSD) task to combat speech spoofing attacks is challenging. Data augmentation (DA) methods are considered effective means to address the FSD task in telephony scenarios, typically divided into time domain and frequency domain stages. While each has its advantages, both can result in information loss. To tackle this issue, we propose a novel DA… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  3. arXiv:2406.06086  [pdf, other

    cs.SD eess.AS

    RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection

    Authors: Yujie Chen, Jiangyan Yi, Jun Xue, Chenglong Wang, Xiaohui Zhang, Shunbo Dong, Siding Zeng, Jianhua Tao, Lv Zhao, Cunhang Fan

    Abstract: Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepf… ▽ More

    Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  4. arXiv:2406.04840  [pdf, other

    cs.SD eess.AS

    TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking

    Authors: Junzuo Zhou, Jiangyan Yi, Tao Wang, Jianhua Tao, Ye Bai, Chu Yuan Zhang, Yong Ren, Zhengqi Wen

    Abstract: Various threats posed by the progress in text-to-speech (TTS) have prompted the need to reliably trace synthesized speech. However, contemporary approaches to this task involve adding watermarks to the audio separately after generation, a process that hurts both speech quality and watermark imperceptibility. In addition, these approaches are limited in robustness and flexibility. To address these… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: acceped by interspeech 2024

  5. arXiv:2405.08596   

    cs.SD eess.AS

    EVDA: Evolving Deepfake Audio Detection Continual Learning Benchmark

    Authors: Xiaohui Zhang, Jiangyan Yi, Jianhua Tao

    Abstract: The rise of advanced large language models such as GPT-4, GPT-4o, and the Claude family has made fake audio detection increasingly challenging. Traditional fine-tuning methods struggle to keep pace with the evolving landscape of synthetic speech, necessitating continual learning approaches that can adapt to new audio while retaining the ability to detect older types. Continual learning, which acts… ▽ More

    Submitted 15 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: This paper need more modification

  6. arXiv:2404.16346  [pdf, other

    eess.IV cs.AI cs.CV

    Light-weight Retinal Layer Segmentation with Global Reasoning

    Authors: Xiang He, Weiye Song, Yiming Wang, Fabio Poiesi, Ji Yi, Manishi Desai, Quanqing Xu, Kongzheng Yang, Yi Wan

    Abstract: Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases. However, it is challenging to achieve accurate segmentation due to low contrast and blood flow noises presented in the images. In addition, the algorithm should be light-weight to be deployed for practical clinical applications… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: IEEE Transactions on Instrumentation & Measurement

  7. arXiv:2403.06072   

    cs.IT eess.SP

    Channel Estimation Considerate Precoder Design for Multi-user Massive MIMO-OFDM Systems: The Concept and Fast Algorithms

    Authors: Liu Junkai, Jiang Yi

    Abstract: The sixth-generation (6G) communication networks target peak data rates exceeding 1Tbps, necessitating base stations (BS) to support up to 100 simultaneous data streams. However, sparse pilot allocation to accommodate such streams poses challenges for users' channel estimation. This paper presents Channel Estimation Considerate Precoding (CECP), where BS precoders prioritize facilitating channel e… ▽ More

    Submitted 7 April, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: The work is supported by HUAWEI cooperation, which is related to the current HUAWEI project. HUAWEI cooperation requires to withdraw the paper

  8. Complete and Near-Optimal Robotic Crack Coverage and Filling in Civil Infrastructure

    Authors: Vishnu Veeraraghavan, Kyle Hunte, **gang Yi, Kaiyan Yu

    Abstract: We present a simultaneous sensor-based inspection and footprint coverage (SIFC) planning and control design with applications to autonomous robotic crack map** and filling. The main challenge of the SIFC problem lies in the coupling of complete sensing (for map**) and robotic footprint (for filling) coverage tasks. Initially, we assume known target information (e.g., crack) and employ classic… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Journal ref: in IEEE Transactions on Robotics, vol. 40, pp. 2850-2867, 2024

  9. arXiv:2402.10055  [pdf

    eess.IV cs.AI cs.CV

    Robust semi-automatic vessel tracing in the human retinal image by an instance segmentation neural network

    Authors: Siyi Chen, Amir H. Kashani, Ji Yi

    Abstract: The morphology and hierarchy of the vascular systems are essential for perfusion in supporting metabolism. In human retina, one of the most energy-demanding organs, retinal circulation nourishes the entire inner retina by an intricate vasculature emerging and remerging at the optic nerve head (ONH). Thus, tracing the vascular branching from ONH through the vascular tree can illustrate vascular hie… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  10. arXiv:2401.03650  [pdf, other

    eess.AS cs.SD eess.SP

    DDD: A Perceptually Superior Low-Response-Time DNN-based Declipper

    Authors: Jayeon Yi, Junghyun Koo, Kyogu Lee

    Abstract: Clip** is a common nonlinear distortion that occurs whenever the input or output of an audio system exceeds the supported range. This phenomenon undermines not only the perception of speech quality but also downstream processes utilizing the disrupted signal. Therefore, a real-time-capable, robust, and low-response-time method for speech declip** (SD) is desired. In this work, we introduce DDD… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: To appear, ICASSP 2024. Demo samples at https://stet-stet.github.io/DDD, repo at https://github.com/stet-stet/DDD

  11. arXiv:2401.03488  [pdf, other

    cs.LG cs.CR eess.SP

    Data-Driven Subsampling in the Presence of an Adversarial Actor

    Authors: Abu Shafin Mohammad Mahdee Jameel, Ahmed P. Mohamed, **ho Yi, Aly El Gamal, Akshay Malhotra

    Abstract: Deep learning based automatic modulation classification (AMC) has received significant attention owing to its potential applications in both military and civilian use cases. Recently, data-driven subsampling techniques have been utilized to overcome the challenges associated with computational complexity and training time for AMC. Beyond these direct advantages of data-driven subsampling, these me… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: Accepted for publication at ICMLCN 2024

  12. arXiv:2312.10155  [pdf, ps, other

    cs.RO eess.SY

    Gaussian Process-Based Learning Control of Underactuated Balance Robots with an External and Internal Convertible Modeling Structure

    Authors: Feng Han, **gang Yi

    Abstract: External and internal convertible (EIC) form-based motion control is one of the effective designs of simultaneously trajectory tracking and balance for underactuated balance robots. Under certain conditions, the EIC-based control design however leads to uncontrolled robot motion. We present a Gaussian process (GP)-based data-driven learning control for underactuated balance robots with the EIC mod… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  13. arXiv:2312.09651  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection

    Authors: Xiaohui Zhang, Jiangyan Yi, Chenglong Wang, Chuyuan Zhang, Siding Zeng, Jianhua Tao

    Abstract: The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms. Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types. To address this challenge, one of the… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by the main track The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

  14. arXiv:2311.13687  [pdf, other

    cs.LG cs.MM cs.SD eess.AS

    Beat-Aligned Spectrogram-to-Sequence Generation of Rhythm-Game Charts

    Authors: Jayeon Yi, Sungho Lee, Kyogu Lee

    Abstract: In the heart of "rhythm games" - games where players must perform actions in sync with a piece of music - are "charts", the directives to be given to players. We newly formulate chart generation as a sequence generation task and train a Transformer using a large dataset. We also introduce tempo-informed preprocessing and training procedures, some of which are suggested to be integral for a success… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: ISMIR 2023 LBD. Demo videos and code at stet-stet.github.io/goct

  15. arXiv:2311.07613  [pdf

    eess.SY cs.LG math.DS

    A Physics-informed Machine Learning-based Control Method for Nonlinear Dynamic Systems with Highly Noisy Measurements

    Authors: Mason Ma, Jiajie Wu, Chase Post, Tony Shi, **gang Yi, Tony Schmitz, Hong Wang

    Abstract: This study presents a physics-informed machine learning-based control method for nonlinear dynamic systems with highly noisy measurements. Existing data-driven control methods that use machine learning for system identification cannot effectively cope with highly noisy measurements, resulting in unstable control performance. To address this challenge, the present study extends current physics-info… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

  16. arXiv:2310.09999  [pdf, other

    stat.ML cs.LG eess.SP

    Outlier Detection Using Generative Models with Theoretical Performance Guarantees

    Authors: Jirong Yi, **gchao Gao, Tianming Wang, Xiaodong Wu, Weiyu Xu

    Abstract: This paper considers the problem of recovering signals modeled by generative models from linear measurements contaminated with sparse outliers. We propose an outlier detection approach for reconstructing the ground-truth signals modeled by generative models under sparse outliers. We establish theoretical recovery guarantees for reconstruction of signals using generative models in the presence of o… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:1810.11335

  17. Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection

    Authors: Cunhang Fan, Mingming Ding, Jianhua Tao, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Zhao Lv

    Abstract: Most research in synthetic speech detection (SSD) focuses on improving performance on standard noise-free datasets. However, in actual situations, noise interference is usually present, causing significant performance degradation in SSD systems. To improve noise robustness, this paper proposes a dual-branch knowledge distillation synthetic speech detection (DKDSSD) method. Specifically, a parallel… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

  18. arXiv:2310.04010  [pdf, other

    cs.CV cs.AI eess.IV

    Excision And Recovery: Visual Defect Obfuscation Based Self-Supervised Anomaly Detection Strategy

    Authors: YeongHyeon Park, Sungho Kang, Myung ** Kim, Yeonho Lee, Hyeong Seok Kim, Juneho Yi

    Abstract: Due to scarcity of anomaly situations in the early manufacturing stage, an unsupervised anomaly detection (UAD) approach is widely adopted which only uses normal samples for training. This approach is based on the assumption that the trained UAD model will accurately reconstruct normal patterns but struggles with unseen anomalous patterns. To enhance the UAD performance, reconstruction-by-inpainti… ▽ More

    Submitted 9 November, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: 10 pages, 5 figures, 5 tables

  19. arXiv:2310.00014  [pdf, other

    cs.SD eess.AS

    Fewer-token Neural Speech Codec with Time-invariant Codes

    Authors: Yong Ren, Tao Wang, Jiangyan Yi, Le Xu, Jianhua Tao, Chuyuan Zhang, Junzuo Zhou

    Abstract: Language model based text-to-speech (TTS) models, like VALL-E, have gained attention for their outstanding in-context learning capability in zero-shot scenarios. Neural speech codec is a critical component of these models, which can convert speech into discrete token representations. However, excessive token sequences from the codec may negatively affect prediction accuracy and restrict the progre… ▽ More

    Submitted 10 March, 2024; v1 submitted 15 September, 2023; originally announced October 2023.

    Comments: Accepted by ICASSP 2024

  20. arXiv:2309.16720  [pdf, ps, other

    cs.RO eess.SY

    Energy Efficient Foot-Shape Design for Bipedal Walkers on Granular Terrain

    Authors: Xunjie Chen, **gang Yi, Hao Wang

    Abstract: It is important to understand how bipedal walkers balance and walk effectively on granular materials, such as sand and loose dirt, etc. This paper first presents a computational approach to obtain the motion and energy analysis of bipedal walkers on granular terrains and then discusses an optimization method for the robot foot-shape contour design for energy efficiently walking. We first present t… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: The 3rd Modeling, Estimation and Control Conference (MECC 2023), Lake Tahoe, NV, Oct 2-5 2023

  21. arXiv:2309.15784  [pdf, other

    cs.RO eess.SY

    Gaussian Process-Enhanced, External and Internal Convertible (EIC) Form-Based Control of Underactuated Balance Robots

    Authors: Feng Han, **gang Yi

    Abstract: External and internal convertible (EIC) form-based motion control (i.e., EIC-based control) is one of the effective approaches for underactuated balance robots. By sequentially controller design, trajectory tracking of the actuated subsystem and balance of the unactuated subsystem can be achieved simultaneously. However, with certain conditions, there exists uncontrolled robot motion under the EIC… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  22. arXiv:2309.08166  [pdf, other

    cs.SD eess.AS

    Controllable Residual Speaker Representation for Voice Conversion

    Authors: Le Xu, Jiangyan Yi, Jianhua Tao, Tao Wang, Yong Ren, Rongxiu Zhong

    Abstract: Recently, there have been significant advancements in voice conversion, resulting in high-quality performance. However, there are still two critical challenges in this field. Firstly, current voice conversion methods have limited robustness when encountering unseen speakers. Secondly, they also have limited ability to control timbre representation. To address these challenges, this paper presents… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: submitted to ICASSP 2024

  23. arXiv:2309.07147  [pdf, other

    eess.SP cs.HC cs.LG cs.MM cs.SD eess.AS

    DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection

    Authors: Cunhang Fan, Hongyu Zhang, Wei Huang, Jun Xue, Jianhua Tao, Jiangyan Yi, Zhao Lv, Xiaopei Wu

    Abstract: Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Although EEG-based AAD methods have shown promising results in recent years, current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images. This makes it challenging to handle EEG signals, which possess non-Euclidean… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  24. arXiv:2309.06780  [pdf, other

    cs.SD eess.AS

    Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms

    Authors: Chu Yuan Zhang, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Xinrui Yan

    Abstract: Recent strides in neural speech synthesis technologies, while enjoying widespread applications, have nonetheless introduced a series of challenges, spurring interest in the defence against the threat of misuse and abuse. Notably, source attribution of synthesized speech has value in forensics and intellectual property protection, but prior work in this area has certain limitations in scope. To add… ▽ More

    Submitted 15 June, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted by CCL 2024

  25. arXiv:2308.14970  [pdf, other

    cs.SD eess.AS

    Audio Deepfake Detection: A Survey

    Authors: Jiangyan Yi, Chenglong Wang, Jianhua Tao, Xiaohui Zhang, Chu Yuan Zhang, Yan Zhao

    Abstract: Audio deepfake detection is an emerging active topic. A growing number of literatures have aimed to study deepfake detection algorithms and achieved effective performance, the problem of which is far from being solved. Although there are some review literatures, there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluati… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  26. arXiv:2308.14595  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Neural Network Training Strategy to Enhance Anomaly Detection Performance: A Perspective on Reconstruction Loss Amplification

    Authors: YeongHyeon Park, Sungho Kang, Myung ** Kim, Hyeonho Jeong, Hyunkyu Park, Hyeong Seok Kim, Juneho Yi

    Abstract: Unsupervised anomaly detection (UAD) is a widely adopted approach in industry due to rare anomaly occurrences and data imbalance. A desirable characteristic of an UAD model is contained generalization ability which excels in the reconstruction of seen normal patterns but struggles with unseen anomalies. Recent studies have pursued to contain the generalization capability of their UAD models in rec… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: 5 pages, 4 figures, 2 tables

  27. arXiv:2308.09944  [pdf, other

    cs.SD eess.AS

    Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection

    Authors: Cunhang Fan, Jun Xue, Jianhua Tao, Jiangyan Yi, Chenglong Wang, Chengshi Zheng, Zhao Lv

    Abstract: The rhythm of synthetic speech is usually too smooth, which causes that the fundamental frequency (F0) of synthetic speech is significantly different from that of real speech. It is expected that the F0 feature contains the discriminative information for the fake speech detection (FSD) task. In this paper, we propose a novel F0 subband for FSD. In addition, to effectively model the F0 subband so a… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

  28. arXiv:2308.03300  [pdf, other

    cs.SD cs.LG eess.AS

    Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection

    Authors: Xiaohui Zhang, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Chuyuan Zhang

    Abstract: Current fake audio detection algorithms have achieved promising performances on most datasets. However, their performance may be significantly degraded when dealing with audio of a different dataset. The orthogonal weight modification to overcome catastrophic forgetting does not consider the similarity of genuine audio across different datasets. To overcome this limitation, we propose a continual… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 40th Internation Conference on Machine Learning (ICML 2023)

  29. arXiv:2307.08323  [pdf, other

    cs.SD eess.AS

    TST: Time-Sparse Transducer for Automatic Speech Recognition

    Authors: Xiaohui Zhang, Mangui Liang, Zhengkun Tian, Jiangyan Yi, Jianhua Tao

    Abstract: End-to-end model, especially Recurrent Neural Network Transducer (RNN-T), has achieved great success in speech recognition. However, transducer requires a great memory footprint and computing time when processing a long decoding sequence. To solve this problem, we propose a model named time-sparse transducer, which introduces a time-sparse mechanism into transducer. In this mechanism, we obtain th… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 10 pages

    Journal ref: International Conference on Artificial Intelligence (CICAI 2023)

  30. arXiv:2306.05617  [pdf, other

    cs.SD cs.CL eess.AS

    Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection

    Authors: Chenglong Wang, Jiangyan Yi, Xiaohui Zhang, Jianhua Tao, Le Xu, Ruibo Fu

    Abstract: Self-supervised speech models are a rapidly develo** research topic in fake audio detection. Many pre-trained models can serve as feature extractors, learning richer and higher-level speech features. However,when fine-tuning pre-trained models, there is often a challenge of excessively long training times and high memory consumption, and complete fine-tuning is also very expensive. To alleviate… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: 6pages

    Journal ref: IJCAI 2023 Workshop on Deepfake Audio Detection and Analysis

  31. arXiv:2306.04956  [pdf, other

    cs.SD cs.LG eess.AS

    Adaptive Fake Audio Detection with Low-Rank Model Squeezing

    Authors: Xiaohui Zhang, Jiangyan Yi, Jianhua Tao, Chenlong Wang, Le Xu, Ruibo Fu

    Abstract: The rapid advancement of spoofing algorithms necessitates the development of robust detection methods capable of accurately identifying emerging fake audio. Traditional approaches, such as finetuning on new datasets containing these novel spoofing algorithms, are computationally intensive and pose a risk of impairing the acquired knowledge of known fake audio types. To address these challenges, th… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Journal ref: DADA workshop on IJCAI 2023

  32. arXiv:2305.13774  [pdf, other

    cs.SD eess.AS

    ADD 2023: the Second Audio Deepfake Detection Challenge

    Authors: Jiangyan Yi, Jianhua Tao, Ruibo Fu, Xinrui Yan, Chenglong Wang, Tao Wang, Chu Yuan Zhang, Xiaohui Zhang, Yan Zhao, Yong Ren, Le Xu, Junzuo Zhou, Hao Gu, Zhengqi Wen, Shan Liang, Zheng Lian, Shuai Nie, Haizhou Li

    Abstract: Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on s… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  33. arXiv:2305.13701  [pdf, other

    cs.SD eess.AS

    TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection

    Authors: Chenglong Wang, Jiangyan Yi, Jianhua Tao, Chuyuan Zhang, Shuai Zhang, Ruibo Fu, Xun Chen

    Abstract: Current fake audio detection relies on hand-crafted features, which lose information during extraction. To overcome this, recent studies use direct feature extraction from raw audio signals. For example, RawNet is one of the representative works in end-to-end fake audio detection. However, existing work on RawNet does not optimize the parameters of the Sinc-conv during training, which limited its… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Interspeech2023

  34. arXiv:2305.13700  [pdf, other

    cs.SD eess.AS

    Detection of Cross-Dataset Fake Audio Based on Prosodic and Pronunciation Features

    Authors: Chenglong Wang, Jiangyan Yi, Jianhua Tao, Chuyuan Zhang, Shuai Zhang, Xun Chen

    Abstract: Existing fake audio detection systems perform well in in-domain testing, but still face many challenges in out-of-domain testing. This is due to the mismatch between the training and test data, as well as the poor generalizability of features extracted from limited views. To address this, we propose multi-view features for fake audio detection, which aim to capture more generalized features from p… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Interspeech2023

  35. arXiv:2303.01211  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Learning From Yourself: A Self-Distillation Method for Fake Speech Detection

    Authors: Jun Xue, Cunhang Fan, Jiangyan Yi, Chenglong Wang, Zhengqi Wen, Dan Zhang, Zhao Lv

    Abstract: In this paper, we propose a novel self-distillation method for fake speech detection (FSD), which can significantly improve the performance of FSD without increasing the model complexity. For FSD, some fine-grained information is very important, such as spectrogram defects, mute segments, and so on, which are often perceived by shallow networks. However, shallow networks have much noise, which can… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  36. arXiv:2301.03801  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion

    Authors: Haogeng Liu, Tao Wang, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Jianhua Tao

    Abstract: Text-to-speech (TTS) and voice conversion (VC) are two different tasks both aiming at generating high quality speaking voice according to different input modality. Due to their similarity, this paper proposes UnifySpeech, which brings TTS and VC into a unified framework for the first time. The model is based on the assumption that speech can be decoupled into three independent components: content… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

  37. arXiv:2212.10191  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Emotion Selectable End-to-End Text-based Speech Editing

    Authors: Tao Wang, Jiangyan Yi, Ruibo Fu, Jianhua Tao, Zhengqi Wen, Chu Yuan Zhang

    Abstract: Text-based speech editing allows users to edit speech by intuitively cutting, copying, and pasting text to speed up the process of editing speech. In the previous work, CampNet (context-aware mask prediction network) is proposed to realize text-based speech editing, significantly improving the quality of edited speech. This paper aims at a new task: adding emotional effect to the editing speech du… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: Under review, 12 pages, 11 figures, demo page is available at https://hairuo55.github.io/Emo-CampNet/

  38. arXiv:2211.06073  [pdf, other

    cs.SD cs.CL eess.AS

    SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection

    Authors: Jiangyan Yi, Chenglong Wang, Jianhua Tao, Chu Yuan Zhang, Cunhang Fan, Zhengkun Tian, Haoxin Ma, Ruibo Fu

    Abstract: Many datasets have been designed to further the development of fake audio detection. However, fake utterances in previous datasets are mostly generated by altering timbre, prosody, linguistic content or channel noise of original audio. These datasets leave out a scenario, in which the acoustic scene of an original audio is manipulated with a forged one. It will pose a major threat to our society i… ▽ More

    Submitted 4 April, 2024; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: Accepted by Pattern Recognition, 1 April 2024

  39. arXiv:2211.05363  [pdf, other

    cs.SD eess.AS

    EmoFake: An Initial Dataset for Emotion Fake Audio Detection

    Authors: Yan Zhao, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Xiaohui Zhang, Yongfeng Dong

    Abstract: Many datasets have been designed to further the development of fake audio detection, such as datasets of the ASVspoof and ADD challenges. However, these datasets do not consider a situation that the emotion of the audio has been changed from one to another, while other information (e.g. speaker identity and content) remains the same. Changing the emotion of an audio can lead to semantic changes. S… ▽ More

    Submitted 14 September, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

  40. arXiv:2211.05295  [pdf, other

    cs.CV cs.LG eess.IV

    Harmonizing output imbalance for defect segmentation on extremely-imbalanced photovoltaic module cells images

    Authors: Jianye Yi, Xiaopin Zhong, Weixiang Liu, Zongze Wu, Yuanlong Deng, Zhengguang Wu

    Abstract: The continuous development of the photovoltaic (PV) industry has raised high requirements for the quality of monocrystalline of PV module cells. When learning to segment defect regions in PV module cell images, Tiny Hidden Cracks (THC) lead to extremely-imbalanced samples. The ratio of defect pixels to normal pixels can be as low as 1:2000. This extreme imbalance makes it difficult to segment the… ▽ More

    Submitted 24 October, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: 19 pages, 16 figures, 3 appendixes

  41. arXiv:2210.11429  [pdf

    cs.SD eess.AS

    Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS

    Authors: Chunyu Qiang, Jianhua Tao, Ruibo Fu, Zhengqi Wen, Jiangyan Yi, Tao Wang, Shiming Wang

    Abstract: Current end-to-end code-switching Text-to-Speech (TTS) can already generate high quality two languages speech in the same utterance with single speaker bilingual corpora. When the speakers of the bilingual corpora are different, the naturalness and consistency of the code-switching TTS will be poor. The cross-lingual embedding layers structure we proposed makes similar syllables in different langu… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: accepted in ISCSLP 2021

  42. arXiv:2208.10489  [pdf, other

    cs.SD cs.AI eess.AS

    System Fingerprint Recognition for Deepfake Audio: An Initial Dataset and Investigation

    Authors: Xinrui Yan, Jiangyan Yi, Chenglong Wang, Jianhua Tao, Junzuo Zhou, Hao Gu, Ruibo Fu

    Abstract: The rapid progress of deep speech synthesis models has posed significant threats to society such as malicious content manipulation. Therefore, many studies have emerged to detect the so-called deepfake audio. However, existing works focus on the binary detection of real audio and fake audio. In real-world scenarios such as model copyright protection and digital evidence forensics, it is needed to… ▽ More

    Submitted 15 September, 2023; v1 submitted 21 August, 2022; originally announced August 2022.

    Comments: 13 pages, 4 figures. Submit to IEEE Transactions on Audio, Speech and Language Processing (TASLP). arXiv admin note: text overlap with arXiv:2208.09646

  43. arXiv:2208.09646  [pdf, other

    cs.SD cs.AI eess.AS

    An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio

    Authors: Xinrui Yan, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Haoxin Ma, Tao Wang, Shiming Wang, Ruibo Fu

    Abstract: Many effective attempts have been made for fake audio detection. However, they can only provide detection results but no countermeasures to curb this harm. For many related practical applications, what model or algorithm generated the fake audio also is needed. Therefore, We propose a new problem for detecting vocoder fingerprints of fake audio. Experiments are conducted on the datasets synthesize… ▽ More

    Submitted 20 August, 2022; originally announced August 2022.

    Comments: Accepted by ACM Multimedia 2022 Workshop: First International Workshop on Deepfake Detection for Audio Multimedia

  44. arXiv:2208.09618  [pdf, other

    cs.SD cs.AI eess.AS

    Fully Automated End-to-End Fake Audio Detection

    Authors: Chenglong Wang, Jiangyan Yi, Jianhua Tao, Haiyang Sun, Xun Chen, Zhengkun Tian, Haoxin Ma, Cunhang Fan, Ruibo Fu

    Abstract: The existing fake audio detection systems often rely on expert experience to design the acoustic features or manually design the hyperparameters of the network structure. However, artificial adjustment of the parameters can have a relatively obvious influence on the results. It is almost impossible to manually set the best set of parameters. Therefore this paper proposes a fully automated end-toen… ▽ More

    Submitted 20 August, 2022; originally announced August 2022.

  45. arXiv:2208.03633  [pdf, other

    cs.MM cs.SD eess.AS

    Debiased Cross-modal Matching for Content-based Micro-video Background Music Recommendation

    Authors: **ng Yi, Zhenzhong Chen

    Abstract: Micro-video background music recommendation is a complicated task where the matching degree between videos and uploader-selected background music is a major issue. However, the selection of the user-generated content (UGC) is biased caused by knowledge limitations and historical preferences among music of each uploader. In this paper, we propose a Debiased Cross-Modal (DebCM) matching model to all… ▽ More

    Submitted 7 August, 2022; originally announced August 2022.

  46. arXiv:2208.01214  [pdf, other

    cs.SD cs.LG eess.AS

    Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

    Authors: Jun Xue, Cunhang Fan, Zhao Lv, Jianhua Tao, Jiangyan Yi, Chengshi Zheng, Zhengqi Wen, Minmin Yuan, Shegang Shao

    Abstract: Recently, pioneer research works have proposed a large number of acoustic features (log power spectrogram, linear frequency cepstral coefficients, constant Q cepstral coefficients, etc.) for audio deepfake detection, obtaining good performance, and showing that different subbands have different contributions to audio deepfake detection. However, this lacks an explanation of the specific informatio… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

  47. arXiv:2207.12308  [pdf, other

    cs.SD eess.AS

    CFAD: A Chinese Dataset for Fake Audio Detection

    Authors: Haoxin Ma, Jiangyan Yi, Chenglong Wang, Xinrui Yan, Jianhua Tao, Tao Wang, Shiming Wang, Ruibo Fu

    Abstract: Fake audio detection is a growing concern and some relevant datasets have been designed for research. However, there is no standard public Chinese dataset under complex conditions.In this paper, we aim to fill in the gap and design a Chinese fake audio detection dataset (CFAD) for studying more generalized detection methods. Twelve mainstream speech-generation techniques are used to generate fake… ▽ More

    Submitted 18 July, 2023; v1 submitted 12 July, 2022; originally announced July 2022.

    Comments: FAD renamed as CFAD

  48. arXiv:2204.01101  [pdf, ps, other

    eess.SY

    Learning-Based Safe Motion Control of Vehicle Ski-Stunt Maneuvers

    Authors: Feng Han, **gang Yi

    Abstract: This paper presents a safety guaranteed control method for an autonomous vehicle ski-stunt maneuver, that is, a vehicle moving with two one-side wheels. To capture the vehicle dynamics precisely, a Gaussian process model is used as additional correction to the nominal model that is obtained from physical principles. We construct a probabilistic control barrier function (CBF) to guarantee the plana… ▽ More

    Submitted 3 April, 2022; originally announced April 2022.

  49. arXiv:2203.11777  [pdf, other

    cs.RO eess.SY

    Autonomous Bikebot Control for Crossing Obstacles with Assistive Leg Impulsive Actuation

    Authors: Feng Han, Xinyan Huang, Zenghao Wang, **gang Yi, Tao Liu

    Abstract: As a single-track mobile platform, bikebot (i.e., bicycle-based robot) has attractive navigation capability to pass through narrow, off-road terrain with high-speed and high-energy efficiency. However, running crossing step-like obstacles creates challenges for intrinsically unstable, underactuated bikebots. This paper presents a novel autonomous bikebot control with assistive leg actuation to nav… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

  50. arXiv:2203.10210  [pdf, ps, other

    cs.RO eess.SY

    Coordinated Pose Control of Mobile Manipulation with an Unstable Bikebot Platform

    Authors: Feng Han, Alborz Jelvani, **gang Yi, Tao Liu

    Abstract: Bikebot manipulation has advantages of the single-track robot mobility and manipulation dexterity. We present a coordinated pose control of mobile manipulation with the stationary bikebot. The challenges of the bikebot manipulation include the limited steering balance capability of the unstable bikebot and kinematic redundancy of the manipulator. We first present the steering balance model to anal… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.