Skip to main content

Showing 1–50 of 106 results for author: Zheng, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.14485   

    cs.AI cs.HC cs.MM cs.SD eess.AS

    Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)

    Authors: Nick Bryan-Kinns, Corey Ford, Shuoyang Zheng, Helen Kennedy, Alan Chamberlain, Makayla Lewis, Drew Hemment, Zi** Li, Qiong Wu, Lanxi Xiao, Gus Xia, Jeba Rezwana, Michael Clemens, Gabriel Vigliensoni

    Abstract: This second international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 16th ACM Conference on Creativity and Cognition (C&C 2024), Chicago, USA.

    Submitted 22 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.11169   

    eess.AS cs.SD

    Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Wen Wang

    Abstract: Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persisting challenge. In this paper, we propose a new self-supervised speaker verification approach, Self-Distillation Prototypes Network (SDPN), which effectively facilitates self-supervised speaker representation learning. SDPN assigns the representation of the augmented views of an… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: We update this paper to an earlier paper

  3. arXiv:2406.10724  [pdf, other

    eess.IV cs.CV cs.LG

    Beyond the Visible: Jointly Attending to Spectral and Spatial Dimensions with HSI-Diffusion for the FINCH Spacecraft

    Authors: Ian Vyse, Rishit Dagli, Dav Vrat Chadha, John P. Ma, Hector Chen, Isha Ruparelia, Prithvi Seran, Matthew Xie, Eesa Aamer, Aidan Armstrong, Naveen Black, Ben Borstein, Kevin Caldwell, Orrin Dahanaggamaarachchi, Joe Dai, Abeer Fatima, Stephanie Lu, Maxime Michet, Anoushka Paul, Carrie Ann Po, Shivesh Prakash, Noa Prosser, Riddhiman Roy, Mirai Shinjo, Iliya Shofman , et al. (4 additional authors not shown)

    Abstract: Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: To appear in 38th Annual Small Satellite Conference

  4. arXiv:2406.05647  [pdf, other

    eess.SP cs.ET

    Sustainable Wireless Networks via Reconfigurable Intelligent Surfaces (RISs): Overview of the ETSI ISG RIS

    Authors: Ruiqi Liu, Shuang Zheng, Qingqing Wu, Yifan Jiang, Nan Zhang, Yuanwei Liu, Marco Di Renzo, and George C. Alexandropoulos

    Abstract: Reconfigurable Intelligent Surfaces (RISs) are a novel form of ultra-low power devices that are capable to increase the communication data rates as well as the cell coverage in a cost- and energy-efficient way. This is attributed to their programmable operation that enables them to dynamically manipulate the wireless propagation environment, a feature that has lately inspired numerous research inv… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures, submitted to an IEEE Magazine

  5. arXiv:2406.02167  [pdf, other

    eess.AS eess.SP

    ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Junjie Li

    Abstract: Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature fusion approach has been proposed to effectively capture speaker characteristics from short utterances. Constrained by the model's size, a robust backbone Enhanced Res2Net (ERes2Net) combining global and local feature fusion… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  6. arXiv:2406.01205  [pdf, other

    eess.AS cs.LG cs.SD

    ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

    Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Siqi Zheng, Qian Chen, Wen Wang, Ziyue Jiang, Hai Huang, Xize Cheng, Rongjie Huang, Zhou Zhao

    Abstract: In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual style description prompt. Prior zero-shot TTS models and controllable TTS models either could only mimic the speaker's voice without further control and… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  7. arXiv:2406.00356  [pdf, other

    eess.AS cs.SD

    AudioLCM: Text-to-Audio Generation with Latent Consistency Models

    Authors: Huadai Liu, Rongjie Huang, Yang Liu, Hengyuan Cao, Jialei Wang, Xize Cheng, Siqi Zheng, Zhou Zhao

    Abstract: Recent advancements in Latent Diffusion Models (LDMs) have propelled them to the forefront of various generative tasks. However, their iterative sampling process poses a significant computational burden, resulting in slow generation speeds and limiting their application in text-to-audio generation deployment. In this work, we introduce AudioLCM, a novel consistency-based model tailored for efficie… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  8. arXiv:2403.19971  [pdf, other

    eess.AS eess.SP

    3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Tinglong Zhu, Changhe Song, Rongjie Huang, Ziyang Ma, Qian Chen, Shiliang Zhang, Xihao Li

    Abstract: This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization. It is designed for the needs of academic researchers and industrial practitioners. The 3D-Speaker-Toolkit adeptly leverages the combined strengths of acoustic, semantic, and visual data, seamlessly fusing these modalities to offer robust speaker recognition capabilities. The acous… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  9. Human Activity Recognition with Low-Resolution Infrared Array Sensor Using Semi-supervised Cross-domain Neural Networks for Indoor Environment

    Authors: Cunyi Yin, Xiren Miao, **g Chen, Hao Jiang, Deying Chen, Yixuan Tong, Shaocong Zheng

    Abstract: Low-resolution infrared-based human activity recognition (HAR) attracted enormous interests due to its low-cost and private. In this paper, a novel semi-supervised crossdomain neural network (SCDNN) based on 8 $\times$ 8 low-resolution infrared sensor is proposed for accurately identifying human activity despite changes in the environment at a low-cost. The SCDNN consists of feature extractor, dom… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  10. arXiv:2402.12208  [pdf, other

    eess.AS cs.SD

    Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

    Authors: Shengpeng Ji, Minghui Fang, Ziyue Jiang, Siqi Zheng, Qian Chen, Rongjie Huang, Jialung Zuo, Shulei Wang, Zhou Zhao

    Abstract: In recent years, large language models have achieved significant success in generative tasks (e.g., speech cloning and audio generation) related to speech, audio, music, and other signal domains. A crucial element of these models is the discrete acoustic codecs, which serves as an intermediate representation replacing the mel-spectrogram. However, there exist several gaps between discrete codecs a… ▽ More

    Submitted 27 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: We release a more powerful checkpoint in Language-Codec v3

  11. arXiv:2402.08846  [pdf, other

    cs.CL cs.AI cs.MM cs.SD eess.AS

    An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

    Authors: Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

    Abstract: In this paper, we focus on solving one of the most important tasks in the field of speech processing, i.e., automatic speech recognition (ASR), with speech foundation encoders and large language models (LLM). Recent works have complex designs such as compressing the output temporally for the speech encoder, tackling modal alignment for the projector, and utilizing parameter-efficient fine-tuning f… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: Working in progress and will open-source soon

  12. arXiv:2401.01553  [pdf, other

    eess.IV cs.CV

    Multi-modal Learning with Missing Modality in Predicting Axillary Lymph Node Metastasis

    Authors: Shichuan Zhang, Sunyi Zheng, Zhongyi Shui, Honglin Li, Lin Yang

    Abstract: Multi-modal Learning has attracted widespread attention in medical image analysis. Using multi-modal data, whole slide images (WSIs) and clinical information, can improve the performance of deep learning models in the diagnosis of axillary lymph node metastasis. However, clinical information is not easy to collect in clinical practice due to privacy concerns, limited resources, lack of interoperab… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  13. arXiv:2312.15993  [pdf

    cs.AI cs.RO eess.SY

    Adaptive Kalman-based hybrid car following strategy using TD3 and CACC

    Authors: Yuqi Zheng, Ruidong Yan, Bin Jia, Rui Jiang, Adriana TAPUS, Xiao**g Chen, Shiteng Zheng, Ying Shang

    Abstract: In autonomous driving, the hybrid strategy of deep reinforcement learning and cooperative adaptive cruise control (CACC) can fully utilize the advantages of the two algorithms and significantly improve the performance of car following. However, it is challenging for the traditional hybrid strategy based on fixed coefficients to adapt to mixed traffic flow scenarios, which may decrease the performa… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: 32pages,13figures

  14. arXiv:2312.01292  [pdf, ps, other

    cs.NI eess.SP

    Joint Beam Scheduling and Power Optimization for Beam Hop** LEO Satellite Systems

    Authors: Shuang Zheng, Xing Zhang, Peng Wang, Wenbo Wang

    Abstract: Low earth orbit (LEO) satellite communications can provide ubiquitous and reliable services, making it an essential part of the Internet of Everything network. Beam hop** (BH) is an emerging technology for effectively addressing the issue of low resource utilization caused by the non-uniform spatio-temporal distribution of traffic demands. However, how to allocate multi-dimensional resources in… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  15. arXiv:2311.16155  [pdf, other

    eess.SP cs.LG

    Deep Learning-Based Frequency Offset Estimation

    Authors: Tao Chen, Shilian Zheng, Jiawei Zhu, Qi Xuan, Xiaoniu Yang

    Abstract: In wireless communication systems, the asynchronization of the oscillators in the transmitter and the receiver along with the Doppler shift due to relative movement may lead to the presence of carrier frequency offset (CFO) in the received signals. Estimation of CFO is crucial for subsequent processing such as coherent demodulation. In this brief, we demonstrate the utilization of deep learning fo… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  16. arXiv:2311.10463  [pdf, other

    eess.IV cs.CV

    Correlation-Distance Graph Learning for Treatment Response Prediction from rs-fMRI

    Authors: Xiatian Zhang, Sisi Zheng, Hubert P. H. Shum, Haozheng Zhang, Nan Song, Mingkang Song, Hongxiao Jia

    Abstract: Resting-state fMRI (rs-fMRI) functional connectivity (FC) analysis provides valuable insights into the relationships between different brain regions and their potential implications for neurological or psychiatric disorders. However, specific design efforts to predict treatment response from rs-fMRI remain limited due to difficulties in understanding the current brain state and the underlying mech… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: Proceedings of the 2023 International Conference on Neural Information Processing (ICONIP)

  17. arXiv:2311.04534  [pdf, other

    cs.CL cs.SD eess.AS

    Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR

    Authors: Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang

    Abstract: Recently, unified speech-text models, such as SpeechGPT, VioLA, and AudioPaLM, have achieved remarkable performance on various speech tasks. These models discretize speech signals into tokens (speech discretization) and use a shared vocabulary for both text and speech tokens. Then they train a single decoder-only Transformer on a mixture of speech tasks. However, these models rely on the Loss Mask… ▽ More

    Submitted 4 February, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: 5 pages, accepted by ICASSP 2024

  18. arXiv:2311.03761  [pdf, other

    cs.LG cs.AI eess.SP

    Augmenting Radio Signals with Wavelet Transform for Deep Learning-Based Modulation Recognition

    Authors: Tao Chen, Shilian Zheng, Kunfeng Qiu, Luxin Zhang, Qi Xuan, Xiaoniu Yang

    Abstract: The use of deep learning for radio modulation recognition has become prevalent in recent years. This approach automatically extracts high-dimensional features from large datasets, facilitating the accurate classification of modulation schemes. However, in real-world scenarios, it may not be feasible to gather sufficient training data in advance. Data augmentation is a method used to increase the d… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  19. arXiv:2310.17471  [pdf, other

    cs.IT cs.DC cs.LG cs.NI eess.SP

    Foundation Model Based Native AI Framework in 6G with Cloud-Edge-End Collaboration

    Authors: Xiang Chen, Zhiheng Guo, Xijun Wang, Howard H. Yang, Chenyuan Feng, Junshen Su, Sihui Zheng, Tony Q. S. Quek

    Abstract: Future wireless communication networks are in a position to move beyond data-centric, device-oriented connectivity and offer intelligent, immersive experiences based on task-oriented connections, especially in the context of the thriving development of pre-trained foundation models (PFM) and the evolving vision of 6G native artificial intelligence (AI). Therefore, redefining modes of collaboration… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: 8 pages, 4 figures, 1 table

  20. arXiv:2310.05052  [pdf, other

    eess.SP cs.AI cs.LG

    Accurate battery lifetime prediction across diverse aging conditions with deep learning

    Authors: Han Zhang, Yuqi Li, Shun Zheng, Ziheng Lu, Xiaofan Gui, Wei Xu, Jiang Bian

    Abstract: Accurately predicting the lifetime of battery cells in early cycles holds tremendous value for battery research and development as well as numerous downstream applications. This task is rather challenging because diverse conditions, such as electrode materials, operating conditions, and working environments, collectively determine complex capacity-degradation behaviors. However, current prediction… ▽ More

    Submitted 24 November, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

  21. arXiv:2310.04673  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

    Authors: Jiaming Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, ** Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang

    Abstract: Generative Pre-trained Transformer (GPT) models have achieved remarkable performance on various natural language processing tasks. However, there has been limited research on applying similar frameworks to audio tasks. Previously proposed large language models for audio tasks either lack sufficient quantitative evaluations, or are limited to tasks for recognizing and understanding audio content, o… ▽ More

    Submitted 10 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: 10 pages, under review

  22. arXiv:2309.15367  [pdf

    cs.RO eess.SY

    Analysis on Multi-robot Relative 6-DOF Pose Estimation Error Based on UWB Range

    Authors: Xinran Li, Shuaikang Zheng, Pengcheng Zheng, Haifeng Zhang, Zhitian Li, Xudong Zou

    Abstract: Relative pose estimation is the foundational requirement for multi-robot system, while it is a challenging research topic in infrastructure-free scenes. In this study, we analyze the relative 6-DOF pose estimation error of multi-robot system in GNSS-denied and anchor-free environment. An analytical lower bound of position and orientation estimation error is given under the assumption that distance… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: 7 pages, 9 figures

  23. arXiv:2309.10456  [pdf, other

    cs.SD cs.CL eess.AS

    Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

    Authors: Luyao Cheng, Siqi Zheng, Qinglin Zhang, Hui Wang, Yafeng Chen, Qian Chen, Shiliang Zhang

    Abstract: Speaker diarization has gained considerable attention within speech processing research community. Mainstream speaker diarization rely primarily on speakers' voice characteristics extracted from acoustic signals and often overlook the potential of semantic information. Considering the fact that speech signals can efficiently convey the content of a speech, it is of our interest to fully exploit th… ▽ More

    Submitted 4 February, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

  24. arXiv:2309.07405  [pdf, other

    cs.SD cs.AI eess.AS

    FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

    Authors: Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng

    Abstract: This paper presents FunCodec, a fundamental neural speech codec toolkit, which is an extension of the open-source speech processing toolkit FunASR. FunCodec provides reproducible training recipes and inference scripts for the latest neural speech codec models, such as SoundStream and Encodec. Thanks to the unified design with FunASR, FunCodec can be easily integrated into downstream tasks, such as… ▽ More

    Submitted 6 October, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: 5 pages, 3 figures, submitted to ICASSP 2024

  25. arXiv:2309.04842  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    Leveraging Large Language Models for Exploiting ASR Uncertainty

    Authors: Pranay Dighe, Yi Su, Shangshang Zheng, Yunshu Liu, Vineet Garg, Xiaochuan Niu, Ahmed Tewfik

    Abstract: While large language models excel in a variety of natural language processing (NLP) tasks, to perform well on spoken language understanding (SLU) tasks, they must either rely on off-the-shelf automatic speech recognition (ASR) systems for transcription, or be equipped with an in-built speech modality. This work focuses on the former scenario, where LLM's accuracy on SLU tasks is constrained by the… ▽ More

    Submitted 12 September, 2023; v1 submitted 9 September, 2023; originally announced September 2023.

    Comments: Added references

  26. arXiv:2308.02774  [pdf, other

    eess.AS cs.SD

    Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Wen Wang

    Abstract: Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persisting challenge. In this paper, we propose a new self-supervised speaker verification approach, Self-Distillation Prototypes Network (SDPN), which effectively facilitates self-supervised speaker representation learning. SDPN assigns the representation of the augmented views of an… ▽ More

    Submitted 26 June, 2024; v1 submitted 4 August, 2023; originally announced August 2023.

    Comments: arXiv admin note: text overlap with arXiv:2211.04168 I submitted the updated paper for arXiv:2308.02774 with the revised version. As for arXiv:2406.11169, I mistakenly submitted this last time, so I withdrew arXiv:2406.11169 and merged the latest content into arXiv: 2308.02774

  27. arXiv:2308.02498  [pdf, other

    eess.IV cs.CV cs.LG

    Learning to Segment from Noisy Annotations: A Spatial Correction Approach

    Authors: Jiachen Yao, Yikai Zhang, Songzhu Zheng, Mayank Goswami, Prateek Prasanna, Chao Chen

    Abstract: Noisy labels can significantly affect the performance of deep neural networks (DNNs). In medical image segmentation tasks, annotations are error-prone due to the high demand in annotation time and in the annotators' expertise. Existing methods mostly assume noisy labels in different pixels are \textit{i.i.d}. However, segmentation label noise usually has strong spatial correlation and has prominen… ▽ More

    Submitted 20 July, 2023; originally announced August 2023.

  28. arXiv:2306.15354  [pdf, other

    cs.CL cs.SD eess.AS

    3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

    Authors: Siqi Zheng, Luyao Cheng, Yafeng Chen, Hui Wang, Qian Chen

    Abstract: Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated information. We present a large-scale speech corpus to facilitate the research of speech representation disentanglement. 3D-Speaker contains over 10,000… ▽ More

    Submitted 24 September, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  29. arXiv:2306.14527  [pdf, other

    eess.SY

    Computationally Enhanced Approach for Chance-Constrained OPF Considering Voltage Stability

    Authors: Yuanxi Wu, Zhi Wu, Yijun Xu, Huan Long, Wei Gu, Shu Zheng, **gtao Zhao

    Abstract: The effective management of stochastic characteristics of renewable power generations is vital for ensuring the stable and secure operation of power systems. This paper addresses the task of optimizing the chance-constrained voltage-stability-constrained optimal power flow (CC-VSC-OPF) problem, which is hindered by the implicit voltage stability index and intractable chance constraints Leveraging… ▽ More

    Submitted 3 January, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

  30. arXiv:2305.19005  [pdf, other

    cs.IT eess.SP

    Hybrid Driven Learning for Channel Estimation in Intelligent Reflecting Surface Aided Millimeter Wave Communications

    Authors: Shuntian Zheng, Sheng Wu, Chunxiao Jiang, Wei Zhang, Xiaojun **g

    Abstract: Intelligent reflecting surfaces (IRS) have been proposed in millimeter wave (mmWave) and terahertz (THz) systems to achieve both coverage and capacity enhancement, where the design of hybrid precoders, combiners, and the IRS typically relies on channel state information. In this paper, we address the problem of uplink wideband channel estimation for IRS aided multiuser multiple-input single-output… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 30 pages, 8 figures, submitted to IEEE transactions on wireless communications on December 13, 2022

  31. arXiv:2305.12927  [pdf, other

    cs.CL cs.SD eess.AS

    Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization

    Authors: Luyao Cheng, Siqi Zheng, Zhang Qinglin, Hui Wang, Yafeng Chen, Qian Chen

    Abstract: Speaker diarization(SD) is a classic task in speech processing and is crucial in multi-party scenarios such as meetings and conversations. Current mainstream speaker diarization approaches consider acoustic information only, which result in performance degradation when encountering adverse acoustic conditions. In this paper, we propose methods to extract speaker-related information from semantic c… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of ACL 2023

  32. arXiv:2305.12838  [pdf, other

    eess.AS cs.SD

    An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Jiajun Qi

    Abstract: Effective fusion of multi-scale features is crucial for improving speaker verification performance. While most existing methods aggregate multi-scale features in a layer-wise manner via simple operations, such as summation or concatenation. This paper proposes a novel architecture called Enhanced Res2Net (ERes2Net), which incorporates both local and global feature fusion techniques to improve the… ▽ More

    Submitted 3 August, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  33. arXiv:2304.04154  [pdf, other

    astro-ph.IM eess.SY

    Review of X-ray pulsar spacecraft autonomous navigation

    Authors: Yidi Wang, Wei Zheng, Shuangnan Zhang, Minyu Ge, Liansheng Li, Kun Jiang, Xiaoqian Chen, Xiang Zhang, Shijie Zheng, Fangjun Lu

    Abstract: This article provides a review on X-ray pulsar-based navigation (XNAV). The review starts with the basic concept of XNAV, and briefly introduces the past, present and future projects concerning XNAV. This paper focuses on the advances of the key techniques supporting XNAV, including the navigation pulsar database, the X-ray detection system, and the pulse time of arrival estimation. Moreover, the… ▽ More

    Submitted 9 April, 2023; originally announced April 2023.

    Comments: has been accepted by Chinese Journal of Aeronautics

    Journal ref: Chinese Journal of Aeronautics, 2023

  34. arXiv:2303.13336  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI

    Authors: Chenshuang Zhang, Chaoning Zhang, Sheng Zheng, Mengchun Zhang, Maryam Qamar, Sung-Ho Bae, In So Kweon

    Abstract: Generative AI has demonstrated impressive performance in various fields, among which speech synthesis is an interesting direction. With the diffusion model as the most popular generative model, numerous works have attempted two active tasks: text to speech and speech enhancement. This work conducts a survey on audio diffusion model, which is complementary to existing surveys that either lack the r… ▽ More

    Submitted 2 April, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: 18 pages

  35. arXiv:2303.06641  [pdf, other

    cs.CV eess.IV

    Adaptive Local Adversarial Attacks on 3D Point Clouds for Augmented Reality

    Authors: Weiquan Liu, Shijun Zheng, Cheng Wang

    Abstract: As the key technology of augmented reality (AR), 3D recognition and tracking are always vulnerable to adversarial examples, which will cause serious security risks to AR systems. Adversarial examples are beneficial to improve the robustness of the 3D neural network model and enhance the stability of the AR system. At present, most 3D adversarial attack methods perturb the entire point cloud to gen… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

  36. arXiv:2303.00332  [pdf, other

    cs.SD eess.AS

    CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking

    Authors: Hui Wang, Siqi Zheng, Yafeng Chen, Luyao Cheng, Qian Chen

    Abstract: Time delay neural network (TDNN) has been proven to be efficient for speaker verification. One of its successful variants, ECAPA-TDNN, achieved state-of-the-art performance at the cost of much higher computational complexity and slower inference speed. This makes it inadequate for scenarios with demanding inference rate and limited computational resources. We are thus interested in finding an arch… ▽ More

    Submitted 16 June, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  37. Efficient Constrained Codes That Enable Page Separation in Modern Flash Memories

    Authors: Ahmed Hareedy, Simeng Zheng, Paul Siegel, Robert Calderbank

    Abstract: The pivotal storage density win achieved by solid-state devices over magnetic devices recently is a result of multiple innovations in physics, architecture, and signal processing. Constrained coding is used in Flash devices to increase reliability via mitigating inter-cell interference. Recently, capacity-achieving constrained codes were introduced to serve that purpose. While these codes result i… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

    Comments: 30 pages (single column), 5 figures, submitted to the IEEE Transactions on Communications (TCOM). arXiv admin note: substantial text overlap with arXiv:2111.07415

  38. arXiv:2301.12129  [pdf, other

    eess.SY

    Decentralized Energy Market Integrating Carbon Allowance Trade and Uncertainty Balance in Energy Communities

    Authors: Yuanxi Wu, Zhi Wu, Wei Gu, Zheng Xu, Shu Zheng, Qirun Sun

    Abstract: With the sustained attention on carbon neutrality, the personal carbon trading (PCT) scheme has been embraced as an auspicious paradigm for scaling down carbon emissions. To facilitate the simultaneous clearance of energy and carbon allowance inside the energy community while hedging against uncertainty, a joint trading framework is proposed in this article. The energy trading is implemented in a… ▽ More

    Submitted 28 January, 2023; originally announced January 2023.

  39. arXiv:2212.07000  [pdf, other

    eess.AS

    DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect

    Authors: **glin Liu, Zhenhui Ye, Qian Chen, Siqi Zheng, Wen Wang, Qinglin Zhang, Zhou Zhao

    Abstract: Recently, binaural audio synthesis (BAS) has emerged as a promising research field for its applications in augmented and virtual realities. Binaural audio helps users orient themselves and establish immersion by providing the brain with interaural time differences reflecting spatial information. However, existing BAS methods are limited in terms of phase estimation, which is crucial for spatial he… ▽ More

    Submitted 1 June, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: Accepted to ACL 2023 short paper; key words: binaural audio, stereophonic sound

  40. arXiv:2211.14548  [pdf, other

    eess.AS cs.CL cs.LG cs.MM

    Contextual Expressive Text-to-Speech

    Authors: Jianhong Tu, Zeyu Cui, Xiaohuan Zhou, Siqi Zheng, Kai Hu, Ju Fan, Chang Zhou

    Abstract: The goal of expressive Text-to-speech (TTS) is to synthesize natural speech with desired content, prosody, emotion, or timbre, in high expressiveness. Most of previous studies attempt to generate speech from given labels of styles and emotions, which over-simplifies the problem by classifying styles and emotions into a fixed number of pre-defined categories. In this paper, we introduce a new task… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  41. arXiv:2211.10243  [pdf, other

    cs.SD cs.MM eess.AS

    Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis

    Authors: Zhihao Du, Shiliang Zhang, Siqi Zheng, Zhijie Yan

    Abstract: Recently, hybrid systems of clustering and neural diarization models have been successfully applied in multi-party meeting analysis. However, current models always treat overlapped speaker diarization as a multi-label classification problem, where speaker dependency and overlaps are not well considered. To overcome the disadvantages, we reformulate overlapped speaker diarization task as a single-l… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

    Comments: Accepted by EMNLP 2022

  42. arXiv:2211.04168  [pdf, other

    eess.AS cs.SD

    Pushing the limits of self-supervised speaker verification using regularized distillation framework

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen

    Abstract: Training robust speaker verification systems without speaker labels has long been a challenging task. Previous studies observed a large performance gap between self-supervised and fully supervised methods. In this paper, we apply a non-contrastive self-supervised learning framework called DIstillation with NO labels (DINO) and propose two regularization terms applied to embeddings in DINO. One reg… ▽ More

    Submitted 2 August, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

  43. arXiv:2210.09531   

    cs.RO cs.HC eess.SY

    The Brain-Inspired Cooperative Shared Control for Brain-Machine Interface

    Authors: Shengjie Zheng, Ling Liu, Junjie Yang, Lang Qian, Gang Gao, Xin Chen, Wenqi **, Chunshan Deng, Xiaojian Li

    Abstract: In the practical application of brain-machine interface technology, the problem often faced is the low information content and high noise of the neural signals collected by the electrode and the difficulty of decoding by the decoder, which makes it difficult for the robotic to obtain stable instructions to complete the task. The idea based on the principle of cooperative shared control can be achi… ▽ More

    Submitted 25 June, 2024; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: This article need to update the corrected figure and data

  44. arXiv:2210.04188  [pdf, other

    eess.IV cs.CV cs.LG

    Invertible Rescaling Network and Its Extensions

    Authors: Mingqing Xiao, Shuxin Zheng, Chang Liu, Zhouchen Lin, Tie-Yan Liu

    Abstract: Image rescaling is a commonly used bidirectional operation, which first downscales high-resolution images to fit various display screens or to be storage- and bandwidth-friendly, and afterward upscales the corresponding low-resolution images to recover the original resolution or the details in the zoom-in images. However, the non-injective downscaling map** discards high-frequency contents, lead… ▽ More

    Submitted 9 October, 2022; originally announced October 2022.

    Comments: Accepted by IJCV

  45. arXiv:2209.13318  [pdf, other

    eess.SY

    Modeling and Control of Discrete Event Systems under Joint Sensor-Actuator Cyber Attacks

    Authors: Shengbao Zheng, Shaolong Shu, Feng Lin

    Abstract: In this paper, we investigate joint sensor-actuator cyber attacks in discrete event systems. We assume that attackers can attack some sensors and actuators at the same time by altering observations and control commands. Because of the nondeterminism in observation and control caused by cyber attacks, the behavior of the supervised system becomes nondeterministic and may deviate from the safety spe… ▽ More

    Submitted 11 January, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

  46. arXiv:2209.11451  [pdf, other

    cs.IT eess.SY

    FIAT: Fine-grained Information Audit for Trustless Transborder Data Flow

    Authors: Shuhao Zheng, Yanxi Lin, Yang Yu, Ye Yuan, Yongzheng Jia, Xue Liu

    Abstract: Auditing the information leakage of latent sensitive features during the transborder data flow has attracted sufficient attention from global digital regulators. However, there is missing a technical approach for the audit practice due to two technical challenges. Firstly, there is a lack of theory and tools for measuring the information of sensitive latent features in a dataset. Secondly, the tra… ▽ More

    Submitted 10 February, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

    Comments: 10 pages, 6 figures, 1 table

  47. arXiv:2209.04854  [pdf, other

    eess.SY cs.LG

    Performance-Driven Controller Tuning via Derivative-Free Reinforcement Learning

    Authors: Yuheng Lei, Jianyu Chen, Shengbo Eben Li, Sifa Zheng

    Abstract: Choosing an appropriate parameter set for the designed controller is critical for the final performance but usually requires a tedious and careful tuning process, which implies a strong need for automatic tuning methods. However, among existing methods, derivative-free ones suffer from poor scalability or low efficiency, while gradient-based ones are often unavailable due to possibly non-different… ▽ More

    Submitted 11 September, 2022; originally announced September 2022.

    Comments: Accepted by the 61st IEEE Conference on Decision and Control (CDC), 2022. Copyright @IEEE

  48. arXiv:2207.00769  [pdf, other

    eess.IV cs.CV

    Test-time Adaptation with Calibration of Medical Image Classification Nets for Label Distribution Shift

    Authors: Wenao Ma, Cheng Chen, Shuang Zheng, **g Qin, Huimao Zhang, Qi Dou

    Abstract: Class distribution plays an important role in learning deep classifiers. When the proportion of each class in the test set differs from the training set, the performance of classification nets usually degrades. Such a label distribution shift problem is common in medical diagnosis since the prevalence of disease vary over location and time. In this paper, we propose the first method to tackle labe… ▽ More

    Submitted 9 July, 2022; v1 submitted 2 July, 2022; originally announced July 2022.

    Comments: This paper has been accepted by MICCAI 2022

  49. arXiv:2206.13903  [pdf, other

    eess.IV cs.CV

    AS-IntroVAE: Adversarial Similarity Distance Makes Robust IntroVAE

    Authors: Changjie Lu, Shen Zheng, Zirui Wang, Omar Dib, Gaurav Gupta

    Abstract: Recently, introspective models like IntroVAE and S-IntroVAE have excelled in image generation and reconstruction tasks. The principal characteristic of introspective models is the adversarial learning of VAE, where the encoder attempts to distinguish between the real and the fake (i.e., synthesized) images. However, due to the unavailability of an effective metric to evaluate the difference betwee… ▽ More

    Submitted 31 October, 2022; v1 submitted 28 June, 2022; originally announced June 2022.

    Comments: ACML conference paper

  50. arXiv:2205.14294  [pdf, other

    eess.AS

    Deep Representation Decomposition for Rate-Invariant Speaker Verification

    Authors: Fuchuan Tong, Siqi Zheng, Haodong Zhou, Xingjia Xie, Qingyang Hong, Lin Li

    Abstract: While promising performance for speaker verification has been achieved by deep speaker embeddings, the advantage would reduce in the case of speaking-style variability. Speaking rate mismatch is often observed in practical speaker verification systems, which may actually degrade the system performance. To reduce intra-class discrepancy caused by speaking rate, we propose a deep representation deco… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: Accepted by Odyssey 2022