Skip to main content

Showing 1–50 of 103 results for author: Deng, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.11364  [pdf, other

    cs.SD eess.AS

    AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

    Authors: Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, **yi Fan

    Abstract: Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machine anomalous sound detection (ASD) task. This may be caused by the inconsistency of the pre-trained model and the inductive bias of machine audio, res… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  2. arXiv:2406.03714  [pdf, other

    cs.SD eess.AS

    Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining

    Authors: **long Xue, Yayue Deng, Yingming Gao, Ya Li

    Abstract: Recent prompt-based text-to-speech (TTS) models can clone an unseen speaker using only a short speech prompt. They leverage a strong in-context ability to mimic the speech prompts, including speaker style, prosody, and emotion. Therefore, the selection of a speech prompt greatly influences the generated speech, akin to the importance of a prompt in large language models (LLMs). However, current pr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  3. arXiv:2406.03706  [pdf, other

    cs.SD cs.CL eess.AS

    Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model

    Authors: **long Xue, Yayue Deng, Yicheng Han, Yingming Gao, Ya Li

    Abstract: Recent advances in large language models (LLMs) and development of audio codecs greatly propel the zero-shot TTS. They can synthesize personalized speech with only a 3-second speech of an unseen speaker as acoustic prompt. However, they only support short speech prompts and cannot leverage longer context information, as required in audiobook and conversational TTS scenarios. In this paper, we intr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  4. arXiv:2405.00603  [pdf, other

    cs.SD eess.AS

    Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation

    Authors: Yimin Deng, Jianzong Wang, Xulong Zhang, Ning Cheng, **g Xiao

    Abstract: Voice conversion is the task to transform voice characteristics of source speech while preserving content information. Nowadays, self-supervised representation learning models are increasingly utilized in content extraction. However, in these representations, a lot of hidden speaker information leads to timbre leakage while the prosodic information of hidden units lacks use. To address these issue… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  5. arXiv:2404.01654  [pdf, other

    cs.CV cs.AI eess.IV eess.SP

    AI WALKUP: A Computer-Vision Approach to Quantifying MDS-UPDRS in Parkinson's Disease

    Authors: Xiang Xiang, Zihan Zhang, **g Ma, Yao Deng

    Abstract: Parkinson's Disease (PD) is the second most common neurodegenerative disorder. The existing assessment method for PD is usually the Movement Disorder Society - Unified Parkinson's Disease Rating Scale (MDS-UPDRS) to assess the severity of various types of motor symptoms and disease progression. However, manual assessment suffers from high subjectivity, lack of consistency, and high cost and low ef… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Technical report for AI WALKUP, an APP winning 3rd Prize of 2022 HUST GS AI Innovation and Design Competition

  6. arXiv:2403.17392  [pdf, other

    cs.RO eess.SY nlin.AO

    Natural-artificial hybrid swarm: Cyborg-insect group navigation in unknown obstructed soft terrain

    Authors: Yang Bai, Phuoc Thanh Tran Ngoc, Huu Duoc Nguyen, Duc Long Le, Quang Huy Ha, Kazuki Kai, Yu Xiang See To, Yaosheng Deng, Jie Song, Naoki Wakamiya, Hirotaka Sato, Masaki Ogura

    Abstract: Navigating multi-robot systems in complex terrains has always been a challenging task. This is due to the inherent limitations of traditional robots in collision avoidance, adaptation to unknown environments, and sustained energy efficiency. In order to overcome these limitations, this research proposes a solution by integrating living insects with miniature electronic controllers to enable roboti… ▽ More

    Submitted 27 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  7. arXiv:2402.16027  [pdf, other

    cs.IT eess.SP

    Enhancing xURLLC with RSMA-Assisted Massive-MIMO Networks: Performance Analysis and Optimization

    Authors: Yuang Chen, Hancheng Lu, Chenwu Zhang, Yansha Deng, Arumugam Nallanathan

    Abstract: Massive interconnection has sparked people's envisioning for next-generation ultra-reliable and low-latency communications (xURLLC), prompting the design of customized next-generation advanced transceivers (NGAT). Rate-splitting multiple access (RSMA) has emerged as a pivotal technology for NGAT design, given its robustness to imperfect channel state information (CSI) and resilience to quality of… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: 14 pages, 11 figures, Submitted to IEEE for potential publication

  8. arXiv:2402.11478  [pdf, other

    eess.SY

    Federated Reinforcement Learning for Uplink Centric Broadband Communication Optimization over Unlicensed Spectrum

    Authors: Hui Zhou, Yansha Deng

    Abstract: To provide Uplink Centric Broadband Communication (UCBC), New Radio Unlicensed (NR-U) network has been standardized to exploit the unlicensed spectrum using Listen Before Talk (LBT) scheme to fairly coexist with the incumbent Wireless Fidelity (WiFi) network. Existing access schemes over unlicensed spectrum are required to perform Clear Channel Assessment (CCA) before transmissions, where fixed En… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  9. arXiv:2401.08096  [pdf, other

    cs.SD eess.AS

    Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval

    Authors: Yimin Deng, Huaizhen Tang, Xulong Zhang, Ning Cheng, **g Xiao, Jianzong Wang

    Abstract: Voice conversion refers to transferring speaker identity with well-preserved content. Better disentanglement of speech representations leads to better voice conversion. Recent studies have found that phonetic information from input audio has the potential ability to well represent content. Besides, the speaker-style modeling with pre-trained models making the process more complex. To tackle these… ▽ More

    Submitted 17 January, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted by 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2024)

  10. arXiv:2401.01544  [pdf, other

    cs.CV eess.SP

    Collaborative Perception for Connected and Autonomous Driving: Challenges, Possible Solutions and Opportunities

    Authors: Senkang Hu, Zhengru Fang, Yiqin Deng, Xianhao Chen, Yuguang Fang

    Abstract: Autonomous driving has attracted significant attention from both academia and industries, which is expected to offer a safer and more efficient driving system. However, current autonomous driving systems are mostly based on a single vehicle, which has significant limitations which still poses threats to driving safety. Collaborative perception with connected and autonomous vehicles (CAVs) shows a… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  11. arXiv:2401.01044  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation

    Authors: **long Xue, Yayue Deng, Yingming Gao, Ya Li

    Abstract: Recent advancements in diffusion models and large language models (LLMs) have significantly propelled the field of AIGC. Text-to-Audio (TTA), a burgeoning AIGC application designed to generate audio from natural language prompts, is attracting increasing attention. However, existing TTA studies often struggle with generation quality and text-audio alignment, especially for complex textual inputs.… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Demo and implementation at https://auffusion.github.io

  12. arXiv:2312.16383  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Frame-level emotional state alignment method for speech emotion recognition

    Authors: Qifei Li, Yingming Gao, Cong Wang, Yayue Deng, **long Xue, Yichen Han, Ya Li

    Abstract: Speech emotion recognition (SER) systems aim to recognize human emotional state during human-computer interaction. Most existing SER systems are trained based on utterance-level labels. However, not all frames in an audio have affective states consistent with utterance-level label, which makes it difficult for the model to distinguish the true emotion of the audio and perform poorly. To address th… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  13. arXiv:2312.13182  [pdf, other

    cs.RO eess.SY

    Task-oriented Semantics-aware Communications for Robotic Waypoint Transmission: the Value and Age of Information Approach

    Authors: Wenchao Wu, Yuanqing Yang, Yansha Deng, A. Hamid Aghvami

    Abstract: The ultra-reliable and low-latency communication (URLLC) service of the fifth-generation (5G) mobile communication network struggles to support safe robot operation. Nowadays, the sixth-generation (6G) mobile communication network is proposed to provide hyper-reliable and low-latency communication to enable safer control for robots. However, current 5G/ 6G research mainly focused on improving comm… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  14. arXiv:2312.12358  [pdf, other

    cs.IT eess.SP

    Localization and Discrete Beamforming with a Large Reconfigurable Intelligent Surface

    Authors: Baojia Luo, Yili Deng, Miaomiao Dong, Zhongyi Huang, Xiang Chen, Wei Han, Bo Bai

    Abstract: In millimeter-wave (mmWave) cellular systems, reconfigurable intelligent surfaces (RISs) are foreseeably deployed with a large number of reflecting elements to achieve high beamforming gains. The large-sized RIS will make radio links fall in the near-field localization regime with spatial non-stationarity issues. Moreover, the discrete phase restriction on the RIS reflection coefficient incurs exp… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 13 pages

  15. arXiv:2311.08670  [pdf, other

    cs.SD eess.AS

    CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control and Contrastive Learning with Negative Samples Augmentation

    Authors: Yimin Deng, Xulong Zhang, Jianzong Wang, Ning Cheng, **g Xiao

    Abstract: Better disentanglement of speech representation is essential to improve the quality of voice conversion. Recently contrastive learning is applied to voice conversion successfully based on speaker labels. However, the performance of model will reduce in conversion between similar speakers. Hence, we propose an augmented negative sample selection to address the issue. Specifically, we create hard ne… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted by the 21st IEEE International Symposium on Parallel and Distributed Processing with Applications (IEEE ISPA 2023)

  16. arXiv:2310.07062  [pdf, other

    cs.SD cs.LG eess.AS

    Acoustic Model Fusion for End-to-end Speech Recognition

    Authors: Zhihong Lei, Mingbin Xu, Shiyi Han, Leo Liu, Zhen Huang, Tim Ng, Yuanyuan Zhang, Ernest Pusateri, Mirko Hannemann, Yaqiao Deng, Man-Hung Siu

    Abstract: Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, tr… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  17. PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion

    Authors: Yimin Deng, Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, **g Xiao

    Abstract: Voice conversion as the style transfer task applied to speech, refers to converting one person's speech into a new speech that sounds like another person's. Up to now, there has been a lot of research devoted to better implementation of VC tasks. However, a good voice conversion model should not only match the timbre information of the target speaker, but also expressive information such as prosod… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted by the 31st ACM International Conference on Multimedia (MM2023)

  18. arXiv:2306.14228  [pdf, ps, other

    eess.SY eess.SP

    Task-Oriented Semantics-Aware Communication for Wireless UAV Control and Command Transmission

    Authors: Yujie Xu, Zhou Hui, Yansha Deng

    Abstract: To guarantee the safety and smooth control of Unmanned Aerial Vehicle (UAV) operation, the new control and command (C&C) data type imposes stringent quality of service (QoS) requirements on the cellular network. However, the existing bit-oriented communication framework is already approaching the Shannon capacity limit, which can hardly guarantee the ultra-reliable low latency communications (URLL… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

  19. arXiv:2306.12153  [pdf, other

    eess.IV cs.CV

    DIAS: A Dataset and Benchmark for Intracranial Artery Segmentation in DSA sequences

    Authors: Wentao Liu, Tong Tian, Lemeng Wang, Wei** Xu, Lei Li, Haoyuan Li, Wenyi Zhao, Siyu Tian, Xipeng Pan, Huihua Yang, Feng Gao, Yiming Deng, Xin Yang, Ruisheng Su

    Abstract: The automated segmentation of Intracranial Arteries (IA) in Digital Subtraction Angiography (DSA) plays a crucial role in the quantification of vascular morphology, significantly contributing to computer-assisted stroke research and clinical practice. Current research primarily focuses on the segmentation of single-frame DSA using proprietary datasets. However, these methods face challenges due to… ▽ More

    Submitted 13 June, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

  20. arXiv:2306.04980  [pdf, other

    cs.CL cs.SD eess.AS

    Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models

    Authors: Zhiyi Wang, Shaoguang Mao, Wenshan Wu, Yan Xia, Yan Deng, Jonathan Tien

    Abstract: This work introduces approaches to assessing phrase breaks in ESL learners' speech using pre-trained language models (PLMs) and large language models (LLMs). There are two tasks: overall assessment of phrase break for a speech clip and fine-grained assessment of every possible phrase break position. To leverage NLP models, speech input is first force-aligned with texts, and then pre-processed into… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: Accepted by InterSpeech 2023. arXiv admin note: substantial text overlap with arXiv:2210.16029

  21. arXiv:2305.08000  [pdf, other

    cs.CV eess.IV

    DNN-Compressed Domain Visual Recognition with Feature Adaptation

    Authors: Yingpeng Deng, Lina J. Karam

    Abstract: Learning-based image compression was shown to achieve a competitive performance with state-of-the-art transform-based codecs. This motivated the development of new learning-based visual compression standards such as JPEG-AI. Of particular interest to these emerging standards is the development of learning-based image compression systems targeting both humans and machines. This paper is concerned w… ▽ More

    Submitted 26 July, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

  22. arXiv:2305.02269  [pdf, other

    cs.SD cs.CL eess.AS

    M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis

    Authors: **long Xue, Yayue Deng, Feng** Wang, Ya Li, Yingming Gao, Jianhua Tao, Jianqing Sun, Jiaen Liang

    Abstract: Conversational text-to-speech (TTS) aims to synthesize speech with proper prosody of reply based on the historical conversation. However, it is still a challenge to comprehensively model the conversation, and a majority of conversational TTS systems only focus on extracting global information and omit local prosody features, which contain important fine-grained information like keywords and emphas… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: 5 pages, 1 figures, 2 tables. Accepted by ICASSP 2023

  23. arXiv:2303.17949  [pdf, other

    cs.SD cs.LG eess.AS

    Unsupervised Anomaly Detection and Localization of Machine Audio: A GAN-based Approach

    Authors: Anbai Jiang, Wei-Qiang Zhang, Yufeng Deng, **yi Fan, Jia Liu

    Abstract: Automatic detection of machine anomaly remains challenging for machine learning. We believe the capability of generative adversarial network (GAN) suits the need of machine audio anomaly detection, yet rarely has this been investigated by previous work. In this paper, we propose AEGAN-AD, a totally unsupervised approach in which the generator (also an autoencoder) is trained to reconstruct input s… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  24. arXiv:2303.10398  [pdf, other

    cs.NI cs.LG cs.MA eess.SP

    Energy-Efficient Cellular-Connected UAV Swarm Control Optimization

    Authors: Yang Su, Hui Zhou, Yansha Deng, Mischa Dohler

    Abstract: Cellular-connected unmanned aerial vehicle (UAV) swarm is a promising solution for diverse applications, including cargo delivery and traffic control. However, it is still challenging to communicate with and control the UAV swarm with high reliability, low latency, and high energy efficiency. In this paper, we propose a two-phase command and control (C&C) transmission scheme in a cellular-connecte… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

  25. arXiv:2302.09332  [pdf, other

    eess.SP

    Incipient Fault Detection in Power Distribution System: A Time-Frequency Embedded Deep Learning Based Approach

    Authors: Qiyue Li, Huan Luo, Hong Cheng, Yuxing Deng, Wei Sun, Weitao Li, Zhi Liu

    Abstract: Incipient fault detection in power distribution systems is crucial to improve the reliability of the grid. However, the non-stationary nature and the inadequacy of the training dataset due to the self-recovery of the incipient fault signal, make the incipient fault detection in power distribution systems a great challenge. In this paper, we focus on incipient fault detection in power distribution… ▽ More

    Submitted 18 February, 2023; originally announced February 2023.

    Comments: 15 pages

  26. arXiv:2211.05295  [pdf, other

    cs.CV cs.LG eess.IV

    Harmonizing output imbalance for defect segmentation on extremely-imbalanced photovoltaic module cells images

    Authors: Jianye Yi, Xiaopin Zhong, Weixiang Liu, Zongze Wu, Yuanlong Deng, Zhengguang Wu

    Abstract: The continuous development of the photovoltaic (PV) industry has raised high requirements for the quality of monocrystalline of PV module cells. When learning to segment defect regions in PV module cell images, Tiny Hidden Cracks (THC) lead to extremely-imbalanced samples. The ratio of defect pixels to normal pixels can be as low as 1:2000. This extreme imbalance makes it difficult to segment the… ▽ More

    Submitted 24 October, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: 19 pages, 16 figures, 3 appendixes

  27. arXiv:2211.01676  [pdf, other

    cs.AI eess.SY

    Repeatable Random Permutation Set

    Authors: Wenran Yang, Yong Deng

    Abstract: Random permutation set (RPS), as a recently proposed theory, enables powerful information representation by traversing all possible permutations. However, the repetition of items is not allowed in RPS while it is quite common in real life. To address this issue, we propose repeatable random permutation set ($\rm R^2PS$) which takes the repetition of items into consideration. The right and left jun… ▽ More

    Submitted 4 November, 2022; v1 submitted 3 November, 2022; originally announced November 2022.

  28. arXiv:2210.17016  [pdf, other

    cs.SD eess.AS

    Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit

    Authors: Hongji Wang, Chengdong Liang, Shuai Wang, Zhengyang Chen, Binbin Zhang, Xu Xiang, Yanlei Deng, Yanmin Qian

    Abstract: Speaker modeling is essential for many related tasks, such as speaker recognition and speaker diarization. The dominant modeling approach is fixed-dimensional vector representation, i.e., speaker embedding. This paper introduces a research and production oriented speaker embedding learning toolkit, Wespeaker. Wespeaker contains the implementation of scalable data management, state-of-the-art speak… ▽ More

    Submitted 1 November, 2022; v1 submitted 30 October, 2022; originally announced October 2022.

  29. arXiv:2210.09372  [pdf, other

    eess.SY eess.SP

    Goal-Oriented Semantic Communications for 6G Networks

    Authors: Hui Zhou, Yansha Deng, Xiaonan Liu, Nikolaos Pappas, Arumugam Nallanathan

    Abstract: Upon the arrival of emerging devices, including Extended Reality (XR) and Unmanned Aerial Vehicles (UAVs), the traditional communication framework is approaching Shannon's physical capacity limit and fails to guarantee the massive amount of transmission within latency requirements. By jointly exploiting the context of data and its importance to the task, an emerging communication paradigm shift to… ▽ More

    Submitted 6 April, 2024; v1 submitted 17 October, 2022; originally announced October 2022.

  30. arXiv:2209.09411  [pdf, other

    eess.SY

    Shepherding Control for Separating a Single Agent from a Swarm

    Authors: Yaosheng Deng, Masaki Ogura, Aiyi Li, Naoki Wakamiya

    Abstract: In this paper, we consider the swarm-control problem of spatially separating a specified target agent within the swarm from all the other agents, while maintaining the connectivity among the other agents. We specifically aim to achieve the separation by designing the movement algorithm of an external agent, called a shepherd, which exerts repulsive forces on the agents in the swarm. This problem h… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: 6 pages, 6 figures

  31. arXiv:2207.00908  [pdf, other

    cs.NI eess.SY

    Interference Constrained Beam Alignment for Time-Varying Channels via Kernelized Bandits

    Authors: Yuntian Deng, Xingyu Zhou, Arnob Ghosh, Abhishek Gupta, Ness B. Shroff

    Abstract: To fully utilize the abundant spectrum resources in millimeter wave (mmWave), Beam Alignment (BA) is necessary for large antenna arrays to achieve large array gains. In practical dynamic wireless environments, channel modeling is challenging due to time-varying and multipath effects. In this paper, we formulate the beam alignment problem as a non-stationary online learning problem with the objecti… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

  32. arXiv:2206.14150  [pdf, other

    cs.DC eess.SY

    Autonomous Smart Grid Fault Detection

    Authors: Qiyue Li, Yuxing Deng, Xin Liu, Wei Sun, Weitao Li, Jie Li, Zhi Liu

    Abstract: Smart grid plays a crucial role for the smart society and the upcoming carbon neutral society. Achieving autonomous smart grid fault detection is critical for smart grid system state awareness, maintenance and operation. This paper focuses on fault monitoring in smart grid and discusses the inherent technical challenges and solutions. In particular, we first present the basic principles of smart g… ▽ More

    Submitted 27 May, 2022; originally announced June 2022.

  33. arXiv:2204.12426  [pdf, ps, other

    cs.LG eess.SY

    Time-triggered Federated Learning over Wireless Networks

    Authors: Xiaokang Zhou, Yansha Deng, Huiyun Xia, Shaochuan Wu, Mehdi Bennis

    Abstract: The newly emerging federated learning (FL) framework offers a new way to train machine learning models in a privacy-preserving manner. However, traditional FL algorithms are based on an event-triggered aggregation, which suffers from stragglers and communication overhead issues. To address these issues, in this paper, we present a time-triggered FL algorithm (TT-Fed) over wireless networks, which… ▽ More

    Submitted 2 May, 2022; v1 submitted 26 April, 2022; originally announced April 2022.

  34. arXiv:2204.10461  [pdf, other

    cs.CL cs.SD eess.AS

    WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment

    Authors: Lin Yao, Jianfei Song, Ruizhuo Xu, Yingfang Yang, Zijian Chen, Yafeng Deng

    Abstract: Historically lower-level tasks such as automatic speech recognition (ASR) and speaker identification are the main focus in the speech field. Interest has been growing in higher-level spoken language understanding (SLU) tasks recently, like sentiment analysis (SA). However, improving performances on SLU tasks remains a big challenge. Basically, there are two main methods for SLU tasks: (1) Two-stag… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

  35. arXiv:2204.08169  [pdf, ps, other

    cs.NI eess.SP

    Actions at the Edge: Jointly Optimizing the Resources in Multi-access Edge Computing

    Authors: Yiqin Deng, Xianhao Chen, Guangyu Zhu, Yuguang Fang, Zhigang Chen, Xiaoheng Deng

    Abstract: Multi-access edge computing (MEC) is an emerging paradigm that pushes resources for sensing, communications, computing, storage and intelligence (SCCSI) to the premises closer to the end users, i.e., the edge, so that they could leverage the nearby rich resources to improve their quality of experience (QoE). Due to the growing emerging applications targeting at intelligentizing life-sustaining cyb… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

    Comments: 7 pages, 2 figures, accepted by IEEE Wireless Communications

  36. arXiv:2203.10473  [pdf, other

    cs.SD cs.LG eess.AS

    ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis

    Authors: **long Xue, Yayue Deng, Yichen Han, Ya Li, Jianqing Sun, Jiaen Liang

    Abstract: In recent years, neural network based methods for multi-speaker text-to-speech synthesis (TTS) have made significant progress. However, the current speaker encoder models used in these methods still cannot capture enough speaker information. In this paper, we focus on accurate speaker encoder modeling and propose an end-to-end method that can generate high-quality speech and better similarity for… ▽ More

    Submitted 26 March, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

    Comments: 5 pages, 2 figures, submitted to interspeech2022

  37. arXiv:2203.03004  [pdf, other

    cs.IT eess.SP

    Low-Complexity Beamforming Design for IRS-Aided NOMA Communication System with Imperfect CSI

    Authors: Yasaman Omid, S. M. Mahdi Shahabi, Cunhua Pan, Yansha Deng, Arumugam Nallanathan

    Abstract: Intelligent reflecting surface (IRS) as a promising technology rendering high throughput in future communication systems is compatible with various communication techniques such as non-orthogonal multiple-access (NOMA). In this paper, the downlink transmission of IRS-assisted NOMA communication is considered while undergoing imperfect channel state information (CSI). Consequently, a robust IRS-aid… ▽ More

    Submitted 16 March, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

  38. arXiv:2203.02098  [pdf, other

    eess.IV cs.CV

    Universal Segmentation of 33 Anatomies

    Authors: Pengbo Liu, Yang Deng, Ce Wang, Yuan Hui, Qian Li, Jun Li, Shiwei Luo, Mengke Sun, Quan Quan, Shuxin Yang, You Hao, Honghu Xiao, Chunpeng Zhao, Xinbao Wu, S. Kevin Zhou

    Abstract: In the paper, we present an approach for learning a single model that universally segments 33 anatomical structures, including vertebrae, pelvic bones, and abdominal organs. Our model building has to address the following challenges. Firstly, while it is ideal to learn such a model from a large-scale, fully-annotated dataset, it is practically hard to curate such a dataset. Thus, we resort to lear… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

  39. arXiv:2112.09312  [pdf, other

    cs.SD cs.LG eess.AS

    MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

    Authors: Yusong Wu, Ethan Manilow, Yi Deng, Rigel Swavely, Kyle Kastner, Tim Cooijmans, Aaron Courville, Cheng-Zhi Anna Huang, Jesse Engel

    Abstract: Musical expression requires control of both what notes are played, and how they are performed. Conventional audio synthesizers provide detailed expressive controls, but at the cost of realism. Black-box neural audio synthesis and concatenative samplers can produce realistic audio, but have few mechanisms for control. In this work, we introduce MIDI-DDSP a hierarchical model of musical instruments… ▽ More

    Submitted 17 March, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted by International Conference on Learning Representations (ICLR) 2022

  40. arXiv:2111.09284  [pdf, other

    eess.SY

    Optimization of Grant-Free NOMA with Multiple Configured-Grants for mURLLC

    Authors: Yan Liu, Yansha Deng, Maged Elkashlan, Arumugam Nallanathan, George K. Karagiannidis

    Abstract: Massive Ultra-Reliable and Low-Latency Communications (mURLLC), which integrates URLLC with massive access, is emerging as a new and important service class in the next generation (6G) for time-sensitive traffics and has recently received tremendous research attention. However, realizing efficient, delay-bounded, and reliable communications for a massive number of user equipments (UEs) in mURLLC,… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

    Comments: 15 pages, 15 figures, submitted to IEEE JSAC SI on Next Generation Multiple Access. arXiv admin note: text overlap with arXiv:2101.00515

  41. arXiv:2111.00418  [pdf, other

    cs.HC cs.LG eess.SP

    Deep Learning in Human Activity Recognition with Wearable Sensors: A Review on Advances

    Authors: Shibo Zhang, Yaxuan Li, Shen Zhang, Farzad Shahabi, Stephen Xia, Yu Deng, Nabil Alshurafa

    Abstract: Mobile and wearable devices have enabled numerous applications, including activity tracking, wellness monitoring, and human--computer interaction, that measure and improve our daily lives. Many of these applications are made possible by leveraging the rich collection of low-power sensors found in many mobile and wearable devices to perform human activity recognition (HAR). Recently, deep learning… ▽ More

    Submitted 3 March, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

  42. arXiv:2108.00506  [pdf, other

    eess.SP

    Scalable Multi-agent Reinforcement Learning Algorithm for Wireless Networks

    Authors: Fenghe Hu, Yansha Deng, A. Hamid Aghvami

    Abstract: Scalability is the key roadstone towards the application of cooperative intelligent algorithms in large-scale networks. Reinforcement learning (RL) is known as model-free and high efficient intelligent algorithm for communication problems and proved useful in the communication network. However, when coming to large-scale networks with limited centralization, it is not possible to employ a centrali… ▽ More

    Submitted 4 November, 2021; v1 submitted 1 August, 2021; originally announced August 2021.

    Comments: 18 pages, 9 figures

  43. arXiv:2107.12943  [pdf, other

    eess.SP

    Learning-based Prediction, Rendering and Transmission for Interactive Virtual Reality in RIS-Assisted Terahertz Networks

    Authors: Xiaonan Liu, Yansha Deng, Chong Han, Marco Di Renzo

    Abstract: The quality of experience (QoE) requirements of wireless Virtual Reality (VR) can only be satisfied with high data rate, high reliability, and low VR interaction latency. This high data rate over short transmission distances may be achieved via abundant bandwidth in the terahertz (THz) band. However, THz waves suffer from severe signal attenuation, which may be compensated by the reconfigurable in… ▽ More

    Submitted 27 July, 2021; originally announced July 2021.

  44. arXiv:2106.04312  [pdf, other

    eess.AS cs.SD

    Speech BERT Embedding For Improving Prosody in Neural TTS

    Authors: Li** Chen, Yan Deng, Xi Wang, Frank K. Soong, Lei He

    Abstract: This paper presents a speech BERT model to extract embedded prosody information in speech segments for improving the prosody of synthesized speech in neural text-to-speech (TTS). As a pre-trained model, it can learn prosody attributes from a large amount of speech data, which can utilize more data than the original training data used by the target TTS. The embedding is extracted from the previous… ▽ More

    Submitted 14 September, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Journal ref: ICASSP 2021

  45. arXiv:2106.02800  [pdf, other

    eess.IV cs.CV cs.LG

    AOSLO-net: A deep learning-based method for automatic segmentation of retinal microaneurysms from adaptive optics scanning laser ophthalmoscope images

    Authors: Qian Zhang, Konstantina Sampani, Mengjia Xu, Shengze Cai, Yixiang Deng, He Li, Jennifer K. Sun, George Em Karniadakis

    Abstract: Microaneurysms (MAs) are one of the earliest signs of diabetic retinopathy (DR), a frequent complication of diabetes that can lead to visual impairment and blindness. Adaptive optics scanning laser ophthalmoscopy (AOSLO) provides real-time retinal images with resolution down to 2 $μm$ and thus allows detection of the morphologies of individual MAs, a potential marker that might dictate MA patholog… ▽ More

    Submitted 25 June, 2021; v1 submitted 5 June, 2021; originally announced June 2021.

  46. arXiv:2105.14711  [pdf, other

    eess.IV cs.CV

    CTSpine1K: A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography

    Authors: Yang Deng, Ce Wang, Yuan Hui, Qian Li, Jun Li, Shiwei Luo, Mengke Sun, Quan Quan, Shuxin Yang, You Hao, Pengbo Liu, Honghu Xiao, Chunpeng Zhao, Xinbao Wu, S. Kevin Zhou

    Abstract: Spine-related diseases have high morbidity and cause a huge burden of social cost. Spine imaging is an essential tool for noninvasively visualizing and assessing spinal pathology. Segmenting vertebrae in computed tomography (CT) images is the basis of quantitative medical image analysis for clinical diagnosis and surgery planning of spine diseases. Current publicly available annotated datasets on… ▽ More

    Submitted 5 July, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

  47. arXiv:2105.14576  [pdf, other

    cs.CV eess.IV

    StyTr$^2$: Image Style Transfer with Transformers

    Authors: Yingying Deng, Fan Tang, Weiming Dong, Chongyang Ma, Xingjia Pan, Lei Wang, Changsheng Xu

    Abstract: The goal of image style transfer is to render an image with artistic features guided by a style reference while maintaining the original content. Owing to the locality in convolutional neural networks (CNNs), extracting and maintaining the global information of input images is difficult. Therefore, traditional neural style transfer methods face biased content representation. To address this critic… ▽ More

    Submitted 1 April, 2022; v1 submitted 30 May, 2021; originally announced May 2021.

    Comments: Accepted by CVPR 2022

  48. arXiv:2104.03815  [pdf, other

    cs.CL cs.SD eess.AS

    Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation

    Authors: Fengpeng Yue, Yan Deng, Lei He, Tom Ko

    Abstract: Machine Speech Chain, which integrates both end-to-end (E2E) automatic speech recognition (ASR) and text-to-speech (TTS) into one circle for joint training, has been proven to be effective in data augmentation by leveraging large amounts of unpaired data. In this paper, we explore the TTS->ASR pipeline in speech chain to do domain adaptation for both neural TTS and E2E ASR models, with only text d… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

  49. arXiv:2103.10241  [pdf, ps, other

    cs.IT eess.SP

    Analyzing Uplink Grant-free Sparse Code Multiple Access System in Massive IoT Networks

    Authors: Ke Lai, **g Lei, Yansha Deng, Lei Wen, Gaojie Chen

    Abstract: Grant-free sparse code multiple access (GF-SCMA) is considered to be a promising multiple access candidate for future wireless networks. In this paper, we focus on characterizing the performance of uplink GF-SCMA schemes in a network with ubiquitous connections, such as the Internet of Things (IoT) networks. To provide a tractable approach to evaluate the performance of GF-SCMA, we first develop a… ▽ More

    Submitted 18 March, 2021; originally announced March 2021.

  50. arXiv:2102.10637  [pdf, other

    eess.SP

    QoE Optimization for Live Video Streaming in UAV-to-UAV Communications via Deep Reinforcement Learning

    Authors: Liyana Adilla binti Burhanuddin, Xiaonan Liu, Yansha Deng, Ursula Challita, Andras Zahemszky

    Abstract: A challenge for rescue teams when fighting against wildfire in remote areas is the lack of information, such as the size and images of fire areas. As such, live streaming from Unmanned Aerial Vehicles (UAVs), capturing videos of dynamic fire areas, is crucial for firefighter commanders in any location to monitor the fire situation with quick response. The 5G network is a promising wireless technol… ▽ More

    Submitted 21 February, 2021; originally announced February 2021.