Skip to main content

Showing 1–50 of 189 results for author: Jiang, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00717  [pdf, other

    cs.LG cs.AI eess.SY

    Learning System Dynamics without Forgetting

    Authors: Xikun Zhang, Dong** Song, Yushan Jiang, Yixin Chen, Dacheng Tao

    Abstract: Predicting the trajectories of systems with unknown dynamics (\textit{i.e.} the governing rules) is crucial in various research fields, including physics and biology. This challenge has gathered significant attention from diverse communities. Most existing works focus on learning fixed system dynamics within one single system. However, real-world applications often involve multiple systems with di… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2406.18079  [pdf, other

    cs.CV eess.IV

    MFDNet: Multi-Frequency Deflare Network for Efficient Nighttime Flare Removal

    Authors: Yiguo Jiang, Xuhang Chen, Chi-Man Pun, Shuqiang Wang, Wei Feng

    Abstract: When light is scattered or reflected accidentally in the lens, flare artifacts may appear in the captured photos, affecting the photos' visual quality. The main challenge in flare removal is to eliminate various flare artifacts while preserving the original content of the image. To address this challenge, we propose a lightweight Multi-Frequency Deflare Network (MFDNet) based on the Laplacian Pyra… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by The Visual Computer journal

  3. arXiv:2406.15160  [pdf, other

    eess.AS eess.SP

    Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios

    Authors: Ya Jiang, Qing Wang, Jun Du, Maocheng Hu, Pengfei Hu, Zeyan Liu, Shi Cheng, Zhaoxu Nian, Yuxuan Dong, Mingqi Cai, Xin Fang, Chin-Hui Lee

    Abstract: This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich c… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: accepted by icme2024

  4. arXiv:2406.09873  [pdf, other

    eess.AS cs.AI cs.SD

    Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition

    Authors: Yicong Jiang, Tianzi Wang, Xurong Xie, Juan Liu, Wei Sun, Nan Yan, Hui Chen, Lan Wang, Xunying Liu, Feng Tian

    Abstract: Disordered speech recognition profound implications for improving the quality of life for individuals afflicted with, for example, dysarthria. Dysarthric speech recognition encounters challenges including limited data, substantial dissimilarities between dysarthric and non-dysarthric speakers, and significant speaker variations stemming from the disorder. This paper introduces Perceiver-Prompt, a… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by interspeech 2024

  5. arXiv:2406.07198  [pdf, other

    eess.AS cs.MM

    Target Speech Diarization with Multimodal Prompts

    Authors: Yidi Jiang, Ruijie Tao, Zhengyang Chen, Yanmin Qian, Haizhou Li

    Abstract: Traditional speaker diarization seeks to detect ``who spoke when'' according to speaker characteristics. Extending to target speech diarization, we detect ``when target event occurs'' according to the semantic characteristics of speech. We propose a novel Multimodal Target Speech Diarization (MM-TSD) framework, which accommodates diverse and multi-modal prompts to specify target events in a flexib… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 13 pages, 7 figures

  6. arXiv:2406.05763  [pdf, other

    eess.AS

    WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark

    Authors: Linhan Ma, Dake Guo, Kun Song, Yuepeng Jiang, Shuai Wang, Liumeng Xue, Weiming Xu, Huan Zhao, Binbin Zhang, Lei Xie

    Abstract: With the development of large text-to-speech (TTS) models and scale-up of the training data, state-of-the-art TTS systems have achieved impressive performance. In this paper, we present WenetSpeech4TTS, a multi-domain Mandarin corpus derived from the open-sourced WenetSpeech dataset. Tailored for the text-to-speech tasks, we refined WenetSpeech by adjusting segment boundaries, enhancing the audio… ▽ More

    Submitted 19 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  7. arXiv:2406.05681  [pdf, other

    cs.SD eess.AS

    Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling

    Authors: Yuepeng Jiang, Tao Li, Fengyu Yang, Lei Xie, Meng Meng, Yujun Wang

    Abstract: Recent research in zero-shot speech synthesis has made significant progress in speaker similarity. However, current efforts focus on timbre generalization rather than prosody modeling, which results in limited naturalness and expressiveness. To address this, we introduce a novel speech synthesis model trained on large-scale datasets, including both timbre and hierarchical prosody modeling. As timb… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, accepted by Interspeech2024

  8. arXiv:2406.05647  [pdf, other

    eess.SP cs.ET

    Sustainable Wireless Networks via Reconfigurable Intelligent Surfaces (RISs): Overview of the ETSI ISG RIS

    Authors: Ruiqi Liu, Shuang Zheng, Qingqing Wu, Yifan Jiang, Nan Zhang, Yuanwei Liu, Marco Di Renzo, and George C. Alexandropoulos

    Abstract: Reconfigurable Intelligent Surfaces (RISs) are a novel form of ultra-low power devices that are capable to increase the communication data rates as well as the cell coverage in a cost- and energy-efficient way. This is attributed to their programmable operation that enables them to dynamically manipulate the wireless propagation environment, a feature that has lately inspired numerous research inv… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures, submitted to an IEEE Magazine

  9. arXiv:2406.03899  [pdf, other

    eess.AS eess.SP

    PLDNet: PLD-Guided Lightweight Deep Network Boosted by Efficient Attention for Handheld Dual-Microphone Speech Enhancement

    Authors: Nan Zhou, Youhai Jiang, Jialin Tan, Chongmin Qi

    Abstract: Low-complexity speech enhancement on mobile phones is crucial in the era of 5G. Thus, focusing on handheld mobile phone communication scenario, based on power level difference (PLD) algorithm and lightweight U-Net, we propose PLD-guided lightweight deep network (PLDNet), an extremely lightweight dual-microphone speech enhancement method that integrates the guidance of signal processing algorithm a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  10. arXiv:2405.19925  [pdf, other

    eess.SP

    Integrated Sensing and Communications Framework for 6G Networks

    Authors: Hongliang Luo, Tengyu Zhang, Chuanbin Zhao, Yucong Wang, Bo Lin, Yuhua Jiang, Dongqi Luo, Feifei Gao

    Abstract: In this paper, we propose a novel integrated sensing and communications (ISAC) framework for the sixth generation (6G) mobile networks, in which we decompose the real physical world into static environment, dynamic targets, and various object materials. The ubiquitous static environment occupies the vast majority of the physical world, for which we design static environment reconstruction (SER) sc… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  11. arXiv:2405.15863  [pdf, other

    cs.SD cs.AI eess.AS

    Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

    Authors: Chang Li, Ruoyu Wang, Lijuan Liu, Jun Du, Yixuan Sun, Zilu Guo, Zhenrong Zhang, Yuan Jiang

    Abstract: In recent years, diffusion-based text-to-music (TTM) generation has gained prominence, offering a novel approach to synthesizing musical content from textual descriptions. Achieving high accuracy and diversity in this generation process requires extensive, high-quality data, which often constitutes only a fraction of available datasets. Within open-source datasets, the prevalence of issues like mi… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  12. arXiv:2405.09446  [pdf, other

    eess.IV

    M$^4$oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts

    Authors: Yufeng Jiang, Yiqing Shen

    Abstract: Medical imaging data is inherently heterogeneous across different modalities and clinical centers, posing unique challenges for develo** generalizable foundation models. Conventional entails training distinct models per dataset or using a shared encoder with modality-specific decoders. However, these approaches incur heavy computational overheads and suffer from poor scalability. To address thes… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  13. arXiv:2405.08512  [pdf

    eess.SP

    CFM6, a closed-form NLI EGN model supporting multiband transmission with arbitrary Raman amplification

    Authors: Yanchao Jiang, Pierluigi Poggiolini

    Abstract: We formulated a closed-form EGN model for nonlinear interference in ultra-wideband optical systems with arbitrary Raman amplification. This model enhanced the CISCO-POLITO-CFM5 performance by introducing a novel contribution attributed to the backward Raman amplification. It can handle the frequency-dependent fiber parameters and inter-channel stimulated Raman scattering.

    Submitted 14 May, 2024; originally announced May 2024.

  14. arXiv:2405.06364  [pdf, other

    eess.SP

    Electromagnetic Property Sensing in ISAC with Multiple Base Stations: Algorithm, Pilot Design,and Performance Analysis

    Authors: Yuhua Jiang, Feifei Gao, Shi **, Tiejun Cui

    Abstract: Integrated sensing and communication (ISAC) has opened up numerous game-changing opportunities for future wireless systems. In this paper, we develop a novel scheme that utilizes orthogonal frequency division multiplexing (OFDM) pilot signals to sense the electromagnetic (EM) property of the target and thus identify the materials of the target. Specifically, we first establish an EM wave propagati… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  15. arXiv:2405.03665  [pdf, other

    eess.SP

    Distributed Estimation in Blockchain-aided Internet of Things in the Presence of Attacks

    Authors: Hamid Varmazyari, Yiming Jiang, Jiangfan Zhang

    Abstract: Distributed estimation in a blockchain-aided Internet of Things (BIoT) is considered, where the integrated blockchain secures data exchanges across the BIoT and the storage of data at BIoT agents. This paper focuses on develo** a performance guarantee for the distributed estimation in a BIoT in the presence of malicious attacks which jointly exploits vulnerabilities present in both IoT devices a… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 11 pages, 4 figures

  16. arXiv:2404.18501  [pdf, other

    eess.AS cs.SD

    Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention

    Authors: Ruijie Tao, Xinyuan Qian, Yidi Jiang, Junjie Li, Jiadong Wang, Haizhou Li

    Abstract: Audio-visual target speaker extraction (AV-TSE) aims to extract the specific person's speech from the audio mixture given auxiliary visual cues. Previous methods usually search for the target voice through speech-lip synchronization. However, this strategy mainly focuses on the existence of target speech, while ignoring the variations of the noise characteristics. That may result in extracting noi… ▽ More

    Submitted 8 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  17. arXiv:2404.16327  [pdf, other

    cs.IT eess.SP

    Generalized Step-Chirp Sequences With Flexible Bandwidth

    Authors: Cheng Du, Yi Jiang

    Abstract: Sequences with low aperiodic autocorrelation sidelobes have been extensively researched in literatures. With sufficiently low integrated sidelobe level (ISL), their power spectrums are asymptotically flat over the whole frequency domain. However, for the beam swee** in the massive multi-input multi-output (MIMO) broadcast channels, the flat spectrum should be constrained in a passband with tunab… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted by 2024 IEEE International Symposium on Information Theory

  18. arXiv:2404.09571  [pdf, other

    eess.IV cs.CV

    MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution

    Authors: Yuxuan Jiang, Chen Feng, Fan Zhang, David Bull

    Abstract: Knowledge distillation (KD) has emerged as a promising technique in deep learning, typically employed to enhance a compact student network through learning from their high-performance but more complex teacher variant. When applied in the context of image super-resolution, most KD approaches are modified versions of methods developed for other computer vision tasks, which are based on training stra… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  19. arXiv:2404.00863  [pdf, other

    eess.AS

    Voice Conversion Augmentation for Speaker Recognition on Defective Datasets

    Authors: Ruijie Tao, Zhan Shi, Yidi Jiang, Tianchi Liu, Haizhou Li

    Abstract: Modern speaker recognition system relies on abundant and balanced datasets for classification training. However, diverse defective datasets, such as partially-labelled, small-scale, and imbalanced datasets, are common in real-world applications. Previous works usually studied specific solutions for each scenario from the algorithm perspective. However, the root cause of these problems lies in data… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 5 pages

  20. arXiv:2403.16402  [pdf, other

    eess.SY

    A Distributionally Robust Model Predictive Control for Static and Dynamic Uncertainties in Smart Grids

    Authors: Qi Li, Ye Shi, Yuning Jiang, Yuanming Shi, Haoyu Wang, H. Vincent Poor

    Abstract: The integration of various power sources, including renewables and electric vehicles, into smart grids is expanding, introducing uncertainties that can result in issues like voltage imbalances, load fluctuations, and power losses. These challenges negatively impact the reliability and stability of online scheduling in smart grids. Existing research often addresses uncertainties affecting current s… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  21. arXiv:2402.16765  [pdf, other

    eess.SY

    Oscillations-Aware Frequency Security Assessment via Efficient Worst-Case Frequency Nadir Computation

    Authors: Yan Jiang, Hancheng Min, Baosen Zhang

    Abstract: Frequency security assessment following major disturbances has long been one of the central tasks in power system operations. The standard approach is to study the center of inertia frequency, an aggregate signal for an entire system, to avoid analyzing the frequency signal at individual buses. However, as the amount of low-inertia renewable resources in a grid increases, the center of inertia fre… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  22. arXiv:2402.11664  [pdf, other

    cs.LG eess.SP

    Interpretable Short-Term Load Forecasting via Multi-Scale Temporal Decomposition

    Authors: Yuqi Jiang, Yan Li, Yize Chen

    Abstract: Rapid progress in machine learning and deep learning has enabled a wide range of applications in the electricity load forecasting of power systems, for instance, univariate and multivariate short-term load forecasting. Though the strong capabilities of learning the non-linearity of the load patterns and the high prediction accuracy have been achieved, the interpretability of typical deep learning… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted to 23rd Power Systems Computation Conference (PSCC); cross referenced in Electric Power Systems Research

  23. arXiv:2402.09170  [pdf, other

    eess.SP

    Permittivity Estimation in Ray-tracing Using Path Loss Data based on GAMP

    Authors: Yuanhao Jiang, Shidong Zhou, Xiaofeng Zhong

    Abstract: In this paper, we propose a modified Generalized Approximate Message Passing (GAMP) algorithm to estimate permittivity parameters using path loss data in ray-tracing model.

    Submitted 14 February, 2024; originally announced February 2024.

  24. arXiv:2401.03726  [pdf, other

    eess.SP cs.IT eess.SY

    UAV-enabled Integrated Sensing and Communication: Tracking Design and Optimization

    Authors: Yifan Jiang, Qingqing Wu, Wen Chen, Kaitao Meng

    Abstract: Integrated sensing and communications (ISAC) enabled by unmanned aerial vehicles (UAVs) is a promising technology to facilitate target tracking applications. In contrast to conventional UAV-based ISAC system designs that mainly focus on estimating the target position, the target velocity estimation also needs to be considered due to its crucial impacts on link maintenance and real-time response, w… ▽ More

    Submitted 16 April, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: 3 figures, 5 pages, Accepted by IEEE Communications Letters

  25. arXiv:2401.02961  [pdf, other

    cs.LG cs.CV eess.IV physics.optics

    A Surrogate-Assisted Extended Generative Adversarial Network for Parameter Optimization in Free-Form Metasurface Design

    Authors: Manna Dai, Yang Jiang, Feng Yang, Joyjit Chattoraj, Yingzhi Xia, Xinxing Xu, Weijiang Zhao, My Ha Dao, Yong Liu

    Abstract: Metasurfaces have widespread applications in fifth-generation (5G) microwave communication. Among the metasurface family, free-form metasurfaces excel in achieving intricate spectral responses compared to regular-shape counterparts. However, conventional numerical methods for free-form metasurfaces are time-consuming and demand specialized expertise. Alternatively, recent studies demonstrate that… ▽ More

    Submitted 18 October, 2023; originally announced January 2024.

  26. arXiv:2401.00523  [pdf, other

    eess.IV cs.CV

    Compressing Deep Image Super-resolution Models

    Authors: Yuxuan Jiang, Jakub Nawala, Fan Zhang, David Bull

    Abstract: Deep learning techniques have been applied in the context of image super-resolution (SR), achieving remarkable advances in terms of reconstruction performance. Existing techniques typically employ highly complex model structures which result in large model sizes and slow inference speeds. This often leads to high energy consumption and restricts their adoption for practical applications. To addres… ▽ More

    Submitted 21 February, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

  27. arXiv:2312.16428  [pdf, other

    eess.SP

    Electromagnetic Property Sensing: A New Paradigm of Integrated Sensing and Communication

    Authors: Yuhua Jiang, Feifei Gao, Shi **

    Abstract: Integrated sensing and communication (ISAC) has opened up numerous game-changing opportunities for future wireless systems. In this paper, we develop a novel scheme that utilizes orthogonal frequency division multiplexing (OFDM) pilot signals in ISAC systems to sense the electromagnetic (EM) property of the target and thus also identify the material of the target. Specifically, we first establish… ▽ More

    Submitted 23 January, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

  28. arXiv:2312.16002  [pdf, other

    eess.AS cs.AI

    The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge

    Authors: Meng Ge, Yizhou Peng, Yidi Jiang, **gru Lin, Junyi Ao, Mehmet Sinan Yildirim, Shuai Wang, Haizhou Li, Mengling Feng

    Abstract: This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition. Our submitted systems for ICMC-ASR Challenge include the multi-channel front-end enhancement and diarization, training data augmentation, speech recognition modeling with multi-channel branches. Tested on the offical Eval1 and Eval2 set, our best system achieves… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: Technical Report. 2 pages. For ICMC-ASR-2023 Challenge

  29. arXiv:2312.04418  [pdf, other

    cs.NI eess.SY

    MIST: An Efficient Approach for Software-Defined Multicast in Wireless Mesh Networks

    Authors: Rupei Xu, Yuming Jiang, Jason P. Jue

    Abstract: Multicasting is a vital information dissemination technique in Software-Defined Networking (SDN). With SDN, a multicast service can incorporate network functions implemented at different nodes, which is referred to as software-defined multicast. Emerging ubiquitous wireless networks for 5G and Beyond (B5G) inherently support multicast. However, the broadcast nature of wireless channels, especially… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  30. arXiv:2312.02498   

    eess.SY

    Provable Reinforcement Learning for Networked Control Systems with Stochastic Packet Disordering

    Authors: Wenqian Xue, Yi Jiang, Frank L. Lewis, Bosen Lian

    Abstract: This paper formulates a stochastic optimal control problem for linear networked control systems featuring stochastic packet disordering with a unique stabilizing solution certified. The problem is solved by proposing reinforcement learning algorithms. A measurement method is first presented to deal with PD and calculate the newest control input. The NCSs with stochastic PD are modeled as stochasti… ▽ More

    Submitted 11 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: This is a wrong version with problem setting and description errors in main sections

  31. arXiv:2310.14823  [pdf, other

    eess.AS eess.SP

    Prompt-driven Target Speech Diarization

    Authors: Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, Haizhou Li

    Abstract: We introduce a novel task named `target speech diarization', which seeks to determine `when target event occurred' within an audio signal. We devise a neural architecture called Prompt-driven Target Speech Diarization (PTSD), that works with diverse prompts that specify the target speech events of interest. We train and evaluate PTSD using sim2spk, sim3spk and sim4spk datasets, which are derived f… ▽ More

    Submitted 8 January, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted by ICASSP 2024

  32. arXiv:2310.11998  [pdf, other

    eess.SP

    One-Bit Byzantine-Tolerant Distributed Learning via Over-the-Air Computation

    Authors: Yuhan Yang, Youlong Wu, Yuning Jiang, Yuanming Shi

    Abstract: Distributed learning has become a promising computational parallelism paradigm that enables a wide scope of intelligent applications from the Internet of Things (IoT) to autonomous driving and the healthcare industry. This paper studies distributed learning in wireless data center networks, which contain a central edge server and multiple edge workers to collaboratively train a shared global model… ▽ More

    Submitted 18 October, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  33. arXiv:2310.02802  [pdf, other

    eess.AS

    VITS-Based Singing Voice Conversion Leveraging Whisper and multi-scale F0 Modeling

    Authors: Ziqian Ning, Yuepeng Jiang, Zhichao Wang, Bin Zhang, Lei Xie

    Abstract: This paper introduces the T23 team's system submitted to the Singing Voice Conversion Challenge 2023. Following the recognition-synthesis framework, our singing conversion model is based on VITS, incorporating four key modules: a prior encoder, a posterior encoder, a decoder, and a parallel bank of transposed convolutions (PBTC) module. We particularly leverage Whisper, a powerful pre-trained ASR… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  34. arXiv:2309.15496  [pdf, other

    eess.AS cs.SD

    DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion

    Authors: Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Shuai Wang, Jixun Yao, Lei Xie, Mengxiao Bi

    Abstract: Voice conversion is becoming increasingly popular, and a growing number of application scenarios require models with streaming inference capabilities. The recently proposed DualVC attempts to achieve this objective through streaming model architecture design and intra-model knowledge distillation along with hybrid predictive coding to compensate for the lack of future information. However, DualVC… ▽ More

    Submitted 18 January, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP2024

  35. arXiv:2309.13907  [pdf, other

    cs.SD eess.AS

    HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS

    Authors: Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie

    Abstract: Recent advances in text-to-speech, particularly those based on Graph Neural Networks (GNNs), have significantly improved the expressiveness of short-form synthetic speech. However, generating human-parity long-form speech with high dynamic prosodic variations is still challenging. To address this problem, we expand the capabilities of GNNs with a hierarchical prosody modeling approach, named HiGNN… ▽ More

    Submitted 6 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted by ASRU2023

  36. arXiv:2309.11161  [pdf, other

    cs.IT eess.SP

    Beamforming Design for RIS-Aided THz Wideband Communication Systems

    Authors: Yihang Jiang, Ziqin Zhou, Xiaoyang Li, Yi Gong

    Abstract: Benefiting from tens of GHz of bandwidth, terahertz (THz) communications has become a promising technology for future 6G networks. However, the conventional hybrid beamforming architecture based on frequency-independent phase-shifters is not able to cope with the beam split effect (BSE) in THz massive multiple-input multiple-output (MIMO) systems. Despite some work introducing the frequency-depend… ▽ More

    Submitted 21 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  37. arXiv:2309.07925  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023

    Authors: Haotian Wang, Yuxuan Xi, Hang Chen, Jun Du, Yan Song, Qing Wang, Hengshun Zhou, Chenxi Wang, Jiefeng Ma, Pengfei Hu, Ya Jiang, Shi Cheng, Jie Zhang, Yuzhe Weng

    Abstract: In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions. In our framework, deep features extracted from foundation models are used as robust acoustic and visual representations of raw video. Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion. Then, we introduce a joint decoding structure for e… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: 5 pages, 4 figures

    Journal ref: The 31st ACM International Conference on Multimedia (MM'23), 2023

  38. arXiv:2309.04444  [pdf, other

    eess.SY math.OC

    A Generalized Stop** Criterion for Real-Time MPC with Guaranteed Stability

    Authors: Kristína Fedorová, Yuning Jiang, Juraj Oravec, Colin N. Jones, Michal Kvasnica

    Abstract: Most of the real-time implementations of the stabilizing optimal control actions suffer from the necessity to provide high computational effort. This paper presents a cutting-edge approach for real-time evaluation of linear-quadratic model predictive control (MPC) that employs a novel generalized stop** criterion, achieving asymptotic stability in the presence of input constraints. The proposed… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  39. arXiv:2309.01797  [pdf, other

    cs.CV eess.IV

    Accuracy and Consistency of Space-based Vegetation Height Maps for Forest Dynamics in Alpine Terrain

    Authors: Yuchang Jiang, Marius Rüetschi, Vivien Sainte Fare Garnot, Mauro Marty, Konrad Schindler, Christian Ginzler, Jan D. Wegner

    Abstract: Monitoring and understanding forest dynamics is essential for environmental conservation and management. This is why the Swiss National Forest Inventory (NFI) provides countrywide vegetation height maps at a spatial resolution of 0.5 m. Its long update time of 6 years, however, limits the temporal analysis of forest dynamics. This can be improved by using spaceborne remote sensing and deep learnin… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

  40. arXiv:2308.14774  [pdf, other

    eess.AS cs.SD eess.SP q-bio.QM

    EEG-Derived Voice Signature for Attended Speaker Detection

    Authors: Hongxu Zhu, Siqi Cai, Yidi Jiang, Qiquan Zhang, Haizhou Li

    Abstract: \textit{Objective:} Conventional EEG-based auditory attention detection (AAD) is achieved by comparing the time-varying speech stimuli and the elicited EEG signals. However, in order to obtain reliable correlation values, these methods necessitate a long decision window, resulting in a long detection latency. Humans have a remarkable ability to recognize and follow a known speaker, regardless of t… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: 8 pages, 2 figures

  41. arXiv:2308.14178  [pdf, other

    math.OC eess.SY

    Data-Driven Robust Control Using Prediction Error Bounds Based on Perturbation Analysis

    Authors: Baiwei Guo, Yuning Jiang, Colin N. Jones, Giancarlo Ferrari-Trecate

    Abstract: For linear systems, many data-driven control methods rely on the behavioral framework, using historical data of the system to predict the future trajectories. However, measurement noise introduces errors in predictions. When the noise is bounded, we propose a method for designing historical experiments that enable the computation of an upper bound on the prediction error. This approach allows us t… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

  42. arXiv:2308.13298  [pdf, other

    cs.LG eess.SP stat.ML

    Federated Linear Bandit Learning via Over-the-Air Computation

    Authors: Jiali Wang, Yuning Jiang, Xin Liu, Ting Wang, Yuanming Shi

    Abstract: In this paper, we investigate federated contextual linear bandit learning within a wireless system that comprises a server and multiple devices. Each device interacts with the environment, selects an action based on the received reward, and sends model updates to the server. The primary objective is to minimize cumulative regret across all devices within a finite time horizon. To reduce the commun… ▽ More

    Submitted 28 August, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

  43. arXiv:2308.13282  [pdf, other

    eess.SY

    Advancing Distributed AC Optimal Power Flow for Integrated Transmission-Distribution Systems

    Authors: Xinliang Dai, Junyi Zhai, Yuning Jiang, Yi Guo, Colin N. Jones, Veit Hagenmeyer

    Abstract: This paper introduces a distributed operational solution for coordinating integrated transmission-distribution (ITD) systems regarding data privacy. To tackle the nonconvex challenges of AC optimal power flow (OPF) problems, our research proposes an enhanced version of the Augmented Lagrangian based Alternating Direction Inexact Newton method (ALADIN). This proposed framework incorporates a second… ▽ More

    Submitted 30 January, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

  44. arXiv:2308.08792  [pdf, other

    eess.SY cs.LG cs.MA

    Federated Reinforcement Learning for Electric Vehicles Charging Control on Distribution Networks

    Authors: Junkai Qian, Yuning Jiang, Xin Liu, Qing Wang, Ting Wang, Yuanming Shi, Wei Chen

    Abstract: With the growing popularity of electric vehicles (EVs), maintaining power grid stability has become a significant challenge. To address this issue, EV charging control strategies have been developed to manage the switch between vehicle-to-grid (V2G) and grid-to-vehicle (G2V) modes for EVs. In this context, multi-agent deep reinforcement learning (MADRL) has proven its effectiveness in EV charging… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

  45. arXiv:2307.06728  [pdf, other

    math.OC eess.SY

    Hypergraph-Based Fast Distributed AC Power Flow Optimization

    Authors: Xinliang Dai, Yingzhao Lian, Yuning Jiang, Colin N. Jones, Veit Hagenmeyer

    Abstract: This paper presents a novel distributed approach for solving AC power flow (PF) problems. The optimization problem is reformulated into a distributed form using a communication structure corresponding to a hypergraph, by which complex relationships between subgrids can be expressed as hyperedges. Then, a hypergraph-based distributed sequential quadratic programming (HDQ) approach is proposed to ha… ▽ More

    Submitted 14 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

  46. arXiv:2305.17777  [pdf, other

    eess.SY

    Structured Neural-PI Control with End-to-End Stability and Output Tracking Guarantees

    Authors: Wenqi Cui, Yan Jiang, Baosen Zhang, Yuanyuan Shi

    Abstract: We study the optimal control of multiple-input and multiple-output dynamical systems via the design of neural network-based controllers with stability and output tracking guarantees. While neural network-based nonlinear controllers have shown superior performance in various applications, their lack of provable guarantees has restricted their adoption in high-stake real-world applications. This pap… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: text overlap with arXiv:2206.00261

  47. arXiv:2305.12831  [pdf, other

    eess.AS cs.SD

    Target Active Speaker Detection with Audio-visual Cues

    Authors: Yidi Jiang, Ruijie Tao, Zexu Pan, Haizhou Li

    Abstract: In active speaker detection (ASD), we would like to detect whether an on-screen person is speaking based on audio-visual cues. Previous studies have primarily focused on modeling audio-visual synchronization cue, which depends on the video quality of the lip region of a speaker. In real-world applications, it is possible that we can also have the reference speech of the on-screen speaker. To benef… ▽ More

    Submitted 12 June, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to INTERSPEECH2023

  48. arXiv:2305.12425  [pdf, other

    eess.AS cs.SD

    DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding

    Authors: Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Jixun Yao, Shuai Wang, Lei Xie, Mengxiao Bi

    Abstract: Voice conversion is an increasingly popular technology, and the growing number of real-time applications requires models with streaming conversion capabilities. Unlike typical (non-streaming) voice conversion, which can leverage the entire utterance as full context, streaming voice conversion faces significant challenges due to the missing future information, resulting in degraded intelligibility,… ▽ More

    Submitted 30 May, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

  49. arXiv:2305.10146  [pdf, other

    cs.CV eess.IV

    CS-PCN: Context-Space Progressive Collaborative Network for Image Denoising

    Authors: Yuqi Jiang, Chune Zhang, Jiao Liu

    Abstract: Currently, image-denoising methods based on deep learning cannot adequately reconcile contextual semantic information and spatial details. To take these information optimizations into consideration, in this paper, we propose a Context-Space Progressive Collaborative Network (CS-PCN) for image denoising. CS-PCN is a multi-stage hierarchical architecture composed of a context mining siamese sub-netw… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: International Conference on Multimedia and Expo

  50. arXiv:2305.09002  [pdf, other

    eess.SY

    Equilibria of Fully Decentralized Learning in Networked Systems

    Authors: Yan Jiang, Wenqi Cui, Baosen Zhang, Jorge Cortés

    Abstract: Existing settings of decentralized learning either require players to have full information or the system to have certain special structure that may be hard to check and hinder their applicability to practical systems. To overcome this, we identify a structure that is simple to check for linear dynamical system, where each player learns in a fully decentralized fashion to minimize its cost. We fir… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.