Skip to main content

Showing 1–50 of 88 results for author: Cheng, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19072  [pdf, other

    eess.SP

    Scatterer Recognition from LiDAR Point Clouds for Environment-Embedded Vehicular Channel Modeling via Synesthesia of Machines

    Authors: Ziwei Huang, Lu Bai, Zengrui Han, Xiang Cheng

    Abstract: In this paper, a novel environment-embedded vehicular channel model is proposed by scatterer recognition from light detection and ranging (LiDAR) point clouds via Synesthesia of Machines (SoM). To provide a robust data foundation, a new intelligent sensing-communication integration dataset in vehicular urban scenarios is constructed. Based on the constructed dataset, the complex SoM mechanism, i.e… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.14440  [pdf, other

    eess.SP

    LLM4CP: Adapting Large Language Models for Channel Prediction

    Authors: Boxun Liu, Xuanyu Liu, Shijian Gao, Xiang Cheng, Liuqing Yang

    Abstract: Channel prediction is an effective approach for reducing the feedback or estimation overhead in massive multi-input multi-output (m-MIMO) systems. However, existing channel prediction methods lack precision due to model mismatch errors or network generalization issues. Large language models (LLMs) have demonstrated powerful modeling and generalization abilities, and have been successfully applied… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2406.01605  [pdf, other

    eess.IV cs.CV

    An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

    Authors: Zijun Gao, Qi Wang, Taiyuan Mei, Xiaohan Cheng, Yun Zi, Haowei Yang

    Abstract: The traditional SegNet architecture commonly encounters significant information loss during the sampling process, which detrimentally affects its accuracy in image semantic segmentation tasks. To counter this challenge, we introduce an innovative encoder-decoder network structure enhanced with residual connections. Our approach employs a multi-residual connection strategy designed to preserve the… ▽ More

    Submitted 26 May, 2024; originally announced June 2024.

  4. arXiv:2406.01205  [pdf, other

    eess.AS cs.LG cs.SD

    ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

    Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Siqi Zheng, Qian Chen, Wen Wang, Ziyue Jiang, Hai Huang, Xize Cheng, Rongjie Huang, Zhou Zhao

    Abstract: In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual style description prompt. Prior zero-shot TTS models and controllable TTS models either could only mimic the speaker's voice without further control and… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  5. arXiv:2406.00356  [pdf, other

    eess.AS cs.SD

    AudioLCM: Text-to-Audio Generation with Latent Consistency Models

    Authors: Huadai Liu, Rongjie Huang, Yang Liu, Hengyuan Cao, Jialei Wang, Xize Cheng, Siqi Zheng, Zhou Zhao

    Abstract: Recent advancements in Latent Diffusion Models (LDMs) have propelled them to the forefront of various generative tasks. However, their iterative sampling process poses a significant computational burden, resulting in slow generation speeds and limiting their application in text-to-audio generation deployment. In this work, we introduce AudioLCM, a novel consistency-based model tailored for efficie… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  6. arXiv:2405.19015  [pdf, other

    eess.SY cs.LG cs.MA

    Distributed Management of Fluctuating Energy Resources in Dynamic Networked Systems

    Authors: Xiaotong Cheng, Ioannis Tsetis, Setareh Maghsudi

    Abstract: Modern power systems integrate renewable distributed energy resources (DERs) as an environment-friendly enhancement to meet the ever-increasing demands. However, the inherent unreliability of renewable energy renders develo** DER management algorithms imperative. We study the energy-sharing problem in a system consisting of several DERs. Each agent harvests and distributes renewable energy in it… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  7. arXiv:2405.14347  [pdf, other

    eess.SP cs.AI

    Doubly-Dynamic ISAC Precoding for Vehicular Networks: A Constrained Deep Reinforcement Learning (CDRL) Approach

    Authors: Zonghui Yang, Shijian Gao, Xiang Cheng

    Abstract: Integrated sensing and communication (ISAC) technology is essential for enabling the vehicular networks. However, the communication channel in this scenario exhibits time-varying characteristics, and the potential targets may move rapidly, creating a doubly-dynamic phenomenon. This nature poses a challenge for real-time precoder design. While optimization-based solutions are widely researched, the… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  8. arXiv:2405.09778  [pdf, other

    eess.SP

    Beam Pattern Modulation Embedded Hybrid Transceiver Optimization for Integrated Sensing and Communication

    Authors: Boxun Liu, Shijian Gao, Zonghui Yang, Xiang Cheng, Liuqing Yang

    Abstract: Integrated sensing and communication (ISAC) emerges as a promising technology for B5G/6G, particularly in the millimeter-wave (mmWave) band. However, the widely utilized hybrid architecture in mmWave systems compromises multiplexing gain due to the constraints of limited radio frequency chains. Moreover, additional sensing functionalities exacerbate the impairment of spectrum efficiency (SE). In t… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  9. arXiv:2405.08306  [pdf, other

    math.OC eess.SY

    Flight Path Optimization with Optimal Control Method

    Authors: Gaofeng Su, Xi Cheng, Siyuan Feng, Ke Liu, Jilin Song, Jianan Chen, Chen Zhu, Hui Lin

    Abstract: This paper is based on a crucial issue in the aviation world: how to optimize the trajectory and controls given to the aircraft in order to optimize flight time and fuel consumption. This study aims to provide elements of a response to this problem and to define, under certain simplifying assumptions, an optimal response, using Constrained Finite Time Optimal Control(CFTOC). The first step is to d… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  10. arXiv:2404.16825  [pdf, other

    cs.CV eess.IV

    ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images

    Authors: Weiqi Li, Shijie Zhao, Bin Chen, Xinhua Cheng, Junlin Li, Li Zhang, Jian Zhang

    Abstract: With the advent of virtual reality technology, omnidirectional image (ODI) rescaling techniques are increasingly embraced for reducing transmitted and stored file sizes while preserving high image quality. Despite this progress, current ODI rescaling methods predominantly focus on enhancing the quality of images in equirectangular projection (ERP) format, which overlooks the fact that the content… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  11. arXiv:2404.09313  [pdf, other

    eess.AS cs.AI

    Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment

    Authors: Zhiqing Hong, Rongjie Huang, Xize Cheng, Yongqi Wang, Ruiqi Li, Fuming You, Zhou Zhao, Zhimeng Zhang

    Abstract: A song is a combination of singing voice and accompaniment. However, existing works focus on singing voice synthesis and music generation independently. Little attention was paid to explore song synthesis. In this work, we propose a novel task called text-to-song synthesis which incorporating both vocals and accompaniments generation. We develop Melodist, a two-stage text-to-song method that consi… ▽ More

    Submitted 20 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: ACL 2024 Main

  12. arXiv:2403.14185  [pdf, other

    eess.SP

    A LiDAR-Aided Channel Model for Vehicular Intelligent Sensing-Communication Integration

    Authors: Ziwei Huang, Lu Bai, Mingran Sun, Xiang Cheng

    Abstract: In this paper, a novel channel modeling approach, named light detection and ranging (LiDAR)-aided geometry-based stochastic modeling (LA-GBSM), is developed. Based on the developed LA-GBSM approach, a new millimeter wave (mmWave) channel model for sixth-generation (6G) vehicular intelligent sensing-communication integration is proposed, which can support the design of intelligent transportation sy… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  13. arXiv:2403.13314  [pdf, other

    eess.SP

    Superposed IM-OFDM (S-IM-OFDM): An Enhanced OFDM for Integrated Sensing and Communications

    Authors: Zonghui Yang, Shijian Gao, Xiang Cheng, Liuqing Yang

    Abstract: Integrated sensing and communications (ISAC) is a critical enabler for emerging 6G applications, and at its core lies in the dual-functional waveform design. While orthogonal frequency division multiplexing (OFDM) has been a popular basic waveform, its primitive version falls short in sensing due to the inherent unregulated auto-correlation properties. Furthermore, the sensitivity to Doppler shift… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  14. arXiv:2403.10629  [pdf, other

    cs.RO eess.SY

    Virtual Elastic Tether: a New Approach for Multi-agent Navigation in Confined Aquatic Environments

    Authors: Kanzhong Yao, Xueliang Cheng, Keir Groves, Barry Lennox, Ognjen Marjanovic, Simon Watson

    Abstract: Underwater navigation is a challenging area in the field of mobile robotics due to inherent constraints in self-localisation and communication in underwater environments. Some of these challenges can be mitigated by using collaborative multi-agent teams. However, when applied underwater, the robustness of traditional multi-agent collaborative control approaches is highly limited due to the unavail… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  15. arXiv:2403.10417  [pdf, other

    eess.SP

    Beam Pattern Modulation Embedded mmWave Hybrid Transceiver Design Towards ISAC

    Authors: Boxun Liu, Shijian Gao, Zonghui Yang, Xiang Cheng

    Abstract: Integrated Sensing and Communication (ISAC) emerges as a promising technology for B5G/6G, particularly in the millimeter-wave (mmWave) band. However, the widespread adoption of hybrid architecture in mmWave systems compromises multiplexing gain due to limited radio-frequency chains, resulting in mediocre performance when embedding sensing functionality. To avoid sacrificing the spectrum efficiency… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  16. arXiv:2403.07444  [pdf, other

    cs.NI eess.SP

    A Survey on Federated Learning in Intelligent Transportation Systems

    Authors: Rongqing Zhang, Hanqiu Wang, Bing Li, Xiang Cheng, Liuqing Yang

    Abstract: The development of Intelligent Transportation System (ITS) has brought about comprehensive urban traffic information that not only provides convenience to urban residents in their daily lives but also enhances the efficiency of urban road usage, leading to a more harmonious and sustainable urban life. Typical scenarios in ITS mainly include traffic flow prediction, traffic target recognition, and… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  17. arXiv:2402.09434  [pdf, other

    eess.SP cs.LG

    Disentangling Imperfect: A Wavelet-Infused Multilevel Heterogeneous Network for Human Activity Recognition in Flawed Wearable Sensor Data

    Authors: Mengna Liu, Dong Xiang, Xu Cheng, Xiufeng Liu, Dalin Zhang, Shengyong Chen, Christian S. Jensen

    Abstract: The popularity and diffusion of wearable devices provides new opportunities for sensor-based human activity recognition that leverages deep learning-based algorithms. Although impressive advances have been made, two major challenges remain. First, sensor data is often incomplete or noisy due to sensor placement and other issues as well as data transmission failure, calling for imputation of missin… ▽ More

    Submitted 26 January, 2024; originally announced February 2024.

    Comments: 14 pages, 7 figures

  18. arXiv:2402.03585  [pdf, other

    cs.CV eess.IV

    Decoder-Only Image Registration

    Authors: Xi Jia, Wenqi Lu, Xinxing Cheng, **ming Duan

    Abstract: In unsupervised medical image registration, the predominant approaches involve the utilization of a encoder-decoder network architecture, allowing for precise prediction of dense, full-resolution displacement fields from given paired images. Despite its widespread use in the literature, we argue for the necessity of making both the encoder and decoder learnable in such an architecture. For this, w… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  19. arXiv:2401.16712  [pdf, other

    cs.CV cs.RO eess.IV

    LF Tracy: A Unified Single-Pipeline Approach for Salient Object Detection in Light Field Cameras

    Authors: Fei Teng, Jiaming Zhang, Jiawei Liu, Kunyu Peng, Xina Cheng, Zhiyong Li, Kailun Yang

    Abstract: Leveraging the rich information extracted from light field (LF) cameras is instrumental for dense prediction tasks. However, adapting light field data to enhance Salient Object Detection (SOD) still follows the traditional RGB methods and remains under-explored in the community. Previous approaches predominantly employ a custom two-stream design to discover the implicit angular feature within ligh… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: The source code will be made publicly available at https://github.com/FeiBryantkit/LF-Tracy

  20. arXiv:2401.16700  [pdf, other

    cs.CV cs.RO eess.IV

    Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers

    Authors: Jianbin Jiao, Xina Cheng, Weijie Chen, Xiaoting Yin, Hao Shi, Kailun Yang

    Abstract: 3D human pose estimation captures the human joint points in three-dimensional space while kee** the depth information and physical structure. That is essential for applications that require precise pose information, such as human-computer interaction, scene understanding, and rehabilitation training. Due to the challenges in data collection, mainstream datasets of 3D human pose estimation are pr… ▽ More

    Submitted 25 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted to IJCNN 2024. The source code will be available at https://github.com/WUJINHUAN/3D-human-pose

  21. arXiv:2312.15197  [pdf, other

    cs.SD cs.CL cs.CV eess.AS

    TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation

    Authors: Xize Cheng, Rongjie Huang, Linjun Li, Tao **, Zehan Wang, Aoxiong Yin, Minglei Li, Xinyu Duan, changpeng yang, Zhou Zhao

    Abstract: Direct speech-to-speech translation achieves high-quality results through the introduction of discrete units obtained from self-supervised learning. This approach circumvents delays and cascading errors associated with model cascading. However, talking head translation, converting audio-visual speech (i.e., talking head video) from one language into another, still confronts several challenges comp… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

  22. arXiv:2312.01544  [pdf, other

    cs.LG cs.AI eess.SY

    KEEC: Embed to Control on An Equivariant Geometry

    Authors: Xiaoyuan Cheng, Yiming Yang, Wei Jiang, Yukun Hu

    Abstract: This paper investigates how representation learning can enable optimal control in unknown and complex dynamics, such as chaotic and non-linear systems, without relying on prior domain knowledge of the dynamics. The core idea is to establish an equivariant geometry that is diffeomorphic to the manifold defined by a dynamical system and to perform optimal control within this corresponding geometry,… ▽ More

    Submitted 10 December, 2023; v1 submitted 3 December, 2023; originally announced December 2023.

  23. arXiv:2312.00727  [pdf, other

    cs.LG cs.AI eess.SY

    Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space

    Authors: Xiaoyuan Cheng, Boli Chen, Liz Varga, Yukun Hu

    Abstract: This paper delves into the problem of safe reinforcement learning (RL) in a partially observable environment with the aim of achieving safe-reachability objectives. In traditional partially observable Markov decision processes (POMDP), ensuring safety typically involves estimating the belief in latent states. However, accurately estimating an optimal Bayesian filter in POMDP to infer latent states… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  24. arXiv:2312.00550  [pdf, ps, other

    eess.SP

    Novel 3D Geometry-Based Stochastic Models for Non-Isotropic MIMO Vehicle-to-Vehicle Channels

    Authors: Yi Yuan, Cheng-Xiang Wang, Xiang Cheng, Bo Ai, David I. Laurenson

    Abstract: This paper proposes a novel three-dimensional (3D) theoretical regular-shaped geometry-based stochastic model (RS-GBSM) and the corresponding sum-of-sinusoids (SoS) simulation model for non-isotropic multiple-input multiple-output (MIMO) vehicle-to-vehicle (V2V) Ricean fading channels. The proposed RS-GBSM, combining line-of-sight (LoS) components, a two-sphere model, and an elliptic-cylinder mode… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  25. arXiv:2311.14264  [pdf, ps, other

    eess.SP

    An ADMM-Based Geometric Configuration Optimization in RSSD-Based Source Localization By UAVs with Spread Angle Constraint

    Authors: Xin Cheng, Weiqiang Zhu, Feng Shu, Jiangzhou Wang

    Abstract: Deploying multiple unmanned aerial vehicles (UAVs) to locate a signal-emitting source covers a wide range of military and civilian applications like rescue and target tracking. It is well known that the UAVs-source (sensors-target) geometry, namely geometric configuration, significantly affects the final localization accuracy. This paper focuses on the geometric configuration optimization for rece… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  26. arXiv:2310.02561  [pdf, other

    eess.SP

    Integrated Sensing and Communications Towards Proactive Beamforming in mmWave V2I via Multi-Modal Feature Fusion (MMFF)

    Authors: Haotian Zhang, Shijian Gao, Xiang Cheng, Liuqing Yang

    Abstract: The future of vehicular communication networks relies on mmWave massive multi-input-multi-output antenna arrays for intensive data transfer and massive vehicle access. However, reliable vehicle-to-infrastructure links require exact alignment between the narrow beams, which traditionally involves excessive signaling overhead. To address this issue, we propose a novel proactive beamforming scheme th… ▽ More

    Submitted 26 March, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: 14 pages, 12 figures, 5 tables

  27. arXiv:2309.14341  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    Extreme Parkour with Legged Robots

    Authors: Xuxin Cheng, Kexin Shi, Ananye Agarwal, Deepak Pathak

    Abstract: Humans can perform parkour by traversing obstacles in a highly dynamic fashion requiring precise eye-muscle coordination and movement. Getting robots to do the same task requires overcoming similar challenges. Classically, this is done by independently engineering perception, actuation, and control systems to very low tolerances. This restricts them to tightly controlled settings such as a predete… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Website and videos at https://extreme-parkour.github.io/

  28. arXiv:2308.13381  [pdf, ps, other

    cs.IT eess.SP

    Deep Unfolding-Based Channel Estimation for Wideband TeraHertz Near-Field Massive MIMO Systems

    Authors: Jiabao Gao, Xiaoming Cheng, Geoffrey Ye Li

    Abstract: The combination of Terahertz (THz) and massive multiple-input multiple-output (MIMO) is promising to meet the increasing data rate demand of future wireless communication systems thanks to the huge bandwidth and spatial degrees of freedom. However, unique channel features such as the near-field beam split effect make channel estimation particularly challenging in THz massive MIMO systems. On one h… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  29. arXiv:2307.15374  [pdf

    eess.SY

    Leveraging Optical Communication Fiber and AI for Distributed Water Pipe Leak Detection

    Authors: Huan Wu, Huan-Feng Duan, Wallace W. L. Lai, Kun Zhu, Xin Cheng, Hao Yin, Bin Zhou, Chun-Cheung Lai, Chao Lu, Xiaoli Ding

    Abstract: Detecting leaks in water networks is a costly challenge. This article introduces a practical solution: the integration of optical network with water networks for efficient leak detection. Our approach uses a fiber-optic cable to measure vibrations, enabling accurate leak identification and localization by an intelligent algorithm. We also propose a method to access leak severity for prioritized re… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: Accepted

    Journal ref: IEEE Communications Magazine, 2023

  30. arXiv:2307.05362  [pdf, other

    eess.SP cs.LG

    SleepEGAN: A GAN-enhanced Ensemble Deep Learning Model for Imbalanced Classification of Sleep Stages

    Authors: Xuewei Cheng, Ke Huang, Yi Zou, Shujie Ma

    Abstract: Deep neural networks have played an important role in automatic sleep stage classification because of their strong representation and in-model feature transformation abilities. However, class imbalance and individual heterogeneity which typically exist in raw EEG signals of sleep data can significantly affect the classification performance of any machine learning algorithms. To solve these two pro… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: 20 pages, 6 figures

  31. arXiv:2307.00583  [pdf, other

    eess.IV cs.CV

    A region and category confidence-based multi-task network for carotid ultrasound image segmentation and classification

    Authors: Haitao Gan, Ran Zhou, Yanghan Ou, Furong Wang, Xinyao Cheng, Aaron Fenster

    Abstract: The segmentation and classification of carotid plaques in ultrasound images play important roles in the treatment of atherosclerosis and assessment for the risk of stroke. Although deep learning methods have been used for carotid plaque segmentation and classification, two-stage methods will increase the complexity of the overall analysis and the existing multi-task methods ignored the relationshi… ▽ More

    Submitted 18 November, 2023; v1 submitted 2 July, 2023; originally announced July 2023.

  32. arXiv:2306.14143  [pdf, other

    eess.SP

    Intelligent Multi-Modal Sensing-Communication Integration: Synesthesia of Machines

    Authors: Xiang Cheng, Haotian Zhang, Jianan Zhang, Shijian Gao, Sijiang Li, Ziwei Huang, Lu Bai, Zonghui Yang, Xinhu Zheng, Liuqing Yang

    Abstract: In the era of sixth-generation (6G) wireless communications, integrated sensing and communications (ISAC) is recognized as a promising solution to upgrade the physical system by endowing wireless communications with sensing capability. Existing ISAC is mainly oriented to static scenarios with radio-frequency (RF) sensors being the primary participants, thus lacking a comprehensive environment feat… ▽ More

    Submitted 20 November, 2023; v1 submitted 25 June, 2023; originally announced June 2023.

    Comments: This paper has been accepted by IEEE Communications Surveys & Tutorials

  33. arXiv:2306.14125  [pdf, other

    eess.SP

    M$^3$SC: A Generic Dataset for Mixed Multi-Modal (MMM) Sensing and Communication Integration

    Authors: Xiang Cheng, Ziwei Huang, Lu Bai, Haotian Zhang, Mingran Sun, Boxun Liu, Sijiang Li, Jianan Zhang, Minson Lee

    Abstract: The sixth generation (6G) of mobile communication system is witnessing a new paradigm shift, i.e., integrated sensing-communication system. A comprehensive dataset is a prerequisite for 6G integrated sensing-communication research. This paper develops a novel simulation dataset, named M3SC, for mixed multi-modal (MMM) sensing-communication integration, and the generation framework of the M3SC data… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

    Comments: 12 pages, 12 figures

  34. arXiv:2306.12042  [pdf, ps, other

    cs.DS eess.SP

    Block-Wise Index Modulation and Receiver Design for High-Mobility OTFS Communications

    Authors: Mi Qian, Fei Ji, Yao Ge, Miaowen Wen, Xiang Cheng, H. Vincent Poor

    Abstract: As a promising technique for high-mobility wireless communications, orthogonal time frequency space (OTFS) has been proved to enjoy excellent advantages with respect to traditional orthogonal frequency division multiplexing (OFDM). Although multiple studies have considered index modulation (IM) based OTFS (IM-OTFS) schemes to further improve system performance, a challenging and open problem is th… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: arXiv admin note: text overlap with arXiv:2210.13454

  35. arXiv:2306.02982  [pdf, other

    cs.CL eess.AS

    PolyVoice: Language Models for Speech to Speech Translation

    Authors: Qianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yu** Wang, Mingxuan Wang, Yuxuan Wang

    Abstract: We propose PolyVoice, a language model-based framework for speech-to-speech translation (S2ST) system. Our framework consists of two language models: a translation language model and a speech synthesis language model. We use discretized speech units, which are generated in a fully unsupervised way, and thus our framework can be used for unwritten languages. For the speech synthesis part, we adopt… ▽ More

    Submitted 13 June, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

  36. arXiv:2305.15403  [pdf, other

    cs.CL cs.SD eess.AS

    AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation

    Authors: Rongjie Huang, Huadai Liu, Xize Cheng, Yi Ren, Linjun Li, Zhenhui Ye, **zheng He, Lichao Zhang, **glin Liu, Xiang Yin, Zhou Zhao

    Abstract: Direct speech-to-speech translation (S2ST) aims to convert speech from one language into another, and has demonstrated significant progress to date. Despite the recent success, current S2ST models still suffer from distinct degradation in noisy environments and fail to translate visual speech (i.e., the movement of lips and teeth). In this work, we present AV-TranSpeech, the first audio-visual spe… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  37. arXiv:2305.14381  [pdf, other

    cs.LG cs.AI cs.CV cs.MM cs.SD eess.AS

    Connecting Multi-modal Contrastive Representations

    Authors: Zehan Wang, Yang Zhao, Xize Cheng, Haifeng Huang, Jiageng Liu, Li Tang, Linjun Li, Yongqi Wang, Aoxiong Yin, Ziang Zhang, Zhou Zhao

    Abstract: Multi-modal Contrastive Representation learning aims to encode different modalities into a semantically aligned shared space. This paradigm shows remarkable generalization ability on numerous downstream tasks across various modalities. However, the reliance on massive high-quality data pairs limits its further development on more modalities. This paper proposes a novel training-efficient method fo… ▽ More

    Submitted 18 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  38. arXiv:2305.12552  [pdf, other

    cs.CL cs.SD eess.AS

    Wav2SQL: Direct Generalizable Speech-To-SQL Parsing

    Authors: Huadai Liu, Rongjie Huang, **zheng He, Gang Sun, Ran Shen, Xize Cheng, Zhou Zhao

    Abstract: Speech-to-SQL (S2SQL) aims to convert spoken questions into SQL queries given relational databases, which has been traditionally implemented in a cascaded manner while facing the following challenges: 1) model training is faced with the major issue of data scarcity, where limited parallel data is available; and 2) the systems should be robust enough to handle diverse out-of-domain speech samples t… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

  39. Vehicle Sequencing at Signal-Free Intersections: Analytical Performance Guarantees Based on PDMP Formulation

    Authors: Xiangchen Cheng, Wei Tang, Ming Yang, Li **

    Abstract: Signal-free intersections are a representative application of smart and connected vehicle technologies. Although extensive results have been developed for trajectory planning and autonomous driving, the formulation and evaluation of vehicle sequencing have not been well understood.In this paper, we consider theoretical guarantees of macroscopic performance (i.e., capacity and delay) of typical seq… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

  40. arXiv:2303.11330  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    Legs as Manipulator: Pushing Quadrupedal Agility Beyond Locomotion

    Authors: Xuxin Cheng, Ashish Kumar, Deepak Pathak

    Abstract: Locomotion has seen dramatic progress for walking or running across challenging terrains. However, robotic quadrupeds are still far behind their biological counterparts, such as dogs, which display a variety of agile skills and can use the legs beyond locomotion to perform several basic manipulation tasks like interacting with objects and climbing. In this paper, we take a step towards bridging th… ▽ More

    Submitted 22 March, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: Accepted at ICRA 2023. Videos at https://robot-skills.github.io

  41. arXiv:2302.09328  [pdf, other

    cs.MM cs.SD eess.AS

    SSVMR: Saliency-based Self-training for Video-Music Retrieval

    Authors: Xuxin Cheng, Zhihong Zhu, Hongxiang Li, Yaowei Li, Yuexian Zou

    Abstract: With the rise of short videos, the demand for selecting appropriate background music (BGM) for a video has increased significantly, video-music retrieval (VMR) task gradually draws much attention by research community. As other cross-modal learning tasks, existing VMR approaches usually attempt to measure the similarity between the video and music in the feature space. However, they (1) neglect th… ▽ More

    Submitted 18 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP 2023

  42. arXiv:2302.09316  [pdf, other

    eess.SY

    Multi-timescale Trading Strategy for Renewable Power to Ammonia Virtual Power Plant in the Electricity, Hydrogen, and Ammonia Markets

    Authors: Sirui Wu, ** Lin, Jiarong Li, Feng Liu, Yonghua Song, Yanhui Xu, Xiang Cheng, Zhipeng Yu

    Abstract: Renewable power to ammonia (RePtA) is a prominent zero-carbon pathway for decarbonization. Due to the imbalance between renewables and production energy demand, the RePtA system relies on the electricity exchange with the power grid. Participating in the electricity market as a virtual power plant (VPP) may help to reduce energy costs. However, the power profile of local photovoltaics and wind tur… ▽ More

    Submitted 18 February, 2023; originally announced February 2023.

  43. arXiv:2302.08053  [pdf

    eess.SY eess.SP

    Selective Noise Suppression Methods Using Random SVPWM to Shape the Noise Spectrum of PMSMs

    Authors: Jian Wen, Xiaobin Cheng, Peifeng Ji, Jun Yang, Feng Zhao

    Abstract: Random pulse width modulation techniques are used in AC motors powered by two-level three-phase inverters, which cause a broadband spectrum of voltage, current, and electromagnetic force. The voltage distribution across a wide range of frequencies may increase the vibration and acoustic noise of motors. This study proposes two selective noise suppression (SNS) methods to eliminate voltage harmonic… ▽ More

    Submitted 6 June, 2024; v1 submitted 15 February, 2023; originally announced February 2023.

    Comments: 8 pages, 15 figures

  44. arXiv:2212.03657  [pdf, other

    cs.CL cs.SD eess.AS

    M3ST: Mix at Three Levels for Speech Translation

    Authors: Xuxin Cheng, Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Yuexian Zou

    Abstract: How to solve the data scarcity problem for end-to-end speech-to-text translation (ST)? It's well known that data augmentation is an efficient method to improve performance for many tasks by enlarging the dataset. In this paper, we propose Mix at three levels for Speech Translation (M^3ST) method to increase the diversity of the augmented training corpus. Specifically, we conduct two phases of fine… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: Submitted to ICASSP 2023

  45. AccEar: Accelerometer Acoustic Eavesdrop** with Unconstrained Vocabulary

    Authors: Pengfei Hu, Hui Zhuang, Panneer Selvam Santhalingamy, Riccardo Spolaor, Parth Pathaky, Guoming Zhang, Xiuzhen Cheng

    Abstract: With the increasing popularity of voice-based applications, acoustic eavesdrop** has become a serious threat to users' privacy. While on smartphones the access to microphones needs an explicit user permission, acoustic eavesdrop** attacks can rely on motion sensors (such as accelerometer and gyroscope), which access is unrestricted. However, previous instances of such attacks can only recogniz… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

    Comments: 2022 IEEE Symposium on Security and Privacy (SP)

    Journal ref: 2022 IEEE Symposium on Security and Privacy (SP)

  46. arXiv:2210.10044  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion

    Authors: Zipeng Fu, Xuxin Cheng, Deepak Pathak

    Abstract: An attached arm can significantly increase the applicability of legged robots to several mobile manipulation tasks that are not possible for the wheeled or tracked counterparts. The standard hierarchical control pipeline for such legged manipulators is to decouple the controller into that of manipulation and locomotion. However, this is ineffective. It requires immense engineering to support coord… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: CoRL 2022 (Oral). Project website at https://maniploco.github.io

  47. arXiv:2210.07721  [pdf, other

    cs.RO eess.SY

    Mechanical features based object recognition

    Authors: Pakorn Uttayopas, Xiaoxiao Cheng, Jonathan Eden, Etienne Burdet

    Abstract: Current robotic haptic object recognition relies on statistical measures derived from movement dependent interaction signals such as force, vibration or position. Mechanical properties that can be identified from these signals are intrinsic object properties that may yield a more robust object representation. Therefore, this paper proposes an object recognition framework using multiple representat… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: 9 pages, journal paper

  48. arXiv:2209.09018  [pdf, other

    eess.SP cs.LG stat.AP

    A Causal Intervention Scheme for Semantic Segmentation of Quasi-periodic Cardiovascular Signals

    Authors: Xingyao Wang, Yuwen Li, Hongxiang Gao, Xianghong Cheng, Jianqing Li, Chengyu Liu

    Abstract: Precise segmentation is a vital first step to analyze semantic information of cardiac cycle and capture anomaly with cardiovascular signals. However, in the field of deep semantic segmentation, inference is often unilaterally confounded by the individual attribute of data. Towards cardiovascular signals, quasi-periodicity is the essential characteristic to be learned, regarded as the synthesize of… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: submitted to IEEE Journal of Biomedical and Health Informatics (J-BHI)

  49. Toward 6G with Terahertz Communications: Understanding the Propagation Channels

    Authors: Xuesong Cai, Xiang Cheng, Fredrik Tufvesson

    Abstract: This article aims at providing insights for a comprehensive understanding of terahertz (THz) propagation channels. Specifically, we discuss essential THz channel characteristics to be well understood for the success of THz communications. The methodology of establishing realistic and 6G-compliant THz channel models based on measurements is then elaborated on, followed by a discussion on existing T… ▽ More

    Submitted 26 February, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

    Comments: The final version can be found in IEEE Communications Magazine

  50. arXiv:2208.01227  [pdf, ps, other

    eess.SP

    Optimal Measurement of Drone Swarm in RSS-based Passive Localization with Region Constraints

    Authors: Xin Cheng, Feng Shu, Yifan Li, Zhihong Zhuang, Di Wu, Jiangzhou Wang

    Abstract: Passive geolocation by multiple unmanned aerial vehicles (UAVs) covers a wide range of military and civilian applications including rescue, wild life tracking and electronic warfare. The sensor-target geometry is known to significantly affect the localization precision. The existing sensor placement strategies mainly work on the cases without any constraints on the sensors locations. However, UAVs… ▽ More

    Submitted 7 August, 2022; v1 submitted 1 August, 2022; originally announced August 2022.