Skip to main content

Showing 1–50 of 141 results for author: Chen, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00579  [pdf, ps, other

    cs.IT eess.SP

    Active-RIS-Aided Covert Communications in NOMA-Inspired ISAC Wireless Systems

    Authors: Miaomiao Zhu, Pengxu Chen, Liang Yang, Alexandros-Apostolos A. Boulogeorgos, Theodoros A. Tsiftsis, Hongwu Liu

    Abstract: Non-orthogonal multiple access (NOMA)-inspired integrated sensing and communication (ISAC) facilitates spectrum sharing for radar sensing and NOMA communications, whereas facing privacy and security challenges due to open wireless propagation. In this paper, active reconfigurable intelligent surface (RIS) is employed to aid covert communications in NOMA-inspired ISAC wireless system with the aim o… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  2. arXiv:2406.18862  [pdf, other

    cs.SD eess.AS

    Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study

    Authors: Peikun Chen, Sining Sun, Changhao Shan, Qing Yang, Lei Xie

    Abstract: Unified speech-text models like SpeechGPT, VioLA, and AudioPaLM have shown impressive performance across various speech-related tasks, especially in Automatic Speech Recognition (ASR). These models typically adopt a unified method to model discrete speech and text tokens, followed by training a decoder-only transformer. However, they are all designed for non-streaming ASR tasks, where the entire s… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted for Interspeech 2024

  3. arXiv:2406.06375  [pdf, other

    cs.SD cs.AI eess.AS

    MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

    Authors: Yu-Fen Huang, Nikki Moran, Simon Coleman, Jon Kelly, Shun-Hwa Wei, Po-Yin Chen, Yun-Hsin Huang, Tsung-** Chen, Yu-Chia Kuo, Yu-Chi Wei, Chih-Hsuan Li, Da-Yu Huang, Hsuan-Kai Kao, Ting-Wei Lin, Li Su

    Abstract: In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music m… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024. 14 pages, 7 figures. Dataset is available on: https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset/tree/main and https://zenodo.org/records/11393449

  4. arXiv:2406.02733  [pdf, other

    cs.CL cs.SD eess.AS

    Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation

    Authors: Min-Jae Hwang, Ilia Kulikov, Benjamin Peloquin, Hongyu Gong, Peng-Jen Chen, Ann Lee

    Abstract: In this paper, we propose a textless acoustic model with a self-supervised distillation strategy for noise-robust expressive speech-to-speech translation (S2ST). Recently proposed expressive S2ST systems have achieved impressive expressivity preservation performances by cascading unit-to-speech (U2S) generator to the speech-to-unit translation model. However, these systems are vulnerable to the pr… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 (findings)

  5. arXiv:2405.18669  [pdf, other

    cs.LG cs.AI cs.CL eess.AS

    Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

    Authors: Vicky Zayats, Peter Chen, Melissa Ferrari, Dirk Padfield

    Abstract: Integrating multiple generative foundation models, especially those trained on different modalities, into something greater than the sum of its parts poses significant challenges. Two key hurdles are the availability of aligned data (concepts that contain similar meaning but is expressed differently in different modalities), and effectively leveraging unimodal representations in cross-domain gener… ▽ More

    Submitted 31 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Under review at NeurIPS

  6. arXiv:2405.16677  [pdf, other

    eess.AS cs.CL cs.SD

    Crossmodal ASR Error Correction with Discrete Speech Units

    Authors: Yuanchao Li, Pinzhen Chen, Peter Bell, Catherine Lai

    Abstract: ASR remains unsatisfactory in scenarios where the speaking style diverges from that used to train ASR systems, resulting in erroneous transcripts. To address this, ASR Error Correction (AEC), a post-ASR processing approach, is required. In this work, we tackle an understudied issue: the Low-Resource Out-of-Domain (LROOD) problem, by investigating crossmodal AEC on very limited downstream data with… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  7. arXiv:2405.14161  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

    Authors: Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Chengwei Qin, Pin-Yu Chen, Eng Siong Chng, Chao Zhang

    Abstract: We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) systems in diverse target domains, such as noise and accents. STAR is developed for prevalent speech foundation models based on Transformer-related architecture with auto-regressive decoding (e.g., Whisper, Canary). Specifica… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 23 pages, Preprint

  8. arXiv:2405.07281  [pdf, ps, other

    eess.SP

    Movable Antennas Aided Multicast MISO Communication Systems

    Authors: Zhenqiao Cheng, Nanxi Li, Ruizhe Long, Jianchi Zhu, Chongjun Ouyang, Peng Chen

    Abstract: A novel multicast communication system with movable antennas (MAs) is proposed, where the antenna position optimization is exploited to enhance the transmission rate. Specifically, an MA-assisted two-user multicast multiple-input single-input system is considered. The joint optimization of the transmit beamforming vector and transmit MA positions is studied by modeling the motion of the MA element… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 5 pages

  9. arXiv:2404.18406  [pdf, ps, other

    cs.IT eess.SP

    Movable Antenna-Enhanced Wireless Powered Mobile Edge Computing Systems

    Authors: Pengcheng Chen, Yuxuan Yang, Bin Lyu, Zhen Yang, Abbas Jamalipour

    Abstract: In this paper, we propose a movable antenna (MA) enhanced scheme for wireless powered mobile edge computing (WP-MEC) system, where the hybrid access point (HAP) equipped with multiple MAs first emits wireless energy to charge wireless devices (WDs), and then receives the offloaded tasks from the WDs for edge computing. The MAs deployed at the HAP enhance the spatial degrees of freedom (DoFs) by fl… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 13 pages, 10 figures. Submitted for possible publication

  10. arXiv:2404.17400  [pdf, other

    cs.CV cs.AI eess.IV

    Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement

    Authors: Zishu Yao, Guodong Fan, **fu Fan, Min Gan, C. L. Philip Chen

    Abstract: Low-light remote sensing images generally feature high resolution and high spatial complexity, with continuously distributed surface features in space. This continuity in scenes leads to extensive long-range correlations in spatial domains within remote sensing images. Convolutional Neural Networks, which rely on local correlations for long-distance modeling, struggle to establish long-range corre… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 14 page

  11. arXiv:2404.16302  [pdf, other

    cs.CV cs.MM cs.RO eess.IV

    CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions

    Authors: Haoyuan Li, Qi Hu, You Yao, Kailun Yang, Peng Chen

    Abstract: Cross-modality images that integrate visible-infrared spectra cues can provide richer complementary information for object detection. Despite this, existing visible-infrared object detection methods severely degrade in severe weather conditions. This failure stems from the pronounced sensitivity of visible images to environmental perturbations, such as rain, haze, and snow, which frequently cause… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: The dataset and source code will be made publicly available at https://github.com/lhy-zjut/CFMW

  12. Low-Complexity Estimation Algorithm and Decoupling Scheme for FRaC System

    Authors: Mengjiang Sun, Peng Chen, Zhenxin Cao, Fei Shen

    Abstract: With the lea** advances in autonomous vehicles and transportation infrastructure, dual function radar-communication (DFRC) systems have become attractive due to the size, cost and resource efficiency. A frequency modulated continuous waveform (FMCW)-based radar-communication system (FRaC) utilizing both sparse multiple-input and multiple-output (MIMO) arrays and index modulation (IM) has been pr… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Journal ref: {IEEE Transactions on Intelligent Vehicles, 2024

  13. arXiv:2403.14978  [pdf, other

    cs.IT eess.SP

    Range-Angle Estimation for FDA-MIMO System With Frequency Offset

    Authors: Mengjiang Sun, Peng Chen, Zhenxin Cao

    Abstract: Frequency diverse array multiple-input multiple-output (FDA-MIMO) radar differs from the traditional phased array (PA) radar, and can form range-angle-dependent beampattern and differentiate between closely spaced targets sharing the same angle but occupying distinct range cells. In the FDA-MIMO radar, target range estimation is achieved by employing a subtle frequency variation between adjacent a… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Journal ref: IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024

  14. arXiv:2403.11737  [pdf, other

    cs.RO eess.SY

    SMT-Based Dynamic Multi-Robot Task Allocation

    Authors: Victoria Marie Tuck, Pei-Wei Chen, Georgios Fainekos, Bardh Hoxha, Hideki Okamoto, S. Shankar Sastry, Sanjit A. Seshia

    Abstract: Multi-Robot Task Allocation (MRTA) is a problem that arises in many application domains including package delivery, warehouse robotics, and healthcare. In this work, we consider the problem of MRTA for a dynamic stream of tasks with task deadlines and capacitated agents (capacity for more than one simultaneous task). Previous work commonly focuses on the static case, uses specialized algorithms fo… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 26 pages, 6 figures, to be published in NASA Formal Methods Symposium 2024

  15. arXiv:2403.02854  [pdf, ps, other

    eess.SP

    STAR-RIS Assisted Wireless-Powered and Backscattering Mobile Edge Computing Networks

    Authors: Bin Lyu, Yining Zhang, Pengcheng Chen, Ziwei Liu, Feng Tian

    Abstract: Wireless powered and backscattering mobile edge computing (WPB-MEC) network is a novel network paradigm to supply energy supplies and computing resource to wireless sensors (WSs). However, its performance is seriously affected by severe attenuations and inappropriate assumptions of infinite computing capability at the hybrid access point (HAP). To address the above issues, in this paper, we propos… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted by China Communications. 13 pages, 8 figures

  16. arXiv:2402.05457  [pdf, other

    cs.CL cs.AI cs.MM cs.SD eess.AS

    It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

    Authors: Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Ensiong Chng, Chao-Han Huck Yang

    Abstract: Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output. Specifically, an LLM is utilized to carry out a direct map** from the N-best hypotheses list generated by an ASR system to the predicted output transcription. However, despite its effectiveness, GER introd… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted to ICLR 2024, 17 pages. This work will be open sourced under MIT license

  17. arXiv:2402.02140  [pdf, other

    cs.CV eess.IV

    Generative Visual Compression: A Review

    Authors: Bolin Chen, Shanzhi Yin, Peilin Chen, Shiqi Wang, Yan Ye

    Abstract: Artificial Intelligence Generated Content (AIGC) is leading a new technical revolution for the acquisition of digital content and impelling the progress of visual compression towards competitive performance gains and diverse functionalities over traditional codecs. This paper provides a thorough review on the recent advances of generative visual compression, illustrating great potentials and promi… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  18. arXiv:2401.10446  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

    Authors: Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Chao Zhang, Pin-Yu Chen, EnSiong Chng

    Abstract: Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR), which leverages the rich linguistic knowledge and powerful reasoning ability of LLMs to improve recognition results. The latest work proposes a GER benchmark with HyPoradise dataset to learn the map** from ASR N-best hypotheses to ground-truth transcription by e… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR 2024, Spotlight top 5%, 24 pages. This work will be open sourced at: https://github.com/YUCHEN005/RobustGER under MIT license

  19. arXiv:2312.14018  [pdf, ps, other

    eess.SP

    Enabling Secure Wireless Communications via Movable Antennas

    Authors: Zhenqiao Cheng, Nanxi Li, Jianchi Zhu, Xiaoming She, Chongjun Ouyang, Peng Chen

    Abstract: A pioneering secure transmission scheme is proposed, which harnesses movable antennas (MAs) to optimize antenna positions for augmenting the physical layer security. Particularly, an MA-enabled secure wireless system is considered, where a multi-antenna transmitter communicates with a single-antenna receiver in the presence of an eavesdropper. The beamformer and antenna positions at the transmitte… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted by IEEE ICASSP 2024

  20. arXiv:2312.05187  [pdf, other

    cs.CL cs.SD eess.AS

    Seamless: Multilingual Expressive and Streaming Speech Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

    Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  21. arXiv:2312.01644  [pdf

    eess.IV cs.CV

    TMSR: Tiny Multi-path CNNs for Super Resolution

    Authors: Chia-Hung Liu, Tzu-Hsin Hsieh, Kuan-Yu Huang, Pei-Yin Chen

    Abstract: In this paper, we proposed a tiny multi-path CNN-based Super-Resolution (SR) method, called TMSR. We mainly refer to some tiny CNN-based SR methods, under 5k parameters. The main contribution of the proposed method is the improved multi-path learning and self-defined activated function. The experimental results show that TMSR obtains competitive image quality (i.e. PSNR and SSIM) compared to the r… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 5 pages, 7 figures, published in the IEEE Eurasia Conference on IoT, Communication and Engineering proceedings 2023

  22. arXiv:2311.16565  [pdf, other

    cs.CV cs.SD eess.AS

    DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser

    Authors: Peng Chen, Xiaobao Wei, Ming Lu, Yitong Zhu, Naiming Yao, Xingyu Xiao, Hui Chen

    Abstract: Speech-driven 3D facial animation has been an attractive task in both academia and industry. Traditional methods mostly focus on learning a deterministic map** from speech to animation. Recent approaches start to consider the non-deterministic fact of speech-driven 3D face animation and employ the diffusion model for the task. However, personalizing facial animation and accelerating animation ge… ▽ More

    Submitted 2 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

  23. arXiv:2310.13259  [pdf

    eess.IV cs.CV

    Domain-specific optimization and diverse evaluation of self-supervised models for histopathology

    Authors: Jeremy Lai, Faruk Ahmed, Supriya Vijay, Tiam Jaroensri, Jessica Loo, Saurabh Vyawahare, Saloni Agarwal, Fayaz Jamil, Yossi Matias, Greg S. Corrado, Dale R. Webster, Jonathan Krause, Yun Liu, Po-Hsuan Cameron Chen, Ellery Wulczyn, David F. Steiner

    Abstract: Task-specific deep learning models in histopathology offer promising opportunities for improving diagnosis, clinical research, and precision medicine. However, development of such models is often limited by availability of high-quality data. Foundation models in histopathology that learn general representations across a wide range of tissue types, diagnoses, and magnifications offer the potential… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 4 main tables, 3 main figures, additional supplemental tables and figures

  24. arXiv:2310.05051  [pdf, other

    cs.SD eess.AS

    SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation

    Authors: Yuanjun Lv, Jixun Yao, Peikun Chen, Hongbin Zhou, Heng Lu, Lei Xie

    Abstract: Speaker anonymization aims to conceal a speaker's identity without degrading speech quality and intelligibility. Most speaker anonymization systems disentangle the speaker representation from the original speech and achieve anonymization by averaging or modifying the speaker representation. However, the anonymized speech is subject to reduction in pseudo speaker distinctiveness, speech quality and… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: 8 pages, 3 figures; Accepted by ASRU2023

  25. arXiv:2310.04645  [pdf, other

    q-bio.NC cs.AI cs.CL eess.AS

    Do self-supervised speech and language models extract similar representations as human brain?

    Authors: Peili Chen, Linyang He, Li Fu, Lu Fan, Edward F. Chang, Yuanning Li

    Abstract: Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception. However, given their distinct training modalities, it remains unclear whether they correlate with the same neural aspects. We directly address this question by evaluating the brain prediction performance of two representative SSL models,… ▽ More

    Submitted 31 January, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: To appear in 2024 IEEE International Conference on Acoustics, Speech and Signal Processing

  26. arXiv:2310.02629  [pdf, other

    cs.SD eess.AS

    BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition

    Authors: Peikun Chen, Fan Yu, Yuhao Lian, Hongfei Xue, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie

    Abstract: Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these dr… ▽ More

    Submitted 7 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU2023

  27. arXiv:2309.16937  [pdf, other

    cs.CL cs.SD eess.AS

    SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition

    Authors: Hongfei Xue, Qijie Shao, Kaixun Huang, Peikun Chen, Jie Liu, Lei Xie

    Abstract: Multilingual automatic speech recognition (ASR) systems have garnered attention for their potential to extend language coverage globally. While self-supervised learning (SSL) models, like MMS, have demonstrated their effectiveness in multilingual ASR, it is worth noting that various layers' representations potentially contain distinct information that has not been fully leveraged. In this study, w… ▽ More

    Submitted 27 April, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: 5 pages, 2 figures. Accepted by ICME 2024

  28. arXiv:2309.15701  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models

    Authors: Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Sabato Macro Siniscalchi, Pin-Yu Chen, Eng Siong Chng

    Abstract: Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to attain human parity on several publicly available clean speech datasets. However, even state-of-the-art ASR systems experience performance degradation when confronted with adverse conditions, as a well-trained acoustic model is sensitive to variations in the speech domain, e.g., background noise. Intuit… ▽ More

    Submitted 16 October, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted to NeurIPS 2023, 24 pages. Datasets and Benchmarks Track. Added the first Mandarin and code-switching (zh-cn and en-us) results from the LLM-based generative ASR error correction to Table 8 on Page 21

  29. NoncovANM: Gridless DOA Estimation for LPDF System

    Authors: Yangying Zhao, Peng Chen, Zhenxin Cao, Xianbin Wang

    Abstract: Direction of arrival (DOA) estimation is an important research in the area of array signal processing, and has been studied for decades. High resolution DOA estimation requires large array aperture, which leads to the increase of hardware cost. Besides, high accuracy DOA estimation methods usually have high computational complexity. In this paper, the problem of decreasing the hardware cost and al… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: 11 pages, 8 figures

    Journal ref: IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023

  30. DNN-DANM: A High-Accuracy Two-Dimensional DOA Estimation Method Using Practical RIS

    Authors: Zhimin Chen, Peng Chen, Le Zheng, Yudong Zhang

    Abstract: Reconfigurable intelligent surface (RIS) or intelligent reflecting surface (IRS) has been an attractive technology for future wireless communication and sensing systems. However, in the practical RIS, the mutual coupling effect among RIS elements, the reflection phase shift, and amplitude errors will degrade the RIS performance significantly. This paper investigates the two-dimensional direction-o… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: 11 pages, 12 figures

    Journal ref: IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023

  31. arXiv:2309.12596  [pdf, ps, other

    eess.SP

    Movable Antenna-Empowered AirComp

    Authors: Zhenqiao Cheng, Nanxi Li, Jianchi Zhu, Xiaoming She, Chongjun Ouyang, Peng Chen

    Abstract: A novel over-the-air computation (AirComp) framework, empowered by the incorporation of movable antennas (MAs), is proposed to significantly enhance computation accuracy. Within this framework, the joint optimization of transmit power control, antenna positioning, and receive combining is investigated. An efficient method is proposed to tackle the problem of computation mean-squared error (MSE) mi… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  32. arXiv:2309.02259  [pdf, ps, other

    cs.IT eess.SP

    Design of a New CIM-DCSK-Based Ambient Backscatter Communication System

    Authors: Ruipeng Yang, Yi Fang, **** Chen, Huan Ma

    Abstract: To improve the data rate in differential chaos shift keying (DCSK) based ambient backscatter communication (AmBC) system, we propose a new AmBC system based on code index modulation (CIM), referred to as CIM-DCSK-AmBC system. In the proposed system, the CIM-DCSK signal transmitted in the direct link is used as the radio frequency source of the backscatter link. The signal format in the backscatter… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  33. arXiv:2309.01207  [pdf, other

    eess.IV cs.CV cs.LG

    Spectral Adversarial MixUp for Few-Shot Unsupervised Domain Adaptation

    Authors: Jia** Zhang, Hanqing Chao, Amit Dhurandhar, Pin-Yu Chen, Ali Tajer, Yangyang Xu, **kun Yan

    Abstract: Domain shift is a common problem in clinical applications, where the training images (source domain) and the test images (target domain) are under different distributions. Unsupervised Domain Adaptation (UDA) techniques have been proposed to adapt models trained in the source domain to the target domain. However, those methods require a large number of images from the target domain for model train… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: Accepted by MICCAI 2023

  34. arXiv:2307.14491  [pdf, other

    cs.MM cs.SD eess.AS

    A Unified Framework for Modality-Agnostic Deepfakes Detection

    Authors: Cai Yu, Peng Chen, Jiahe Tian, ** Liu, Jiao Dai, Xi Wang, Yesheng Chai, Shan Jia, Siwei Lyu, Jizhong Han

    Abstract: As AI-generated content (AIGC) thrives, deepfakes have expanded from single-modality falsification to cross-modal fake content creation, where either audio or visual components can be manipulated. While using two unimodal detectors can detect audio-visual deepfakes, cross-modal forgery clues could be overlooked. Existing multimodal deepfake detection methods typically establish correspondence betw… ▽ More

    Submitted 24 October, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  35. arXiv:2307.10757  [pdf, other

    cs.SD cs.CL eess.AS

    Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition

    Authors: Weidong Chen, Xiaofen Xing, Peihao Chen, Xiangmin Xu

    Abstract: This paper presents a paradigm that adapts general large-scale pretrained models (PTMs) to speech emotion recognition task. Although PTMs shed new light on artificial general intelligence, they are constructed with general tasks in mind, and thus, their efficacy for specific tasks can be further improved. Additionally, employing PTMs in practical applications can be challenging due to their consid… ▽ More

    Submitted 18 April, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: This paper was accepted by IEEE Transactions on Affective Computing 2024

  36. arXiv:2307.04630  [pdf, other

    cs.SD eess.AS

    The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task

    Authors: Kun Song, Yi lei, Peikun Chen, Yiqing Cao, Kun Wei, Yongmao Zhang, Lei Xie, Ning Jiang, Guoqing Zhao

    Abstract: This paper describes the NPU-MSXF system for the IWSLT 2023 speech-to-speech translation (S2ST) task which aims to translate from English speech of multi-source to Chinese speech. The system is built in a cascaded manner consisting of automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS). We make tremendous efforts to handle the challenging multi-source input. Spec… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: IWSLT@ACL 2023 system paper. Our submitted system ranks 1st in the S2ST task of the IWSLT 2023 evaluation campaign

  37. arXiv:2306.12925  [pdf, other

    cs.CL cs.AI cs.SD eess.AS stat.ML

    AudioPaLM: A Large Language Model That Can Speak and Listen

    Authors: Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats , et al. (5 additional authors not shown)

    Abstract: We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: Technical report

  38. TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition

    Authors: Hongfei Xue, Qijie Shao, Peikun Chen, Pengcheng Guo, Lei Xie, Jie Liu

    Abstract: UniSpeech has achieved superior performance in cross-lingual automatic speech recognition (ASR) by explicitly aligning latent representations to phoneme units using multi-task self-supervised learning. While the learned representations transfer well from high-resource to low-resource languages, predicting words directly from these phonetic representations in downstream ASR is challenging. In this… ▽ More

    Submitted 8 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: 5 pages, 3 figures. Accepted by INTERSPEECH 2023

  39. arXiv:2305.05203  [pdf, other

    cs.SD eess.AS

    Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing

    Authors: **gbei Li, Sipan Li, ** Chen, Luwen Zhang, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yu** Wang, Yuxuan Wang

    Abstract: Automatic dubbing, which generates a corresponding version of the input speech in another language, could be widely utilized in many real-world scenarios such as video and game localization. In addition to synthesizing the translated scripts, automatic dubbing needs to further transfer the speaking style in the original language to the dubbed speeches to give audiences the impression that the char… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: Submitted to TASLP

  40. arXiv:2305.03982  [pdf

    cs.SD cs.MM eess.AS

    Pitch Estimation by Denoising Preprocessor and Hybrid Estimation Model

    Authors: Yu Cheng Hung, ** Hung Chen, Jian Jiun Ding

    Abstract: Pitch estimation is to estimate the fundamental frequency and the midi number and plays a critical role in music signal analysis and vocal signal processing. In this work, we proposed a new architecture based on a learning-based enhancement preprocessor and a combination of several traditional and deep learning pitch estimation methods to achieve better pitch estimation performance in both noisy a… ▽ More

    Submitted 6 May, 2023; originally announced May 2023.

    Comments: From ICCE-Taiwan

  41. Geometric Prior Based Deep Human Point Cloud Geometry Compression

    Authors: Xinju Wu, **** Zhang, Meng Wang, Peilin Chen, Shiqi Wang, Sam Kwong

    Abstract: The emergence of digital avatars has raised an exponential increase in the demand for human point clouds with realistic and intricate details. The compression of such data becomes challenging with overwhelming data amounts comprising millions of points. Herein, we leverage the human geometric prior in geometry redundancy removal of point clouds, greatly promoting the compression performance. More… ▽ More

    Submitted 25 March, 2024; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: Accepted by TCSVT 2024

  42. arXiv:2304.13928  [pdf, ps, other

    eess.SP

    Cramer-Rao Lower Bound Analysis for OTFS and OFDM Modulation Systems

    Authors: Bowen Wang, Jianchi Zhu, Xiaoming She, Peng Chen

    Abstract: The orthogonal time frequency space (OTFS) modulation as a promising signal representation attracts growingcinterest for integrated sensing and communication (ISAC), yet its merits over orthogonal frequency division multiplexing (OFDM) remain controversial. This paper devotes to a comprehensive comparison of OTFS and OFDM for sensing from the perspective of Cramer-Rao lower bounds (CRLB) analysis.… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

  43. arXiv:2304.03104  [pdf, other

    cs.LG eess.SY

    Constrained Exploration in Reinforcement Learning with Optimality Preservation

    Authors: Peter C. Y. Chen

    Abstract: We consider a class of reinforcement-learning systems in which the agent follows a behavior policy to explore a discrete state-action space to find an optimal policy while adhering to some restriction on its behavior. Such restriction may prevent the agent from visiting some state-action pairs, possibly leading to the agent finding only a sub-optimal policy. To address this problem we introduce th… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: 33 pages, and 6 figures

  44. arXiv:2303.06341  [pdf, other

    eess.AS

    The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge

    Authors: Pengcheng Guo, He Wang, Bingshen Mu, Ao Zhang, Peikun Chen

    Abstract: This paper describes our NPU-ASLP system for the Audio-Visual Diarization and Recognition (AVDR) task in the Multi-modal Information based Speech Processing (MISP) 2022 Challenge. Specifically, the weighted prediction error (WPE) and guided source separation (GSS) techniques are used to reduce reverberation and generate clean signals for each single speaker first. Then, we explore the effectivenes… ▽ More

    Submitted 11 March, 2023; originally announced March 2023.

    Comments: 2 pages, accepted by ICASSP 2023

  45. arXiv:2302.12662  [pdf, other

    eess.IV cs.CV

    FedDBL: Communication and Data Efficient Federated Deep-Broad Learning for Histopathological Tissue Classification

    Authors: Tianpeng Deng, Yanqi Huang, Guoqiang Han, Zhenwei Shi, Jiatai Lin, Qi Dou, Zaiyi Liu, Xiao-**g Guo, C. L. Philip Chen, Chu Han

    Abstract: Histopathological tissue classification is a fundamental task in computational pathology. Deep learning-based models have achieved superior performance but centralized training with data centralization suffers from the privacy leakage problem. Federated learning (FL) can safeguard privacy by kee** training samples locally, but existing FL-based frameworks require a large number of well-annotated… ▽ More

    Submitted 17 December, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

  46. arXiv:2302.02922  [pdf, other

    cs.LG cs.AI eess.SP math.OC

    Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks

    Authors: Shuai Zhang, Meng Wang, Pin-Yu Chen, Sijia Liu, Songtao Lu, Miao Liu

    Abstract: Due to the significant computational challenge of training large-scale graph neural networks (GNNs), various sparse learning techniques have been exploited to reduce memory and storage costs. Examples include \textit{graph sparsification} that samples a subgraph to reduce the amount of data aggregation and \textit{model sparsification} that prunes the neural network to reduce the number of trainab… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Journal ref: The Eleventh International Conference on Learning Representations, 2023

  47. arXiv:2301.10606  [pdf, other

    cs.CL cs.SD eess.AS

    A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation

    Authors: Wen-Chin Huang, Benjamin Peloquin, Justine Kao, Changhan Wang, Hongyu Gong, Elizabeth Salesky, Yossi Adi, Ann Lee, Peng-Jen Chen

    Abstract: Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of source speech to target speech while maintaining translation accuracy. Existing research in expressive S2ST is limited, typically focusing on a single expressivity aspect at a time. Likewise, this research area lacks standard evaluation protocols and well-curated benchmark datasets. In this work, we propose a ho… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

    Comments: This is the full version of our submission to ICASSP 2023

  48. arXiv:2301.01915  [pdf, ps, other

    cs.IT eess.SP

    Sum-Rate Maximization in Active RIS-Assisted Multi-Antenna WPCN

    Authors: Jie Jiang, Bin Lyu, Pengcheng Chen, Zhen Yang

    Abstract: In this paper, we propose an active reconfigurable intelligent surface (RIS) enabled hybrid relaying scheme for a multi-antenna wireless powered communication network (WPCN), where the active RIS is employed to assist both wireless energy transfer (WET) from the power station (PS) to energy-constrained users and wireless information transmission (WIT) from users to the receiving station (RS). For… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

    Comments: Accepted by China Communications

  49. arXiv:2212.08055  [pdf, other

    cs.CL cs.SD eess.AS

    UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

    Authors: Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, Juan Pino

    Abstract: Direct speech-to-speech translation (S2ST), in which all components can be optimized jointly, is advantageous over cascaded approaches to achieve fast inference with a simplified pipeline. We present a novel two-pass direct S2ST architecture, UnitY, which first generates textual representations and predicts discrete acoustic units subsequently. We enhance the model performance by subword predictio… ▽ More

    Submitted 26 May, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: ACL 2023 (main conference)

  50. arXiv:2212.04069  [pdf, other

    cs.LG eess.SY

    Reinforcement Learning for Resilient Power Grids

    Authors: Zhenting Zhao, Po-Yen Chen, Yucheng **

    Abstract: Traditional power grid systems have become obsolete under more frequent and extreme natural disasters. Reinforcement learning (RL) has been a promising solution for resilience given its successful history of power grid control. However, most power grid simulators and RL interfaces do not support simulation of power grid under large-scale blackouts or when the network is divided into sub-networks.… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: 7 pages, 3 figures, 6 tables