Skip to main content

Showing 1–31 of 31 results for author: Deng, C

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.09444  [pdf, other

    eess.AS cs.CL cs.SD

    GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model

    Authors: Yingying Gao, Shilei Zhang, Chao Deng, Junlan Feng

    Abstract: Pre-trained speech language models such as HuBERT and WavLM leverage unlabeled speech data for self-supervised learning and offer powerful representations for numerous downstream tasks. Despite the success of these models, their high requirements for memory and computing resource hinder their application on resource restricted devices. Therefore, this paper introduces GenDistiller, a novel knowled… ▽ More

    Submitted 21 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.13418

  2. arXiv:2406.07801  [pdf, other

    cs.CL cs.SD eess.AS

    PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models

    Authors: Runyan Yang, Huibao Yang, Xiqing Zhang, Tiantian Ye, Ying Liu, Yingying Gao, Shilei Zhang, Chao Deng, Junlan Feng

    Abstract: Recently, there have been attempts to integrate various speech processing tasks into a unified model. However, few previous works directly demonstrated that joint optimization of diverse tasks in multitask speech models has positive influence on the performance of individual tasks. In this paper we present a multitask speech model -- PolySpeech, which supports speech recognition, speech synthesis,… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures

  3. arXiv:2405.10463  [pdf, other

    physics.optics eess.IV physics.bio-ph

    Single-shot volumetric fluorescence imaging with neural fields

    Authors: Oumeng Zhang, Haowen Zhou, Brandon Y. Feng, Elin M. Larsson, Reinaldo E. Alcalde, Siyuan Yin, Catherine Deng, Changhuei Yang

    Abstract: Single-shot volumetric fluorescence (SVF) imaging offers a significant advantage over traditional imaging methods that require scanning across multiple axial planes as it can capture biological processes with high temporal resolution across a large field of view. The key challenges in SVF imaging include requiring sparsity constraints to meet the multiplexing requirements of compressed sensing, el… ▽ More

    Submitted 4 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  4. arXiv:2402.12746  [pdf, ps, other

    eess.AS cs.SD

    Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network

    Authors: Yanan Chen, Zihao Cui, Yingying Gao, Junlan Feng, Chao Deng, Shilei Zhang

    Abstract: The expectation to deploy a universal neural network for speech enhancement, with the aim of improving noise robustness across diverse speech processing tasks, faces challenges due to the existing lack of awareness within static speech enhancement frameworks regarding the expected speech in downstream modules. These limitations impede the effectiveness of static speech enhancement approaches in ac… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  5. arXiv:2401.14421  [pdf, other

    cs.LG cs.MA eess.SY stat.ML

    Multi-Agent Based Transfer Learning for Data-Driven Air Traffic Applications

    Authors: Chuhao Deng, Hong-Cheol Choi, Hyunsang Park, Inseok Hwang

    Abstract: Research in develo** data-driven models for Air Traffic Management (ATM) has gained a tremendous interest in recent years. However, data-driven models are known to have long training time and require large datasets to achieve good performance. To address the two issues, this paper proposes a Multi-Agent Bidirectional Encoder Representations from Transformers (MA-BERT) model that fully considers… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 12 pages, 8 figures, submitted for IEEE Transactions on Intelligent Transportation System

  6. arXiv:2311.04534  [pdf, other

    cs.CL cs.SD eess.AS

    Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR

    Authors: Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang

    Abstract: Recently, unified speech-text models, such as SpeechGPT, VioLA, and AudioPaLM, have achieved remarkable performance on various speech tasks. These models discretize speech signals into tokens (speech discretization) and use a shared vocabulary for both text and speech tokens. Then they train a single decoder-only Transformer on a mixture of speech tasks. However, these models rely on the Loss Mask… ▽ More

    Submitted 4 February, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: 5 pages, accepted by ICASSP 2024

  7. arXiv:2310.17664  [pdf, other

    cs.LG eess.AS eess.SP

    Cascaded Multi-task Adaptive Learning Based on Neural Architecture Search

    Authors: Yingying Gao, Shilei Zhang, Zihao Cui, Chao Deng, Junlan Feng

    Abstract: Cascading multiple pre-trained models is an effective way to compose an end-to-end system. However, fine-tuning the full cascaded model is parameter and memory inefficient and our observations reveal that only applying adapter modules on cascaded model can not achieve considerable performance as fine-tuning. We propose an automatic and effective adaptive learning method to optimize end-to-end casc… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  8. arXiv:2310.13418  [pdf, other

    eess.AS eess.SP

    GenDistiller: Distilling Pre-trained Language Models based on Generative Models

    Authors: Yingying Gao, Shilei Zhang, Zihao Cui, Yanhan Xu, Chao Deng, Junlan Feng

    Abstract: Self-supervised pre-trained models such as HuBERT and WavLM leverage unlabeled speech data for representation learning and offer significantly improve for numerous downstream tasks. Despite the success of these methods, their large memory and strong computational requirements hinder their application on resource restricted devices. Therefore, this paper introduces GenDistiller, a novel knowledge d… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  9. arXiv:2305.10821  [pdf, other

    eess.AS

    Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation

    Authors: Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang

    Abstract: Recently, stunning improvements on multi-channel speech separation have been achieved by neural beamformers when direction information is available. However, most of them neglect to utilize speaker's 2-dimensional (2D) location cues contained in mixture signal, which limits the performance when two sources come from close directions. In this paper, we propose an end-to-end beamforming network for… ▽ More

    Submitted 2 June, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2212.03401

  10. arXiv:2303.13932  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG)

    Authors: Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen, Wen Wang, Zhijie Yan, **glin Liu, Yi Ren, Zhou Zhao

    Abstract: ICASSP2023 General Meeting Understanding and Generation Challenge (MUG) focuses on prompting a wide range of spoken language processing (SLP) research on meeting transcripts, as SLP applications are critical to improve users' efficiency in gras** important information in meetings. MUG includes five tracks, including topic segmentation, topic-level and session-level extractive summarization, topi… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: Paper accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes, Greece

  11. arXiv:2303.00952  [pdf, other

    cs.CV cs.RO eess.IV

    Towards Activated Muscle Group Estimation in the Wild

    Authors: Kunyu Peng, David Schneider, Alina Roitberg, Kailun Yang, Jiaming Zhang, Chen Deng, Kaiyu Zhang, M. Saquib Sarfraz, Rainer Stiefelhagen

    Abstract: In this paper, we tackle the new task of video-based Activated Muscle Group Estimation (AMGE) aiming at identifying active muscle regions during physical activity in the wild. To this intent, we provide the MuscleMap dataset featuring >15K video clips with 135 different activities and 20 labeled muscle groups. This dataset opens the vistas to multiple video-based applications in sports and rehabil… ▽ More

    Submitted 27 April, 2024; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: The contributed dataset and code will be publicly available at https://github.com/KPeng9510/MuscleMap

  12. arXiv:2212.03401  [pdf, other

    eess.AS cs.LG cs.SD

    MIMO-DBnet: Multi-channel Input and Multiple Outputs DOA-aware Beamforming Network for Speech Separation

    Authors: Yanjie Fu, Haoran Yin, Meng Ge, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang

    Abstract: Recently, many deep learning based beamformers have been proposed for multi-channel speech separation. Nevertheless, most of them rely on extra cues known in advance, such as speaker feature, face image or directional information. In this paper, we propose an end-to-end beamforming network for direction guided speech separation given merely the mixture signal, namely MIMO-DBnet. Specifically, we d… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Submitted to ICASSP 2023

  13. arXiv:2211.00206  [pdf

    eess.SY

    A Primary Frequency Control Strategy for Variable-Speed Pumped-Storage Plant in Power Generation Based on Adaptive Model Predictive Control

    Authors: Zhenghua Xu, Changhong Deng, Qiuling Yang

    Abstract: Variable-speed pumped-storage (VSPS) has great potential in hel** solve the frequency control problem caused by low inertia, owing to its remarkable flexibility beyond conventional fixed-speed one, to make better use of which, a primary frequency control strategy based on adaptive model predictive control (AMPC) is proposed in this paper for VSPS plant in power generation.

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: 8 pages, 9 figures

  14. arXiv:2210.09531   

    cs.RO cs.HC eess.SY

    The Brain-Inspired Cooperative Shared Control for Brain-Machine Interface

    Authors: Shengjie Zheng, Ling Liu, Junjie Yang, Lang Qian, Gang Gao, Xin Chen, Wenqi **, Chunshan Deng, Xiaojian Li

    Abstract: In the practical application of brain-machine interface technology, the problem often faced is the low information content and high noise of the neural signals collected by the electrode and the difficulty of decoding by the decoder, which makes it difficult for the robotic to obtain stable instructions to complete the task. The idea based on the principle of cooperative shared control can be achi… ▽ More

    Submitted 25 June, 2024; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: This article need to update the corrected figure and data

  15. arXiv:2210.01434  [pdf, ps, other

    eess.SP

    Beamforming Design and Trajectory Optimization for UAV-Empowered Adaptable Integrated Sensing and Communication

    Authors: Cailian Deng, Xuming Fang, Xianbin Wang

    Abstract: Unmanned aerial vehicle (UAV) has high flexibility and controllable mobility, therefore it is considered as a promising enabler for future integrated sensing and communication (ISAC). In this paper, we propose a novel adaptable ISAC (AISAC) mechanism in the UAV-enabled system, where the UAV performs sensing on demand during communication and the sensing duration is configured flexibly according to… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  16. arXiv:2209.13915  [pdf, ps, other

    eess.SP

    Joint Optimization of Resource Allocation and Trajectory Control for Mobile Group Users in Fixed-Wing UAV-Enabled Wireless Network

    Authors: Xuezhen Yan, Xuming Fang, Cailian Deng, Xianbin Wang

    Abstract: Owing to the controlling flexibility and cost-effectiveness, fixed-wing unmanned aerial vehicles (UAVs) are expected to serve as flying base stations (BSs) in the air-ground integrated network. By exploiting the mobility of UAVs, controllable coverage can be provided for mobile group users (MGUs) under challenging scenarios or even somewhere without communication infrastructure. However, in such d… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

    Comments: 30 pages, 9 figures

  17. arXiv:2208.13952  [pdf, other

    eess.SP physics.optics

    Micro-Vibration Modes Reconstruction Based on Micro-Doppler Coincidence Imaging

    Authors: Shuang Liu, Chen** Deng, Chaoran Wang, Zunwang Bo, Shensheng Han, Zihuai Lin

    Abstract: Micro-vibration, a ubiquitous nature phenomenon, can be seen as a characteristic feature on the objects, these vibrations always have tiny amplitudes which are much less than the wavelengths of the sensing systems, thus these motions information can only be reflected in the phase item of echo. Normally the conventional radar system can detect these micro vibrations through the time frequency analy… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

  18. arXiv:2206.12774  [pdf, other

    eess.AS cs.CL cs.SD

    Meta Auxiliary Learning for Low-resource Spoken Language Understanding

    Authors: Yingying Gao, Junlan Feng, Chao Deng, Shilei Zhang

    Abstract: Spoken language understanding (SLU) treats automatic speech recognition (ASR) and natural language understanding (NLU) as a unified task and usually suffers from data scarcity. We exploit an ASR and NLU joint training method based on meta auxiliary learning to improve the performance of low-resource SLU task by only taking advantage of abundant manual transcriptions of speech data. One obvious adv… ▽ More

    Submitted 25 June, 2022; originally announced June 2022.

  19. arXiv:2206.08031  [pdf, other

    eess.AS

    A CTC Triggered Siamese Network with Spatial-Temporal Dropout for Speech Recognition

    Authors: Yingying Gao, Junlan Feng, Tianrui Wang, Chao Deng, Shilei Zhang

    Abstract: Siamese networks have shown effective results in unsupervised visual representation learning. These models are designed to learn an invariant representation of two augmentations for one input by maximizing their similarity. In this paper, we propose an effective Siamese network to improve the robustness of End-to-End automatic speech recognition (ASR). We introduce spatial-temporal dropout to supp… ▽ More

    Submitted 22 June, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

  20. arXiv:2202.04250  [pdf, other

    cs.NI eess.SP

    GenAD: General Representations of Multivariate Time Seriesfor Anomaly Detection

    Authors: Xiaolei Hua, Lin Zhu, Shenglin Zhang, Zeyan Li, Su Wang, Dong Zhou, Shuo Wang, Chao Deng

    Abstract: The reliability of wireless base stations in China Mobile is of vital importance, because the cell phone users are connected to the stations and the behaviors of the stations are directly related to user experience. Although the monitoring of the station behaviors can be realized by anomaly detection on multivariate time series, due to complex correlations and various temporal patterns of multivar… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

  21. arXiv:2111.14220  [pdf, other

    cs.LG eess.SP

    On the Robustness and Generalization of Deep Learning Driven Full Waveform Inversion

    Authors: Chengyuan Deng, Youzuo Lin

    Abstract: The data-driven approach has been demonstrated as a promising technique to solve complicated scientific problems. Full Waveform Inversion (FWI) is commonly epitomized as an image-to-image translation task, which motivates the use of deep neural networks as an end-to-end solution. Despite being trained with synthetic data, the deep learning-driven FWI is expected to perform well when evaluated with… ▽ More

    Submitted 28 November, 2021; originally announced November 2021.

  22. arXiv:2111.02926  [pdf, other

    cs.LG eess.SP

    OpenFWI: Large-Scale Multi-Structural Benchmark Datasets for Seismic Full Waveform Inversion

    Authors: Chengyuan Deng, Shihang Feng, Hanchen Wang, Xitong Zhang, Peng **, Yinan Feng, Qili Zeng, Yinpeng Chen, Youzuo Lin

    Abstract: Full waveform inversion (FWI) is widely used in geophysics to reconstruct high-resolution velocity maps from seismic data. The recent success of data-driven FWI methods results in a rapidly increasing demand for open datasets to serve the geophysics community. We present OpenFWI, a collection of large-scale multi-structural benchmark datasets, to facilitate diversified, rigorous, and reproducible… ▽ More

    Submitted 23 June, 2023; v1 submitted 4 November, 2021; originally announced November 2021.

    Comments: This manuscript has been accepted by NeurIPS 2022 dataset and benchmark track

  23. arXiv:2106.15765  [pdf, other

    eess.IV cs.CV physics.optics

    10-mega pixel snapshot compressive imaging with a hybrid coded aperture

    Authors: Zhihong Zhang, Chao Deng, Yang Liu, Xin Yuan, **li Suo, Qionghai Dai

    Abstract: High resolution images are widely used in our daily life, whereas high-speed video capture is challenging due to the low frame rate of cameras working at the high resolution mode. Digging deeper, the main bottleneck lies in the low throughput of existing imaging systems. Towards this end, snapshot compressive imaging (SCI) was proposed as a promising solution to improve the throughput of imaging s… ▽ More

    Submitted 15 August, 2021; v1 submitted 29 June, 2021; originally announced June 2021.

    Comments: 11 pages, 8 figures, accepted by Photonics Research

  24. arXiv:2011.02109  [pdf

    eess.AS

    Deep Multi-task Network for Delay Estimation and Echo Cancellation

    Authors: Yi Zhang, Chengyun Deng, Shiqian Ma, Yongtao Sha, Hui Song

    Abstract: Echo path delay (or ref-delay) estimation is a big challenge in acoustic echo cancellation. Different devices may introduce various ref-delay in practice. Ref-delay inconsistency slows down the convergence of adaptive filters, and also degrades the performance of deep learning models due to 'unseen' ref-delays in the training set. In this paper, a multi-task network is proposed to address both ref… ▽ More

    Submitted 11 August, 2022; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted by Interspeech 2020

  25. arXiv:2011.02102  [pdf, other

    eess.AS

    Robust Speaker Extraction Network Based on Iterative Refined Adaptation

    Authors: Chengyun Deng, Shiqian Ma, Yi Zhang, Yongtao Sha, Hui Zhang, Hui Song, Xiangang Li

    Abstract: Speaker extraction aims to extract target speech signal from a multi-talker environment with interference speakers and surrounding noise, given the target speaker's reference information. Most speaker extraction systems achieve satisfactory performance on the premise that the test speakers have been encountered during training time. Such systems suffer from performance degradation given unseen tar… ▽ More

    Submitted 11 August, 2022; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted by Interspeech 2021

  26. arXiv:2007.14974  [pdf, other

    eess.AS cs.SD

    On Loss Functions and Recurrency Training for GAN-based Speech Enhancement Systems

    Authors: Zhuohuang Zhang, Chengyun Deng, Yi Shen, Donald S. Williamson, Yongtao Sha, Yi Zhang, Hui Song, Xiangang Li

    Abstract: Recent work has shown that it is feasible to use generative adversarial networks (GANs) for speech enhancement, however, these approaches have not been compared to state-of-the-art (SOTA) non GAN-based approaches. Additionally, many loss functions have been proposed for GAN-based approaches, but they have not been adequately compared. In this study, we propose novel convolutional recurrent GAN (CR… ▽ More

    Submitted 26 December, 2020; v1 submitted 29 July, 2020; originally announced July 2020.

    Comments: accepted by Interspeech2020, 5 pages, 2 figures

  27. arXiv:2007.13401  [pdf, ps, other

    eess.SP

    IEEE 802.11be-Wi-Fi 7: New Challenges and Opportunities

    Authors: Cailian Deng, Xuming Fang, Xiao Han, Xianbin Wang, Li Yan, Rong He, Yan Long, Yuchen Guo

    Abstract: With the emergence of 4k/8k video, the throughput requirement of video delivery will keep grow to tens of Gbps. Other new high-throughput and low-latency video applications including augmented reality (AR), virtual reality (VR), and online gaming, are also proliferating. Due to the related stringent requirements, supporting these applications over wireless local area network (WLAN) is far beyond t… ▽ More

    Submitted 3 August, 2020; v1 submitted 27 July, 2020; originally announced July 2020.

    Comments: Accepted for publication in IEEE Communications Surveys and Tutorials

  28. arXiv:1912.01852  [pdf, other

    cs.SD cs.CL eess.AS

    PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network

    Authors: Chengqi Deng, Chengzhu Yu, Heng Lu, Chao Weng, Dong Yu

    Abstract: Singing voice conversion is to convert a singer's voice to another one's voice without changing singing content. Recent work shows that unsupervised singing voice conversion can be achieved with an autoencoder-based approach [1]. However, the converted singing voice can be easily out of key, showing that the existing approach cannot model the pitch information precisely. In this paper, we propose… ▽ More

    Submitted 18 February, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: Accepted by ICASSP 2020

  29. arXiv:1901.07042  [pdf, other

    cs.CV cs.LG eess.IV

    MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs

    Authors: Alistair E. W. Johnson, Tom J. Pollard, Nathaniel R. Greenbaum, Matthew P. Lungren, Chih-ying Deng, Yifan Peng, Zhiyong Lu, Roger G. Mark, Seth J. Berkowitz, Steven Horng

    Abstract: Chest radiography is an extremely powerful imaging modality, allowing for a detailed inspection of a patient's thorax, but requiring specialized training for proper interpretation. With the advent of high performance general purpose computer vision algorithms, the accurate automated analysis of chest radiographs is becoming increasingly of interest to researchers. However, a key challenge in the d… ▽ More

    Submitted 14 November, 2019; v1 submitted 21 January, 2019; originally announced January 2019.

  30. arXiv:1811.03455  [pdf, other

    eess.IV

    High fidelity single-pixel imaging

    Authors: Chao Deng, Xuemei Hu, Xiaoxu Li, **li Suo, Zhili Zhang, Qionghai Dai

    Abstract: Single-pixel imaging (SPI) is an emerging technique which has attracts wide attention in various research fields. However, restricted by the low reconstruction quality and large amount of measurements, the practical application is still in its infancy. Inspired by the fact that natural scenes exhibit unique degenerate structures in the low dimensional subspace, we propose to take advantage of the… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

    Comments: 5 pages, 6 figures

  31. On Non-Consensus Motions of Dynamical Linear Multi-Agent Systems

    Authors: Ning Cai, Chun-Lin Deng, Qiu-Xuan Wu

    Abstract: The non-consensus problems of high order linear time-invariant dynamical homogeneous multi-agent systems are concerned. Based on the conditions of consensus achievement, the mechanisms that lead to non-consensus motions are analyzed. Besides, a comprehensive classification for diverse types of non-consensus phases in accordance to the different conditions is conducted, which is jointly depending o… ▽ More

    Submitted 23 August, 2017; originally announced August 2017.