Skip to main content

Showing 1–50 of 162 results for author: Chen, Q

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18950  [pdf, other

    eess.IV cs.CV

    MMR-Mamba: Multi-Contrast MRI Reconstruction with Mamba and Spatial-Frequency Information Fusion

    Authors: **g Zou, Lanqing Liu, Qi Chen, Shujun Wang, Xiaohan Xing, **g Qin

    Abstract: Multi-contrast MRI acceleration has become prevalent in MR imaging, enabling the reconstruction of high-quality MR images from under-sampled k-space data of the target modality, using guidance from a fully-sampled auxiliary modality. The main crux lies in efficiently and comprehensively integrating complementary information from the auxiliary modality. Existing methods either suffer from quadratic… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figure

  2. arXiv:2406.11169   

    eess.AS cs.SD

    Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Wen Wang

    Abstract: Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persisting challenge. In this paper, we propose a new self-supervised speaker verification approach, Self-Distillation Prototypes Network (SDPN), which effectively facilitates self-supervised speaker representation learning. SDPN assigns the representation of the augmented views of an… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: We update this paper to an earlier paper

  3. arXiv:2406.02167  [pdf, other

    eess.AS eess.SP

    ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Junjie Li

    Abstract: Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature fusion approach has been proposed to effectively capture speaker characteristics from short utterances. Constrained by the model's size, a robust backbone Enhanced Res2Net (ERes2Net) combining global and local feature fusion… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  4. arXiv:2406.01205  [pdf, other

    eess.AS cs.LG cs.SD

    ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

    Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Siqi Zheng, Qian Chen, Wen Wang, Ziyue Jiang, Hai Huang, Xize Cheng, Rongjie Huang, Zhou Zhao

    Abstract: In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual style description prompt. Prior zero-shot TTS models and controllable TTS models either could only mimic the speaker's voice without further control and… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  5. arXiv:2405.16516  [pdf, other

    eess.IV cs.CV

    Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models

    Authors: Kun Huang, Xiao Ma, Yuhan Zhang, Na Su, Songtao Yuan, Yong Liu, Qiang Chen, Huazhu Fu

    Abstract: Optical coherence tomography (OCT) image analysis plays an important role in the field of ophthalmology. Current successful analysis models rely on available large datasets, which can be challenging to be obtained for certain tasks. The use of deep generative models to create realistic data emerges as a promising approach. However, due to limitations in hardware resources, it is still difficulty t… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Provisionally accepted for medical image computing and computer-assisted intervention (MICCAI) 2024

  6. arXiv:2405.11006  [pdf, other

    eess.SY nlin.AO

    Self-Triggered Distributed Model Predictive Control with Synchronization Parameters Interaction

    Authors: Qianqian Chen, Shaoyuan Li

    Abstract: This paper investigates an aperiodic distributed model predictive control approach for multi-agent systems (MASs) in which parameterized synchronization constraints is considered and an innovative self-triggered criterion is constructed. Different from existing coordination methodology, the proposed strategy achieves the cooperation of agents through the synchronization of one-dimensional paramete… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  7. arXiv:2405.11005  [pdf, other

    eess.SY nlin.AO

    Distributed Model Predictive Control for Asynchronous Multi-agent Systems with Self-Triggered Coordinator

    Authors: Qianqian Chen, Shaoyuan Li

    Abstract: This paper investigates the distributed model predictive control for an asynchronous nonlinear multi-agent system with external interference via a self-triggered generator and a prediction horizon regulator. First, a shrinking constraint related to the error between the actual state and the predicted state is introduced into the optimal control problem to enable the robustness of the system. Then,… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  8. arXiv:2405.08745  [pdf, other

    eess.IV cs.CV cs.MM

    Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

    Authors: Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: In this paper, we present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos. Motivated by previous researches that leverage pre-trained features extracted from various computer vision models as the feature representation for BVQA, we further explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQ… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  9. arXiv:2405.03905  [pdf, other

    cs.AR cs.CV cs.SD eess.AS

    A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM

    Authors: Qinyu Chen, Kwantae Kim, Chang Gao, Sheng Zhou, Taekwang Jang, Tobi Delbruck, Shih-Chii Liu

    Abstract: This paper introduces, to the best of the authors' knowledge, the first fine-grained temporal sparsity-aware keyword spotting (KWS) IC leveraging temporal similarities between neighboring feature vectors extracted from input frames and network hidden states, eliminating unnecessary operations and memory accesses. This KWS IC, featuring a bio-inspired delta-gated recurrent neural network (ΔRNN) cla… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  10. arXiv:2404.15364  [pdf, other

    eess.SP cs.AI cs.CV cs.LG

    MP-DPD: Low-Complexity Mixed-Precision Neural Networks for Energy-Efficient Digital Predistortion of Wideband Power Amplifiers

    Authors: Yizhuo Wu, Ang Li, Mohammadreza Beikmirza, Gagan Deep Singh, Qinyu Chen, Leo C. N. de Vreede, Morteza Alavi, Chang Gao

    Abstract: Digital Pre-Distortion (DPD) enhances signal quality in wideband RF power amplifiers (PAs). As signal bandwidths expand in modern radio systems, DPD's energy consumption increasingly impacts overall system efficiency. Deep Neural Networks (DNNs) offer promising advancements in DPD, yet their high complexity hinders their practical deployment. This paper introduces open-source mixed-precision (MP)… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted to IEEE Microwave and Wireless Technology Letters (MWTL)

  11. arXiv:2404.15278  [pdf, other

    eess.SP cs.CR cs.NI

    Security-Sensitive Task Offloading in Integrated Satellite-Terrestrial Networks

    Authors: Wenjun Lan, Kongyang Chen, Jiannong Cao, Yikai Li, Ning Li, Qi Chen, Yuvraj Sahni

    Abstract: With the rapid development of sixth-generation (6G) communication technology, global communication networks are moving towards the goal of comprehensive and seamless coverage. In particular, low earth orbit (LEO) satellites have become a critical component of satellite communication networks. The emergence of LEO satellites has brought about new computational resources known as the \textit{LEO sat… ▽ More

    Submitted 20 January, 2024; originally announced April 2024.

  12. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  13. arXiv:2404.11278  [pdf, other

    physics.ins-det eess.IV

    Study on the static detection of ICF target based on muonic X-ray sphere encoded imaging

    Authors: Dikai Li, Jian Yu, Qian Chen, Chunhui Zhang, Xiangyu Wan, Leifeng Cao

    Abstract: Muon Induced X-ray Emission (MIXE) was discovered by Chinese physicist Zhang Wenyu as early as 1947, and it can conduct non-destructive elemental analysis inside samples. Research has shown that MIXE can retain the high efficiency of direct imaging while benefiting from the low noise of pinhole imaging through encoding holes. The related technology significantly improves the counting rate while ma… ▽ More

    Submitted 17 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  14. arXiv:2404.06007  [pdf, other

    cs.IT cs.AI cs.LG eess.SP

    Collaborative Edge AI Inference over Cloud-RAN

    Authors: Pengfei Zhang, Dingzhu Wen, Guangxu Zhu, Qimei Chen, Kaifeng Han, Yuanming Shi

    Abstract: In this paper, a cloud radio access network (Cloud-RAN) based collaborative edge AI inference architecture is proposed. Specifically, geographically distributed devices capture real-time noise-corrupted sensory data samples and extract the noisy local feature vectors, which are then aggregated at each remote radio head (RRH) to suppress sensing noise. To realize efficient uplink feature aggregatio… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by IEEE Transactions on Communications on 08-Apr-2024

  15. arXiv:2404.03253  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    A dataset of primary nasopharyngeal carcinoma MRI with multi-modalities segmentation

    Authors: Yin Li, Qi Chen, Kai Wang, Meige Li, Li** Si, Yingwei Guo, Yu Xiong, Qixing Wang, Yang Qin, Ling Xu, Patrick van der Smagt, Jun Tang, Nutan Chen

    Abstract: Multi-modality magnetic resonance imaging data with various sequences facilitate the early diagnosis, tumor segmentation, and disease staging in the management of nasopharyngeal carcinoma (NPC). The lack of publicly available, comprehensive datasets limits advancements in diagnosis, treatment planning, and the development of machine learning algorithms for NPC. Addressing this critical need, we in… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  16. arXiv:2403.19971  [pdf, other

    eess.AS eess.SP

    3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Tinglong Zhu, Changhe Song, Rongjie Huang, Ziyang Ma, Qian Chen, Shiliang Zhang, Xihao Li

    Abstract: This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization. It is designed for the needs of academic researchers and industrial practitioners. The 3D-Speaker-Toolkit adeptly leverages the combined strengths of acoustic, semantic, and visual data, seamlessly fusing these modalities to offer robust speaker recognition capabilities. The acous… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  17. arXiv:2403.11556  [pdf, other

    eess.IV cs.CV

    Hierarchical Frequency-based Upsampling and Refining for Compressed Video Quality Enhancement

    Authors: Qianyu Zhang, Bolun Zheng, Xinying Chen, Quan Chen, Zhunjie Zhu, Can** Wang, Zongpeng Li, Chengang Yan

    Abstract: Video compression artifacts arise due to the quantization operation in the frequency domain. The goal of video quality enhancement is to reduce compression artifacts and reconstruct a visually-pleasant result. In this work, we propose a hierarchical frequency-based upsampling and refining neural network (HFUR) for compressed video quality enhancement. HFUR consists of two modules: implicit frequen… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  18. arXiv:2402.19470  [pdf, other

    eess.IV cs.CV

    Towards Generalizable Tumor Synthesis

    Authors: Qi Chen, Xiaoxi Chen, Haorui Song, Zhiwei Xiong, Alan Yuille, Chen Wei, Zongwei Zhou

    Abstract: Tumor synthesis enables the creation of artificial tumors in medical images, facilitating the training of AI models for tumor detection and segmentation. However, success in tumor synthesis hinges on creating visually realistic tumors that are generalizable across multiple organs and, furthermore, the resulting AI models being capable of detecting real tumors in images sourced from different domai… ▽ More

    Submitted 28 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR 2024)

  19. arXiv:2402.17723  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

    Authors: Yazhou Xing, Yingqing He, Zeyue Tian, Xintao Wang, Qifeng Chen

    Abstract: Video and audio content creation serves as the core technique for the movie industry and professional users. Recently, existing diffusion-based methods tackle video and audio generation separately, which hinders the technique transfer from academia to industry. In this work, we aim at filling the gap, with a carefully designed optimization-based framework for cross-visual-audio and joint-visual-au… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024. Project website: https://yzxing87.github.io/Seeing-and-Hearing/

  20. arXiv:2402.14099  [pdf, other

    eess.IV cs.CV physics.med-ph

    EXACT-Net:EHR-guided lung tumor auto-segmentation for non-small cell lung cancer radiotherapy

    Authors: Hamed Hooshangnejad, Xue Feng, Gaofeng Huang, Rui Zhang, Quan Chen, Kai Ding

    Abstract: Lung cancer is a devastating disease with the highest mortality rate among cancer types. Over 60% of non-small cell lung cancer (NSCLC) patients, which accounts for 87% of diagnoses, require radiation therapy. Rapid treatment initiation significantly increases the patient's survival rate and reduces the mortality rate. Accurate tumor segmentation is a critical step in the diagnosis and treatment o… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  21. arXiv:2402.12208  [pdf, other

    eess.AS cs.SD

    Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

    Authors: Shengpeng Ji, Minghui Fang, Ziyue Jiang, Siqi Zheng, Qian Chen, Rongjie Huang, Jialung Zuo, Shulei Wang, Zhou Zhao

    Abstract: In recent years, large language models have achieved significant success in generative tasks (e.g., speech cloning and audio generation) related to speech, audio, music, and other signal domains. A crucial element of these models is the discrete acoustic codecs, which serves as an intermediate representation replacing the mel-spectrogram. However, there exist several gaps between discrete codecs a… ▽ More

    Submitted 27 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: We release a more powerful checkpoint in Language-Codec v3

  22. arXiv:2402.11769  [pdf, other

    eess.SY cs.GT math.OC

    Connection-Aware P2P Trading: Simultaneous Trading and Peer Selection

    Authors: Cheng Feng, Kedi Zheng, Lanqing Shan, Hani Alers, Lampros Stergioulas, Hongye Guo, Qixin Chen

    Abstract: Peer-to-peer (P2P) trading is seen as a viable solution to handle the growing number of distributed energy resources in distribution networks. However, when dealing with large-scale consumers, there are several challenges that must be addressed. One of these challenges is limited communication capabilities. Additionally, prosumers may have specific preferences when it comes to trading. Both can re… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: Submitted to IEEE PES Transactions

  23. arXiv:2402.09424  [pdf, other

    eess.SP cs.CV cs.LG cs.NE

    Epilepsy Seizure Detection and Prediction using an Approximate Spiking Convolutional Transformer

    Authors: Qinyu Chen, Congyi Sun, Chang Gao, Shih-Chii Liu

    Abstract: Epilepsy is a common disease of the nervous system. Timely prediction of seizures and intervention treatment can significantly reduce the accidental injury of patients and protect the life and health of patients. This paper presents a neuromorphic Spiking Convolutional Transformer, named Spiking Conformer, to detect and predict epileptic seizure segments from scalped long-term electroencephalogram… ▽ More

    Submitted 21 January, 2024; originally announced February 2024.

    Comments: To be published at the 2024 IEEE International Symposium on Circuits and Systems (ISCAS), Singapore

  24. arXiv:2402.08846  [pdf, other

    cs.CL cs.AI cs.MM cs.SD eess.AS

    An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

    Authors: Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

    Abstract: In this paper, we focus on solving one of the most important tasks in the field of speech processing, i.e., automatic speech recognition (ASR), with speech foundation encoders and large language models (LLM). Recent works have complex designs such as compressing the output temporally for the speech encoder, tackling modal alignment for the projector, and utilizing parameter-efficient fine-tuning f… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: Working in progress and will open-source soon

  25. A Unified MPC Strategy for a Tilt-rotor VTOL UAV Towards Seamless Mode Transitioning

    Authors: Qizhao Chen, Ziqi Hu, Junyi Geng, Dongwei Bai, Mohammad Mousaei, Sebastian Scherer

    Abstract: Capabilities of long-range flight and vertical take-off and landing (VTOL) are essential for Urban Air Mobility (UAM). Tiltrotor VTOLs have the advantage of balancing control simplicity and system complexity due to their redundant control authority. Prior work on controlling these aircraft either requires separate controllers and switching modes for different vehicle configurations or performs the… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

    Comments: In proceedings of the 2024 AIAA SciTech Forum, Session: Guidance, Navigation, and Control GNC-49

    Journal ref: AIAA SCITECH 2024 Forum, p. 2878. January 2024

  26. arXiv:2401.04747  [pdf, other

    cs.SD cs.AI cs.CV cs.GR eess.AS

    DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation

    Authors: Junming Chen, Yunfei Liu, Jianan Wang, Ailing Zeng, Yu Li, Qifeng Chen

    Abstract: We propose DiffSHEG, a Diffusion-based approach for Speech-driven Holistic 3D Expression and Gesture generation with arbitrary length. While previous works focused on co-speech gesture or expression generation individually, the joint generation of synchronized expressions and gestures remains barely explored. To address this, our diffusion-based co-speech motion generation transformer enables uni-… ▽ More

    Submitted 6 April, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted by CVPR 2024. Project page: https://jeremycjm.github.io/proj/DiffSHEG

  27. arXiv:2401.00475  [pdf, other

    cs.SD eess.AS

    E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models

    Authors: Hongfei Xue, Yuhao Liang, Bingshen Mu, Shiliang Zhang, Mengzhe Chen, Qian Chen, Lei Xie

    Abstract: This study focuses on emotion-sensitive spoken dialogue in human-machine speech interaction. With the advancement of Large Language Models (LLMs), dialogue systems can handle multimodal data, including audio. Recent models have enhanced the understanding of complex audio signals through the integration of various audio events. However, they are unable to generate appropriate responses based on emo… ▽ More

    Submitted 6 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: 6 pages, 3 figures

  28. arXiv:2312.17508  [pdf, ps, other

    eess.AS cs.AI cs.SD

    Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion

    Authors: Yun Chen, Lingxiao Yang, Qi Chen, Jian-Huang Lai, Xiaohua Xie

    Abstract: Emotional Voice Conversion aims to manipulate a speech according to a given emotion while preserving non-emotion components. Existing approaches cannot well express fine-grained emotional attributes. In this paper, we propose an Attention-based Interactive diseNtangling Network (AINN) that leverages instance-wise emotional knowledge for voice conversion. We introduce a two-stage pipeline to effect… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

    Comments: Accepted by INTERSPEECH 2023

  29. arXiv:2312.13523  [pdf

    physics.med-ph eess.IV

    High-resolution myelin-water fraction and quantitative relaxation map** using 3D ViSTa-MR fingerprinting

    Authors: Congyu Liao, Xiaozhi Cao, Siddharth Srinivasan Iyer, Sophie Schauman, Zihan Zhou, Xiaoqian Yan, Quan Chen, Zhitao Li, Nan Wang, Ting Gong, Zhe Wu, Hongjian He, Jianhui Zhong, Yang Yang, Adam Kerr, Kalanit Grill-Spector, Kawin Setsompop

    Abstract: Purpose: This study aims to develop a high-resolution whole-brain multi-parametric quantitative MRI approach for simultaneous map** of myelin-water fraction (MWF), T1, T2, and proton-density (PD), all within a clinically feasible scan time. Methods: We developed 3D ViSTa-MRF, which combined Visualization of Short Transverse relaxation time component (ViSTa) technique with MR Fingerprinting (MR… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 38 pages, 12 figures and 1 table

    Journal ref: Magnetic Resonance in Medicine 2023

  30. arXiv:2312.10880  [pdf, other

    cs.RO eess.SY

    Sharable Clothoid-based Continuous Motion Planning for Connected Automated Vehicles

    Authors: Sanghoon Oh, Qi Chen, H. Eric Tseng, Gaurav Pandey, Gabor Orosz

    Abstract: A continuous motion planning method for connected automated vehicles is considered for generating feasible trajectories in real-time using three consecutive clothoids. The proposed method reduces path planning to a small set of nonlinear algebraic equations such that the generated path can be efficiently checked for feasibility and collision. After path planning, velocity planning is executed whil… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: 14 pages, 14 figures

  31. DTP-Net: Learning to Reconstruct EEG signals in Time-Frequency Domain by Multi-scale Feature Reuse

    Authors: Yan Pei, Jiahui Xu, Qianhao Chen, Chenhao Wang, Feng Yu, Lisan Zhang, Wei Luo

    Abstract: Electroencephalography (EEG) signals are easily corrupted by various artifacts, making artifact removal crucial for improving signal quality in scenarios such as disease diagnosis and brain-computer interface (BCI). In this paper, we present a fully convolutional neural architecture, called DTP-Net, which consists of a Densely Connected Temporal Pyramid (DTP) sandwiched between a pair of learnable… ▽ More

    Submitted 6 March, 2024; v1 submitted 27 November, 2023; originally announced December 2023.

    Comments: 18 pages, 10 figures

    Journal ref: IEEE Journal of Biomedical and Health Informatics. 2024: 1-12

  32. arXiv:2311.16200  [pdf, other

    eess.IV cs.AR cs.LG

    Streaming Lossless Volumetric Compression of Medical Images Using Gated Recurrent Convolutional Neural Network

    Authors: Qianhao Chen, Jietao Chen

    Abstract: Deep learning-based lossless compression methods offer substantial advantages in compressing medical volumetric images. Nevertheless, many learning-based algorithms encounter a trade-off between practicality and compression performance. This paper introduces a hardware-friendly streaming lossless volumetric compression framework, utilizing merely one-thousandth of the model weights compared to oth… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 18 pages, 8 figures

  33. arXiv:2311.04534  [pdf, other

    cs.CL cs.SD eess.AS

    Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR

    Authors: Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang

    Abstract: Recently, unified speech-text models, such as SpeechGPT, VioLA, and AudioPaLM, have achieved remarkable performance on various speech tasks. These models discretize speech signals into tokens (speech discretization) and use a shared vocabulary for both text and speech tokens. Then they train a single decoder-only Transformer on a mixture of speech tasks. However, these models rely on the Loss Mask… ▽ More

    Submitted 4 February, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: 5 pages, accepted by ICASSP 2024

  34. arXiv:2311.03062  [pdf

    physics.optics cs.LG eess.SP

    Imaging through multimode fibres with physical prior

    Authors: Chuncheng Zhang, Yingjie Shi, Zheyi Yao, Xiubao Sui, Qian Chen

    Abstract: Imaging through perturbed multimode fibres based on deep learning has been widely researched. However, existing methods mainly use target-speckle pairs in different configurations. It is challenging to reconstruct targets without trained networks. In this paper, we propose a physics-assisted, unsupervised, learning-based fibre imaging scheme. The role of the physical prior is to simplify the mappi… ▽ More

    Submitted 13 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

  35. arXiv:2311.02911  [pdf, other

    eess.SP eess.SY

    Goal-Oriented Wireless Communication Resource Allocation for Cyber-Physical Systems

    Authors: Cheng Feng, Kedi Zheng, Yi Wang, Kaibin Huang, Qixin Chen

    Abstract: The proliferation of novel industrial applications at the wireless edge, such as smart grids and vehicle networks, demands the advancement of cyber-physical systems. The performance of CPSs is closely linked to the last-mile wireless communication networks, which often become bottlenecks due to their inherent limited resources. Current CPS operations often treat wireless communication networks as… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Submitted to IEEE ComSoc journal for possible publications. Copyright may be transferred without notice, after which this version may no longer be accessible

  36. arXiv:2311.02551  [pdf

    eess.SY cs.GT cs.LG

    High-dimensional Bid Learning for Energy Storage Bidding in Energy Markets

    Authors: **yu Liu, Hongye Guo, Qinghu Tang, En Lu, Qiuna Cai, Qixin Chen

    Abstract: With the growing penetration of renewable energy resource, electricity market prices have exhibited greater volatility. Therefore, it is important for Energy Storage Systems(ESSs) to leverage the multidimensional nature of energy market bids to maximize profitability. However, current learning methods cannot fully utilize the high-dimensional price-quantity bids in the energy markets. To address t… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: 5 pages, 3 figures, Accepted by the 15th International Conference on Applied Energy (ICAE2023)

  37. arXiv:2310.17997  [pdf

    physics.optics cs.AI eess.IV

    Deep Learning Enables Large Depth-of-Field Images for Sub-Diffraction-Limit Scanning Superlens Microscopy

    Authors: Hui Sun, Hao Luo, Feifei Wang, Qingjiu Chen, Meng Chen, Xiaoduo Wang, Haibo Yu, Guanglie Zhang, Lianqing Liu, Jian** Wang, Dapeng Wu, Wen Jung Li

    Abstract: Scanning electron microscopy (SEM) is indispensable in diverse applications ranging from microelectronics to food processing because it provides large depth-of-field images with a resolution beyond the optical diffraction limit. However, the technology requires coating conductive films on insulator samples and a vacuum environment. We use deep learning to obtain the map** relationship between op… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: 13 pages,7 figures

  38. arXiv:2310.14636  [pdf, other

    eess.IV cs.CV

    Multilevel Perception Boundary-guided Network for Breast Lesion Segmentation in Ultrasound Images

    Authors: Xing Yang, Jian Zhang, Qijian Chen, Li Wang, Lihui Wang

    Abstract: Automatic segmentation of breast tumors from the ultrasound images is essential for the subsequent clinical diagnosis and treatment plan. Although the existing deep learning-based methods have achieved significant progress in automatic segmentation of breast tumor, their performance on tumors with similar intensity to the normal tissues is still not pleasant, especially for the tumor boundaries. T… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 12pages,5 figures

  39. Low-Complex Channel Estimation in Extra-Large Scale MIMO with the Spherical Wave Properties

    Authors: Xumin Pu, Zhinan Sun, Qianbin Chen, Shi **

    Abstract: This paper investigates the low-complex linear minimum mean squared error (LMMSE) channel estimation in an extra-large scale MIMO system with the spherical wave model (SWM). We model the extra-large scale MIMO channels using the SWM in the terahertz (THz) line-of-sight propagation, in which the transceiver is a uniform circular antenna array. On this basis, for the known channel covariance matrix… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: 9 pages with 3 figures, accepted by Physical Communication

  40. arXiv:2310.06678  [pdf, other

    cs.IT eess.SP eess.SY

    Modelling and Performance Analysis of the Over-the-Air Computing in Cellular IoT Networks

    Authors: Ying Dong, Haonan Hu, Qiaoshou Liu, Tingwei Lv, Qianbin Chen, Jie Zhang

    Abstract: Ultra-fast wireless data aggregation (WDA) of distributed data has emerged as a critical design challenge in the ultra-densely deployed cellular internet of things network (CITN) due to limited spectral resources. Over-the-air computing (AirComp) has been proposed as an effective solution for ultra-fast WDA by exploiting the superposition property of wireless channels. However, the effect of acces… ▽ More

    Submitted 11 August, 2023; originally announced October 2023.

  41. arXiv:2310.04673  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

    Authors: Jiaming Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, ** Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang

    Abstract: Generative Pre-trained Transformer (GPT) models have achieved remarkable performance on various natural language processing tasks. However, there has been limited research on applying similar frameworks to audio tasks. Previously proposed large language models for audio tasks either lack sufficient quantitative evaluations, or are limited to tasks for recognizing and understanding audio content, o… ▽ More

    Submitted 10 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: 10 pages, under review

  42. arXiv:2310.02641  [pdf, other

    cs.CV cs.AI eess.IV

    Deformation-Invariant Neural Network and Its Applications in Distorted Image Restoration and Analysis

    Authors: Han Zhang, Qiguang Chen, Lok Ming Lui

    Abstract: Images degraded by geometric distortions pose a significant challenge to imaging and computer vision tasks such as object recognition. Deep learning-based imaging models usually fail to give accurate performance for geometrically distorted images. In this paper, we propose the deformation-invariant neural network (DINN), a framework to address the problem of imaging tasks for geometrically distort… ▽ More

    Submitted 7 November, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

  43. arXiv:2309.13573  [pdf, other

    cs.SD eess.AS

    The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR

    Authors: Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu

    Abstract: With the success of the first Multi-channel Multi-party Meeting Transcription challenge (M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to tackle the complex task of \emph{speaker-attributed ASR (SA-ASR)}, which directly addresses the practical and challenging problem of ``who spoke what at when" at typical meeting scenario. We particularly established two sub-tr… ▽ More

    Submitted 5 October, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: 8 pages, Accepted by ASRU2023

  44. arXiv:2309.11714  [pdf, other

    eess.SP cs.AI cs.LG

    A Dynamic Domain Adaptation Deep Learning Network for EEG-based Motor Imagery Classification

    Authors: Jie Jiao, Meiyan Xu, Qingqing Chen, Hefan Zhou, Wangliang Zhou

    Abstract: There is a correlation between adjacent channels of electroencephalogram (EEG), and how to represent this correlation is an issue that is currently being explored. In addition, due to inter-individual differences in EEG signals, this discrepancy results in new subjects need spend a amount of calibration time for EEG-based motor imagery brain-computer interface. In order to solve the above problems… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: 10 pages,4 figures,journal

    MSC Class: 68T07 (Primary) ACM Class: I.2.4

  45. arXiv:2309.10456  [pdf, other

    cs.SD cs.CL eess.AS

    Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

    Authors: Luyao Cheng, Siqi Zheng, Qinglin Zhang, Hui Wang, Yafeng Chen, Qian Chen, Shiliang Zhang

    Abstract: Speaker diarization has gained considerable attention within speech processing research community. Mainstream speaker diarization rely primarily on speakers' voice characteristics extracted from acoustic signals and often overlook the potential of semantic information. Considering the fact that speech signals can efficiently convey the content of a speech, it is of our interest to fully exploit th… ▽ More

    Submitted 4 February, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

  46. arXiv:2309.10294  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

    Authors: Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen

    Abstract: In this paper, we explored how to boost speech emotion recognition (SER) with the state-of-the-art speech pre-trained model (PTM), data2vec, text generation technique, GPT-4, and speech synthesis technique, Azure TTS. First, we investigated the representation ability of different speech self-supervised pre-trained models, and we found that data2vec has a good representation ability on the SER task… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  47. arXiv:2309.04960  [pdf, other

    eess.IV cs.CV

    SdCT-GAN: Reconstructing CT from Biplanar X-Rays with Self-driven Generative Adversarial Networks

    Authors: Shuangqin Cheng, Qingliang Chen, Qiyi Zhang, Ming Li, Yamuhanmode Alike, Kaile Su, Pengcheng Wen

    Abstract: Computed Tomography (CT) is a medical imaging modality that can generate more informative 3D images than 2D X-rays. However, this advantage comes at the expense of more radiation exposure, higher costs, and longer acquisition time. Hence, the reconstruction of 3D CT images using a limited number of 2D X-rays has gained significant importance as an economical alternative. Nevertheless, existing met… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  48. arXiv:2309.01321  [pdf, other

    eess.SY math.OC

    Joint Oscillation Dam** and Inertia Provision Service for Converter-Interfaced Generation

    Authors: Cheng Feng, Linbin Huang, Xiuqiang He, Yi Wang, Florian Dörfler, Qixin Chen

    Abstract: As renewable generation becomes more prevalent, traditional power systems dominated by synchronous generators are transitioning to systems dominated by converter-interfaced generation. These devices, with their weaker dam** capabilities and lower inertia, compromise the system's ability to withstand disturbances, pose a threat to system stability, and lead to oscillations and poor frequency resp… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: Submitted for IEEE PES journal for possible publications

  49. arXiv:2308.16676  [pdf, other

    eess.IV

    Twofold Structured Features-Based Siamese Network for Infrared Target Tracking

    Authors: Wei-Jie Yan, Yun-Kai Xu, Qian Chen, Xiao-Fang Kong, Guo-Hua Gu, A-Jun Shao, Min-Jie Wan

    Abstract: Nowadays, infrared target tracking has been a critical technology in the field of computer vision and has many applications, such as motion analysis, pedestrian surveillance, intelligent detection, and so forth. Unfortunately, due to the lack of color, texture and other detailed information, tracking drift often occurs when the tracker encounters infrared targets that vary in size or shape. To add… ▽ More

    Submitted 26 June, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: 13 pages,9 figures,references added

  50. arXiv:2308.06432  [pdf, other

    eess.IV cs.CV cs.LG

    Learn Single-horizon Disease Evolution for Predictive Generation of Post-therapeutic Neovascular Age-related Macular Degeneration

    Authors: Yuhan Zhang, Kun Huang, Mingchao Li, Songtao Yuan, Qiang Chen

    Abstract: Most of the existing disease prediction methods in the field of medical image processing fall into two classes, namely image-to-category predictions and image-to-parameter predictions. Few works have focused on image-to-image predictions. Different from multi-horizon predictions in other fields, ophthalmologists prefer to show more confidence in single-horizon predictions due to the low tolerance… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.