Skip to main content

Showing 1–50 of 82 results for author: Cheng, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.11169   

    eess.AS cs.SD

    Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Wen Wang

    Abstract: Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persisting challenge. In this paper, we propose a new self-supervised speaker verification approach, Self-Distillation Prototypes Network (SDPN), which effectively facilitates self-supervised speaker representation learning. SDPN assigns the representation of the augmented views of an… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: We update this paper to an earlier paper

  2. arXiv:2406.02167  [pdf, other

    eess.AS eess.SP

    ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Junjie Li

    Abstract: Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature fusion approach has been proposed to effectively capture speaker characteristics from short utterances. Constrained by the model's size, a robust backbone Enhanced Res2Net (ERes2Net) combining global and local feature fusion… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  4. arXiv:2405.14210  [pdf, other

    cs.CV eess.IV

    Eidos: Efficient, Imperceptible Adversarial 3D Point Clouds

    Authors: Hanwei Zhang, Luo Cheng, Qisong He, Wei Huang, Renjue Li, Ronan Sicre, Xiaowei Huang, Holger Hermanns, Lijun Zhang

    Abstract: Classification of 3D point clouds is a challenging machine learning (ML) task with important real-world applications in a spectrum from autonomous driving and robot-assisted surgery to earth observation from low orbit. As with other ML tasks, classification models are notoriously brittle in the presence of adversarial attacks. These are rooted in imperceptible changes to inputs with the effect tha… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Preprint

  5. arXiv:2403.19971  [pdf, other

    eess.AS eess.SP

    3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Tinglong Zhu, Changhe Song, Rongjie Huang, Ziyang Ma, Qian Chen, Shiliang Zhang, Xihao Li

    Abstract: This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization. It is designed for the needs of academic researchers and industrial practitioners. The 3D-Speaker-Toolkit adeptly leverages the combined strengths of acoustic, semantic, and visual data, seamlessly fusing these modalities to offer robust speaker recognition capabilities. The acous… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  6. arXiv:2403.17701   

    eess.IV cs.CV cs.LG

    Rotate to Scan: UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation

    Authors: Hao Tang, Lianglun Cheng, Guoheng Huang, Zhengguang Tan, Junhao Lu, Kaihong Wu

    Abstract: Image segmentation holds a vital position in the realms of diagnosis and treatment within the medical domain. Traditional convolutional neural networks (CNNs) and Transformer models have made significant advancements in this realm, but they still encounter challenges because of limited receptive field or high computing complexity. Recently, State Space Models (SSMs), particularly Mamba and its var… ▽ More

    Submitted 3 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Experimental method encountered errors, undergoing experiment again

  7. Activity Detection for Massive Connectivity in Cell-free Networks with Unknown Large-scale Fading, Channel Statistics, Noise Variance, and Activity Probability: A Bayesian Approach

    Authors: Hao Zhang, Qingfeng Lin, Yang Li, Lei Cheng, Yik-Chung Wu

    Abstract: Activity detection is an important task in the next generation grant-free multiple access. While there are a number of existing algorithms designed for this purpose, they mostly require precise information about the network, such as large-scale fading coefficients, small-scale fading channel statistics, noise variance at the access points, and user activity probability. Acquiring these information… ▽ More

    Submitted 2 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: 16 pages, 9 figures, accepted for publication in IEEE Transactions on Signal Processing

    MSC Class: 68T01

  8. Integrated Sensing and Communication with Massive MIMO: A Unified Tensor Approach for Channel and Target Parameter Estimation

    Authors: Ruoyu Zhang, Lei Cheng, Shuai Wang, Yi Lou, Yulong Gao, Wen Wu, Derrick Wing Kwan Ng

    Abstract: Benefitting from the vast spatial degrees of freedom, the amalgamation of integrated sensing and communication (ISAC) and massive multiple-input multiple-output (MIMO) is expected to simultaneously improve spectral and energy efficiencies as well as the sensing capability. However, a large number of antennas deployed in massive MIMO-ISAC raises critical challenges in acquiring both accurate channe… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Journal ref: IEEE Transactions on Wireless Communications, 2024

  9. arXiv:2311.05929  [pdf, other

    cs.CV eess.IV

    Efficient Segmentation with Texture in Ore Images Based on Box-supervised Approach

    Authors: Guodong Sun, Delong Huang, Yuting Peng, Le Cheng, Bo Wu, Yang Zhang

    Abstract: Image segmentation methods have been utilized to determine the particle size distribution of crushed ores. Due to the complex working environment, high-powered computing equipment is difficult to deploy. At the same time, the ore distribution is stacked, and it is difficult to identify the complete features. To address this issue, an effective box-supervised technique with texture features is prov… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: 14 pages, 8 figures

  10. arXiv:2309.10456  [pdf, other

    cs.SD cs.CL eess.AS

    Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

    Authors: Luyao Cheng, Siqi Zheng, Qinglin Zhang, Hui Wang, Yafeng Chen, Qian Chen, Shiliang Zhang

    Abstract: Speaker diarization has gained considerable attention within speech processing research community. Mainstream speaker diarization rely primarily on speakers' voice characteristics extracted from acoustic signals and often overlook the potential of semantic information. Considering the fact that speech signals can efficiently convey the content of a speech, it is of our interest to fully exploit th… ▽ More

    Submitted 4 February, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

  11. arXiv:2309.00787  [pdf, other

    cs.RO eess.IV eess.SP eess.SY

    Online Targetless Radar-Camera Extrinsic Calibration Based on the Common Features of Radar and Camera

    Authors: Lei Cheng, Siyang Cao

    Abstract: Sensor fusion is essential for autonomous driving and autonomous robots, and radar-camera fusion systems have gained popularity due to their complementary sensing capabilities. However, accurate calibration between these two sensors is crucial to ensure effective fusion and improve overall system performance. Calibration involves intrinsic and extrinsic calibration, with the latter being particula… ▽ More

    Submitted 24 January, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

  12. arXiv:2308.04930  [pdf, other

    eess.SP

    Striking The Right Balance: Three-Dimensional Ocean Sound Speed Field Reconstruction Using Tensor Neural Networks

    Authors: Siyuan Li, Lei Cheng, Ting Zhang, Hangfang Zhao, Jianlong Li

    Abstract: Accurately reconstructing a three-dimensional ocean sound speed field (3D SSF) is essential for various ocean acoustic applications, but the sparsity and uncertainty of sound speed samples across a vast ocean region make it a challenging task. To tackle this challenge, a large body of reconstruction methods has been developed, including spline interpolation, matrix/tensor-based completion, and dee… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  13. arXiv:2308.02774  [pdf, other

    eess.AS cs.SD

    Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Wen Wang

    Abstract: Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persisting challenge. In this paper, we propose a new self-supervised speaker verification approach, Self-Distillation Prototypes Network (SDPN), which effectively facilitates self-supervised speaker representation learning. SDPN assigns the representation of the augmented views of an… ▽ More

    Submitted 26 June, 2024; v1 submitted 4 August, 2023; originally announced August 2023.

    Comments: arXiv admin note: text overlap with arXiv:2211.04168 I submitted the updated paper for arXiv:2308.02774 with the revised version. As for arXiv:2406.11169, I mistakenly submitted this last time, so I withdrew arXiv:2406.11169 and merged the latest content into arXiv: 2308.02774

  14. arXiv:2307.15264  [pdf, other

    cs.RO eess.SP eess.SY

    3D Radar and Camera Co-Calibration: A Flexible and Accurate Method for Target-based Extrinsic Calibration

    Authors: Lei Cheng, Arindam Sengupta, Siyang Cao

    Abstract: Advances in autonomous driving are inseparable from sensor fusion. Heterogeneous sensors are widely used for sensor fusion due to their complementary properties, with radar and camera being the most equipped sensors. Intrinsic and extrinsic calibration are essential steps in sensor fusion. The extrinsic calibration, independent of the sensor's own parameters, and performed after the sensors are in… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  15. Multipath Time-delay Estimation with Impulsive Noise via Bayesian Compressive Sensing

    Authors: Xingyu Ji, Lei Cheng, Hangfang Zhao

    Abstract: Multipath time-delay estimation is commonly encountered in radar and sonar signal processing. In some real-life environments, impulse noise is ubiquitous and significantly degrades estimation performance. Here, we propose a Bayesian approach to tailor the Bayesian Compressive Sensing (BCS) to mitigate impulsive noises. In particular, a heavy-tail Laplacian distribution is used as a statistical mod… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  16. arXiv:2306.15354  [pdf, other

    cs.CL cs.SD eess.AS

    3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

    Authors: Siqi Zheng, Luyao Cheng, Yafeng Chen, Hui Wang, Qian Chen

    Abstract: Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated information. We present a large-scale speech corpus to facilitate the research of speech representation disentanglement. 3D-Speaker contains over 10,000… ▽ More

    Submitted 24 September, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  17. arXiv:2306.11149  [pdf, other

    eess.SP

    Overcoming Beam Squint in Dual-Wideband mmWave MIMO Channel Estimation: A Bayesian Multi-Band Sparsity Approach

    Authors: Le Xu, Lei Cheng, Ngai Wong, Yik-Chung Wu, H. Vincent Poor

    Abstract: The beam squint effect, which manifests in different steering matrices in different sub-bands, has been widely considered a challenge in millimeter wave (mmWave) multiinput multi-output (MIMO) channel estimation. Existing methods either require specific forms of the precoding/combining matrix, which restrict their general practicality, or simply ignore the beam squint effect by only making use of… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

  18. arXiv:2306.11123  [pdf, other

    eess.SP cs.CV

    To Fold or Not to Fold: Graph Regularized Tensor Train for Visual Data Completion

    Authors: Le Xu, Lei Cheng, Ngai Wong, Yik-Chung Wu

    Abstract: Tensor train (TT) representation has achieved tremendous success in visual data completion tasks, especially when it is combined with tensor folding. However, folding an image or video tensor breaks the original data structure, leading to local information loss as nearby pixels may be assigned into different dimensions and become far away from each other. In this paper, to fully preserve the local… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

  19. arXiv:2305.12927  [pdf, other

    cs.CL cs.SD eess.AS

    Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization

    Authors: Luyao Cheng, Siqi Zheng, Zhang Qinglin, Hui Wang, Yafeng Chen, Qian Chen

    Abstract: Speaker diarization(SD) is a classic task in speech processing and is crucial in multi-party scenarios such as meetings and conversations. Current mainstream speaker diarization approaches consider acoustic information only, which result in performance degradation when encountering adverse acoustic conditions. In this paper, we propose methods to extract speaker-related information from semantic c… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of ACL 2023

  20. arXiv:2305.12838  [pdf, other

    eess.AS cs.SD

    An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Jiajun Qi

    Abstract: Effective fusion of multi-scale features is crucial for improving speaker verification performance. While most existing methods aggregate multi-scale features in a layer-wise manner via simple operations, such as summation or concatenation. This paper proposes a novel architecture called Enhanced Res2Net (ERes2Net), which incorporates both local and global feature fusion techniques to improve the… ▽ More

    Submitted 3 August, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  21. arXiv:2305.01183  [pdf, other

    cs.CV eess.IV

    Faster OreFSDet : A Lightweight and Effective Few-shot Object Detector for Ore Images

    Authors: Yang Zhang, Le Cheng, Yuting Peng, Chengming Xu, Yanwei Fu, Bo Wu, Guodong Sun

    Abstract: For the ore particle size detection, obtaining a sizable amount of high-quality ore labeled data is time-consuming and expensive. General object detection methods often suffer from severe over-fitting with scarce labeled data. Despite their ability to eliminate over-fitting, existing few-shot object detectors encounter drawbacks such as slow detection speed and high memory requirements, making the… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: 18 pages, 11 figures

  22. arXiv:2304.01064  [pdf, other

    cs.CV eess.IV

    Real-time 6K Image Rescaling with Rate-distortion Optimization

    Authors: Chenyang Qi, Xin Yang, Ka Leong Cheng, Ying-Cong Chen, Qifeng Chen

    Abstract: Contemporary image rescaling aims at embedding a high-resolution (HR) image into a low-resolution (LR) thumbnail image that contains embedded information for HR image reconstruction. Unlike traditional image super-resolution, this enables high-fidelity HR image restoration faithful to the original one, given the embedded information in the LR thumbnail. However, state-of-the-art image rescaling me… ▽ More

    Submitted 19 May, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted by CVPR 2023; Github Repository: https://github.com/AbnerVictor/HyperThumbnail

  23. arXiv:2303.09560  [pdf

    eess.SY math.PR

    Methodology for Capacity Credit Evaluation of Physical and Virtual Energy Storage in Decarbonized Power System

    Authors: Ning Qi, Peng Li, Lin Cheng, Ziyi Zhang, Wenrui Huang, Weiwei Yang

    Abstract: Energy storage (ES) and virtual energy storage (VES) are key components to realizing power system decarbonization. Although ES and VES have been proven to deliver various types of grid services, little work has so far provided a systematical framework for quantifying their adequacy contribution and credible capacity value while incorporating human and market behavior. Therefore, this manuscript pr… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: capacity credit, decision-dependent uncertainty, decarbonized power system

  24. arXiv:2303.06340  [pdf, other

    q-bio.QM cs.LG eess.IV

    Intelligent diagnostic scheme for lung cancer screening with Raman spectra data by tensor network machine learning

    Authors: Yu-Jia An, Sheng-Chen Bai, Lin Cheng, Xiao-Guang Li, Cheng-en Wang, Xiao-Dong Han, Gang Su, Shi-Ju Ran, Cong Wang

    Abstract: Artificial intelligence (AI) has brought tremendous impacts on biomedical sciences from academic researches to clinical applications, such as in biomarkers' detection and diagnosis, optimization of treatment, and identification of new therapeutic targets in drug discovery. However, the contemporary AI technologies, particularly deep machine learning (ML), severely suffer from non-interpretability,… ▽ More

    Submitted 11 March, 2023; originally announced March 2023.

    Comments: 10 pages, 7 figures

  25. arXiv:2303.00332  [pdf, other

    cs.SD eess.AS

    CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking

    Authors: Hui Wang, Siqi Zheng, Yafeng Chen, Luyao Cheng, Qian Chen

    Abstract: Time delay neural network (TDNN) has been proven to be efficient for speaker verification. One of its successful variants, ECAPA-TDNN, achieved state-of-the-art performance at the cost of much higher computational complexity and slower inference speed. This makes it inadequate for scenarios with demanding inference rate and limited computational resources. We are thus interested in finding an arch… ▽ More

    Submitted 16 June, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  26. arXiv:2212.14739  [pdf

    eess.SP cs.AI

    Semantic optical fiber communication system

    Authors: Zhenming Yu, Hongyu Huang, Liming Cheng, Wei Zhang, Yueqiu Mu, Kun Xu

    Abstract: The current optical communication systems minimize bit or symbol errors without considering the semantic meaning behind digital bits, thus transmitting a lot of unnecessary information. We propose and experimentally demonstrate a semantic optical fiber communication (SOFC) system. Instead of encoding information into bits for transmission, semantic information is extracted from the source using de… ▽ More

    Submitted 27 December, 2022; originally announced December 2022.

  27. arXiv:2212.07608  [pdf, other

    cs.LG eess.SP eess.SY

    Output-Dependent Gaussian Process State-Space Model

    Authors: Zhidi Lin, Lei Cheng, Feng Yin, Lexi Xu, Shuguang Cui

    Abstract: Gaussian process state-space model (GPSSM) is a fully probabilistic state-space model that has attracted much attention over the past decade. However, the outputs of the transition function in the existing GPSSMs are assumed to be independent, meaning that the GPSSMs cannot exploit the inductive biases between different outputs and lose certain model capacities. To address this issue, this paper p… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: 5 pages, 4 figures

  28. arXiv:2211.12220  [pdf, other

    cs.CL cs.SD eess.AS

    A Scope Sensitive and Result Attentive Model for Multi-Intent Spoken Language Understanding

    Authors: Lizhi Cheng, Wenmian Yang, Weijia Jia

    Abstract: Multi-Intent Spoken Language Understanding (SLU), a novel and more complex scenario of SLU, is attracting increasing attention. Unlike traditional SLU, each intent in this scenario has its specific scope. Semantic information outside the scope even hinders the prediction, which tremendously increases the difficulty of intent detection. More seriously, guiding slot filling with these inaccurate int… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  29. arXiv:2211.04168  [pdf, other

    eess.AS cs.SD

    Pushing the limits of self-supervised speaker verification using regularized distillation framework

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen

    Abstract: Training robust speaker verification systems without speaker labels has long been a challenging task. Previous studies observed a large performance gap between self-supervised and fully supervised methods. In this paper, we apply a non-contrastive self-supervised learning framework called DIstillation with NO labels (DINO) and propose two regularization terms applied to embeddings in DINO. One reg… ▽ More

    Submitted 2 August, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

  30. arXiv:2207.10869  [pdf, other

    eess.IV cs.CV

    Optimizing Image Compression via Joint Learning with Denoising

    Authors: Ka Leong Cheng, Yueqi Xie, Qifeng Chen

    Abstract: High levels of noise usually exist in today's captured images due to the relatively small sensors equipped in the smartphone cameras, where the noise brings extra challenges to lossy image compression algorithms. Without the capacity to tell the difference between image details and noise, general image compression methods allocate additional bits to explicitly store the undesired image noise durin… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022

  31. arXiv:2207.10204  [pdf, ps, other

    cs.IT eess.SP

    Watermark-Based Code Construction for Finite-State Markov Channel with Synchronisation Errors

    Authors: Shamin Achari, Ling Cheng

    Abstract: With advancements in telecommunications, data transmission over increasingly harsher channels that produce synchronisation errors is inevitable. Coding schemes for such channels are available through techniques such as the Davey-MacKay watermark coding; however, this is limited to memoryless channel estimates. Memory must be accounted for to ensure a realistic channel approximation - similar to a… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Submitted to Elsevier Digital Signal Processing

  32. arXiv:2206.05427  [pdf, ps, other

    eess.SP

    Reconfigurable Intelligent Surface-Aided 6G Massive Access: Coupled Tensor Modeling and Sparse Bayesian Learning

    Authors: Xiaodan Shao, Lei Cheng, Xiaoming Chen, Chongwen Huang, Derrick Wing Kwan Ng

    Abstract: This paper investigates a reconfigurable intelligent surface (RIS)-aided unsourced random access (URA) scheme for the sixth-generation (6G) wireless networks with massive sporadic traffic devices. First of all, this paper proposes a novel joint active device separation (the message recovery of active device) and channel estimation architecture for the RIS-aided URA. Specifically, the RIS passive r… ▽ More

    Submitted 11 June, 2022; originally announced June 2022.

    Comments: Accepted by IEEE Transactions on Wireless Communications. arXiv admin note: text overlap with arXiv:2108.11123

  33. arXiv:2205.14283  [pdf, other

    stat.ML cs.LG eess.IV eess.SP

    Rethinking Bayesian Learning for Data Analysis: The Art of Prior and Inference in Sparsity-Aware Modeling

    Authors: Lei Cheng, Feng Yin, Sergios Theodoridis, Sotirios Chatzis, Tsung-Hui Chang

    Abstract: Sparse modeling for signal processing and machine learning has been at the focus of scientific research for over two decades. Among others, supervised sparsity-aware learning comprises two major paths paved by: a) discriminative methods and b) generative methods. The latter, more widely known as Bayesian methods, enable uncertainty evaluation w.r.t. the performed predictions. Furthermore, they can… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: 64 pages, 16 figures, 6 tables, 98 references, submitted to IEEE Signal Processing Magazine

  34. Downlink Channel Covariance Matrix Reconstruction for FDD Massive MIMO Systems with Limited Feedback

    Authors: Kai Li, Ying Li, Lei Cheng, Qingjiang Shi, Zhi-Quan Luo

    Abstract: The downlink channel covariance matrix (CCM) acquisition is the key step for the practical performance of massive multiple-input and multiple-output (MIMO) systems, including beamforming, channel tracking, and user scheduling. However, this task is challenging in the popular frequency division duplex massive MIMO systems with Type I codebook due to the limited channel information feedback. In this… ▽ More

    Submitted 12 September, 2023; v1 submitted 2 April, 2022; originally announced April 2022.

  35. arXiv:2203.13991  [pdf

    q-fin.RM eess.SY math.PR

    Risk Assessment with Generic Energy Storage under Exogenous and Endogenous Uncertainty

    Authors: Ning Qi, Lin Cheng, Yuxiang Wan, Yingrui Zhuang, Zeyu Liu

    Abstract: Current risk assessment ignores the stochastic nature of energy storage availability itself and thus lead to potential risk during operation. This paper proposes the redefinition of generic energy storage (GES) that is allowed to offer probabilistic reserve. A data-driven unified model with exogenous and endogenous uncertainty (EXU & EDU) description is presented for four typical types of GES. Mor… ▽ More

    Submitted 26 March, 2022; originally announced March 2022.

    Comments: PES GM2022-Exogenous and Endogenous Uncertainty

  36. arXiv:2202.11490  [pdf, other

    cs.LG cs.DC eess.SP

    Towards Tailored Models on Private AIoT Devices: Federated Direct Neural Architecture Search

    Authors: Chunhui Zhang, Xiaoming Yuan, Qianyun Zhang, Guangxu Zhu, Lei Cheng, Ning Zhang

    Abstract: Neural networks often encounter various stringent resource constraints while deploying on edge devices. To tackle these problems with less human efforts, automated machine learning becomes popular in finding various neural architectures that fit diverse Artificial Intelligence of Things (AIoT) scenarios. Recently, to prevent the leakage of private information while enable automated machine intelli… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: text overlap with arXiv:2011.03372

  37. arXiv:2202.01630  [pdf, other

    eess.AS cs.SD

    A deep complex multi-frame filtering network for stereophonic acoustic echo cancellation

    Authors: Linjuan Cheng, Chengshi Zheng, Andong Li, Yuquan Wu, Renhua Peng, Xiaodong Li

    Abstract: In hands-free communication system, the coupling between loudspeaker and microphone generates echo signal, which can severely influence the quality of communication. Meanwhile, various types of noise in communication environments further reduce speech quality and intelligibility. It is difficult to extract the near-end signal from the microphone signal within one step, especially in low signal-to-… ▽ More

    Submitted 5 May, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

  38. arXiv:2201.13143  [pdf, other

    cs.AI cs.LG eess.SY

    CoTV: Cooperative Control for Traffic Light Signals and Connected Autonomous Vehicles using Deep Reinforcement Learning

    Authors: Jiaying Guo, Long Cheng, Shen Wang

    Abstract: The target of reducing travel time only is insufficient to support the development of future smart transportation systems. To align with the United Nations Sustainable Development Goals (UN-SDG), a further reduction of fuel and emissions, improvements of traffic safety, and the ease of infrastructure deployment and maintenance should also be considered. Different from existing work focusing on the… ▽ More

    Submitted 14 February, 2023; v1 submitted 31 January, 2022; originally announced January 2022.

  39. arXiv:2201.11999  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    Dual Learning Music Composition and Dance Choreography

    Authors: Shuang Wu, Zhenguang Li, Shijian Lu, Li Cheng

    Abstract: Music and dance have always co-existed as pillars of human activities, contributing immensely to the cultural, social, and entertainment functions in virtually all societies. Notwithstanding the gradual systematization of music and dance into two independent disciplines, their intimate connection is undeniable and one art-form often appears incomplete without the other. Recent research works have… ▽ More

    Submitted 28 January, 2022; originally announced January 2022.

    Comments: ACMMM 2021 (Oral)

  40. Tensor-based Basis Function Learning for Three-dimensional Sound Speed Fields

    Authors: Lei Cheng, Xingyu Ji, Hangfang Zhao, Jianlong Li, Wen Xu

    Abstract: Basis function learning is the step** stone towards effective three-dimensional (3D) sound speed field (SSF) inversion for various acoustic signal processing tasks, including ocean acoustic tomography, underwater target localization/tracking, and underwater communications. Classical basis functions include the empirical orthogonal functions (EOFs), Fourier basis functions, and their combinations… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

  41. arXiv:2112.01806  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Music-to-Dance Generation with Optimal Transport

    Authors: Shuang Wu, Shijian Lu, Li Cheng

    Abstract: Dance choreography for a piece of music is a challenging task, having to be creative in presenting distinctive stylistic dance elements while taking into account the musical theme and rhythm. It has been tackled by different approaches such as similarity retrieval, sequence-to-sequence modeling and generative adversarial networks, but their generated dance sequences are often short of motion reali… ▽ More

    Submitted 4 May, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

    Comments: IJCAI 2022

  42. arXiv:2111.11658  [pdf, other

    eess.IV cs.CV

    The RETA Benchmark for Retinal Vascular Tree Analysis

    Authors: Xingzheng Lyu, Li Cheng, Sanyuan Zhang

    Abstract: Topological and geometrical analysis of retinal blood vessel is a cost-effective way for early detection of many common diseases. Meanwhile, automated vessel segmentation and vascular tree analysis are still lacking in terms of generalization capability. In this work, we construct a novel benchmark RETA with 81 labeled vessel masks aiming to facilitate retinal vessel analysis. A semi-automated coa… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

    Comments: 13 pages,6 figures, 4 tables

  43. arXiv:2110.12915  [pdf

    eess.IV cs.AI cs.CV physics.med-ph

    Revealing unforeseen diagnostic image features with deep learning by detecting cardiovascular diseases from apical four-chamber ultrasounds

    Authors: Li-Hsin Cheng, Pablo B. J. Bosch, Rutger F. H. Hofman, Timo B. Brakenhoff, Eline F. Bruggemans, Rob J. van der Geest, Eduard R. Holman

    Abstract: Background. With the rise of highly portable, wireless, and low-cost ultrasound devices and automatic ultrasound acquisition techniques, an automated interpretation method requiring only a limited set of views as input could make preliminary cardiovascular disease diagnoses more accessible. In this study, we developed a deep learning (DL) method for automated detection of impaired left ventricular… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

  44. arXiv:2110.07191  [pdf

    eess.SP cs.AI cs.LG

    CNN-DST: ensemble deep learning based on Dempster-Shafer theory for vibration-based fault recognition

    Authors: Vahid Yaghoubi, Liangliang Cheng, Wim Van Paepegem, Mathias Kersemans

    Abstract: Nowadays, using vibration data in conjunction with pattern recognition methods is one of the most common fault detection strategies for structures. However, their performances depend on the features extracted from vibration data, the features selected to train the classifier, and the classifier used for pattern recognition. Deep learning facilitates the fault detection procedure by automating the… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

    Journal ref: Structural Health Monitoring, February 2022

  45. arXiv:2110.06909  [pdf, other

    stat.ML cs.AI cs.LG cs.NI eess.SP

    Reinforcement Learning for Standards Design

    Authors: Shahrukh Khan Kasi, Sayandev Mukherjee, Lin Cheng, Bernardo A. Huberman

    Abstract: Communications standards are designed via committees of humans holding repeated meetings over months or even years until consensus is achieved. This includes decisions regarding the modulation and coding schemes to be supported over an air interface. We propose a way to "automate" the selection of the set of modulation and coding schemes to be supported over a given air interface and thereby strea… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

  46. arXiv:2108.11123  [pdf, ps, other

    cs.IT eess.SP

    A Bayesian Tensor Approach to Enable RIS for 6G Massive Unsourced Random Access

    Authors: Xiaodan Shao, Lei Cheng, Xiaoming Chen, Chongwen Huang, Derrick Wing Kwan Ng

    Abstract: This paper investigates the problem of joint massive devices separation and channel estimation for a reconfigurable intelligent surface (RIS)-aided unsourced random access (URA) scheme in the sixth-generation (6G) wireless networks. In particular, by associating the data sequences to a rank-one tensor and exploiting the angular sparsity of the channel, the detection problem is cast as a high-order… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

    Comments: IEEE GLOBECOM 2021

  47. arXiv:2108.03690  [pdf, other

    eess.IV cs.CV

    Enhanced Invertible Encoding for Learned Image Compression

    Authors: Yueqi Xie, Ka Leong Cheng, Qifeng Chen

    Abstract: Although deep learning based image compression methods have achieved promising progress these days, the performance of these methods still cannot match the latest compression standard Versatile Video Coding (VVC). Most of the recent developments focus on designing a more accurate and flexible entropy model that can better parameterize the distributions of the latent features. However, few efforts… ▽ More

    Submitted 8 August, 2021; originally announced August 2021.

    Comments: Accepted to ACM Multimedia 2021 as Oral

  48. arXiv:2106.15283  [pdf, other

    cs.CV cs.LG eess.SP

    Similarity Embedding Networks for Robust Human Activity Recognition

    Authors: Chenglin Li, Carrie Lu Tong, Di Niu, Bei Jiang, Xiao Zuo, Lei Cheng, Jian Xiong, Jianming Yang

    Abstract: Deep learning models for human activity recognition (HAR) based on sensor data have been heavily studied recently. However, the generalization ability of deep models on complex real-world HAR data is limited by the availability of high-quality labeled activity data, which are hard to obtain. In this paper, we design a similarity embedding neural network that maps input sensor signals onto real vec… ▽ More

    Submitted 31 May, 2021; originally announced June 2021.

  49. arXiv:2104.03603  [pdf, other

    cs.SD eess.AS

    AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario

    Authors: Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, **gdong Chen

    Abstract: In this paper, we present AISHELL-4, a sizable real-recorded Mandarin speech dataset collected by 8-channel circular microphone array for speech processing in conference scenario. The dataset consists of 211 recorded meeting sessions, each containing 4 to 8 speakers, with a total length of 120 hours. This dataset aims to bridge the advanced research on multi-speaker processing and the practical ap… ▽ More

    Submitted 10 August, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: Accepted by Interspeech 2021

  50. arXiv:2102.05610  [pdf, other

    cs.CV eess.IV

    Searching for Fast Model Families on Datacenter Accelerators

    Authors: Sheng Li, Mingxing Tan, Ruoming Pang, Andrew Li, Liqun Cheng, Quoc Le, Norman P. Jouppi

    Abstract: Neural Architecture Search (NAS), together with model scaling, has shown remarkable progress in designing high accuracy and fast convolutional architecture families. However, as neither NAS nor model scaling considers sufficient hardware architecture details, they do not take full advantage of the emerging datacenter (DC) accelerators. In this paper, we search for fast and accurate CNN model famil… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.