Skip to main content

Showing 1–50 of 81 results for author: Zheng, C

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.14210  [pdf, other

    eess.IV cs.CV cs.LG

    Self-Supervised Pretext Tasks for Alzheimer's Disease Classification using 3D Convolutional Neural Networks on Large-Scale Synthetic Neuroimaging Dataset

    Authors: Chen Zheng

    Abstract: Structural magnetic resonance imaging (MRI) studies have shown that Alzheimer's Disease (AD) induces both localised and widespread neural degenerative changes throughout the brain. However, the absence of segmentation that highlights brain degenerative changes presents unique challenges for training CNN-based classifiers in a supervised fashion. In this work, we evaluated several unsupervised meth… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2406.02887  [pdf, other

    eess.AS cs.SD

    USM RNN-T model weights binarization

    Authors: Oleg Rybakov, Dmitriy Serdyuk, Chengjian Zheng

    Abstract: Large-scale universal speech models (USM) are already used in production. However, as the model size grows, the serving cost grows too. Serving cost of large models is dominated by model size that is why model size reduction is an important research topic. In this work we are focused on model size reduction using weights only quantization. We present the weights binarization of USM Recurrent Neura… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  4. arXiv:2405.10705  [pdf, other

    eess.IV cs.CV

    3D Vessel Reconstruction from Sparse-View Dynamic DSA Images via Vessel Probability Guided Attenuation Learning

    Authors: Zhentao Liu, Huangxuan Zhao, Wenhui Qin, Zhenghong Zhou, Xinggang Wang, Wen** Wang, Xiaochun Lai, Chuansheng Zheng, Dinggang Shen, Zhiming Cui

    Abstract: Digital Subtraction Angiography (DSA) is one of the gold standards in vascular disease diagnosing. With the help of contrast agent, time-resolved 2D DSA images deliver comprehensive insights into blood flow information and can be utilized to reconstruct 3D vessel structures. Current commercial DSA systems typically demand hundreds of scanning views to perform reconstruction, resulting in substanti… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 12 pages, 13 figures, 5 tables

  5. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  6. arXiv:2403.18296  [pdf, other

    cs.LG cs.AI eess.SP

    GeNet: A Graph Neural Network-based Anti-noise Task-Oriented Semantic Communication Paradigm

    Authors: Chunhang Zheng, Kechao Cai

    Abstract: Traditional approaches to semantic communication tasks rely on the knowledge of the signal-to-noise ratio (SNR) to mitigate channel noise. Moreover, these methods necessitate training under specific SNR conditions, entailing considerable time and computational resources. In this paper, we propose GeNet, a Graph Neural Network (GNN)-based paradigm for semantic communication aimed at combating noise… ▽ More

    Submitted 14 May, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  7. arXiv:2312.13722  [pdf, other

    cs.SD eess.AS

    BAE-Net: A Low complexity and high fidelity Bandwidth-Adaptive neural network for speech super-resolution

    Authors: Guochen Yu, Xiguang Zheng, Nan Li, Runqiang Han, Chengshi Zheng, Chen Zhang, Chao Zhou, Qi Huang, Bing Yu

    Abstract: Speech bandwidth extension (BWE) has demonstrated promising performance in enhancing the perceptual speech quality in real communication systems. Most existing BWE researches primarily focus on fixed upsampling ratios, disregarding the fact that the effective bandwidth of captured audio may fluctuate frequently due to various capturing devices and transmission conditions. In this paper, we propose… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted to ICASSP 2024

  8. arXiv:2310.05547  [pdf, other

    cs.RO eess.SY

    Geometry-Aware Safety-Critical Local Reactive Controller for Robot Navigation in Unknown and Cluttered Environments

    Authors: Yulin Li, Xindong Tang, Kai Chen, Chunxin Zheng, Haichao Liu, Jun Ma

    Abstract: This work proposes a safety-critical local reactive controller that enables the robot to navigate in unknown and cluttered environments. In particular, the trajectory tracking task is formulated as a constrained polynomial optimization problem. Then, safety constraints are imposed on the control variables invoking the notion of polynomial positivity certificates in conjunction with their Sum-of-Sq… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  9. arXiv:2309.12963  [pdf, ps, other

    eess.AS cs.SD

    Massive End-to-end Models for Short Search Queries

    Authors: Weiran Wang, Rohit Prabhavalkar, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Zhong Meng, CJ Zheng, Yanzhang He, Tara Sainath, Pedro Moreno Mengibar

    Abstract: In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model parameters. The encoders of our models use the neural architecture of Google's universal speech model (USM), with additional funnel pooling layers to signifi… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  10. arXiv:2309.05674  [pdf, other

    eess.IV cs.CV

    ConvFormer: Plug-and-Play CNN-Style Transformers for Improving Medical Image Segmentation

    Authors: Xian Lin, Zengqiang Yan, Xianbo Deng, Chuansheng Zheng, Li Yu

    Abstract: Transformers have been extensively studied in medical image segmentation to build pairwise long-range dependence. Yet, relatively limited well-annotated medical image data makes transformers struggle to extract diverse global features, resulting in attention collapse where attention maps become similar or even identical. Comparatively, convolutional neural networks (CNNs) have better convergence p… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: Accepted by MICCAI 2023

  11. arXiv:2309.02318  [pdf, other

    cs.CV eess.IV

    TiAVox: Time-aware Attenuation Voxels for Sparse-view 4D DSA Reconstruction

    Authors: Zhenghong Zhou, Huangxuan Zhao, Jiemin Fang, Dongqiao Xiang, Lei Chen, Lingxia Wu, Feihong Wu, Wenyu Liu, Chuansheng Zheng, Xinggang Wang

    Abstract: Four-dimensional Digital Subtraction Angiography (4D DSA) plays a critical role in the diagnosis of many medical diseases, such as Arteriovenous Malformations (AVM) and Arteriovenous Fistulas (AVF). Despite its significant application value, the reconstruction of 4D DSA demands numerous views to effectively model the intricate vessels and radiocontrast flow, thereby implying a significant radiatio… ▽ More

    Submitted 19 December, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: 10 pages, 8 figures

  12. arXiv:2308.10522  [pdf, other

    cs.CV cs.LG eess.IV

    Information Theory-Guided Heuristic Progressive Multi-View Coding

    Authors: Jiangmeng Li, Hang Gao, Wenwen Qiang, Changwen Zheng

    Abstract: Multi-view representation learning aims to capture comprehensive information from multiple views of a shared context. Recent works intuitively apply contrastive learning to different views in a pairwise manner, which is still scalable: view-specific noise is not filtered in learning view-shared representations; the fake negative pairs, where the negative terms are actually within the same class as… ▽ More

    Submitted 23 August, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: This paper is accepted by the jourcal of Neural Networks (Elsevier) by 2023. arXiv admin note: substantial text overlap with arXiv:2109.02344

  13. arXiv:2308.09944  [pdf, other

    cs.SD eess.AS

    Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection

    Authors: Cunhang Fan, Jun Xue, Jianhua Tao, Jiangyan Yi, Chenglong Wang, Chengshi Zheng, Zhao Lv

    Abstract: The rhythm of synthetic speech is usually too smooth, which causes that the fundamental frequency (F0) of synthetic speech is significantly different from that of real speech. It is expected that the F0 feature contains the discriminative information for the fake speech detection (FSD) task. In this paper, we propose a novel F0 subband for FSD. In addition, to effectively model the F0 subband so a… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

  14. arXiv:2307.13066  [pdf, other

    eess.SP

    Ultra-Wideband Technology: Characteristics, Applications and Challenges

    Authors: Chutao Zheng, Yuchu Ge, Anfu Guo

    Abstract: Ultra-wideband (UWB) technology is a wireless communication technology designed for short-range applications. It is characterized by its ability to generate and transmit radio-frequency energy over an extensive frequency range. This paper provides an overview of UWB technology including its definition, two representative schemes and some key characteristics distinguished from other types of commun… ▽ More

    Submitted 13 August, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

  15. arXiv:2307.04506  [pdf, ps, other

    eess.SP

    Distributed Decisions on Optimal Load Balancing in Loss Networks

    Authors: Qiong Liu, Chehao Wang, Ce Zheng

    Abstract: When multiple users share a common link in direct transmission, packet loss and network collision may occur due to the simultaneous arrival of traffics at the source node. To tackle this problem, users may resort to an indirect path: the packet flows are first relayed through a sidelink to another source node, then transmitted to the destination. This behavior brings the problems of packet routing… ▽ More

    Submitted 17 July, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

    Comments: 8 pages, WiOPT workshop RAWNET

  16. arXiv:2306.02894  [pdf, ps, other

    eess.IV

    Recyclable Semi-supervised Method Based on Multi-model Ensemble for Video Scene Parsing

    Authors: Biao Wu, Shaoli Liu, Diankai Zhang, Chengjian Zheng, Si Gao, Xiaofeng Zhang, Ning Wang

    Abstract: Pixel-level Scene Understanding is one of the fundamental problems in computer vision, which aims at recognizing object classes, masks and semantics of each pixel in the given image. Since the real-world is actually video-based rather than a static state, learning to perform video semantic segmentation is more reasonable and practical for realistic applications. In this paper, we adopt Mask2Former… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  17. arXiv:2305.19507  [pdf, other

    cs.CV eess.IV

    Manifold Constraint Regularization for Remote Sensing Image Generation

    Authors: Xingzhe Su, Changwen Zheng, Wenwen Qiang, Fengge Wu, Junsuo Zhao, Fuchun Sun, Hui Xiong

    Abstract: Generative Adversarial Networks (GANs) have shown notable accomplishments in remote sensing domain. However, this paper reveals that their performance on remote sensing images falls short when compared to their impressive results with natural images. This study identifies a previously overlooked issue: GANs exhibit a heightened susceptibility to overfitting on remote sensing images.To address this… ▽ More

    Submitted 28 March, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

  18. arXiv:2305.02100  [pdf, other

    cs.CV eess.IV

    Single Image Deraining via Feature-based Deep Convolutional Neural Network

    Authors: Chaobing Zheng, Jun Jiang, Wenjian Ying, Shiqian Wu

    Abstract: It is challenging to remove rain-steaks from a single rainy image because the rain steaks are spatially varying in the rainy image. Although the CNN based methods have reported promising performance recently, there are still some defects, such as data dependency and insufficient interpretation. A single image deraining algorithm based on the combination of data-driven and model-based approaches is… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: 6 pages, 5 figures. arXiv admin note: substantial text overlap with arXiv:2209.07808

    MSC Class: Machine vision and scene understanding

  19. arXiv:2304.06570  [pdf

    physics.optics eess.IV

    Single-shot quantitative differential phase contrast imaging combined with programmable polarization multiplexing illumination

    Authors: Siying Liu, Chuanjian Zheng, Qun Hao, Xin Li, Shaohui Zhang

    Abstract: We propose a single-shot quantitative differential phase contrast (DPC) method with polarization multiplexing illumination. In the illumination module of our system, the programmable LED array is divided into four quadrants and covered with polarizing films of four different polarization angles. We use a polarization camera with polarizers before the pixels in the imaging module. By matching the p… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: 5 pages,4figures

  20. arXiv:2303.05240  [pdf, other

    cs.CV eess.IV

    Intriguing Property and Counterfactual Explanation of GAN for Remote Sensing Image Generation

    Authors: Xingzhe Su, Wenwen Qiang, Jie Hu, Fengge Wu, Changwen Zheng, Fuchun Sun

    Abstract: Generative adversarial networks (GANs) have achieved remarkable progress in the natural image field. However, when applying GANs in the remote sensing (RS) image generation task, an extraordinary phenomenon is observed: the GAN model is more sensitive to the size of training data for RS image generation than for natural image generation. In other words, the generation quality of RS images will cha… ▽ More

    Submitted 14 May, 2024; v1 submitted 9 March, 2023; originally announced March 2023.

  21. arXiv:2302.13063  [pdf, other

    eess.AS cs.LG cs.SD

    Time-Variance Aware Real-Time Speech Enhancement

    Authors: Chengyu Zheng, Yuan Zhou, Xiulian Peng, Yuan Zhang, Yan Lu

    Abstract: Time-variant factors often occur in real-world full-duplex communication applications. Some of them are caused by the complex environment such as non-stationary environmental noises and varying acoustic path while some are caused by the communication system such as the dynamic delay between the far-end and near-end signals. Current end-to-end deep neural network (DNN) based methods usually model t… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

  22. arXiv:2302.10377  [pdf, other

    eess.AS cs.SD

    Real-time speech enhancement with dynamic attention span

    Authors: Chengyu Zheng, Yuan Zhou, Xiulian Peng, Yuan Zhang, Yan Lu

    Abstract: For real-time speech enhancement (SE) including noise suppression, dereverberation and acoustic echo cancellation, the time-variance of the audio signals becomes a severe challenge. The causality and memory usage limit that only the historical information can be used for the system to capture the time-variant characteristics. We propose to adaptively change the receptive field according to the inp… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: ICASSP 2023 (Accepted)

  23. arXiv:2301.03281  [pdf, other

    eess.IV cs.CV

    The state-of-the-art 3D anisotropic intracranial hemorrhage segmentation on non-contrast head CT: The INSTANCE challenge

    Authors: Xiangyu Li, Gongning Luo, Kuanquan Wang, Hongyu Wang, Jun Liu, Xinjie Liang, Jie Jiang, Zhenghao Song, Chunyue Zheng, Haokai Chi, Mingwang Xu, Yingte He, Xinghua Ma, **gwen Guo, Yifan Liu, Chuanpu Li, Zeli Chen, Md Mahfuzur Rahman Siddiquee, Andriy Myronenko, Antoine P. Sanner, Anirban Mukhopadhyay, Ahmed E. Othman, Xingyu Zhao, Wei** Liu, **huang Zhang , et al. (9 additional authors not shown)

    Abstract: Automatic intracranial hemorrhage segmentation in 3D non-contrast head CT (NCCT) scans is significant in clinical practice. Existing hemorrhage segmentation methods usually ignores the anisotropic nature of the NCCT, and are evaluated on different in-house datasets with distinct metrics, making it highly challenging to improve segmentation performance and perform objective comparisons among differ… ▽ More

    Submitted 12 January, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

    Comments: Summarized paper for the MICCAI INSTANCE 2022 Challenge

  24. arXiv:2211.16764  [pdf, other

    cs.SD eess.AS

    A General Unfolding Speech Enhancement Method Motivated by Taylor's Theorem

    Authors: Andong Li, Guochen Yu, Chengshi Zheng, Wenzhe Liu, Xiaodong Li

    Abstract: While deep neural networks have facilitated significant advancements in the field of speech enhancement, most existing methods are developed following either empirical or relatively blind criteria, lacking adequate guidelines in pipeline design. Inspired by Taylor's theorem, we propose a general unfolding framework for both single- and multi-channel speech enhancement tasks. Concretely, we formula… ▽ More

    Submitted 28 March, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

    Comments: Submitted to TASLP, revised version, 17 pages

  25. arXiv:2211.12024  [pdf, other

    cs.SD eess.AS

    TaylorBeamixer: Learning Taylor-Inspired All-Neural Multi-Channel Speech Enhancement from Beam-Space Dictionary Perspective

    Authors: Andong Li, Guochen Yu, Wenzhe Liu, Xiaodong Li, Chengshi Zheng

    Abstract: Despite the promising performance of existing frame-wise all-neural beamformers in the speech enhancement field, it remains unclear what the underlying mechanism exists. In this paper, we revisit the beamforming behavior from the beam-space dictionary perspective and formulate it into the learning and mixing of different beam-space components. Based on that, we propose an all-neural beamformer cal… ▽ More

    Submitted 30 November, 2022; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: In submission to ICASSP 2023, 5 pages

  26. arXiv:2211.05256  [pdf, other

    eess.IV cs.CV

    Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, Cheng-Ming Chiang, Hsien-Kai Kuo, Yu-Syuan Xu, Man-Yu Lee, Allen Lu, Chia-Ming Cheng, Chih-Cheng Chen, Jia-Ying Yong, Hong-Han Shuai, Wen-Huang Cheng, Zhuang Jia, Tianyu Xu, Yijian Zhang, Long Bao, Heng Sun, Diankai Zhang, Si Gao, Shaoli Liu, Biao Wu, Xiaofeng Zhang, Chengjian Zheng, Kaidi Lu, Ning Wang , et al. (29 additional authors not shown)

    Abstract: Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this prob… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2105.08826, arXiv:2105.07809, arXiv:2211.04470, arXiv:2211.03885

  27. arXiv:2210.16481  [pdf, other

    eess.AS cs.CL cs.SD

    Accelerating RNN-T Training and Inference Using CTC guidance

    Authors: Yongqiang Wang, Zhehuai Chen, Chengjian Zheng, Yu Zhang, Wei Han, Parisa Haghani

    Abstract: We propose a novel method to accelerate training and inference process of recurrent neural network transducer (RNN-T) based on the guidance from a co-trained connectionist temporal classification (CTC) model. We made a key assumption that if an encoder embedding frame is classified as a blank frame by the CTC model, it is likely that this frame will be aligned to blank for all the partial alignmen… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: submitted to ICASSP 2023

  28. arXiv:2210.10892  [pdf, other

    eess.IV cs.LG

    DEEP$^2$: Deep Learning Powered De-scattering with Excitation Patterning

    Authors: Navodini Wijethilake, Mithunjha Anandakumar, Cheng Zheng, Peter T. C. So, Murat Yildirim, Dushan N. Wadduwage

    Abstract: Limited throughput is a key challenge in in-vivo deep-tissue imaging using nonlinear optical microscopy. Point scanning multiphoton microscopy, the current gold standard, is slow especially compared to the wide-field imaging modalities used for optically cleared or thin specimens. We recently introduced 'De-scattering with Excitation Patterning or DEEP', as a widefield alternative to point-scannin… ▽ More

    Submitted 21 October, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

  29. Dual-Scale Single Image Dehazing Via Neural Augmentation

    Authors: Zhengguo Li, Chaobing Zheng, Haiyan Shu, Shiqian Wu

    Abstract: Model-based single image dehazing algorithms restore haze-free images with sharp edges and rich details for real-world hazy images at the expense of low PSNR and SSIM values for synthetic hazy images. Data-driven ones restore haze-free images with high PSNR and SSIM values for synthetic hazy images but with low contrast, and even some remaining haze for real world hazy images. In this paper, a nov… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: Single image dehazing, dual-scale, neural augmentation, haze line averaging, generative adversarial network. arXiv admin note: substantial text overlap with arXiv:2111.10943

  30. arXiv:2208.01214  [pdf, other

    cs.SD cs.LG eess.AS

    Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

    Authors: Jun Xue, Cunhang Fan, Zhao Lv, Jianhua Tao, Jiangyan Yi, Chengshi Zheng, Zhengqi Wen, Minmin Yuan, Shegang Shao

    Abstract: Recently, pioneer research works have proposed a large number of acoustic features (log power spectrogram, linear frequency cepstral coefficients, constant Q cepstral coefficients, etc.) for audio deepfake detection, obtaining good performance, and showing that different subbands have different contributions to audio deepfake detection. However, this lacks an explanation of the specific informatio… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

  31. arXiv:2207.07755  [pdf, other

    math.DS eess.SY

    Carleman Linearization of Nonlinear Systems and Its Finite-Section Approximations

    Authors: Arash Amini, Cong Zheng, Qiyu Sun, Nader Motee

    Abstract: The Carleman linearization is one of the mainstream approaches to lift a finite-dimensional nonlinear dynamical system into an infinite-dimensional linear system with the promise of providing accurate approximations of the original nonlinear system over larger regions around the equilibrium for longer time horizons with respect to the conventional first-order linearization approach. Finite-section… ▽ More

    Submitted 19 July, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

    Comments: 25 Pages, 10 figures

    MSC Class: 34H05 (Primary); 65P99; 37M99 (Secondary) ACM Class: G.1.7; G.1.2

  32. arXiv:2207.01255  [pdf, other

    cs.SD eess.AS

    TMGAN-PLC: Audio Packet Loss Concealment using Temporal Memory Generative Adversarial Network

    Authors: Yuansheng Guan, Guochen Yu, Andong Li, Chengshi Zheng, Jie Wang

    Abstract: Real-time communications in packet-switched networks have become widely used in daily communication, while they inevitably suffer from network delays and data losses in constrained real-time conditions. To solve these problems, audio packet loss concealment (PLC) algorithms have been developed to mitigate voice transmission failures by reconstructing the lost information. Limited by the transmissi… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: accepted by INTERSPEECH 2022

  33. arXiv:2206.05451  [pdf

    physics.optics eess.IV

    Robust full-pose-parameter estimation for the LED array in Fourier ptychographic microscopy

    Authors: Chuanjian Zheng, Shaohui Zhang, Delong Yang, Guocheng Zhou, Yao Hu, Qun Hao

    Abstract: Fourier ptychographic microscopy (FPM) can achieve quantitative phase imaging with a large space-bandwidth product by synthesizing a set of low-resolution intensity images captured under angularly varying illuminations. Determining accurate illumination angles is critical because the consistency between actual systematic parameters and those used in the recovery algorithm is essential for high-qua… ▽ More

    Submitted 11 June, 2022; originally announced June 2022.

    Comments: 14 pages, 6 figures

  34. arXiv:2205.04019  [pdf, other

    eess.SP cs.IT

    Wiener filters on graphs and distributed polynomial approximation algorithms

    Authors: Cong Zheng, Cheng Cheng, Qiyu Sun

    Abstract: In this paper, we consider Wiener filters to reconstruct deterministic and (wide-band) stationary graph signals from their observations corrupted by random noises, and we propose distributed algorithms to implement Wiener filters and inverse filters on networks in which agents are equipped with a data processing subsystem for limited data storage and computation power, and with a one-hop communica… ▽ More

    Submitted 8 May, 2022; originally announced May 2022.

  35. arXiv:2205.00206  [pdf, other

    cs.SD eess.AS

    Taylor, Can You Hear Me Now? A Taylor-Unfolding Framework for Monaural Speech Enhancement

    Authors: Andong Li, Shan You, Guochen Yu, Chengshi Zheng, Xiaodong Li

    Abstract: While the deep learning techniques promote the rapid development of the speech enhancement (SE) community, most schemes only pursue the performance in a black-box manner and lack adequate model interpretability. Inspired by Taylor's approximation theory, we propose an interpretable decoupling-style SE framework, which disentangles the complex spectrum recovery into two separate optimization proble… ▽ More

    Submitted 30 April, 2022; originally announced May 2022.

    Comments: Accepted by IJCAI2022, Long Oral

  36. arXiv:2203.16033  [pdf, other

    cs.SD eess.AS

    Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement

    Authors: Guochen Yu, Andong Li, Wenzhe Liu, Chengshi Zheng, Yutian Wang, Hui Wang

    Abstract: Due to the high computational complexity to model more frequency bands, it is still intractable to conduct real-time full-band speech enhancement based on deep neural networks. Recent studies typically utilize the compressed perceptually motivated features with relatively low frequency resolution to filter the full-band spectrum by one-stage networks, leading to limited speech quality improvements… ▽ More

    Submitted 15 June, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: arXiv admin note: text overlap with arXiv:2203.00472

  37. arXiv:2203.07195  [pdf, other

    cs.SD eess.AS

    TaylorBeamformer: Learning All-Neural Beamformer for Multi-Channel Speech Enhancement from Taylor's Approximation Theory

    Authors: Andong Li, Guochen Yu, Chengshi Zheng, Xiaodong Li

    Abstract: While existing end-to-end beamformers achieve impressive performance in various front-end speech processing tasks, they usually encapsulate the whole process into a black box and thus lack adequate interpretability. As an attempt to fill the blank, we propose a novel neural beamformer inspired by Taylor's approximation theory called TaylorBeamformer for multi-channel speech enhancement. The core i… ▽ More

    Submitted 16 March, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

    Comments: Submitted to Interspeech2022

  38. arXiv:2203.07179  [pdf, other

    cs.SD eess.AS

    MDNet: Learning Monaural Speech Enhancement from Deep Prior Gradient

    Authors: Andong Li, Chengshi Zheng, Ziyang Zhang, Xiaodong Li

    Abstract: While traditional statistical signal processing model-based methods can derive the optimal estimators relying on specific statistical assumptions, current learning-based methods further promote the performance upper bound via deep neural networks but at the expense of high encapsulation and lack adequate interpretability. Standing upon the intersection between traditional model-based methods and l… ▽ More

    Submitted 16 March, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

    Comments: Submitted to Interspeech2022

  39. arXiv:2203.00472  [pdf, other

    cs.SD eess.AS

    DMF-Net: A decoupling-style multi-band fusion model for full-band speech enhancement

    Authors: Guochen Yu, Yuansheng Guan, Weixin Meng, Chengshi Zheng, Hui Wang

    Abstract: For the difficulty and large computational complexity of modeling more frequency bands, full-band speech enhancement based on deep neural networks is still challenging. Previous studies usually adopt compressed full-band speech features in Bark and ERB scale with relatively low frequency resolution, leading to degraded performance, especially in the high-frequency region. In this paper, we propose… ▽ More

    Submitted 30 July, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

  40. arXiv:2202.10239  [pdf, other

    physics.optics eess.IV

    Fourier ptychography multi-parameter neural network with composite physical priori optimization

    Authors: Delong Yang, Shaohui Zhang, Chuanjian Zheng, Guocheng Zhou, Lei Cao, Yao Hu, Qun Hao

    Abstract: Fourier ptychography microscopy(FP) is a recently developed computational imaging approach for microscopic super-resolution imaging. By turning on each light-emitting-diode (LED) located on different position on the LED array sequentially and acquiring the corresponding images that contain different spatial frequency components, high spatial resolution and quantitative phase imaging can be achieve… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: 13 pages, 12 figures, solving inverse problem of computational imaging by neural network

  41. arXiv:2202.07931  [pdf, other

    cs.SD eess.AS

    DBT-Net: Dual-branch federative magnitude and phase estimation with attention-in-attention transformer for monaural speech enhancement

    Authors: Guochen Yu, Andong Li, Hui Wang, Yutian Wang, Yuxuan Ke, Chengshi Zheng

    Abstract: The decoupling-style concept begins to ignite in the speech enhancement area, which decouples the original complex spectrum estimation task into multiple easier sub-tasks i.e., magnitude-only recovery and the residual complex spectrum estimation)}, resulting in better performance and easier interpretability. In this paper, we propose a dual-branch federative magnitude and phase estimation framewor… ▽ More

    Submitted 30 July, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

    Comments: 15 pages;Accepted by IEEE/ACM Trans. Audio. Speech, Lang. Process

  42. arXiv:2202.06764  [pdf, other

    eess.AS cs.SD eess.SP

    Low-latency Monaural Speech Enhancement with Deep Filter-bank Equalizer

    Authors: Chengshi Zheng, Wenzhe Liu, Andong Li, Yuxuan Ke, Xiaodong Li

    Abstract: It is highly desirable that speech enhancement algorithms can achieve good performance while kee** low latency for many applications, such as digital hearing aids, acoustically transparent hearing devices, and public address systems. To improve the performance of traditional low-latency speech enhancement algorithms, a deep filter-bank equalizer (FBE) framework was proposed, which integrated a d… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

    Comments: 35 pages, 8 figures

  43. arXiv:2202.02500  [pdf, other

    cs.SD eess.AS

    A Neural Beam Filter for Real-time Multi-channel Speech Enhancement

    Authors: Wenzhe Liu, Andong Li, Chengshi Zheng, Xiaodong Li

    Abstract: Most deep learning-based multi-channel speech enhancement methods focus on designing a set of beamforming coefficients to directly filter the low signal-to-noise ratio signals received by microphones, which hinders the performance of these approaches. To handle these problems, this paper designs a causal neural beam filter that fully exploits the spatial-spectral information in the beam domain. Sp… ▽ More

    Submitted 5 February, 2022; originally announced February 2022.

    Comments: 5 pages, 4 figures

  44. arXiv:2202.01630  [pdf, other

    eess.AS cs.SD

    A deep complex multi-frame filtering network for stereophonic acoustic echo cancellation

    Authors: Linjuan Cheng, Chengshi Zheng, Andong Li, Yuquan Wu, Renhua Peng, Xiaodong Li

    Abstract: In hands-free communication system, the coupling between loudspeaker and microphone generates echo signal, which can severely influence the quality of communication. Meanwhile, various types of noise in communication environments further reduce speech quality and intelligibility. It is difficult to extract the near-end signal from the microphone signal within one step, especially in low signal-to-… ▽ More

    Submitted 5 May, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

  45. arXiv:2201.09429  [pdf, other

    cs.SD cs.LG eess.AS

    End-to-End Neural Speech Coding for Real-Time Communications

    Authors: Xue Jiang, Xiulian Peng, Chengyu Zheng, Huaying Xue, Yuan Zhang, Yan Lu

    Abstract: Deep-learning based methods have shown their advantages in audio coding over traditional ones but limited attention has been paid on real-time communications (RTC). This paper proposes the TFNet, an end-to-end neural speech codec with low latency for RTC. It takes an encoder-temporal filtering-decoder paradigm that has seldom been investigated in audio coding. An interleaved structure is proposed… ▽ More

    Submitted 15 February, 2022; v1 submitted 23 January, 2022; originally announced January 2022.

    Comments: ICASSP 2022 (Accepted)

  46. arXiv:2112.10046  [pdf, other

    eess.IV cs.CV cs.LG

    A-ESRGAN: Training Real-World Blind Super-Resolution with Attention U-Net Discriminators

    Authors: Zihao Wei, Yidong Huang, Yuang Chen, Chenhao Zheng, **nan Gao

    Abstract: Blind image super-resolution(SR) is a long-standing task in CV that aims to restore low-resolution images suffering from unknown and complex distortions. Recent work has largely focused on adopting more complicated degradation models to emulate real-world degradations. The resulting models have made breakthroughs in perceptual loss and yield perceptually convincing results. However, the limitation… ▽ More

    Submitted 18 December, 2021; originally announced December 2021.

    Comments: 6 pages, 9 figures

  47. arXiv:2112.08561  [pdf, other

    cs.SD eess.AS

    EmotionBox: a music-element-driven emotional music generation system using Recurrent Neural Network

    Authors: Kaitong Zheng, Ruijie Meng, Chengshi Zheng, Xiaodong Li, **qiu Sang, Juanjuan Cai, Jie Wang

    Abstract: With the development of deep neural networks, automatic music composition has made great progress. Although emotional music can evoke listeners' different emotions and it is important for artistic expression, only few researches have focused on generating emotional music. This paper presents EmotionBox -an music-element-driven emotional music generator that is capable of composing music given a sp… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

  48. arXiv:2112.04726  [pdf, other

    cs.SD eess.AS

    Noise-robust blind reverberation time estimation using noise-aware time-frequency masking

    Authors: Kaitong Zheng, Chengshi Zheng, **qiu Sang, Yulong Zhang, Xiaodong Li

    Abstract: The reverberation time is one of the most important parameters used to characterize the acoustic property of an enclosure. In real-world scenarios, it is much more convenient to estimate the reverberation time blindly from recorded speech compared to the traditional acoustic measurement techniques using professional measurement instruments. However, the recorded speech is often corrupted by noise,… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

  49. arXiv:2111.06038  [pdf, other

    cs.CV eess.IV

    Hybrid Saturation Restoration for LDR Images of HDR Scenes

    Authors: Chaobing Zheng, Zhengguo Li, Shiqian Wu

    Abstract: There are shadow and highlight regions in a low dynamic range (LDR) image which is captured from a high dynamic range (HDR) scene. It is an ill-posed problem to restore the saturated regions of the LDR image. In this paper, the saturated regions of the LDR image are restored by fusing model-based and data-driven approaches. With such a neural augmentation, two synthetic LDR images are first genera… ▽ More

    Submitted 14 November, 2021; v1 submitted 10 November, 2021; originally announced November 2021.

    Comments: arXiv admin note: text overlap with arXiv:2007.02042

  50. arXiv:2110.06467  [pdf, other

    cs.SD cs.AI eess.AS

    Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement

    Authors: Guochen Yu, Andong Li, Chengshi Zheng, Yinuo Guo, Yutian Wang, Hui Wang

    Abstract: Curriculum learning begins to thrive in the speech enhancement area, which decouples the original spectrum estimation task into multiple easier sub-tasks to achieve better performance. Motivated by that, we propose a dual-branch attention-in-attention transformer dubbed DB-AIAT to handle both coarse- and fine-grained regions of the spectrum in parallel. From a complementary perspective, a magnitud… ▽ More

    Submitted 14 February, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: Accepted by ICASSP 2022