Skip to main content

Showing 1–50 of 92 results for author: Zhu, W

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.14896  [pdf, other

    eess.IV cs.CV

    SelfReg-UNet: Self-Regularized UNet for Medical Image Segmentation

    Authors: Wenhui Zhu, Xiwen Chen, Peijie Qiu, Mohammad Farazi, Aristeidis Sotiras, Abolfazl Razi, Yalin Wang

    Abstract: Since its introduction, UNet has been leading a variety of medical image segmentation tasks. Although numerous follow-up studies have also been dedicated to improving the performance of standard UNet, few have conducted in-depth analyses of the underlying interest pattern of UNet in medical image segmentation. In this paper, we explore the patterns learned in a UNet and observe two important facto… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted as a conference paper to 2024 MICCAI

  2. arXiv:2406.10856  [pdf, other

    cs.NI eess.SY

    LEO Satellite Networks Assisted Geo-distributed Data Processing

    Authors: Zhiyuan Zhao, Zhe Chen, Zheng Lin, Wenjun Zhu, Kun Qiu, Chaoqun You, Yue Gao

    Abstract: Nowadays, the increasing deployment of edge clouds globally provides users with low-latency services. However, connecting an edge cloud to a core cloud via optic cables in terrestrial networks poses significant barriers due to the prohibitively expensive building cost of optic cables. Fortunately, emerging Low Earth Orbit (LEO) satellite networks (e.g., Starlink) offer a more cost-effective soluti… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 6 pages, 5 figures

  3. arXiv:2405.19665  [pdf

    eess.SY cs.AI cs.LG

    A novel fault localization with data refinement for hydroelectric units

    Authors: Jialong Huang, Junlin Song, Penglong Lian, Mengjie Gan, Zhiheng Su, Benhao Wang, Wenji Zhu, Xiaomin Pu, Jianxiao Zou, Shicai Fan

    Abstract: Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learni… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 6pages,4 figures,Conference on Decision and Control(CDC) conference

  4. arXiv:2403.12425  [pdf, other

    cs.CV cs.SD eess.AS

    Multimodal Fusion Method with Spatiotemporal Sequences and Relationship Learning for Valence-Arousal Estimation

    Authors: Jun Yu, Gongpeng Zhao, Yongqi Wang, Zhihong Wei, Yang Zheng, Zerui Zhang, Zhongpeng Cai, Guochen Xie, Jichao Zhu, Wangyuan Zhu

    Abstract: This paper presents our approach for the VA (Valence-Arousal) estimation task in the ABAW6 competition. We devised a comprehensive model by preprocessing video frames and audio segments to extract visual and audio features. Through the utilization of Temporal Convolutional Network (TCN) modules, we effectively captured the temporal and spatial correlations between these features. Subsequently, we… ▽ More

    Submitted 20 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 8 pages,3 figures

  5. arXiv:2403.11757  [pdf, other

    cs.MM cs.LG cs.SD eess.AS

    Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation

    Authors: Jun Yu, Wangyuan Zhu, Jichao Zhu

    Abstract: In this paper, we present the solution to the Emotional Mimicry Intensity (EMI) Estimation challenge, which is part of 6th Affective Behavior Analysis in-the-wild (ABAW) Competition.The EMI Estimation challenge task aims to evaluate the emotional intensity of seed videos by assessing them from a set of predefined emotion categories (i.e., "Admiration", "Amusement", "Determination", "Empathic Pain"… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  6. arXiv:2402.11834  [pdf, ps, other

    eess.SY eess.SP

    Terahertz User-Centric Clustering in the Presence of Beam Misalignment

    Authors: Khaled Humadi, Imene Trigui, Wei-** Zhu, Wessam Ajib

    Abstract: Beam misalignment is one of the main challenges for the design of reliable wireless systems in terahertz (THz) bands. This paper investigates how to apply user-centric base station (BS) clustering as a valuable add-on in THz networks. In particular, to reduce the impact of beam misalignment, a user-centric BS clustering design that provides multi-connectivity via BS cooperation is investigated. Th… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  7. arXiv:2402.10388  [pdf

    cs.CY eess.SP

    Improvising Age Verification Technologies in Canada: Technical, Regulatory and Social Dynamics

    Authors: Azfar Adib, Wei-** Zhu, M. Omair Ahmad

    Abstract: Age verification, which is a mandatory legal requirement for delivering certain age-appropriate services or products, has recently been emphasized around the globe to ensure online safety for children. The rapid advancement of artificial intelligence has facilitated the recent development of some cutting-edge age-verification technologies, particularly using biometrics. However, successful deploym… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Presented and accepted for publication in the 2023 IEEE International Humanitarian Technologies Conference (IEEE IHTC 2023), November 1 to 3, 2023, Cartagena, Colombia

  8. arXiv:2401.04154  [pdf

    cs.CV cs.AI cs.LG cs.MM cs.SD eess.AS

    Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification

    Authors: Wentao Zhu

    Abstract: Audio and video are two most common modalities in the mainstream media platforms, e.g., YouTube. To learn from multimodal videos effectively, in this work, we propose a novel audio-video recognition approach termed audio video Transformer, AVT, leveraging the effective spatio-temporal representation by the video Transformer to improve action recognition accuracy. For multimodal fusion, simply conc… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted by WACV 2024; well-formatted PDF is in https://drive.google.com/file/d/1qvW52lamsvNGMCqPS7q8g8L4NaR_LlbR/view?usp=sharing. arXiv admin note: text overlap with arXiv:2401.04023

  9. arXiv:2401.04023  [pdf

    cs.CV cs.AI cs.LG cs.MM cs.SD eess.AS

    Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification

    Authors: Wentao Zhu

    Abstract: In recent years, researchers combine both audio and video signals to deal with challenges where actions are not well represented or captured by visual cues. However, how to effectively leverage the two modalities is still under development. In this work, we develop a multiscale multimodal Transformer (MMT) that leverages hierarchical representation learning. Particularly, MMT is composed of a nove… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted by WACV 2024; well-formatted PDF is in https://drive.google.com/file/d/10Zo_ydJZFAm7YsxHDgTjhyc4dEJbW_dk/view?usp=sharing

  10. arXiv:2312.16228  [pdf, other

    cs.SD cs.LG cs.MM cs.NE eess.AS

    Deformable Audio Transformer for Audio Event Detection

    Authors: Wentao Zhu

    Abstract: Transformers have achieved promising results on a variety of tasks. However, the quadratic complexity in self-attention computation has limited the applications, especially in low-resource settings and mobile or edge devices. Existing works have proposed to exploit hand-crafted attention patterns to reduce computation complexity. However, such hand-crafted patterns are data-agnostic and may not be… ▽ More

    Submitted 7 January, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

    Comments: ICASSP 2024. arXiv admin note: substantial text overlap with arXiv:2201.00520 by other authors

  11. arXiv:2312.05786  [pdf, other

    eess.SP cs.IT

    Deep Learning for Joint Design of Pilot, Channel Feedback, and Hybrid Beamforming in FDD Massive MIMO-OFDM Systems

    Authors: Junyi Yang, Weifeng Zhu, Shu Sun, Xiaofeng Li, Xingqin Lin, Meixia Tao

    Abstract: This letter considers the transceiver design in frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems for high-quality data transmission. We propose a novel deep learning based framework where the procedures of pilot design, channel feedback, and hybrid beamforming are realized by carefully crafted deep neural networ… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 5 pages, 4 figures, acccpted by IEEE Communication Letters

  12. arXiv:2312.05557  [pdf, ps, other

    cs.IT eess.SP

    Long-Term Rate-Fairness-Aware Beamforming Based Massive MIMO Systems

    Authors: W. Zhu, H. D. Tuan, E. Dutkiewicz, Y. Fang, H. V. Poor, L. Hanzo

    Abstract: This is the first treatise on multi-user (MU) beamforming designed for achieving long-term rate-fairness in fulldimensional MU massive multi-input multi-output (m-MIMO) systems. Explicitly, based on the channel covariances, which can be assumed to be known beforehand, we address this problem by optimizing the following objective functions: the users' signal-toleakage-noise ratios (SLNRs) using SLN… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  13. arXiv:2311.14264  [pdf, ps, other

    eess.SP

    An ADMM-Based Geometric Configuration Optimization in RSSD-Based Source Localization By UAVs with Spread Angle Constraint

    Authors: Xin Cheng, Weiqiang Zhu, Feng Shu, Jiangzhou Wang

    Abstract: Deploying multiple unmanned aerial vehicles (UAVs) to locate a signal-emitting source covers a wide range of military and civilian applications like rescue and target tracking. It is well known that the UAVs-source (sensors-target) geometry, namely geometric configuration, significantly affects the final localization accuracy. This paper focuses on the geometric configuration optimization for rece… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  14. arXiv:2310.17155  [pdf, ps, other

    cs.IT eess.SP

    Max-min Rate Optimization of Low-Complexity Hybrid Multi-User Beamforming Maintaining Rate-Fairness

    Authors: W. Zhu, H. D. Tuan, E. Dutkiewicz, H. V. Poor, L. Hanzo

    Abstract: A wireless network serving multiple users in the millimeter-wave or the sub-terahertz band by a base station is considered. High-throughput multi-user hybrid-transmit beamforming is conceived by maximizing the minimum rate of the users. For the sake of energy-efficient signal transmission, the array-of-subarrays structure is used for analog beamforming relying on low-resolution phase shifters. We… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  15. arXiv:2310.10095  [pdf, other

    eess.IV cs.CV cs.LG

    A Multi-Scale Spatial Transformer U-Net for Simultaneously Automatic Reorientation and Segmentation of 3D Nuclear Cardiac Images

    Authors: Yangfan Ni, Duo Zhang, Gege Ma, Lijun Lu, Zhongke Huang, Wentao Zhu

    Abstract: Accurate reorientation and segmentation of the left ventricular (LV) is essential for the quantitative analysis of myocardial perfusion imaging (MPI), in which one critical step is to reorient the reconstructed transaxial nuclear cardiac images into standard short-axis slices for subsequent image processing. Small-scale LV myocardium (LV-MY) region detection and the diverse cardiac structures of i… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 17 pages, 7 figures

  16. arXiv:2308.12198  [pdf, other

    eess.SP cs.IT

    Hierarchical Beam Alignment for Millimeter-Wave Communication Systems: A Deep Learning Approach

    Authors: Junyi Yang, Weifeng Zhu, Meixia Tao, Shu Sun

    Abstract: Fast and precise beam alignment is crucial for high-quality data transmission in millimeter-wave (mmWave) communication systems, where large-scale antenna arrays are utilized to overcome the severe propagation loss. To tackle the challenging problem, we propose a novel deep learning-based hierarchical beam alignment method for both multiple-input single-output (MISO) and multiple-input multiple-ou… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: 15 pages, 16 figures, to appear in Transactions on Wireless Communications. arXiv admin note: text overlap with arXiv:2209.03643

  17. arXiv:2308.04663  [pdf, other

    eess.IV cs.CV cs.LG

    Classification of lung cancer subtypes on CT images with synthetic pathological priors

    Authors: Wentao Zhu, Yuan **, Gege Ma, Geng Chen, Jan Egger, Shaoting Zhang, Dimitris N. Metaxas

    Abstract: The accurate diagnosis on pathological subtypes for lung cancer is of significant importance for the follow-up treatments and prognosis managements. In this paper, we propose self-generating hybrid feature network (SGHF-Net) for accurately classifying lung cancer subtypes on computed tomography (CT) images. Inspired by studies stating that cross-scale associations exist in the image patterns betwe… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: 16 pages, 7 figures

    Journal ref: Medical Image Analysis 95, July 2024, 103199

  18. arXiv:2306.15942  [pdf, other

    cs.SD cs.AI eess.AS

    Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

    Authors: Aoqi Guo, Junnan Wu, Peng Gao, Wenbo Zhu, Qinwen Guo, Dazhi Gao, Yujun Wang

    Abstract: Recently, deep learning-based beamforming algorithms have shown promising performance in target speech extraction tasks. However, most systems do not fully utilize spatial information. In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer. To achieve this, we first use the UNet-TCN structure to model input fea… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  19. arXiv:2306.11958  [pdf, other

    physics.med-ph eess.IV

    PDS-MAR: a fine-grained Projection-Domain Segmentation-based Metal Artifact Reduction method for intraoperative CBCT images with guidewires

    Authors: Tianling Lyu, Zhan Wu, Gege Ma, Chen Jiang, Xinyun Zhong, Yan Xi, Yang Chen, Wentao Zhu

    Abstract: Since the invention of modern CT systems, metal artifacts have been a persistent problem. Due to increased scattering, amplified noise, and insufficient data collection, it is more difficult to suppress metal artifacts in cone-beam CT, limiting its use in human- and robot-assisted spine surgeries where metallic guidewires and screws are commonly used. In this paper, we demonstrate that conventiona… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: 19 Pages

    Journal ref: Phys. Med. Biol. 68 215007 (2023)

  20. arXiv:2306.01289  [pdf, other

    eess.IV cs.CV

    nnMobileNet: Rethinking CNN for Retinopathy Research

    Authors: Wenhui Zhu, Peijie Qiu, Xiwen Chen, Xin Li, Natasha Lepore, Oana M. Dumitrascu, Yalin Wang

    Abstract: Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success, the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability-their ability… ▽ More

    Submitted 15 April, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted as a conference paper to 2024 CVPRW

  21. arXiv:2305.08014  [pdf

    cs.CV cs.AI cs.LG eess.AS

    Surface EMG-Based Inter-Session/Inter-Subject Gesture Recognition by Leveraging Lightweight All-ConvNet and Transfer Learning

    Authors: Md. Rabiul Islam, Daniel Massicotte, Philippe Y. Massicotte, Wei-** Zhu

    Abstract: Gesture recognition using low-resolution instantaneous HD-sEMG images opens up new avenues for the development of more fluid and natural muscle-computer interfaces. However, the data variability between inter-session and inter-subject scenarios presents a great challenge. The existing approaches employed very large and complex deep ConvNet or 2SRNN-based domain adaptation methods to approximate th… ▽ More

    Submitted 19 February, 2024; v1 submitted 13 May, 2023; originally announced May 2023.

  22. arXiv:2304.09727  [pdf, other

    eess.SP cs.IT

    Cooperative Multi-Cell Massive Access with Temporally Correlated Activity

    Authors: Weifeng Zhu, Meixia Tao, Xiaojun Yuan, Fan Xu, Yunfeng Guan

    Abstract: This paper investigates the problem of activity detection and channel estimation in cooperative multi-cell massive access systems with temporally correlated activity, where all access points (APs) are connected to a central unit via fronthaul links. We propose to perform user-centric AP cooperation for computation burden alleviation and introduce a generalized sliding-window detection strategy for… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

    Comments: 16 pages, 17 figures, minor revision

  23. arXiv:2303.10757  [pdf, other

    cs.SD cs.AI cs.CV cs.LG eess.AS

    Multiscale Audio Spectrogram Transformer for Efficient Audio Classification

    Authors: Wentao Zhu, Mohamed Omar

    Abstract: Audio event has a hierarchical architecture in both time and frequency and can be grouped together to construct more abstract semantic audio classes. In this work, we develop a multiscale audio spectrogram Transformer (MAST) that employs hierarchical representation learning for efficient audio classification. Specifically, MAST employs one-dimensional (and two-dimensional) pooling operators along… ▽ More

    Submitted 19 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  24. arXiv:2303.07704  [pdf, other

    eess.AS cs.SD

    TEA-PSE 3.0: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System For ICASSP 2023 DNS Challenge

    Authors: Yukai Ju, Jun Chen, Shimin Zhang, Shulin He, Wei Rao, Weixin Zhu, Yannan Wang, Tao Yu, Shidong Shang

    Abstract: This paper introduces the Unbeatable Team's submission to the ICASSP 2023 Deep Noise Suppression (DNS) Challenge. We expand our previous work, TEA-PSE, to its upgraded version -- TEA-PSE 3.0. Specifically, TEA-PSE 3.0 incorporates a residual LSTM after squeezed temporal convolution network (S-TCN) to enhance sequence modeling capabilities. Additionally, the local-global representation (LGR) struct… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  25. arXiv:2303.03737  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized by Discriminative Learning

    Authors: Zhaoxi Mu, Xinyu Yang, Wen**g Zhu

    Abstract: Transformer has shown advanced performance in speech separation, benefiting from its ability to capture global features. However, capturing local features and channel information of audio sequences in speech separation is equally important. In this paper, we present a novel approach named Intra-SE-Conformer and Inter-Transformer (ISCIT) for speech separation. Specifically, we design a new network… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  26. arXiv:2303.03732  [pdf, other

    cs.SD cs.LG eess.AS

    A Multi-Stage Triple-Path Method for Speech Separation in Noisy and Reverberant Environments

    Authors: Zhaoxi Mu, Xinyu Yang, Xiangyuan Yang, Wen**g Zhu

    Abstract: In noisy and reverberant environments, the performance of deep learning-based speech separation methods drops dramatically because previous methods are not designed and optimized for such situations. To address this issue, we propose a multi-stage end-to-end learning method that decouples the difficult speech separation problem in noisy and reverberant environments into three sub-problems: speech… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  27. Low-Complexity Pareto-Optimal 3D Beamforming for the Full-Dimensional Multi-User Massive MIMO Downlink

    Authors: W. Zhu, H. D. Tuan, E. Dutkiewicz, Y. Fang, L. Hanzo

    Abstract: Full-dimensional (FD) multi-user massive multiple input multiple output (m-MIMO) systems employ large two-dimensional (2D) rectangular antenna arrays to control both the azimuth and elevation angles of signal transmission. We introduce the sum of two outer products of the azimuth and elevation beamforming vectors having moderate dimensions as a new class of FD beamforming. We show that this low-co… ▽ More

    Submitted 18 February, 2023; originally announced February 2023.

  28. arXiv:2302.03003  [pdf, other

    eess.IV cs.CV stat.ML

    OTRE: Where Optimal Transport Guided Unpaired Image-to-Image Translation Meets Regularization by Enhancing

    Authors: Wenhui Zhu, Peijie Qiu, Oana M. Dumitrascu, Jacob M. Sobczak, Mohammad Farazi, Zhangsihao Yang, Keshav Nandakumar, Yalin Wang

    Abstract: Non-mydriatic retinal color fundus photography (CFP) is widely available due to the advantage of not requiring pupillary dilation, however, is prone to poor quality due to operators, systemic imperfections, or patient-related causes. Optimal retinal image quality is mandated for accurate medical diagnoses and automated analyses. Herein, we leveraged the Optimal Transport (OT) theory to propose an… ▽ More

    Submitted 8 April, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted as a conference paper to The 28th biennial international conference on Information Processing in Medical Imaging (IPMI 2023)

  29. arXiv:2302.02991  [pdf, other

    eess.IV cs.CV stat.ML

    Optimal Transport Guided Unsupervised Learning for Enhancing low-quality Retinal Images

    Authors: Wenhui Zhu, Peijie Qiu, Mohammad Farazi, Keshav Nandakumar, Oana M. Dumitrascu, Yalin Wang

    Abstract: Real-world non-mydriatic retinal fundus photography is prone to artifacts, imperfections and low-quality when certain ocular or systemic co-morbidities exist. Artifacts may result in inaccuracy or ambiguity in clinical diagnoses. In this paper, we proposed a simple but effective end-to-end framework for enhancing poor-quality retinal fundus images. Leveraging the optimal transport theory, we propo… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted as a conference paper to 20th IEEE International Symposium on Biomedical Imaging(ISBI 2023)

  30. arXiv:2301.00554  [pdf

    eess.IV

    In-situ monitoring additive manufacturing process with AI edge computing

    Authors: Wenkang Zhu, Hui Li, Yikai Zhang, Yuqing Hou, Liwei Chen

    Abstract: In-situ monitoring system can be used to monitor the quality of additive manufacturing (AM) processes. In the case of digital image correlation (DIC) based in-situ monitoring systems, high-speed cameras were used to capture images of high resolutions. This paper proposed a novel in-situ monitoring system to accelerate the process of digital images using artificial intelligence (AI) edge computing… ▽ More

    Submitted 2 January, 2023; originally announced January 2023.

  31. arXiv:2211.06041  [pdf, other

    eess.AS

    An Adapter based Multi-label Pre-training for Speech Separation and Enhancement

    Authors: Tianrui Wang, Xie Chen, Zhuo Chen, Shu Yu, Weibin Zhu

    Abstract: In recent years, self-supervised learning (SSL) has achieved tremendous success in various speech tasks due to its power to extract representations from massive unlabeled data. However, compared with tasks such as speech recognition (ASR), the improvements from SSL representation in speech separation (SS) and enhancement (SE) are considerably smaller. Based on HuBERT, this work investigates improv… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: 5 pages

  32. arXiv:2211.00002  [pdf, other

    cs.CV eess.IV physics.data-an

    A Self-Supervised Approach to Reconstruction in Sparse X-Ray Computed Tomography

    Authors: Rey Mendoza, Minh Nguyen, Judith Weng Zhu, Vincent Dumont, Talita Perciano, Juliane Mueller, Vidya Ganapati

    Abstract: Computed tomography has propelled scientific advances in fields from biology to materials science. This technology allows for the elucidation of 3-dimensional internal structure by the attenuation of x-rays through an object at different rotations relative to the beam. By imaging 2-dimensional projections, a 3-dimensional object can be reconstructed through a computational algorithm. Imaging at a… ▽ More

    Submitted 29 October, 2022; originally announced November 2022.

    Comments: NeurIPS 2022 Machine Learning and the Physical Sciences Workshop. arXiv admin note: text overlap with arXiv:2210.16709

  33. arXiv:2210.12954  [pdf, other

    cs.IT eess.SP

    Message Passing-Based Joint User Activity Detection and Channel Estimation for Temporally-Correlated Massive Access

    Authors: Weifeng Zhu, Meixia Tao, Xiaojun Yuan, Yunfeng Guan

    Abstract: This paper studies the user activity detection and channel estimation problem in a temporally-correlated massive access system where a very large number of users communicate with a base station sporadically and each user once activated can transmit with a large probability over multiple consecutive frames. We formulate the problem as a dynamic compressed sensing (DCS) problem to exploit both the s… ▽ More

    Submitted 26 January, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: 31 pages, 14 figures, minor revision

  34. arXiv:2210.11089  [pdf, other

    eess.AS cs.SD

    Speech Dereverberation with a Reverberation Time Shortening Target

    Authors: Rui Zhou, Wenye Zhu, Xiaofei Li

    Abstract: This work proposes a new learning target based on reverberation time shortening (RTS) for speech dereverberation. The learning target for dereverberation is usually set as the direct-path speech or optionally with some early reflections. This type of target suddenly truncates the reverberation, and thus it may not be suitable for network training. The proposed RTS target suppresses reverberation a… ▽ More

    Submitted 5 June, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2204.08765

  35. arXiv:2210.08802  [pdf, other

    eess.AS cs.SD

    spatial-dccrn: dccrn equipped with frame-level angle feature and hybrid filtering for multi-channel speech enhancement

    Authors: Shubo Lv, Yihui Fu, Yukai Jv, Lei Xie, Weixin Zhu, Wei Rao, Yannan Wang

    Abstract: Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal. To make full use of spatial information and neural network based masking estimation, we propose a multi-channel denoising neural network -- Spatial DCCRN. Firstly, we extend S-DCCRN to multi-channel scenario, aiming at performing cascaded su… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

  36. arXiv:2210.05946  [pdf, other

    eess.IV cs.CV stat.ML

    Self-Supervised Equivariant Regularization Reconciles Multiple Instance Learning: Joint Referable Diabetic Retinopathy Classification and Lesion Segmentation

    Authors: Wenhui Zhu, Peijie Qiu, Natasha Lepore, Oana M. Dumitrascu, Yalin Wang

    Abstract: Lesion appearance is a crucial clue for medical providers to distinguish referable diabetic retinopathy (rDR) from non-referable DR. Most existing large-scale DR datasets contain only image-level labels rather than pixel-based annotations. This motivates us to develop algorithms to classify rDR and segment lesions via image-level labels. This paper leverages self-supervised equivariant learning an… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: 7 pages, 2 tables, 3 figures. 18th International Symposium on Medical Information Processing and Analysis

  37. arXiv:2209.03643  [pdf, ps, other

    eess.SP

    Deep Learning for Hierarchical Beam Alignment in mmWave Communication Systems

    Authors: Junyi Yang, Weifeng Zhu, Meixia Tao

    Abstract: Fast and precise beam alignment is crucial to support high-quality data transmission in millimeter wave (mmWave) communication systems. In this work, we propose a novel deep learning based hierarchical beam alignment method that learns two tiers of probing codebooks (PCs) and uses their measurements to predict the optimal beam in a coarse-to-fine searching manner. Specifically, the proposed method… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

    Comments: 6 pages, 6 figure, accepted by GLOBECOM 2022

  38. arXiv:2206.04289  [pdf, other

    eess.IV cs.CV

    A No-Reference Deep Learning Quality Assessment Method for Super-resolution Images Based on Frequency Maps

    Authors: Zicheng Zhang, Wei Sun, Xiongkuo Min, Wenhan Zhu, Tao Wang, Wei Lu, Guangtao Zhai

    Abstract: To support the application scenarios where high-resolution (HR) images are urgently needed, various single image super-resolution (SISR) algorithms are developed. However, SISR is an ill-posed inverse problem, which may bring artifacts like texture shift, blur, etc. to the reconstructed images, thus it is necessary to evaluate the quality of super-resolution images (SRIs). Note that most existing… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

  39. arXiv:2205.07494  [pdf, other

    eess.SP cs.IT

    Double-Sided Information Aided Temporal-Correlated Massive Access

    Authors: Weifeng Zhu, Meixia Tao, Yunfeng Guan

    Abstract: This letter considers temporal-correlated massive access, where each device, once activated, is likely to transmit continuously over several consecutive frames. Motivated by that the device activity at each frame is correlated to not only its previous frame but also its next frame, we propose a double-sided information (DSI) aided joint activity detection and channel estimation algorithm based on… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: 6 pages, 5 figures

  40. arXiv:2204.08765  [pdf, other

    eess.AS cs.SD eess.SP

    Speech Dereverberation with A Reverberation Time Shortening Target

    Authors: Rui Zhou, Wenye Zhu, Xiaofei Li

    Abstract: This work proposes a new learning target based on reverberation time shortening (RTS) for speech dereverberation. The learning target for dereverberation is usually set as the direct-path speech or optionally with some early reflections. This type of target suddenly truncates the reverberation, and thus it may not be suitable for network training. The proposed RTS target suppresses reverberation a… ▽ More

    Submitted 20 November, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: Submitted to ICASSP 2023

  41. arXiv:2204.05571  [pdf, other

    cs.SD cs.LG eess.AS

    Speech Emotion Recognition with Global-Aware Fusion on Multi-scale Feature Representation

    Authors: Wen**g Zhu, Xiang Li

    Abstract: Speech Emotion Recognition (SER) is a fundamental task to predict the emotion label from speech data. Recent works mostly focus on using convolutional neural networks~(CNNs) to learn local attention map on fixed-scale feature representation by viewing time-varied spectral features as images. However, rich emotional feature at different scales and important global information are not able to be wel… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: 6 pages, 3 figures, ICASSP 2022

  42. arXiv:2204.00226  [pdf, other

    eess.AS

    Multiple Confidence Gates For Joint Training Of SE And ASR

    Authors: Tianrui Wang, Weibin Zhu, Yingying Gao, Junlan Feng, Shilei Zhang

    Abstract: Joint training of speech enhancement model (SE) and speech recognition model (ASR) is a common solution for robust ASR in noisy environments. SE focuses on improving the auditory quality of speech, but the enhanced feature distribution is changed, which is uncertain and detrimental to the ASR. To tackle this challenge, an approach with multiple confidence gates for jointly training of SE and ASR i… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.

    Comments: 5 pages

  43. arXiv:2203.04780  [pdf

    eess.SP

    Intelligent resonance tracking of a microwave plasmonic resonator for compact wireless sensors

    Authors: Xuanru Zhang, Jia Wen Zhu, Tie Jun Cui

    Abstract: Plasmonic sensing has been in the spotlight for decades, the concept and applications of which have been generalized to spoof surface plasmons (SSPs) in the microwave band. Here, we report a compact and wireless sensor within a printed circuit board size of 18 mm * 12 mm, tracking the resonance frequency shift of a microwave plasmonic resonator via a software-defined scheme. The microwave plasmoni… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

  44. arXiv:2203.00926  [pdf, other

    eess.IV cs.CV

    Parameterized Image Quality Score Distribution Prediction

    Authors: Yixuan Gao, Xiongkuo Min, Wenhan Zhu, Xiao-** Zhang, Guangtao Zhai

    Abstract: Recently, image quality has been generally describedby a mean opinion score (MOS). However, we observe that thequality scores of an image given by a group of subjects are verysubjective and diverse. Thus it is not enough to use a MOS todescribe the image quality. In this paper, we propose to describeimage quality using a parameterized distribution rather thana MOS, and an objective method is also… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

  45. arXiv:2203.00917  [pdf, other

    eess.SP

    Machine Learning Methods for Inferring the Number of UAV Emitters via Massive MIMO Receive Array

    Authors: Yifan Li, Feng Shu, **song Hu, Shihao Yan, Haiwei Song, Weiqiang Zhu, Da Tian, Yaoliang Song, Jiangzhou Wang

    Abstract: To provide important prior knowledge for the DOA estimation of UAV emitters in future wireless networks, we present a complete DOA preprocessing system for inferring the number of emitters via massive MIMO receive array. Firstly, in order to eliminate the noise signals, two high-precision signal detectors, square root of maximum eigenvalue times minimum eigenvalue (SR-MME) and geometric mean (GM),… ▽ More

    Submitted 10 March, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

  46. arXiv:2203.00613  [pdf

    cs.CL cs.LG cs.SD eess.AS

    Towards a Common Speech Analysis Engine

    Authors: Hagai Aronowitz, Itai Gat, Edmilson Morais, Weizhong Zhu, Ron Hoory

    Abstract: Recent innovations in self-supervised representation learning have led to remarkable advances in natural language processing. That said, in the speech processing domain, self-supervised representation learning-based systems are not yet considered state-of-the-art. We propose leveraging recent advances in self-supervised-based speech processing to create a common speech analysis engine. Such an eng… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Comments: ICASSP 2022

  47. arXiv:2202.12643  [pdf, other

    eess.AS

    Harmonic gated compensation network plus for ICASSP 2022 DNS CHALLENGE

    Authors: Tianrui Wang, Weibin Zhu, Yingying Gao, Yanan Chen, Junlan Feng, Shilei Zhang

    Abstract: The harmonic structure of speech is resistant to noise, but the harmonics may still be partially masked by noise. Therefore, we previously proposed a harmonic gated compensation network (HGCN) to predict the full harmonic locations based on the unmasked harmonics and process the result of a coarse enhancement module to recover the masked harmonics. In addition, the auditory loudness loss function… ▽ More

    Submitted 25 February, 2022; originally announced February 2022.

    Comments: 5 pages

  48. arXiv:2202.03896  [pdf

    cs.SD cs.AI cs.LG eess.AS

    Speech Emotion Recognition using Self-Supervised Features

    Authors: Edmilson Morais, Ron Hoory, Weizhong Zhu, Itai Gat, Matheus Damasceno, Hagai Aronowitz

    Abstract: Self-supervised pre-trained features have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of speech emotion recognition (SER) still need further investigation. In this paper we introduce a modular End-to- End (E2E) SER system based on an Upstream + Downstream architecture paradigm, which allows easy use/integration o… ▽ More

    Submitted 6 February, 2022; originally announced February 2022.

    Comments: 5 pages, 4 figures, 2 tables, ICASSP 2022

  49. arXiv:2201.12755  [pdf, other

    eess.AS cs.SD

    HGCN: Harmonic gated compensation network for speech enhancement

    Authors: Tianrui Wang, Weibin Zhu, Yingying Gao, Junlan Feng, Shilei Zhang

    Abstract: Mask processing in the time-frequency (T-F) domain through the neural network has been one of the mainstreams for single-channel speech enhancement. However, it is hard for most models to handle the situation when harmonics are partially masked by noise. To tackle this challenge, we propose a harmonic gated compensation network (HGCN). We design a high-resolution harmonic integral spectrum to impr… ▽ More

    Submitted 16 March, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

    Comments: 5 pages

  50. arXiv:2201.09124  [pdf, ps, other

    eess.SP

    Copula-Based Modeling of RIS-Assisted Communications: Outage Probability Analysis

    Authors: Imène Trigui, Damoon Shahbaztabar, Wessam Ajib, Wei-** Zhu

    Abstract: Statistical characterization of the signal-to-noise ratio (SNR) of reconfigurable intelligent surface (RIS)-assistedcommunications in the presence of phase noise is an important open issue. In this letter, we exploit the concept of copula modeling to capture the non-standard dependence features that appear due to the presence of discrete phase noise. In particular,we consider the outage probabilit… ▽ More

    Submitted 22 January, 2022; originally announced January 2022.