Skip to main content

Showing 1–50 of 667 results for author: Zhang, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19311  [pdf, other

    cs.CR cs.SD eess.AS

    Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

    Authors: Zheng Fang, Tao Wang, Lingchen Zhao, Shenyi Zhang, Bowen Li, Yunjie Ge, Qi Li, Chao Shen, Qian Wang

    Abstract: In recent years, extensive research has been conducted on the vulnerability of ASR systems, revealing that black-box adversarial example attacks pose significant threats to real-world ASR systems. However, most existing black-box attacks rely on queries to the target ASRs, which is impractical when queries are not permitted. In this paper, we propose ZQ-Attack, a transfer-based adversarial attack… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in the Proceedings of The ACM Conference on Computer and Communications Security (CCS), 2024

  2. arXiv:2406.18345  [pdf, other

    cs.LG eess.SP

    EmT: A Novel Transformer for Generalized Cross-subject EEG Emotion Recognition

    Authors: Yi Ding, Chengxuan Tong, Shuailei Zhang, Muyun Jiang, Yong Li, Kevin Lim Jun Liang, Cuntai Guan

    Abstract: Integrating prior knowledge of neurophysiology into neural network architecture enhances the performance of emotion decoding. While numerous techniques emphasize learning spatial and short-term temporal patterns, there has been limited emphasis on capturing the vital long-term contextual information associated with emotional cognitive processes. In order to address this discrepancy, we introduce a… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 11 pages, 5 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  3. arXiv:2406.18067  [pdf, other

    cs.CL eess.AS

    Exploring Energy-Based Models for Out-of-Distribution Detection in Dialect Identification

    Authors: Yaqian Hao, Chenguang Hu, Yingying Gao, Shilei Zhang, Junlan Feng

    Abstract: The diverse nature of dialects presents challenges for models trained on specific linguistic patterns, rendering them susceptible to errors when confronted with unseen or out-of-distribution (OOD) data. This study introduces a novel margin-enhanced joint energy model (MEJEM) tailored specifically for OOD detection in dialects. By integrating a generative model and the energy margin loss, our appro… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.18065  [pdf, other

    eess.AS cs.SD

    On Calibration of Speech Classification Models: Insights from Energy-Based Model Investigations

    Authors: Yaqian Hao, Chenguang Hu, Yingying Gao, Shilei Zhang, Junlan Feng

    Abstract: For speech classification tasks, deep learning models often achieve high accuracy but exhibit shortcomings in calibration, manifesting as classifiers exhibiting overconfidence. The significance of calibration lies in its critical role in guaranteeing the reliability of decision-making within deep learning systems. This study explores the effectiveness of Energy-Based Models in calibrating confiden… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  5. arXiv:2406.17788  [pdf, other

    eess.SY cs.LG eess.SP

    CNN-based Compressor Mass Flow Estimator in Industrial Aircraft Vapor Cycle System

    Authors: Justin Reverdi, Sixin Zhang, Saïd Aoues, Fabrice Gamboa, Serge Gratton, Thomas Pellegrini

    Abstract: In Vapor Cycle Systems, the mass flow sensor playsa key role for different monitoring and control purposes. However,physical sensors can be inaccurate, heavy, cumbersome, expensive orhighly sensitive to vibrations, which is especially problematic whenembedded into an aircraft. The conception of a virtual sensor, basedon other standard sensors, is a good alternative. This paper has twomain objectiv… ▽ More

    Submitted 27 May, 2024; originally announced June 2024.

  6. arXiv:2406.16189  [pdf, other

    eess.IV cs.CV

    Fuzzy Attention-based Border Rendering Network for Lung Organ Segmentation

    Authors: Sheng Zhang, Yang Nan, Yingying Fang, Shiyi Wang, Xiaodan Xing, Zhifan Gao, Guang Yang

    Abstract: Automatic lung organ segmentation on CT images is crucial for lung disease diagnosis. However, the unlimited voxel values and class imbalance of lung organs can lead to false-negative/positive and leakage issues in advanced methods. Additionally, some slender lung organs are easily lost during the recycled down/up-sample procedure, e.g., bronchioles & arterioles, causing severe discontinuity issue… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024

  7. arXiv:2406.15047  [pdf, other

    cs.IT eess.SP

    Optimal Transmit Signal Design for Multi-Target MIMO Sensing Exploiting Prior Information

    Authors: Jiayi Yao, Shuowen Zhang

    Abstract: In this paper, we study the transmit signal optimization in a multiple-input multiple-output (MIMO) radar system for sensing the angle information of multiple targets via their reflected echo signals. We consider a challenging and practical scenario where the angles to be sensed are unknown and random, while their probability information is known a priori for exploitation. First, we establish an a… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: submitted for possible piblication

  8. arXiv:2406.13674  [pdf, other

    eess.IV cs.CV

    Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases

    Authors: Xiangde Luo, Zihan Li, Shaoting Zhang, Wenjun Liao, Guotai Wang

    Abstract: Deep learning has enabled great strides in abdominal multi-organ segmentation, even surpassing junior oncologists on common cases or organs. However, robustness on corner cases and complex organs remains a challenging open problem for clinical adoption. To investigate model robustness, we collected and annotated the RAOS dataset comprising 413 CT scans ($\sim$80k 2D images, $\sim$8k 3D organ annot… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 10 pages, 1 figure, 6 tables, Early Accept to MICCAI 2024

  9. arXiv:2406.13268  [pdf, other

    eess.AS cs.SD

    CEC: A Noisy Label Detection Method for Speaker Recognition

    Authors: Yao Shen, Yingying Gao, Yaqian Hao, Chenguang Hu, Fulin Zhang, Junlan Feng, Shilei Zhang

    Abstract: Noisy labels are inevitable, even in well-annotated datasets. The detection of noisy labels is of significant importance to enhance the robustness of speaker recognition models. In this paper, we propose a novel noisy label detection approach based on two new statistical metrics: Continuous Inconsistent Counting (CIC) and Total Inconsistent Counting (TIC). These metrics are calculated through Cros… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: interspeech 2024

  10. arXiv:2406.11169   

    eess.AS cs.SD

    Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Wen Wang

    Abstract: Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persisting challenge. In this paper, we propose a new self-supervised speaker verification approach, Self-Distillation Prototypes Network (SDPN), which effectively facilitates self-supervised speaker representation learning. SDPN assigns the representation of the augmented views of an… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: We update this paper to an earlier paper

  11. arXiv:2406.10591  [pdf, other

    eess.AS cs.AI cs.CV cs.MM cs.SD

    MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

    Authors: Ruibo Fu, Shuchen Shi, Hongming Guo, Tao Wang, Chunyu Qiang, Zhengqi Wen, Jianhua Tao, Xin Qi, Yi Lu, Xiaopeng Wang, Zhiyong Wang, Yukun Liu, Xuefei Liu, Shuai Zhang, Guanjun Li

    Abstract: Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  12. arXiv:2406.09589  [pdf, other

    eess.AS

    Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment

    Authors: Yiwen Shao, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Daniel Povey, Sanjeev Khudanpur

    Abstract: In the field of multi-channel, multi-speaker Automatic Speech Recognition (ASR), the task of discerning and accurately transcribing a target speaker's speech within background noise remains a formidable challenge. Traditional approaches often rely on microphone array configurations and the information of the target speaker's location or voiceprint. This study introduces the Solo Spatial Feature (S… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted for presentation at Interspeech 2024

  13. arXiv:2406.09444  [pdf, other

    eess.AS cs.CL cs.SD

    GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model

    Authors: Yingying Gao, Shilei Zhang, Chao Deng, Junlan Feng

    Abstract: Pre-trained speech language models such as HuBERT and WavLM leverage unlabeled speech data for self-supervised learning and offer powerful representations for numerous downstream tasks. Despite the success of these models, their high requirements for memory and computing resource hinder their application on resource restricted devices. Therefore, this paper introduces GenDistiller, a novel knowled… ▽ More

    Submitted 21 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.13418

  14. Blind Super-Resolution via Meta-learning and Markov Chain Monte Carlo Simulation

    Authors: **gyuan Xia, Zhixiong Yang, Shengxi Li, Shuanghui Zhang, Yaowen Fu, Deniz Gündüz, Xiang Li

    Abstract: Learning-based approaches have witnessed great successes in blind single image super-resolution (SISR) tasks, however, handcrafted kernel priors and learning based kernel priors are typically required. In this paper, we propose a Meta-learning and Markov Chain Monte Carlo (MCMC) based SISR approach to learn kernel priors from organized randomness. In concrete, a lightweight network is adopted as k… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

  15. arXiv:2406.07801  [pdf, other

    cs.CL cs.SD eess.AS

    PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models

    Authors: Runyan Yang, Huibao Yang, Xiqing Zhang, Tiantian Ye, Ying Liu, Yingying Gao, Shilei Zhang, Chao Deng, Junlan Feng

    Abstract: Recently, there have been attempts to integrate various speech processing tasks into a unified model. However, few previous works directly demonstrated that joint optimization of diverse tasks in multitask speech models has positive influence on the performance of individual tasks. In this paper we present a multitask speech model -- PolySpeech, which supports speech recognition, speech synthesis,… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures

  16. arXiv:2406.07289  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?

    Authors: Qingkai Fang, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng

    Abstract: Recently proposed two-pass direct speech-to-speech translation (S2ST) models decompose the task into speech-to-text translation (S2TT) and text-to-speech (TTS) within an end-to-end model, yielding promising results. However, the training of these models still relies on parallel speech data, which is extremely challenging to collect. In contrast, S2TT and TTS have accumulated a large amount of data… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024 main conference. Project Page: https://ictnlp.github.io/ComSpeech-Site/

    ACM Class: I.2.7

  17. arXiv:2406.06937  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation

    Authors: Zhengrui Ma, Qingkai Fang, Shaolei Zhang, Shoutao Guo, Yang Feng, Min Zhang

    Abstract: Simultaneous translation models play a crucial role in facilitating communication. However, existing research primarily focuses on text-to-text or speech-to-text models, necessitating additional cascade components to achieve speech-to-speech translation. These pipeline methods suffer from error propagation and accumulate delays in each cascade component, resulting in reduced synchronization betwee… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024; Codes and demos are at https://github.com/ictnlp/NAST-S2x

  18. arXiv:2406.06619  [pdf, other

    eess.AS cs.AI cs.CL

    LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR

    Authors: Zheshu Song, Jianheng Zhuo, Yifan Yang, Ziyang Ma, Shixiong Zhang, Xie Chen

    Abstract: Recent years have witnessed significant progress in multilingual automatic speech recognition (ASR), driven by the emergence of end-to-end (E2E) models and the scaling of multilingual datasets. Despite that, two main challenges persist in multilingual ASR: language interference and the incorporation of new languages without degrading the performance of the existing ones. This paper proposes LoRA-W… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, conference

  19. arXiv:2406.05839  [pdf, other

    eess.AS cs.AI

    MaLa-ASR: Multimedia-Assisted LLM-Based ASR

    Authors: Guanrou Yang, Ziyang Ma, Fan Yu, Zhifu Gao, Shiliang Zhang, Xie Chen

    Abstract: As more and more information-rich data like video become available, utilizing multi-modal auxiliary information to enhance audio tasks has sparked widespread research interest. The recent surge in research on LLM-based audio models provides fresh perspectives for tackling audio tasks. Given that LLM can flexibly ingest multiple inputs, we propose MaLa-ASR, an LLM-based ASR model that can integrate… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  20. arXiv:2406.03049  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

    Authors: Shaolei Zhang, Qingkai Fang, Shoutao Guo, Zhengrui Ma, Min Zhang, Yang Feng

    Abstract: Simultaneous speech-to-speech translation (Simul-S2ST, a.k.a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication. Beyond accomplishing translation between speech, Simul-S2ST requires a policy to control the model to generate corresponding target speech at the opportune moment within speech inputs, thereby posing… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 main conference, Project Page: https://ictnlp.github.io/StreamSpeech-site/

  21. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  22. arXiv:2406.02291  [pdf, other

    cs.NI eess.SP

    A deep-learning-based MAC for integrating channel access, rate adaptation and channel switch

    Authors: Jiantao Xin, Wei Xu, Bin Cao, Taotao Wang, Shengli Zhang

    Abstract: With increasing density and heterogeneity in unlicensed wireless networks, traditional MAC protocols, such as carrier-sense multiple access with collision avoidance (CSMA/CA) in Wi-Fi networks, are experiencing performance degradation. This is manifested in increased collisions and extended backoff times, leading to diminished spectrum efficiency and protocol coordination. Addressing these issues,… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  23. arXiv:2406.02247  [pdf, other

    physics.ins-det eess.SY

    A Study of the Latest Updates of the Readout System for the Hybird-Pixel Detector at HEPS

    Authors: Hangxu Li, Jie Zhang, Wei Wei, Zhenjie Li, Xiaolu Ji, Yan Zhang, Xuanzheng Yang, Shuihan Zhang, Xueke Ma, Peng Liu, Zheng Wang, Yuanbai Chen

    Abstract: The High Energy Photon Source (HEPS) represents a fourth-generation light source. This facility has made unprecedented advancements in accelerator technology, necessitating the development of new detectors to satisfy physical requirements such as single-photon resolution, large dynamic range, and high frame rates. Since 2016, the Institute of High Energy Physics has introduced the first user-exper… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  24. arXiv:2406.02167  [pdf, other

    eess.AS eess.SP

    ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Junjie Li

    Abstract: Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature fusion approach has been proposed to effectively capture speaker characteristics from short utterances. Constrained by the model's size, a robust backbone Enhanced Res2Net (ERes2Net) combining global and local feature fusion… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  25. arXiv:2406.00689  [pdf, other

    cs.IT eess.SP

    Hybrid Beamforming Design for Integrated Sensing and Communication Exploiting Prior Information

    Authors: Yizhuo Wang, Shuowen Zhang

    Abstract: In this paper, we investigate the hybrid beamforming design for a multiple-input multiple-output (MIMO) integrated sensing and communication (ISAC) system, where a multi-antenna base station (BS) with hybrid analog-digital transmit antenna arrays sends dual-functional signals to communicate with a multi-antenna user and simultaneously sense the location information of a point target based on the r… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: submitted for possible conference publication

  26. arXiv:2406.00604  [pdf, other

    eess.SP

    Multipath Exploitation for Fluctuating Target Detection in RIS-Assisted ISAC Systems

    Authors: Shoushuo Zhang, Zichao Xiao, Rang Liu, Ming Li, Wei Wang, Qian Liu

    Abstract: Integrated sensing and communication (ISAC) systems are typically deployed in multipath environments, which is usually deemed as a challenging issue for wireless communications. However, the multipath propagation can also provide extra illumination and observation perspectives for radar sensing, which offers spatial diversity gain for detecting targets with spatial radar cross-section (RCS) fluctu… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: submitted to IEEE WCL

  27. arXiv:2406.00399  [pdf, other

    eess.SP

    Patterned Beam Training: A Novel Low-Complexity and Low-Overhead Scheme for ELAA

    Authors: Hongkang Yu, Yuan Si, Shujuan Zhang, Yijian Chen

    Abstract: Extremely large antenna arrays (ELAAs) can provide higher spectral efficiency. However, the use of narrower beams for data transmission significantly increases the overhead associated with beam training. In this letter, we propose a novel patterned beam training (PBT) scheme characterized by its low overhead and complexity. This scheme requires only a single linear operation by both the base stati… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  28. arXiv:2405.13678  [pdf, other

    cs.IT eess.SP

    Integrated Sensing and Communication Exploiting Prior Information: How Many Sensing Beams are Needed?

    Authors: Chan Xu, Shuowen Zhang

    Abstract: This paper studies an integrated sensing and communication (ISAC) system where a multi-antenna base station (BS) aims to communicate with a single-antenna user in the downlink and sense the unknown and random angle parameter of a target via exploiting its prior distribution information. We consider a general transmit beamforming structure where the BS sends one communication beam and potentially o… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: This is the longer version of a paper to appear in IEEE International Symposium on Information Theory (ISIT), 2024

  29. arXiv:2405.13634  [pdf, other

    eess.SP

    Secure Communications in Near-Filed ISCAP Systems with Extremely Large-Scale Antenna Arrays

    Authors: Zixiang Ren, Siyao Zhang, Xinmin Li, Ling Qiu, Jie Xu, Derrick Wing Kwan Ng

    Abstract: This paper investigates secure communications in a near-field multi-functional integrated sensing, communication, and powering (ISCAP) system with an extremely large-scale antenna arrays (ELAA) equipped at the base station (BS). In this system, the BS sends confidential messages to a single communication user (CU), and at the same time wirelessly senses a point target and charges multiple energy r… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 6 pages

  30. arXiv:2405.09571  [pdf, other

    eess.SP physics.data-an physics.optics quant-ph

    The Best Radar Ranging Pulse to Resolve Two Reflectors

    Authors: Andrew N. Jordan, John C. Howell, Achim Kempf, Shunxing Zhang, Derek White

    Abstract: Previous work established fundamental bounds on subwavelength resolution for the radar range resolution problem, called superradar [Phys. Rev. Appl. 20, 064046 (2023)]. In this work, we identify the optimal waveforms for distinguishing the range resolution between two reflectors of identical strength. We discuss both the unnormalized optimal waveform as well as the best square-integrable pulse, an… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 8 pages, 8 figures

  31. arXiv:2405.07777  [pdf, other

    cs.CV eess.IV

    GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB Images

    Authors: Xinying Wang, Zhixiong Huang, Sifan Zhang, Jiawen Zhu, Lin Feng

    Abstract: Mainstream approaches to spectral reconstruction (SR) primarily focus on designing Convolution- and Transformer-based architectures. However, CNN methods often face challenges in handling long-range dependencies, whereas Transformers are constrained by computational efficiency limitations. Recent breakthroughs in state-space model (e.g., Mamba) has attracted significant attention due to its near-l… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  32. arXiv:2405.07689  [pdf, other

    cs.MM cs.NI eess.SY

    Quality of Experience Optimization for Real-time XR Video Transmission with Energy Constraints

    Authors: Guang** Pan, Shugong Xu, Shunqing Zhang, Xiao**g Chen, Yanzan Sun

    Abstract: Extended Reality (XR) is an important service in the 5G network and in future 6G networks. In contrast to traditional video on demand services, real-time XR video is transmitted frame-by-frame, requiring low latency and being highly sensitive to network fluctuations. In this paper, we model the quality of experience (QoE) for real-time XR video transmission on a frame-by-frame basis. Based on the… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 6 pages, 5 figures

  33. arXiv:2405.07442  [pdf

    cs.SD cs.AI eess.AS q-bio.QM

    Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases

    Authors: Pengfei Zhang, Zhihang Zheng, Shichen Zhang, Minghao Yang, Shaojun Tang

    Abstract: Compared with invasive examinations that require tissue sampling, respiratory sound testing is a non-invasive examination method that is safer and easier for patients to accept. In this study, we introduce Rene, a pioneering large-scale model tailored for respiratory sound recognition. Rene has been rigorously fine-tuned with an extensive dataset featuring a broad array of respiratory audio sample… ▽ More

    Submitted 6 June, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

  34. arXiv:2405.03949  [pdf, other

    cs.LG cs.CR eess.SP

    FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data

    Authors: Shusen **g, Anlan Yu, Shuai Zhang, Songyang Zhang

    Abstract: Recent efforts have been made to integrate self-supervised learning (SSL) with the framework of federated learning (FL). One unique challenge of federated self-supervised learning (FedSSL) is that the global objective of FedSSL usually does not equal the weighted sum of local SSL objectives. Consequently, conventional approaches, such as federated averaging (FedAvg), fail to precisely minimize the… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  35. arXiv:2405.03729  [pdf

    eess.IV physics.optics quant-ph

    Computational ghost imaging with hybrid transforms by integrating Hadamard, discrete cosine, and Haar matrices

    Authors: Yi-Ning Zhao, Lin-Shan Chen, Liu-Ya Chen, Lingxin Kong, Chong Wang, Cheng Ren, Su-Heng Zhang, De-Zhong Cao

    Abstract: A scenario of ghost imaging with hybrid transform approach is proposed by integrating Hadamard, discrete cosine, and Haar matrices. The measurement matrix is formed by the Kronecker product of the two different transform matrices. The image information can be conveniently reconstructed by the corresponding inverse matrices. In experiment, six hybridization sets are performed in computational ghost… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 5 pages, 4 figures

  36. arXiv:2405.02567  [pdf, other

    eess.SP

    TiRE-GAN: Task-Incentivized Generative Learning Models for Radiomap Estimation with Radio Propagation Model

    Authors: Yueling Zhou, Achintha Wijesinghe, Songyang Zhang, Zhi Ding

    Abstract: Enriching geometric information on radio frequency (RF) signal power distribution in wireless communication systems, the radiomap has become an essential tool for resource allocation and network management. Usually, a dense radiomap is reconstructed from sparse observations collected by deployed sensors or mobile devices, which makes the radiomap estimation an urgent challenge. To leverage both ph… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  37. arXiv:2405.00542  [pdf, other

    eess.IV cs.CV

    UWAFA-GAN: Ultra-Wide-Angle Fluorescein Angiography Transformation via Multi-scale Generation and Registration Enhancement

    Authors: Ruiquan Ge, Zhaojie Fang, Pengxue Wei, Zhanghao Chen, Hongyang Jiang, Ahmed Elazab, Wangting Li, Xiang Wan, Shaochong Zhang, Changmiao Wang

    Abstract: Fundus photography, in combination with the ultra-wide-angle fundus (UWF) techniques, becomes an indispensable diagnostic tool in clinical settings by offering a more comprehensive view of the retina. Nonetheless, UWF fluorescein angiography (UWF-FA) necessitates the administration of a fluorescent dye via injection into the patient's hand or elbow unlike UWF scanning laser ophthalmoscopy (UWF-SLO… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  38. arXiv:2404.16920  [pdf, other

    cs.NI cs.IT cs.LG eess.SP

    Structured Reinforcement Learning for Delay-Optimal Data Transmission in Dense mmWave Networks

    Authors: Shufan Wang, Guojun Xiong, Shichen Zhang, Huacheng Zeng, Jian Li, Shivendra Panwar

    Abstract: We study the data packet transmission problem (mmDPT) in dense cell-free millimeter wave (mmWave) networks, i.e., users sending data packet requests to access points (APs) via uplinks and APs transmitting requested data packets to users via downlinks. Our objective is to minimize the average delay in the system due to APs' limited service capacity and unreliable wireless channels between APs and u… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: IEEE Transactions on Wireless Communications

  39. arXiv:2404.16905  [pdf, other

    cs.CL cs.SD eess.AS

    Samsung Research China-Bei**g at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations

    Authors: Shen Zhang, Haojie Zhang, **g Zhang, Xudong Zhang, Yimeng Zhuang, **ting Wu

    Abstract: In human-computer interaction, it is crucial for agents to respond to human by understanding their emotions. Unraveling the causes of emotions is more challenging. A new task named Multimodal Emotion-Cause Pair Extraction in Conversations is responsible for recognizing emotion and identifying causal expressions. In this study, we propose a multi-stage framework to generate emotion and extract the… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  40. arXiv:2404.15620  [pdf, other

    eess.IV

    A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution

    Authors: Zhixiong Yang, **gyuan Xia, Shengxi Li, Xinghua Huang, Shuanghui Zhang, Zhen Liu, Yaowen Fu, Yongxiang Liu

    Abstract: Deep learning-based methods have achieved significant successes on solving the blind super-resolution (BSR) problem. However, most of them request supervised pre-training on labelled datasets. This paper proposes an unsupervised kernel estimation model, named dynamic kernel prior (DKP), to realize an unsupervised and pre-training-free learning-based algorithm for solving the BSR problem. DKP can a… ▽ More

    Submitted 25 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted for publication in CVPR 2024

  41. arXiv:2404.15341  [pdf, other

    eess.SP cs.LG

    Classifier-guided neural blind deconvolution: a physics-informed denoising module for bearing fault diagnosis under heavy noise

    Authors: **g-Xiao Liao, Chao He, Jipu Li, **wei Sun, Shi** Zhang, Xiaoge Zhang

    Abstract: Blind deconvolution (BD) has been demonstrated as an efficacious approach for extracting bearing fault-specific features from vibration signals under strong background noise. Despite BD's desirable feature in adaptability and mathematical interpretability, a significant challenge persists: How to effectively integrate BD with fault-diagnosing classifiers? This issue arises because the traditional… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  42. arXiv:2404.11168  [pdf

    physics.optics eess.SP

    Microwave photonic short-time Fourier transform based on stabilized period-one nonlinear laser dynamics and stimulated Brillouin scattering

    Authors: Sunan Zhang, Taixia Shi, Lizhong Jiang, Yang Chen

    Abstract: A microwave photonic short-time Fourier transform (STFT) system based on stabilized period-one (P1) nonlinear laser dynamics and stimulated Brillouin scattering (SBS) is proposed. By using an optoelectronic feedback loop, the frequency-sweep optical signal generated by the P1 nonlinear laser dynamics is stabilized, which is further used in conjunction with an optical bandpass filter implemented by… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 9 pages, 6 figures

  43. arXiv:2404.11070  [pdf

    cs.CV eess.SP

    Sky-GVIO: an enhanced GNSS/INS/Vision navigation with FCN-based sky-segmentation in urban canyon

    Authors: **grong Wang, Bo Xu, Ronghe **, Shoujian Zhang, Kefu Gao, **gnan Liu

    Abstract: Accurate, continuous, and reliable positioning is a critical component of achieving autonomous driving. However, in complex urban canyon environments, the vulnerability of a stand-alone sensor and non-line-of-sight (NLOS) caused by high buildings, trees, and elevated structures seriously affect positioning results. To address these challenges, a sky-view images segmentation algorithm based on Full… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  44. arXiv:2404.10605  [pdf, other

    cs.IT eess.SY

    UAV Trajectory Optimization for Sensing Exploiting Target Location Distribution Map

    Authors: Xiangming Du, Shuowen Zhang, Liang Liu

    Abstract: In this paper, we study the trajectory optimization of a cellular-connected unmanned aerial vehicle (UAV) which aims to sense the location of a target while maintaining satisfactory communication quality with the ground base stations (GBSs). In contrast to most existing works which assumed the target's location is known, we focus on a more challenging scenario where the exact location of the targe… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: to appear in IEEE Vehicular Technology Conference (VTC) Spring, 2024

  45. arXiv:2404.09905  [pdf, other

    cs.NI cs.MM eess.IV eess.SY

    Quality of Experience Oriented Cross-layer Optimization for Real-time XR Video Transmission

    Authors: Guang** Pan, Shugong Xu, Shunqing Zhang, Xiao**g Chen, Yanzan Sun

    Abstract: Extended reality (XR) is one of the most important applications of beyond 5G and 6G networks. Real-time XR video transmission presents challenges in terms of data rate and delay. In particular, the frame-by-frame transmission mode of XR video makes real-time XR video very sensitive to dynamic network environments. To improve the users' quality of experience (QoE), we design a cross-layer transmiss… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 14 pages, 13 figures. arXiv admin note: text overlap with arXiv:2402.01180

  46. arXiv:2404.07989  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.SD eess.AS

    Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

    Authors: Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Shanghang Zhang, Peng Gao, Hongsheng Li, Xuelong Li

    Abstract: Large foundation models have recently emerged as a prominent focus of interest, attaining superior performance in widespread scenarios. Due to the scarcity of 3D data, many efforts have been made to adapt pre-trained transformers from vision to 3D domains. However, such 2D-to-3D approaches are still limited, due to the potential loss of spatial geometries and high computation cost. More importantl… ▽ More

    Submitted 30 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Code and models are released at https://github.com/Ivan-Tang-3D/Any2Point

  47. arXiv:2404.06695  [pdf, other

    eess.IV physics.med-ph

    Spiral Scanning and Self-Supervised Image Reconstruction Enable Ultra-Sparse Sampling Multispectral Photoacoustic Tomography

    Authors: Yutian Zhong, Xiaoming Zhang, Zongxin Mo, Shuangyang Zhang, Wufan Chen, Li Qi

    Abstract: Multispectral photoacoustic tomography (PAT) is an imaging modality that utilizes the photoacoustic effect to achieve non-invasive and high-contrast imaging of internal tissues. However, the hardware cost and computational demand of a multispectral PAT system consisting of up to thousands of detectors are huge. To address this challenge, we propose an ultra-sparse spiral sampling strategy for mult… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  48. arXiv:2404.01192  [pdf, other

    eess.IV cs.CV

    iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer

    Authors: Fengtao Zhou, Yingxue Xu, Yanfen Cui, Shenyan Zhang, Yun Zhu, Weiyang He, Jiguang Wang, Xin Wang, Ronald Chan, Louis Ho Shing Lau, Chu Han, Dafu Zhang, Zhenhui Li, Hao Chen

    Abstract: Gastric cancer (GC) is a prevalent malignancy worldwide, ranking as the fifth most common cancer with over 1 million new cases and 700 thousand deaths in 2020. Locally advanced gastric cancer (LAGC) accounts for approximately two-thirds of GC diagnoses, and neoadjuvant chemotherapy (NACT) has emerged as the standard treatment for LAGC. However, the effectiveness of NACT varies significantly among… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 27 pages, 9 figures, 3 tables (under review)

  49. arXiv:2404.01148  [pdf, other

    cs.IT eess.SP

    Joint Beam Scheduling and Beamforming Design for Cooperative Positioning in Multi-beam LEO Satellite Networks

    Authors: Hongtao Xv, Yaohua Sun, Yafei Zhao, Mugen Peng, Shijie Zhang

    Abstract: Cooperative positioning with multiple low earth orbit (LEO) satellites is promising in providing location-based services and enhancing satellite-terrestrial communication. However, positioning accuracy is greatly affected by inter-beam interference and satellite-terrestrial topology geometry. To select the best combination of satellites from visible ones and suppress inter-beam interference, this… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  50. arXiv:2403.19971  [pdf, other

    eess.AS eess.SP

    3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Tinglong Zhu, Changhe Song, Rongjie Huang, Ziyang Ma, Qian Chen, Shiliang Zhang, Xihao Li

    Abstract: This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization. It is designed for the needs of academic researchers and industrial practitioners. The 3D-Speaker-Toolkit adeptly leverages the combined strengths of acoustic, semantic, and visual data, seamlessly fusing these modalities to offer robust speaker recognition capabilities. The acous… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.