Skip to main content

Showing 1–50 of 326 results for author: Wang, Q

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19311  [pdf, other

    cs.CR cs.SD eess.AS

    Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

    Authors: Zheng Fang, Tao Wang, Lingchen Zhao, Shenyi Zhang, Bowen Li, Yunjie Ge, Qi Li, Chao Shen, Qian Wang

    Abstract: In recent years, extensive research has been conducted on the vulnerability of ASR systems, revealing that black-box adversarial example attacks pose significant threats to real-world ASR systems. However, most existing black-box attacks rely on queries to the target ASRs, which is impractical when queries are not permitted. In this paper, we propose ZQ-Attack, a transfer-based adversarial attack… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in the Proceedings of The ACM Conference on Computer and Communications Security (CCS), 2024

  2. arXiv:2406.15160  [pdf, other

    eess.AS eess.SP

    Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios

    Authors: Ya Jiang, Qing Wang, Jun Du, Maocheng Hu, Pengfei Hu, Zeyan Liu, Shi Cheng, Zhaoxu Nian, Yuxuan Dong, Mingqi Cai, Xin Fang, Chin-Hui Lee

    Abstract: This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich c… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: accepted by icme2024

  3. arXiv:2406.13645  [pdf, other

    eess.IV cs.CV

    Advancing UWF-SLO Vessel Segmentation with Source-Free Active Domain Adaptation and a Novel Multi-Center Dataset

    Authors: Hongqiu Wang, Xiangde Luo, Wu Chen, Qingqing Tang, Mei Xin, Qiong Wang, Lei Zhu

    Abstract: Accurate vessel segmentation in Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images is crucial for diagnosing retinal diseases. Although recent techniques have shown encouraging outcomes in vessel segmentation, models trained on one medical dataset often underperform on others due to domain shifts. Meanwhile, manually labeling high-resolution UWF-SLO images is an extremely challenging,… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024 Early Accept

  4. arXiv:2406.11446  [pdf, other

    eess.SP

    Approximate Angular Domain Expression for Near-Field XL-MIMO Channel

    Authors: Hongbo Xing, Yuxiang Zhang, Jianhua Zhang, Huixin Xu, Guangyi Liu, Qixing Wang

    Abstract: As Extremely Large-Scale Multiple-Input-Multiple-Output (XL-MIMO) technology advances and frequency band rises, the near-field effects in communication are intensifying. A concise and accurate near-field XL-MIMO channel model serves as the cornerstone for investigating the near-field effects. However, existing angular domain XL-MIMO channel models under near-field conditions require non-closed-for… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.10283  [pdf, other

    cs.CL cs.SD eess.AS

    Attentive Merging of Hidden Embeddings from Pre-trained Speech Model for Anti-spoofing Detection

    Authors: Zihan Pan, Tianchi Liu, Hardik B. Sailor, Qiongqiong Wang

    Abstract: Self-supervised learning (SSL) speech representation models, trained on large speech corpora, have demonstrated effectiveness in extracting hierarchical speech embeddings through multiple transformer layers. However, the behavior of these embeddings in specific tasks remains uncertain. This paper investigates the multi-layer behavior of the WavLM model in anti-spoofing and proposes an attentive me… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  6. arXiv:2406.05974  [pdf, other

    eess.IV cs.CV

    Inter-slice Super-resolution of Magnetic Resonance Images by Pre-training and Self-supervised Fine-tuning

    Authors: Xin Wang, Zhiyun Song, Yitao Zhu, Sheng Wang, Lichi Zhang, Dinggang Shen, Qian Wang

    Abstract: In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for both image visualization and subsequent analysis tasks, which often require isotropic voxel spacing. To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated.… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: ISBI 2024

  7. arXiv:2406.02092  [pdf, other

    cs.SD cs.AI cs.LG eess.AS eess.SP

    MaskSR: Masked Language Model for Full-band Speech Restoration

    Authors: Xu Li, Qirui Wang, Xiaoyu Liu

    Abstract: Speech restoration aims at restoring high quality speech in the presence of a diverse set of distortions. Although several deep learning paradigms have been studied for this task, the power of the recently emerging language models has not been fully explored. In this paper, we propose MaskSR, a masked language model capable of restoring full-band 44.1 kHz speech jointly considering noise, reverb,… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024. Demo page: https://masksr.github.io/MaskSR/

  8. arXiv:2406.01605  [pdf, other

    eess.IV cs.CV

    An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

    Authors: Zijun Gao, Qi Wang, Taiyuan Mei, Xiaohan Cheng, Yun Zi, Haowei Yang

    Abstract: The traditional SegNet architecture commonly encounters significant information loss during the sampling process, which detrimentally affects its accuracy in image semantic segmentation tasks. To counter this challenge, we introduce an innovative encoder-decoder network structure enhanced with residual connections. Our approach employs a multi-residual connection strategy designed to preserve the… ▽ More

    Submitted 26 May, 2024; originally announced June 2024.

  9. arXiv:2406.00085  [pdf, other

    eess.IV cs.LG q-bio.NC

    Augmentation-based Unsupervised Cross-Domain Functional MRI Adaptation for Major Depressive Disorder Identification

    Authors: Yunling Ma, Chaojun Zhang, Xiaochuan Wang, Qianqian Wang, Liang Cao, Limei Zhang, Mingxia Liu

    Abstract: Major depressive disorder (MDD) is a common mental disorder that typically affects a person's mood, cognition, behavior, and physical health. Resting-state functional magnetic resonance imaging (rs-fMRI) data are widely used for computer-aided diagnosis of MDD. While multi-site fMRI data can provide more data for training reliable diagnostic models, significant cross-site data heterogeneity would… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

  10. arXiv:2405.16952  [pdf, other

    eess.AS

    A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and Recognition

    Authors: Zilu Guo, Qing Wang, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui

    Abstract: In this paper, we propose a variance-preserving interpolation framework to improve diffusion models for single-channel speech enhancement (SE) and automatic speech recognition (ASR). This new variance-preserving interpolation diffusion model (VPIDM) approach requires only 25 iterative steps and obviates the need for a corrector, an essential element in the existing variance-exploding interpolation… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  11. arXiv:2405.16248  [pdf

    eess.IV cs.CV cs.LG q-bio.QM

    Combining Radiomics and Machine Learning Approaches for Objective ASD Diagnosis: Verifying White Matter Associations with ASD

    Authors: Junlin Song, Yuzhuo Chen, Yuan Yao, Zetong Chen, Renhao Guo, Lida Yang, Xinyi Sui, Qihang Wang, Xijiao Li, Aihua Cao, Wei Li

    Abstract: Autism Spectrum Disorder is a condition characterized by a typical brain development leading to impairments in social skills, communication abilities, repetitive behaviors, and sensory processing. There have been many studies combining brain MRI images with machine learning algorithms to achieve objective diagnosis of autism, but the correlation between white matter and autism has not been fully u… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  12. arXiv:2405.11386  [pdf, other

    eess.IV cs.CV

    Liver Fat Quantification Network with Body Shape

    Authors: Qiyue Wang, Wu Xue, Xiaoke Zhang, Fang **, James Hahn

    Abstract: It is critically important to detect the content of liver fat as it is related to cardiac complications and cardiovascular disease mortality. However, existing methods are either associated with high cost and/or medical complications (e.g., liver biopsy, imaging technology) or only roughly estimate the grades of steatosis. In this paper, we propose a deep neural network to estimate the percentage… ▽ More

    Submitted 30 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

  13. arXiv:2405.10786  [pdf, other

    eess.AS

    Distinctive and Natural Speaker Anonymization via Singular Value Transformation-assisted Matrix

    Authors: Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie

    Abstract: Speaker anonymization is an effective privacy protection solution that aims to conceal the speaker's identity while preserving the naturalness and distinctiveness of the original speech. Mainstream approaches use an utterance-level vector from a pre-trained automatic speaker verification (ASV) model to represent speaker identity, which is then averaged or modified for anonymization. However, these… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

  14. arXiv:2405.10116  [pdf, other

    eess.SY eess.SP

    Enhancing Energy Efficiency in O-RAN Through Intelligent xApps Deployment

    Authors: Xuanyu Liang, Ahmed Al-Tahmeesschi, Qiao Wang, Swarna Chetty, Chenrui Sun, Hamed Ahmadi

    Abstract: The proliferation of 5G technology presents an unprecedented challenge in managing the energy consumption of densely deployed network infrastructures, particularly Base Stations (BSs), which account for the majority of power usage in mobile networks. The O-RAN architecture, with its emphasis on open and intelligent design, offers a promising framework to address the Energy Efficiency (EE) demands… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 6 pages, 4 figures

  15. arXiv:2405.09539  [pdf, ps, other

    eess.IV cs.CV cs.MM

    MMFusion: Multi-modality Diffusion Model for Lymph Node Metastasis Diagnosis in Esophageal Cancer

    Authors: Chengyu Wu, Chengkai Wang, Yaqi Wang, Huiyu Zhou, Yatao Zhang, Qifeng Wang, Shuai Wang

    Abstract: Esophageal cancer is one of the most common types of cancer worldwide and ranks sixth in cancer-related mortality. Accurate computer-assisted diagnosis of cancer progression can help physicians effectively customize personalized treatment plans. Currently, CT-based cancer diagnosis methods have received much attention for their comprehensive ability to examine patients' conditions. However, multi-… ▽ More

    Submitted 16 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Early accepted to MICCAI 2024 (6/6/5)

  16. arXiv:2405.06178  [pdf, other

    eess.IV cs.LG q-bio.NC

    ACTION: Augmentation and Computation Toolbox for Brain Network Analysis with Functional MRI

    Authors: Yuqi Fang, Junhao Zhang, Linmin Wang, Qianqian Wang, Mingxia Liu

    Abstract: Functional magnetic resonance imaging (fMRI) has been increasingly employed to investigate functional brain activity. Many fMRI-related software/toolboxes have been developed, providing specialized algorithms for fMRI analysis. However, existing toolboxes seldom consider fMRI data augmentation, which is quite useful, especially in studies with limited or imbalanced data. Moreover, current studies… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures, 5 tables

  17. arXiv:2405.05500  [pdf

    cs.RO eess.SY

    Research on the Tender Leaf Identification and Mechanically Perceptible Plucking Finger for High-quality Green Tea

    Authors: Wei Zhang, Yong Chen, Qianqian Wang, Jun Chen

    Abstract: BACKGROUND: Intelligent identification and precise plucking are the keys to intelligent tea harvesting robots, which are of increasing significance nowadays. Aiming at plucking tender leaves for high-quality green tea producing, in this paper, a tender leaf identification algorithm and a mechanically perceptible plucking finger have been proposed. RESULTS: Based on segmentation algorithm and color… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  18. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  19. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  20. arXiv:2404.09192  [pdf, other

    cs.SD cs.AI eess.AS

    Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling

    Authors: Quanxiu Wang, Hui Huang, Mingjie Wang, Yong Dai, **zuomu Zhong, Benlai Tang

    Abstract: Over the past decade, a series of unflagging efforts have been dedicated to develo** highly expressive and controllable text-to-speech (TTS) systems. In general, the holistic TTS comprises two interconnected components: the frontend module and the backend module. The frontend excels in capturing linguistic representations from the raw text input, while the backend module converts linguistic cues… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  21. arXiv:2404.03253  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    A dataset of primary nasopharyngeal carcinoma MRI with multi-modalities segmentation

    Authors: Yin Li, Qi Chen, Kai Wang, Meige Li, Li** Si, Yingwei Guo, Yu Xiong, Qixing Wang, Yang Qin, Ling Xu, Patrick van der Smagt, Jun Tang, Nutan Chen

    Abstract: Multi-modality magnetic resonance imaging data with various sequences facilitate the early diagnosis, tumor segmentation, and disease staging in the management of nasopharyngeal carcinoma (NPC). The lack of publicly available, comprehensive datasets limits advancements in diagnosis, treatment planning, and the development of machine learning algorithms for NPC. Addressing this critical need, we in… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  22. arXiv:2404.01717  [pdf, other

    cs.CV eess.IV

    AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation

    Authors: Rui Xie, Ying Tai, Chen Zhao, Kai Zhang, Zhenyu Zhang, Jun Zhou, Xiaoqian Ye, Qian Wang, Jian Yang

    Abstract: Blind super-resolution methods based on stable diffusion showcase formidable generative capabilities in reconstructing clear high-resolution images with intricate details from low-resolution inputs. However, their practical applicability is often hampered by poor efficiency, stemming from the requirement of thousands or hundreds of sampling steps. Inspired by the efficient adversarial diffusion di… ▽ More

    Submitted 23 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  23. arXiv:2403.16830  [pdf, other

    cs.NI eess.SP

    Exploring Communication Technologies, Standards, and Challenges in Electrified Vehicle Charging

    Authors: Xiang Ma, Yuan Zhou, Hanwen Zhang, Qun Wang, Haijian Sun, Hongjie Wang, Rose Qingyang Hu

    Abstract: As public awareness of environmental protection continues to grow, the trend of integrating more electric vehicles (EVs) into the transportation sector is rising. Unlike conventional internal combustion engine (ICE) vehicles, EVs can minimize carbon emissions and potentially achieve autonomous driving. However, several obstacles hinder the widespread adoption of EVs, such as their constrained driv… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: submitted to IET Communication as a survey paper

  24. arXiv:2403.11699  [pdf, other

    eess.IV cs.CV

    A Spatial-Temporal Progressive Fusion Network for Breast Lesion Segmentation in Ultrasound Videos

    Authors: Zhengzheng Tu, Zigang Zhu, Yayang Duan, Bo Jiang, Qishun Wang, Chaoxue Zhang

    Abstract: Ultrasound video-based breast lesion segmentation provides a valuable assistance in early breast lesion detection and treatment. However, existing works mainly focus on lesion segmentation based on ultrasound breast images which usually can not be adapted well to obtain desirable results on ultrasound videos. The main challenge for ultrasound video-based breast lesion segmentation is how to exploi… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  25. arXiv:2403.10146  [pdf, other

    cs.SD cs.IR eess.AS

    Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval

    Authors: Qian Wang, Jia-Chen Gu, Zhen-Hua Ling

    Abstract: Audio-text retrieval (ATR), which retrieves a relevant caption given an audio clip (A2T) and vice versa (T2A), has recently attracted much research attention. Existing methods typically aggregate information from each modality into a single vector for matching, but this sacrifices local details and can hardly capture intricate relationships within and between modalities. Furthermore, current ATR d… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 5 pages, accepted to ICASSP2024

  26. arXiv:2403.06404  [pdf, other

    cs.SD cs.LG eess.AS

    Cosine Scoring with Uncertainty for Neural Speaker Embedding

    Authors: Qiongqiong Wang, Kong Aik Lee

    Abstract: Uncertainty modeling in speaker representation aims to learn the variability present in speech utterances. While the conventional cosine-scoring is computationally efficient and prevalent in speaker recognition, it lacks the capability to handle uncertainty. To address this challenge, this paper proposes an approach for estimating uncertainty at the speaker embedding front-end and propagating it t… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: 5 pages, 4 figures

    Journal ref: IEEE Signal Processing Letters 2024

  27. arXiv:2402.14349  [pdf, other

    eess.IV cs.CV cs.LG

    Uncertainty-driven and Adversarial Calibration Learning for Epicardial Adipose Tissue Segmentation

    Authors: Kai Zhao, Zhiming Liu, Jiaqi Liu, **gbiao Zhou, Bihong Liao, Huifang Tang, Qiuyu Wang, Chunquan Li

    Abstract: Epicardial adipose tissue (EAT) is a type of visceral fat that can secrete large amounts of adipokines to affect the myocardium and coronary arteries. EAT volume and density can be used as independent risk markers measurement of volume by noninvasive magnetic resonance images is the best method of assessing EAT. However, segmenting EAT is challenging due to the low contrast between EAT and pericar… ▽ More

    Submitted 23 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 13 pages,7 figuers

  28. arXiv:2402.09871  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

    Authors: Zihao Wang, Shuyu Li, Tao Zhang, Qi Wang, Pengfei Yu, **yang Luo, Yan Liu, Ming Xi, Kejun Zhang

    Abstract: The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music descripti… ▽ More

    Submitted 13 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted by International Joint Conference on Artificial Intelligence 2024 (IJCAI 2024)

    MSC Class: 68Txx(Primary)14F05; 91Fxx(Secondary) ACM Class: I.2.7; J.5

  29. arXiv:2402.02442  [pdf, other

    cs.LG eess.IV

    A Momentum Accelerated Algorithm for ReLU-based Nonlinear Matrix Decomposition

    Authors: Qingsong Wang, Chunfeng Cui, Deren Han

    Abstract: Recently, there has been a growing interest in the exploration of Nonlinear Matrix Decomposition (NMD) due to its close ties with neural networks. NMD aims to find a low-rank matrix from a sparse nonnegative matrix with a per-element nonlinear function. A typical choice is the Rectified Linear Unit (ReLU) activation function. To address over-fitting in the existing ReLU-based NMD model (ReLU-NMD),… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 5 pages, 7 figures

  30. arXiv:2401.17837  [pdf, ps, other

    eess.SY

    Safe Reinforcement Learning-Based Eco-Driving Control for Mixed Traffic Flows With Disturbances

    Authors: Ke Lu, Dongjun Li, Qun Wang, Kaidi Yang, Lin Zhao, Ziyou Song

    Abstract: This paper presents a safe learning-based eco-driving framework tailored for mixed traffic flows, which aims to optimize energy efficiency while guaranteeing safety during real-system operations. Even though reinforcement learning (RL) is capable of optimizing energy efficiency in intricate environments, it is challenged by safety requirements during the training process. The lack of safety guaran… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  31. Localization of Dummy Data Injection Attacks in Power Systems Considering Incomplete Topological Information: A Spatio-Temporal Graph Wavelet Convolutional Neural Network Approach

    Authors: Zhaoyang Qu, Yunchang Dong, Yang Li, Siqi Song, Tao Jiang, Min Li, Qiming Wang, Lei Wang, Xiaoyong Bo, Jiye Zang, Qi Xu

    Abstract: The emergence of novel the dummy data injection attack (DDIA) poses a severe threat to the secure and stable operation of power systems. These attacks are particularly perilous due to the minimal Euclidean spatial separation between the injected malicious data and legitimate data, rendering their precise detection challenging using conventional distance-based methods. Furthermore, existing researc… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Comments: Accepted by Applied Energy

    Journal ref: Applied Energy 360 (2024) 122736

  32. arXiv:2401.13014  [pdf

    eess.SY

    A Novel Policy Iteration Algorithm for Nonlinear Continuous-Time H$\infty$ Control Problem

    Authors: Qi Wang

    Abstract: H{\infty} control of nonlinear continuous-time system depends on the solution of the Hamilton-Jacobi-Isaacs (HJI) equation, which has been proved impossible to obtain a closed-form solution due to the nonlinearity of HJI equation. In order to solve HJI equation, many iterative algorithms were proposed, and most of the algorithms were essentially Newton method when the fixed-point equation was cons… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 25 pages, 10 figures. arXiv admin note: text overlap with arXiv:2401.12882

  33. arXiv:2401.11836  [pdf, other

    cs.LG cs.CR eess.SY

    Privacy-Preserving Data Fusion for Traffic State Estimation: A Vertical Federated Learning Approach

    Authors: Qiqing Wang, Kaidi Yang

    Abstract: This paper proposes a privacy-preserving data fusion method for traffic state estimation (TSE). Unlike existing works that assume all data sources to be accessible by a single trusted party, we explicitly address data privacy concerns that arise in the collaboration and data sharing between multiple data owners, such as municipal authorities (MAs) and mobility providers (MPs). To this end, we prop… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  34. arXiv:2401.03506  [pdf, other

    eess.AS cs.LG cs.SD

    DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

    Authors: Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao

    Abstract: In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system. Various goals can be achieved with the proposed framework, such as improving the readability of the diarized transcript, or reducing the word diarization error rate (WDER). In this framework, the outputs of the automatic speech recognition (A… ▽ More

    Submitted 26 June, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

  35. arXiv:2312.17583  [pdf, other

    eess.SY cs.RO

    Enhancing the Performance of DeepReach on High-Dimensional Systems through Optimizing Activation Functions

    Authors: Qian Wang, Tianhao Wu

    Abstract: With the continuous advancement in autonomous systems, it becomes crucial to provide robust safety guarantees for safety-critical systems. Hamilton-Jacobi Reachability Analysis is a formal verification method that guarantees performance and safety for dynamical systems and is widely applicable to various tasks and challenges. Traditionally, reachability problems are solved by using grid-based meth… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

  36. arXiv:2312.16998  [pdf, other

    eess.IV cs.CV

    Deep Unfolding Network with Spatial Alignment for multi-modal MRI reconstruction

    Authors: Hao Zhang, Qi Wang, Jun Shi, Shihui Ying, Zhijie Wen

    Abstract: Multi-modal Magnetic Resonance Imaging (MRI) offers complementary diagnostic information, but some modalities are limited by the long scanning time. To accelerate the whole acquisition process, MRI reconstruction of one modality from highly undersampled k-space data with another fully-sampled reference modality is an efficient solution. However, the misalignment between modalities, which is common… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  37. arXiv:2312.13556  [pdf, other

    cs.SD eess.AS

    Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions

    Authors: Yang Liu, Haoqin Sun, Geng Chen, Qingyue Wang, Zhen Zhao, Xugang Lu, Longbiao Wang

    Abstract: Speech emotion recognition (SER) performance deteriorates significantly in the presence of noise, making it challenging to achieve competitive performance in noisy conditions. To this end, we propose a multi-level knowledge distillation (MLKD) method, which aims to transfer the knowledge from a teacher model trained on clean speech to a simpler student model trained on noisy speech. Specifically,… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted by INTERSPEECH 2023

  38. arXiv:2312.12023  [pdf, other

    eess.IV cs.CV

    Progressive Frequency-Aware Network for Laparoscopic Image Desmoking

    Authors: Jiale Zhang, Wenfeng Huang, Xiangyun Liao, Qiong Wang

    Abstract: Laparoscopic surgery offers minimally invasive procedures with better patient outcomes, but smoke presence challenges visibility and safety. Existing learning-based methods demand large datasets and high computational resources. We propose the Progressive Frequency-Aware Network (PFAN), a lightweight GAN framework for laparoscopic image desmoking, combining the strengths of CNN and Transformer for… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  39. arXiv:2312.11123  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers

    Authors: Guru Prakash Arumugam, Shuo-yiin Chang, Tara N. Sainath, Rohit Prabhavalkar, Quan Wang, Shaan Bijwadia

    Abstract: ASR models often suffer from a long-form deletion problem where the model predicts sequential blanks instead of words when transcribing a lengthy audio (in the order of minutes or hours). From the perspective of a user or downstream system consuming the ASR results, this behavior can be perceived as the model "being stuck", and potentially make the product hard to use. One of the culprits for long… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: 8 pages, ASRU 2023

  40. arXiv:2312.04131  [pdf, other

    eess.AS cs.SD

    Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization

    Authors: Huan Zhao, Li Zhang, Yue Li, Yannan Wang, Hongji Wang, Wei Rao, Qing Wang, Lei Xie

    Abstract: The scarcity of labeled audio-visual datasets is a constraint for training superior audio-visual speaker diarization systems. To improve the performance of audio-visual speaker diarization, we leverage pre-trained supervised and self-supervised speech models for audio-visual speaker diarization. Specifically, we adopt supervised~(ResNet and ECAPA-TDNN) and self-supervised pre-trained models~(WavLM… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  41. arXiv:2312.03620  [pdf, other

    eess.AS cs.SD

    Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

    Authors: Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li

    Abstract: Previous studies demonstrate the impressive performance of residual neural networks (ResNet) in speaker verification. The ResNet models treat the time and frequency dimensions equally. They follow the default stride configuration designed for image recognition, where the horizontal and vertical axes exhibit similarities. This approach ignores the fact that time and frequency are asymmetric in spee… ▽ More

    Submitted 24 April, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Accepted to IEEE/ACM Transactions on Audio, Speech, and Language Processing. Open Access: https://ieeexplore.ieee.org/abstract/document/10497864

  42. arXiv:2311.08904  [pdf, ps, other

    cs.IT eess.SP

    Energy-Efficient Design of Satellite-Terrestrial Computing in 6G Wireless Networks

    Authors: Qi Wang, Xiaoming Chen, Qiao Qi

    Abstract: In this paper, we investigate the issue of satellite-terrestrial computing in the sixth generation (6G) wireless networks, where multiple terrestrial base stations (BSs) and low earth orbit (LEO) satellites collaboratively provide edge computing services to ground user equipments (GUEs) and space user equipments (SUEs) over the world. In particular, we design a complete process of satellite-terres… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  43. arXiv:2311.08225  [pdf, other

    eess.IV cs.CV

    Uni-COAL: A Unified Framework for Cross-Modality Synthesis and Super-Resolution of MR Images

    Authors: Zhiyun Song, Zengxin Qi, Xin Wang, Xiangyu Zhao, Zhenrong Shen, Sheng Wang, Manman Fei, Zhe Wang, Di Zang, Dongdong Chen, Linlin Yao, Qian Wang, Xuehai Wu, Lichi Zhang

    Abstract: Cross-modality synthesis (CMS), super-resolution (SR), and their combination (CMSR) have been extensively studied for magnetic resonance imaging (MRI). Their primary goals are to enhance the imaging quality by synthesizing the desired modality and reducing the slice thickness. Despite the promising synthetic results, these techniques are often tailored to specific tasks, thereby limiting their ada… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  44. arXiv:2311.06705  [pdf, other

    eess.SY

    Equal Incremental Cost-Based Optimization Method to Enhance Efficiency for IPOP-Type Converters

    Authors: Hanfeng Cai, Haiyang Liu, Heyang Sun, Qiao Wang

    Abstract: Systematic optimization over a wide power range is often achieved through the combination of modules of different power levels. This paper addresses the issue of enhancing the efficiency of a multiple module system connected in parallel during operation and proposes an algorithm based on equal incremental cost for dynamic load allocation. Initially, a polynomial fitting technique is employed to fi… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

  45. arXiv:2311.05415  [pdf, other

    eess.SP

    EEG-DG: A Multi-Source Domain Generalization Framework for Motor Imagery EEG Classification

    Authors: Xiao-Cong Zhong, Qisong Wang, Dan Liu, Zhihuang Chen, **g-Xiao Liao, **wei Sun, Yudong Zhang, Feng-Lei Fan

    Abstract: Motor imagery EEG classification plays a crucial role in non-invasive Brain-Computer Interface (BCI) research. However, the classification is affected by the non-stationarity and individual variations of EEG signals. Simply pooling EEG data with different statistical distributions to train a classification model can severely degrade the generalization performance. To address this issue, the existi… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  46. arXiv:2311.03419  [pdf, other

    eess.AS cs.LG cs.SD

    Personalizing Keyword Spotting with Speaker Information

    Authors: Beltrán Labrador, Pai Zhu, Guanlong Zhao, Angelo Scorza Scarpati, Quan Wang, Alicia Lozano-Diez, Alex Park, Ignacio López Moreno

    Abstract: Keyword spotting systems often struggle to generalize to a diverse population with various accents and age groups. To address this challenge, we propose a novel approach that integrates speaker information into keyword spotting using Feature-wise Linear Modulation (FiLM), a recent method for learning from multiple sources of information. We explore both Text-Dependent and Text-Independent speaker… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  47. arXiv:2310.11153  [pdf, other

    cs.CV eess.SP

    Unsupervised Pre-Training Using Masked Autoencoders for ECG Analysis

    Authors: Guoxin Wang, Qingyuan Wang, Ganesh Neelakanta Iyer, Avishek Nag, Deepu John

    Abstract: Unsupervised learning methods have become increasingly important in deep learning due to their demonstrated large utilization of datasets and higher accuracy in computer vision and natural language processing tasks. There is a growing trend to extend unsupervised learning methods to other domains, which helps to utilize a large amount of unlabelled data. This paper proposes an unsupervised pre-tra… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted by IEEE Biomedical Circuits and Systems (BIOCAS) 2023

  48. arXiv:2310.03749  [pdf

    eess.SP cs.AI cs.LG

    SCVCNet: Sliding cross-vector convolution network for cross-task and inter-individual-set EEG-based cognitive workload recognition

    Authors: Qi Wang, Li Chen, Zhiyuan Zhan, Jianhua Zhang, Zhong Yin

    Abstract: This paper presents a generic approach for applying the cognitive workload recognizer by exploiting common electroencephalogram (EEG) patterns across different human-machine tasks and individual sets. We propose a neural network called SCVCNet, which eliminates task- and individual-set-related interferences in EEGs by analyzing finer-grained frequency structures in the power spectral densities. Th… ▽ More

    Submitted 21 September, 2023; originally announced October 2023.

    Comments: 12 pages

  49. arXiv:2310.01861  [pdf, other

    eess.IV cs.CV cs.GR

    Shifting More Attention to Breast Lesion Segmentation in Ultrasound Videos

    Authors: Junhao Lin, Qian Dai, Lei Zhu, Huazhu Fu, Qiong Wang, Weibin Li, Wenhao Rao, Xiaoyang Huang, Liansheng Wang

    Abstract: Breast lesion segmentation in ultrasound (US) videos is essential for diagnosing and treating axillary lymph node metastasis. However, the lack of a well-established and large-scale ultrasound video dataset with high-quality annotations has posed a persistent challenge for the research community. To overcome this issue, we meticulously curated a US video breast lesion segmentation dataset comprisi… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 10 pages

  50. arXiv:2310.01453  [pdf, other

    eess.SP cs.IT

    Enhancing Secrecy Capacity in PLS Communication with NORAN based on Pilot Information Codebooks

    Authors: Yebo Gu, Tao Shen, Jian Song, Qingbo Wang

    Abstract: In recent research, non-orthogonal artificial noise (NORAN) has been proposed as an alternative to orthogonal artificial noise (AN). However, NORAN introduces additional noise into the channel, which reduces the capacity of the legitimate channel (LC). At the same time, selecting a NORAN design with ideal security performance from a large number of design options is also a challenging problem. To… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.