Skip to main content

Showing 1–50 of 835 results for author: Wang, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Ya**g Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, **g Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  2. arXiv:2406.16928  [pdf, other

    eess.SP cs.LG

    A Multi-Resolution Mutual Learning Network for Multi-Label ECG Classification

    Authors: Wei Huang, Ning Wang, Panpan Feng, Haiyan Wang, Zongmin Wang, Bing Zhou

    Abstract: Electrocardiograms (ECG), which record the electrophysiological activity of the heart, have become a crucial tool for diagnosing these diseases. In recent years, the application of deep learning techniques has significantly improved the performance of ECG signal classification. Multi-resolution feature analysis, which captures and processes information at different time scales, can extract subtle… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  3. arXiv:2406.16314  [pdf, other

    eess.AS

    DreamVoice: Text-Guided Voice Conversion

    Authors: Jiarui Hai, Karan Thakkar, Helin Wang, Zengyi Qin, Mounya Elhilali

    Abstract: Generative voice technologies are rapidly evolving, offering opportunities for more personalized and inclusive experiences. Traditional one-shot voice conversion (VC) requires a target recording during inference, limiting ease of usage in generating desired voice timbres. Text-guided generation offers an intuitive solution to convert voices to desired "DreamVoices" according to the users' needs. O… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  4. arXiv:2406.13645  [pdf, other

    eess.IV cs.CV

    Advancing UWF-SLO Vessel Segmentation with Source-Free Active Domain Adaptation and a Novel Multi-Center Dataset

    Authors: Hongqiu Wang, Xiangde Luo, Wu Chen, Qingqing Tang, Mei Xin, Qiong Wang, Lei Zhu

    Abstract: Accurate vessel segmentation in Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images is crucial for diagnosing retinal diseases. Although recent techniques have shown encouraging outcomes in vessel segmentation, models trained on one medical dataset often underperform on others due to domain shifts. Meanwhile, manually labeling high-resolution UWF-SLO images is an extremely challenging,… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024 Early Accept

  5. arXiv:2406.12703  [pdf, other

    eess.IV cs.CV

    Coarse-Fine Spectral-Aware Deformable Convolution For Hyperspectral Image Reconstruction

    Authors: **cheng Yang, Lishun Wang, Miao Cao, Huan Wang, Yin** Zhao, Xin Yuan

    Abstract: We study the inverse problem of Coded Aperture Snapshot Spectral Imaging (CASSI), which captures a spatial-spectral data cube using snapshot 2D measurements and uses algorithms to reconstruct 3D hyperspectral images (HSI). However, current methods based on Convolutional Neural Networks (CNNs) struggle to capture long-range dependencies and non-local similarities. The recently popular Transformer-b… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures, Accepted by ICIP2024

  6. arXiv:2406.12019  [pdf

    eess.SY cs.CR cs.ET eess.SP

    Hacking Encrypted Wireless Power: Cyber-Security of Dynamic Charging

    Authors: Hui Wang, Nima Tashakor, Wei Jiang, Wei Liu, C. Q. Jiang, Stefan M. Goetz

    Abstract: Recently, energy encryption for wireless power transfer has been developed for energy safety, which is important in public places to suppress unauthorized energy extraction. Most techniques vary the frequency so that unauthorized receivers cannot extract energy because of non-resonance. However, this strategy is unreliable. To stimulate the progress of energy encryption technology and point out se… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 10 pages, 17 figures

  7. arXiv:2406.11169   

    eess.AS cs.SD

    Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Wen Wang

    Abstract: Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persisting challenge. In this paper, we propose a new self-supervised speaker verification approach, Self-Distillation Prototypes Network (SDPN), which effectively facilitates self-supervised speaker representation learning. SDPN assigns the representation of the augmented views of an… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: We update this paper to an earlier paper

  8. arXiv:2406.10052  [pdf, other

    cs.SD cs.CL eess.AS

    Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection

    Authors: Haoyu Wang, Guoqiang Hu, Guodong Lin, Wei-Qiang Zhang, Jian Li

    Abstract: As a robust and large-scale multilingual speech recognition model, Whisper has demonstrated impressive results in many low-resource and out-of-distribution scenarios. However, its encoder-decoder structure hinders its application to streaming speech recognition. In this paper, we introduce Simul-Whisper, which uses the time alignment embedded in Whisper's cross-attention to guide auto-regressive d… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  9. arXiv:2406.08445  [pdf, other

    eess.AS cs.LG cs.SD

    SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models

    Authors: Chun Yin, Tai-Shih Chi, Yu Tsao, Hsin-Min Wang

    Abstract: Representations from pre-trained speech foundation models (SFMs) have shown impressive performance in many downstream tasks. However, the potential benefits of incorporating pre-trained SFM representations into speaker voice similarity assessment have not been thoroughly investigated. In this paper, we propose SVSNet+, a model that integrates pre-trained SFM representations to improve performance… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  10. arXiv:2406.08203  [pdf, other

    eess.AS cs.SD

    LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation

    Authors: Wenhao Guan, Kaidi Wang, Wang** Zhou, Yang Wang, Feng Deng, Hui Wang, Lin Li, Qingyang Hong, Yong Qin

    Abstract: Recently, the application of diffusion models has facilitated the significant development of speech and audio generation. Nevertheless, the quality of samples generated by diffusion models still needs improvement. And the effectiveness of the method is accompanied by the extensive number of sampling steps, leading to an extended synthesis time necessary for generating high-quality audio. Previous… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech2024

  11. arXiv:2406.07871  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Flexible Music-Conditioned Dance Generation with Style Description Prompts

    Authors: Hongsong Wang, Yin Zhu, Xin Geng

    Abstract: Dance plays an important role as an artistic form and expression in human culture, yet the creation of dance remains a challenging task. Most dance generation methods primarily rely solely on music, seldom taking into consideration intrinsic attributes such as music style or genre. In this work, we introduce Flexible Dance Generation with Style Description Prompts (DGSDP), a diffusion-based framew… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  12. arXiv:2406.07461  [pdf, other

    eess.AS

    Noise-robust Speech Separation with Fast Generative Correction

    Authors: Helin Wang, Jesus Villalba, Laureano Moro-Velazquez, Jiarui Hai, Thomas Thebaud, Najim Dehak

    Abstract: Speech separation, the task of isolating multiple speech sources from a mixed audio signal, remains challenging in noisy environments. In this paper, we propose a generative correction method to enhance the output of a discriminative separator. By leveraging a generative corrector based on a diffusion model, we refine the separation process for single-channel mixture speech by removing noises and… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  13. arXiv:2406.06744  [pdf

    cs.LG cs.CR eess.SY

    A Multi-module Robust Method for Transient Stability Assessment against False Label Injection Cyberattacks

    Authors: Hanxuan Wang, Na Lu, Yinhong Liu, Zhuqing Wang, Zixuan Wang

    Abstract: The success of deep learning in transient stability assessment (TSA) heavily relies on high-quality training data. However, the label information in TSA datasets is vulnerable to contamination through false label injection (FLI) cyberattacks, resulting in degraded performance of deep TSA models. To address this challenge, a Multi-Module Robust TSA method (MMR) is proposed to rectify the supervised… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  14. arXiv:2406.05954  [pdf, other

    cs.AI cs.LG eess.SY

    Aligning Large Language Models with Representation Editing: A Control Perspective

    Authors: Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang

    Abstract: Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabi… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: fix typos

  15. arXiv:2406.05359  [pdf, other

    eess.AS cs.SD

    Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization

    Authors: Bei Liu, Haoyu Wang, Yanmin Qian

    Abstract: Modern speaker verification (SV) systems typically demand expensive storage and computing resources, thereby hindering their deployment on mobile devices. In this paper, we explore adaptive neural network quantization for lightweight speaker verification. Firstly, we propose a novel adaptive uniform precision quantization method which enables the dynamic generation of quantization centroids custom… ▽ More

    Submitted 18 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: submitted to IEEE/ACM Transactions on Audio Speech and Language Processing (Under Review)

  16. arXiv:2406.04105  [pdf, other

    cs.LG eess.IV

    From Tissue Plane to Organ World: A Benchmark Dataset for Multimodal Biomedical Image Registration using Deep Co-Attention Networks

    Authors: Yifeng Wang, Weipeng Li, Thomas Pearce, Haohan Wang

    Abstract: Correlating neuropathology with neuroimaging findings provides a multiscale view of pathologic changes in the human organ spanning the meso- to micro-scales, and is an emerging methodology expected to shed light on numerous disease states. To gain the most information from this multimodal, multiscale approach, it is desirable to identify precisely where a histologic tissue section was taken from w… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  17. arXiv:2406.03961  [pdf, ps, other

    eess.IV cs.CV

    LDM-RSIC: Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression

    Authors: Junhui Li, Jutao Li, Xingsong Hou, Huake Wang, Yutao Zhang, Yujie Dun, Wenke Sun

    Abstract: Deep learning-based image compression algorithms typically focus on designing encoding and decoding networks and improving the accuracy of entropy model estimation to enhance the rate-distortion (RD) performance. However, few algorithms leverage the compression distortion prior from existing compression algorithms to improve RD performance. In this paper, we propose a latent diffusion model-based… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  18. arXiv:2406.03902  [pdf, other

    eess.IV cs.CV

    C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction

    Authors: Yiqun Lin, Jiewen Yang, Hualiang Wang, Xinpeng Ding, Wei Zhao, Xiaomeng Li

    Abstract: Cone beam computed tomography (CBCT) is an important imaging technology widely used in medical scenarios, such as diagnosis and preoperative planning. Using fewer projection views to reconstruct CT, also known as sparse-view reconstruction, can reduce ionizing radiation and further benefit interventional radiology. Compared with sparse-view reconstruction for traditional parallel/fan-beam CT, CBCT… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024

  19. arXiv:2406.03814  [pdf, other

    cs.CL cs.SD eess.AS

    Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores

    Authors: Jiaming Zhou, Shiwan Zhao, Hui Wang, Tian-Hao Zhang, Haoqin Sun, Xuechen Wang, Yong Qin

    Abstract: The kNN-CTC model has proven to be effective for monolingual automatic speech recognition (ASR). However, its direct application to multilingual scenarios like code-switching, presents challenges. Although there is potential for performance improvement, a kNN-CTC model utilizing a single bilingual datastore can inadvertently introduce undesirable noise from the alternative language. To address thi… ▽ More

    Submitted 13 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  20. arXiv:2406.02859   

    eess.AS cs.SD

    ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization

    Authors: Bi-Cheng Yan, Wei-Cheng Chao, Jiun-Ting Li, Yi-Cheng Wang, Hsin-Wei Wang, Meng-Shin Lin, Berlin Chen

    Abstract: Automatic pronunciation assessment (APA) manages to evaluate the pronunciation proficiency of a second language (L2) learner in a target language. Existing efforts typically draw on regression models for proficiency score prediction, where the models are trained to estimate target values without explicitly accounting for phoneme-awareness in the feature space. In this paper, we propose a contrasti… ▽ More

    Submitted 8 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: This paper has been withdrawn because the authors aim to achieve better organization in writing and more detailed experimental analysis

  21. arXiv:2406.02167  [pdf, other

    eess.AS eess.SP

    ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Junjie Li

    Abstract: Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature fusion approach has been proposed to effectively capture speaker characteristics from short utterances. Constrained by the model's size, a robust backbone Enhanced Res2Net (ERes2Net) combining global and local feature fusion… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  22. arXiv:2406.02009  [pdf, other

    eess.AS cs.CL cs.SD

    Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

    Authors: Kun Zhou, Shengkui Zhao, Yukun Ma, Chong Zhang, Hao Wang, Dianwen Ng, Chongjia Ni, Nguyen Trung Hieu, Jia Qi Yip, Bin Ma

    Abstract: Recent language model-based text-to-speech (TTS) frameworks demonstrate scalability and in-context learning capabilities. However, they suffer from robustness issues due to the accumulation of errors in speech unit predictions during autoregressive language modeling. In this paper, we propose a phonetic enhanced language modeling method to improve the performance of TTS models. We leverage self-su… ▽ More

    Submitted 11 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  23. arXiv:2406.00654  [pdf, other

    cs.CL cs.SD eess.AS

    Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback

    Authors: Chen Chen, Yuchen Hu, Wen Wu, Helin Wang, Eng Siong Chng, Chao Zhang

    Abstract: In recent years, text-to-speech (TTS) technology has witnessed impressive advancements, particularly with large-scale training datasets, showcasing human-level speech quality and impressive zero-shot capabilities on unseen speakers. However, despite human subjective evaluations, such as the mean opinion score (MOS), remaining the gold standard for assessing the quality of synthetic speech, even st… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 19 pages, Preprint

  24. arXiv:2406.00516  [pdf, other

    eess.SY

    Deep Learning based Performance Testing for Analog Integrated Circuits

    Authors: Jiawei Cao, Chongtao Guo, Hao Li, Zhigang Wang, Houjun Wang, Geoffrey Ye Li

    Abstract: In this paper, we propose a deep learning based performance testing framework to minimize the number of required test modules while guaranteeing the accuracy requirement, where a test module corresponds to a combination of one circuit and one stimulus. First, we apply a deep neural network (DNN) to establish the map** from the response of the circuit under test (CUT) in each module to all specif… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  25. arXiv:2405.18790  [pdf, other

    cs.CV cs.MM eess.IV

    Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics

    Authors: Zhangkai Ni, Yue Liu, Keyan Ding, Wenhan Yang, Hanli Wang, Shiqi Wang

    Abstract: Deep learning-based methods have significantly influenced the blind image quality assessment (BIQA) field, however, these methods often require training using large amounts of human rating data. In contrast, traditional knowledge-based methods are cost-effective for training but face challenges in effectively extracting features aligned with human visual perception. To bridge these gaps, we propos… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted to IEEE Transactions on Multimedia 2024

  26. arXiv:2405.16980  [pdf, other

    cs.CV eess.IV

    DSU-Net: Dynamic Snake U-Net for 2-D Seismic First Break Picking

    Authors: Hongtao Wang, Rongyu Feng, Liangyi Wu, Mutian Liu, Yinuo Cui, Chunxia Zhang, Zhenbo Guo

    Abstract: In seismic exploration, identifying the first break (FB) is a critical component in establishing subsurface velocity models. Various automatic picking techniques based on deep neural networks have been developed to expedite this procedure. The most popular class is using semantic segmentation networks to pick on a shot gather called 2-dimensional (2-D) picking. Generally, 2-D segmentation-based pi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  27. arXiv:2405.16905  [pdf, other

    eess.SY

    Privacy and Security Trade-off in Interconnected Systems with Known or Unknown Privacy Noise Covariance

    Authors: Haojun Wang, Kun Liu, Baojia Li, Emilia Fridman, Yuanqing Xia

    Abstract: This paper is concerned with the security problem for interconnected systems, where each subsystem is required to detect local attacks using locally available information and the information received from its neighboring subsystems. Moreover, we consider that there exists an additional eavesdropper being able to infer the private information by eavesdrop** transmitted data between subsystems. Th… ▽ More

    Submitted 1 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  28. arXiv:2405.15093  [pdf, other

    eess.AS

    Real-Time and Accurate: Zero-shot High-Fidelity Singing Voice Conversion with Multi-Condition Flow Synthesis

    Authors: Hui Li, Hongyu Wang, Zhi** Chen, Bohan Sun, Bo Li

    Abstract: Singing voice conversion is to convert the source sing voice into the target sing voice except for the content. Currently, flow-based models can complete the task of voice conversion, but they struggle to effectively extract latent variables in the more rhythmically rich and emotionally expressive task of singing voice conversion, while also facing issues with low efficiency in speech processing.… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 5 pages,4 figures

  29. arXiv:2405.13757  [pdf, other

    eess.IV cs.CV cs.LG

    A label-free and data-free training strategy for vasculature segmentation in serial sectioning OCT data

    Authors: Etienne Chollet, Yael Balbastre, Caroline Magnain, Bruce Fischl, Hui Wang

    Abstract: Serial sectioning Optical Coherence Tomography (sOCT) is a high-throughput, label free microscopic imaging technique that is becoming increasingly popular to study post-mortem neurovasculature. Quantitative analysis of the vasculature requires highly accurate segmentation; however, sOCT has low signal-to-noise-ratio and displays a wide range of contrasts and artifacts that depend on acquisition pa… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 5 Pages, 2 figures. Accepted by Medical Imaging with Deep Learning

    ACM Class: I.4.6; I.2.10; I.2.1; I.2.6; I.6.5; J.3

  30. arXiv:2405.12996  [pdf, other

    eess.IV

    Dose-aware Diffusion Model for 3D Low-dose PET: Multi-institutional Validation with Reader Study and Real Low-dose Data

    Authors: Huidong Xie, Weijie Gan, Bo Zhou, Ming-Kai Chen, Michal Kulon, Annemarie Boustani, Benjamin A. Spencer, Reimund Bayerlein, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, Yinchi Zhou, Hui Liu, Liang Guo, Hongyu An, Ulugbek S. Kamilov, Hanzhong Wang, Biao Li, Axel Rominger, Kuangyu Shi, Ge Wang, Ramsey D. Badawi, Chi Liu

    Abstract: As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizabi… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 16 Pages, 15 Figures, 4 Tables. Paper under review. arXiv admin note: substantial text overlap with arXiv:2311.04248

  31. arXiv:2405.11459  [pdf, other

    eess.SP cs.CL q-bio.NC

    Du-IN: Discrete units-guided mask modeling for decoding speech from Intracranial Neural signals

    Authors: Hui Zheng, Hai-Teng Wang, Wei-Bang Jiang, Zhong-Tao Chen, Li He, Pei-Yang Lin, Peng-Hu Wei, Guo-Guang Zhao, Yun-Zhe Liu

    Abstract: Invasive brain-computer interfaces have garnered significant attention due to their high performance. The current intracranial stereoElectroEncephaloGraphy (sEEG) foundation models typically build univariate representations based on a single channel. Some of them further use Transformer to model the relationship among channels. However, due to the locality and specificity of brain computation, the… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  32. arXiv:2405.10570  [pdf

    eess.IV cs.AI

    Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI

    Authors: Yirong Zhou, Chengyan Wang, Mengtian Lu, Kunyuan Guo, Zi Wang, Dan Ruan, Rui Guo, Peijun Zhao, Jianhua Wang, Naiming Wu, Jianzhong Lin, Yinyin Chen, Hang **, Lianxin Xie, Lilan Wu, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Xiaobo Qu

    Abstract: In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features… ▽ More

    Submitted 29 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 8 figures, 6 tables

  33. arXiv:2405.09266  [pdf, other

    cs.CV cs.AI cs.MM cs.SD eess.AS

    Dance Any Beat: Blending Beats with Visuals in Dance Video Generation

    Authors: Xuanchen Wang, Heng Wang, Dongnan Liu, Weidong Cai

    Abstract: The task of generating dance from music is crucial, yet current methods, which mainly produce joint sequences, lead to outputs that lack intuitiveness and complicate data collection due to the necessity for precise joint annotations. We introduce a Dance Any Beat Diffusion model, namely DabFusion, that employs music as a conditional input to directly create dance videos from still images, utilizin… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 11 pages, 6 figures, demo page: https://DabFusion.github.io

  34. arXiv:2405.08599  [pdf, other

    eess.SY math.OC

    The distributed biased min-consensus protocol revisited: pre-specified finite time control strategies and small-gain based analysis

    Authors: Yuanqiu Mo, He Wang

    Abstract: Unlike the classical distributed consensus protocols enabling the group of agents as a whole to reach an agreement regarding a certain quantity of interest in a distributed fashion, the distributed biased min-consensus protocol (DBMC) has been proven to generate advanced complexity pertaining to solving the shortest path problem. As such a protocol is commonly incorporated as the first step of a h… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  35. arXiv:2405.07759  [pdf, other

    cs.MM cs.AI cs.NI eess.IV

    MADRL-Based Rate Adaptation for 360° Video Streaming with Multi-Viewpoint Prediction

    Authors: Haopeng Wang, Zijian Long, Haiwei Dong, Abdulmotaleb El Saddik

    Abstract: Over the last few years, 360° video traffic on the network has grown significantly. A key challenge of 360° video playback is ensuring a high quality of experience (QoE) with limited network bandwidth. Currently, most studies focus on tile-based adaptive bitrate (ABR) streaming based on single viewport prediction to reduce bandwidth consumption. However, the performance of models for single-viewpo… ▽ More

    Submitted 17 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Internet of Things Journal

  36. arXiv:2405.07202  [pdf, other

    cs.CV cs.AI cs.LG cs.MM cs.SD eess.AS

    Unified Video-Language Pre-training with Synchronized Audio

    Authors: Shentong Mo, Haofan Wang, Huaxia Li, Xu Tang

    Abstract: Video-language pre-training is a typical and challenging problem that aims at learning visual and textual representations from large-scale data in a self-supervised way. Existing pre-training approaches either captured the correspondence of image-text pairs or utilized temporal ordering of frames. However, they do not explicitly explore the natural synchronization between audio and the other two m… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  37. arXiv:2405.06951  [pdf, ps, other

    cs.IT eess.SP

    Intelligent Reflecting Surface-Aided Radar Spoofing

    Authors: Haozhe Wang, Beixiong Zheng, Xiaodan Shao, Rui Zhang

    Abstract: Electronic countermeasure (ECM) technology plays a critical role in modern electronic warfare, which can interfere with enemy radar detection systems by noise or deceptive signals. However, the conventional active jamming strategy incurs additional hardware and power costs and has the potential threat of exposing the target itself. To tackle the above challenges, we propose a new intelligent refle… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 5 pages, 4 figures

  38. arXiv:2405.05252  [pdf, other

    cs.CV cs.AI cs.LG eess.IV eess.SP

    Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

    Authors: Hongjie Wang, Difan Liu, Yan Kang, Yijun Li, Zhe Lin, Niraj K. Jha, Yuchen Liu

    Abstract: Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  39. arXiv:2405.02132  [pdf, other

    cs.SD cs.CL eess.AS

    Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

    Authors: Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie

    Abstract: Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition (ASR) is becoming a mainstream paradigm. Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset. Specifically, our research aims to evaluate the impact of various configu… ▽ More

    Submitted 6 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  40. arXiv:2404.19723  [pdf, other

    eess.AS cs.SD

    Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech

    Authors: Hankun Wang, Chenpeng Du, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu

    Abstract: Recent popular decoder-only text-to-speech models are known for their ability of generating natural-sounding speech. However, such models sometimes suffer from word skip** and repeating due to the lack of explicit monotonic alignment constraints. In this paper, we notice from the attention maps that some particular attention heads of the decoder-only model indicate the alignments between speech… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  41. arXiv:2404.17994  [pdf

    eess.IV

    LpQcM: Adaptable Lesion-Quantification-Consistent Modulation for Deep Learning Low-Count PET Image Denoising

    Authors: Menghua Xia, Huidong Xie, Qiong Liu, Bo Zhou, Hanzhong Wang, Biao Li, Axel Rominger, Kuangyu Shi, Georges EI Fakhri, Chi Liu

    Abstract: Deep learning-based positron emission tomography (PET) image denoising offers the potential to reduce radiation exposure and scanning time by transforming low-count images into high-count equivalents. However, existing methods typically blur crucial details, leading to inaccurate lesion quantification. This paper proposes a lesion-perceived and quantification-consistent modulation (LpQcM) strategy… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 10 pages

  42. arXiv:2404.17280  [pdf, other

    cs.SD eess.AS

    Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks

    Authors: Mingrui He, Longting Xu, Han Wang, Mingjun Zhang, Rohan Kumar Das

    Abstract: The most common spoofing attacks on automatic speaker verification systems are replay speech attacks. Detection of replay speech heavily relies on replay configuration information. Previous studies have shown that graph Fourier transform-derived features can effectively detect replay speech but ignore device and environmental noise effects. In this work, we propose a new feature, the graph frequen… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  43. arXiv:2404.15533  [pdf, other

    eess.SY

    Designing, simulating, and performing the 100-AV field test for the CIRCLES consortium: Methodology and Implementation of the Largest mobile traffic control experiment to date

    Authors: Mostafa Ameli, Sean Mcquade, Jonathan W. Lee, Matthew Bunting, Matthew Nice, Han Wang, William Barbour, Ryan Weightman, Chris Denaro, Ryan Delorenzo, Sharon Hornstein, Jon F. Davis, Dan Timsit, Riley Wagner, Rita Xu, Malaika Mahmood, Mikail Mahmood, Maria Laura Delle Monache, Benjamin Seibold, Daniel B. Work, Jonathan Sprinkle, Benedetto Piccoli, Alexandre M. Bayen

    Abstract: Previous controlled experiments on single-lane ring roads have shown that a single partially autonomous vehicle (AV) can effectively mitigate traffic waves. This naturally prompts the question of how these findings can be generalized to field operational, high-density traffic conditions. To address this question, the Congestion Impacts Reduction via CAV-in-the-loop Lagrangian Energy Smoothing (CIR… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  44. arXiv:2404.15311  [pdf, other

    eess.SP cs.AI cs.LG

    Fusing Pretrained ViTs with TCNet for Enhanced EEG Regression

    Authors: Eric Modesitt, Haicheng Yin, Williams Huang Wang, Brian Lu

    Abstract: The task of Electroencephalogram (EEG) analysis is paramount to the development of Brain-Computer Interfaces (BCIs). However, to reach the goal of develo** robust, useful BCIs depends heavily on the speed and the accuracy at which BCIs can understand neural dynamics. In response to that goal, this paper details the integration of pre-trained Vision Transformers (ViTs) with Temporal Convolutional… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted HCI International 2024

  45. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  46. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  47. arXiv:2404.10233  [pdf, ps, other

    eess.SP

    Little Pilot is Needed for Channel Estimation with Integrated Super-Resolution Sensing and Communication

    Authors: **gran Xu, Huizhi Wang, Yong Zeng, Xiaoli Xu

    Abstract: Integrated super-resolution sensing and communication (ISSAC) is a promising technology to achieve extremely high sensing performance for critical parameters, such as the angles of the wireless channels. In this paper, we propose an ISSAC-based channel estimation method, which requires little or even no pilot, yet still achieves accurate channel state information (CSI) estimation. The key idea is… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 6 pages, 5 figures, accepted by IEEE WCNC 2024 workshops

  48. arXiv:2404.08408  [pdf, other

    cs.LG cs.AI eess.SP physics.geo-ph

    Seismic First Break Picking in a Higher Dimension Using Deep Graph Learning

    Authors: Hongtao Wang, Li Long, Jiangshe Zhang, Xiaoli Wei, Chunxia Zhang, Zhenbo Guo

    Abstract: Contemporary automatic first break (FB) picking methods typically analyze 1D signals, 2D source gathers, or 3D source-receiver gathers. Utilizing higher-dimensional data, such as 2D or 3D, incorporates global features, improving the stability of local picking. Despite the benefits, high-dimensional data requires structured input and increases computational demands. Addressing this, we propose a no… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  49. arXiv:2404.06079  [pdf, other

    eess.AS cs.AI

    The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

    Authors: Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu

    Abstract: Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS). In this paper, we describe the systems developed by the SJTU X-LANCE group for the TTS (acoustic + vocoder), SVS, and ASR tracks in the Interspeech 2024 Speech Processing Using Discrete Speech Unit Challen… ▽ More

    Submitted 9 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: 5 pages, 3 figures. Report of a challenge

  50. arXiv:2404.05544  [pdf, other

    eess.SP

    Near/Far-Field Channel Estimation For Terahertz Systems With ELAAs: A Block-Sparse-Aware Approach

    Authors: Hongwei Wang, Jun Fang, Hui** Duan, Hongbin Li

    Abstract: Millimeter wave/Terahertz (mmWave/THz) communication with extremely large-scale antenna arrays (ELAAs) offers a promising solution to meet the escalating demand for high data rates in next-generation communications. A large array aperture, along with the ever increasing carrier frequency within the mmWave/THz bands, leads to a large Rayleigh distance. As a result, the traditional plane-wave assump… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.