Skip to main content

Showing 1–50 of 692 results for author: Wang, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Ya**g Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, **g Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  2. arXiv:2406.18950  [pdf, other

    eess.IV cs.CV

    MMR-Mamba: Multi-Contrast MRI Reconstruction with Mamba and Spatial-Frequency Information Fusion

    Authors: **g Zou, Lanqing Liu, Qi Chen, Shujun Wang, Xiaohan Xing, **g Qin

    Abstract: Multi-contrast MRI acceleration has become prevalent in MR imaging, enabling the reconstruction of high-quality MR images from under-sampled k-space data of the target modality, using guidance from a fully-sampled auxiliary modality. The main crux lies in efficiently and comprehensively integrating complementary information from the auxiliary modality. Existing methods either suffer from quadratic… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figure

  3. arXiv:2406.18425  [pdf, other

    eess.SP

    L-Sort: An Efficient Hardware for Real-time Multi-channel Spike Sorting with Localization

    Authors: Yuntao Han, Shiwei Wang, Alister Hamilton

    Abstract: Spike sorting is essential for extracting neuronal information from neural signals and understanding brain function. With the advent of high-density microelectrode arrays (HDMEAs), the challenges and opportunities in multi-channel spike sorting have intensified. Real-time spike sorting is particularly crucial for closed-loop brain computer interface (BCI) applications, demanding efficient hardware… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    ACM Class: B.7.1

  4. arXiv:2406.18079  [pdf, other

    cs.CV eess.IV

    MFDNet: Multi-Frequency Deflare Network for Efficient Nighttime Flare Removal

    Authors: Yiguo Jiang, Xuhang Chen, Chi-Man Pun, Shuqiang Wang, Wei Feng

    Abstract: When light is scattered or reflected accidentally in the lens, flare artifacts may appear in the captured photos, affecting the photos' visual quality. The main challenge in flare removal is to eliminate various flare artifacts while preserving the original content of the image. To address this challenge, we propose a lightweight Multi-Frequency Deflare Network (MFDNet) based on the Laplacian Pyra… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by The Visual Computer journal

  5. arXiv:2406.16189  [pdf, other

    eess.IV cs.CV

    Fuzzy Attention-based Border Rendering Network for Lung Organ Segmentation

    Authors: Sheng Zhang, Yang Nan, Yingying Fang, Shiyi Wang, Xiaodan Xing, Zhifan Gao, Guang Yang

    Abstract: Automatic lung organ segmentation on CT images is crucial for lung disease diagnosis. However, the unlimited voxel values and class imbalance of lung organs can lead to false-negative/positive and leakage issues in advanced methods. Additionally, some slender lung organs are easily lost during the recycled down/up-sample procedure, e.g., bronchioles & arterioles, causing severe discontinuity issue… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024

  6. arXiv:2406.14869  [pdf, other

    eess.SP

    Cost-Effective RF Fingerprinting Based on Hybrid CVNN-RF Classifier with Automated Multi-Dimensional Early-Exit Strategy

    Authors: Jiayan Gan, Zhixing Du, Qiang Li, Huaizong Shao, **gran Lin, Ye Pan, Zhongyi Wen, Shafei Wang

    Abstract: While the Internet of Things (IoT) technology is booming and offers huge opportunities for information exchange, it also faces unprecedented security challenges. As an important complement to the physical layer security technologies for IoT, radio frequency fingerprinting (RFF) is of great interest due to its difficulty in counterfeiting. Recently, many machine learning (ML)-based RFF algorithms h… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Internet of Things Journal

  7. arXiv:2406.14186  [pdf, other

    eess.IV cs.CV

    CriDiff: Criss-cross Injection Diffusion Framework via Generative Pre-train for Prostate Segmentation

    Authors: Tingwei Liu, Miao Zhang, Leiye Liu, Jialong Zhong, Shuyao Wang, Yongri Piao, Huchuan Lu

    Abstract: Recently, the Diffusion Probabilistic Model (DPM)-based methods have achieved substantial success in the field of medical image segmentation. However, most of these methods fail to enable the diffusion model to learn edge features and non-edge features effectively and to inject them efficiently into the diffusion backbone. Additionally, the domain gap between the images features and the diffusion… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted in MICCAI 2024

  8. arXiv:2406.13179  [pdf, other

    cs.SD cs.AI cs.NE eess.AS

    Global-Local Convolution with Spiking Neural Networks for Energy-efficient Keyword Spotting

    Authors: Shuai Wang, Dehao Zhang, Kexin Shi, Yuchen Wang, Wenjie Wei, Jibin Wu, Malu Zhang

    Abstract: Thanks to Deep Neural Networks (DNNs), the accuracy of Keyword Spotting (KWS) has made substantial progress. However, as KWS systems are usually implemented on edge devices, energy efficiency becomes a critical requirement besides performance. Here, we take advantage of spiking neural networks' energy efficiency and propose an end-to-end lightweight KWS model. The model consists of two innovative… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  9. arXiv:2406.12646  [pdf, other

    eess.IV cs.AI cs.CV

    An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation

    Authors: Qin Li, Yizhe Zhang, Yan Li, Jun Lyu, Meng Liu, Longyu Sun, Mengting Sun, Qirong Li, Wenyue Mao, Xinran Wu, Ya**g Zhang, Yinghua Chu, Shuo Wang, Chengyan Wang

    Abstract: The segmentation foundation model, e.g., Segment Anything Model (SAM), has attracted increasing interest in the medical image community. Early pioneering studies primarily concentrated on assessing and improving SAM's performance from the perspectives of overall accuracy and efficiency, yet little attention was given to the fairness considerations. This oversight raises questions about the potenti… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to MICCAI-2024

  10. arXiv:2406.12463  [pdf, other

    cs.CV eess.IV

    LFMamba: Light Field Image Super-Resolution with State Space Model

    Authors: Wang xia, Yao Lu, Shunzhou Wang, Ziqi Wang, Peiqi Xia, Tianfei Zhou

    Abstract: Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scan… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  11. arXiv:2406.11799  [pdf, other

    eess.IV cs.CV cs.LG

    Mix-Domain Contrastive Learning for Unpaired H&E-to-IHC Stain Translation

    Authors: Song Wang, Zhong Zhang, Huan Yan, Ming Xu, Guanghui Wang

    Abstract: H&E-to-IHC stain translation techniques offer a promising solution for precise cancer diagnosis, especially in low-resource regions where there is a shortage of health professionals and limited access to expensive equipment. Considering the pixel-level misalignment of H&E-IHC image pairs, current research explores the pathological consistency between patches from the same positions of the image pa… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  12. arXiv:2406.10358  [pdf, other

    cs.CR eess.SY

    I Still See You: Why Existing IoT Traffic Resha** Fails

    Authors: Su Wang, Keyang Yu, Qi Li, Dong Chen

    Abstract: The Internet traffic data produced by the Internet of Things (IoT) devices are collected by Internet Service Providers (ISPs) and device manufacturers, and often shared with their third parties to maintain and enhance user services. Unfortunately, on-path adversaries could infer and fingerprint users' sensitive privacy information such as occupancy and user activities by analyzing these network tr… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: EWSN'24 paper accepted, to appear

  13. arXiv:2406.09326  [pdf, other

    cs.SD cs.AI cs.CV cs.MM eess.AS

    PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance

    Authors: Qijun Gan, Song Wang, Shengtao Wu, Jianke Zhu

    Abstract: Recently, artificial intelligence techniques for education have been received increasing attentions, while it still remains an open problem to design the effective music instrument instructing systems. Although key presses can be directly derived from sheet music, the transitional movements among key presses require more extensive guidance in piano performance. In this work, we construct a piano-h… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Codes and Dataset: https://agnjason.github.io/PianoMotion-page

  14. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Kai Yu, Aidi Lin, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Wei Chen, Yilong Luo, Yifan Chen, **gcheng Wang, Yih Chung Tham, Dianbo Liu, Wendy Wong, Sahil Thakur, Beau Fenner, Yanda Meng, Yukun Zhou , et al. (11 additional authors not shown)

    Abstract: The current retinal artificial intelligence models were trained using data with a limited category of diseases and limited knowledge. In this paper, we present a retinal vision-language foundation model (RetiZero) with knowledge of over 400 fundus diseases. Specifically, we collected 341,896 fundus images paired with text descriptions from 29 publicly available datasets, 180 ophthalmic books, and… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  15. arXiv:2406.09304  [pdf

    physics.app-ph eess.SP

    Self-reconfigurable Multifunctional Memristive Nociceptor for Intelligent Robotics

    Authors: Shengbo Wang, Mingchao Fang, Lekai Song, Cong Li, Jian Zhang, Arokia Nathan, Guohua Hu, Shuo Gao

    Abstract: Artificial nociceptors, mimicking human-like stimuli perception, are of significance for intelligent robotics to work in hazardous and dynamic scenarios. One of the most essential characteristics of the human nociceptor is its self-adjustable attribute, which indicates that the threshold of determination of a potentially hazardous stimulus relies on environmental knowledge. This critical attribute… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures

  16. arXiv:2406.08835  [pdf, other

    cs.SD eess.AS

    A Single-Step Non-Autoregressive Automatic Speech Recognition Architecture with High Accuracy and Inference Speed

    Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Shuai Gong, Ming Fang, Tao Wei, Zijian Li, Wei Hu, Shaojun Wang, **g Xiao

    Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, ca… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  17. arXiv:2406.08641  [pdf, ps, other

    cs.SD cs.CL eess.AS

    ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

    Authors: Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, **chuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, Shinji Watanabe

    Abstract: ML-SUPERB evaluates self-supervised learning (SSL) models on the tasks of language identification and automatic speech recognition (ASR). This benchmark treats the models as feature extractors and uses a single shallow downstream model, which can be fine-tuned for a downstream task. However, real-world use cases may require different configurations. This paper presents ML-SUPERB~2.0, which is a ne… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  18. arXiv:2406.08081  [pdf

    eess.SP

    CLDTA: Contrastive Learning based on Diagonal Transformer Autoencoder for Cross-Dataset EEG Emotion Recognition

    Authors: Yuan Liao, Yuhong Zhang, Shenghuan Wang, Xiruo Zhang, Yiling Zhang, Wei Chen, Yuzhe Gu, Liya Huang

    Abstract: Recent advances in non-invasive EEG technology have broadened its application in emotion recognition, yielding a multitude of related datasets. Yet, deep learning models struggle to generalize across these datasets due to variations in acquisition equipment and emotional stimulus materials. To address the pressing need for a universal model that fluidly accommodates diverse EEG dataset formats and… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  19. arXiv:2406.07846  [pdf, other

    eess.AS

    DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion

    Authors: Ziqian Ning, Shuai Wang, Pengcheng Zhu, Zhichao Wang, Jixun Yao, Lei Xie, Mengxiao Bi

    Abstract: Streaming voice conversion has become increasingly popular for its potential in real-time applications. The recently proposed DualVC 2 has achieved robust and high-quality streaming voice conversion with a latency of about 180ms. Nonetheless, the recognition-synthesis framework hinders end-to-end optimization, and the instability of automatic speech recognition (ASR) model with short chunks makes… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  20. arXiv:2406.07390  [pdf, other

    eess.SP cs.IT eess.IV

    DiffCom: Channel Received Signal is a Natural Condition to Guide Diffusion Posterior Sampling

    Authors: Sixian Wang, **cheng Dai, Kailin Tan, Xiaoqi Qin, Kai Niu, ** Zhang

    Abstract: End-to-end visual communication systems typically optimize a trade-off between channel bandwidth costs and signal-level distortion metrics. However, under challenging physical conditions, this traditional discriminative communication paradigm often results in unrealistic reconstructions with perceptible blurring and aliasing artifacts, despite the inclusion of perceptual or adversarial losses for… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  21. arXiv:2406.05974  [pdf, other

    eess.IV cs.CV

    Inter-slice Super-resolution of Magnetic Resonance Images by Pre-training and Self-supervised Fine-tuning

    Authors: Xin Wang, Zhiyun Song, Yitao Zhu, Sheng Wang, Lichi Zhang, Dinggang Shen, Qian Wang

    Abstract: In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for both image visualization and subsequent analysis tasks, which often require isotropic voxel spacing. To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated.… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: ISBI 2024

  22. arXiv:2406.05763  [pdf, other

    eess.AS

    WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark

    Authors: Linhan Ma, Dake Guo, Kun Song, Yuepeng Jiang, Shuai Wang, Liumeng Xue, Weiming Xu, Huan Zhao, Binbin Zhang, Lei Xie

    Abstract: With the development of large text-to-speech (TTS) models and scale-up of the training data, state-of-the-art TTS systems have achieved impressive performance. In this paper, we present WenetSpeech4TTS, a multi-domain Mandarin corpus derived from the open-sourced WenetSpeech dataset. Tailored for the text-to-speech tasks, we refined WenetSpeech by adjusting segment boundaries, enhancing the audio… ▽ More

    Submitted 19 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  23. arXiv:2406.05551  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Autoregressive Diffusion Transformer for Text-to-Speech Synthesis

    Authors: Zhijun Liu, Shuai Wang, Sho Inoue, Qibing Bai, Haizhou Li

    Abstract: Audio language models have recently emerged as a promising approach for various audio generation tasks, relying on audio tokenizers to encode waveforms into sequences of discrete symbols. Audio tokenization often poses a necessary compromise between code bitrate and reconstruction accuracy. When dealing with low-bitrate audio codes, language models are constrained to process only a subset of the i… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  24. arXiv:2406.03391  [pdf, other

    eess.SP

    Joint Association, Beamforming, and Resource Allocation for Multi-IRS Enabled MU-MISO Systems With RSMA

    Authors: Chunjie Wang, Xuhui Zhang, Huijun Xing, Liang Xue, Shuqiang Wang, Yanyan Shen, Bo Yang, ** Guan

    Abstract: Intelligent reflecting surface (IRS) and rate-splitting multiple access (RSMA) technologies are at the forefront of enhancing spectrum and energy efficiency in the next generation multi-antenna communication systems. This paper explores a RSMA system with multiple IRSs, and proposes two purpose-driven scheduling schemes, i.e., the exhaustive IRS-aided (EIA) and opportunistic IRS-aided (OIA) scheme… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  25. arXiv:2406.00956  [pdf, other

    cs.CV cs.LG eess.IV

    Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation

    Authors: Tianyu Huang, Tao Zhou, Weidi Xie, Shuo Wang, Qi Dou, Yizhe Zhang

    Abstract: The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entai… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Project Link: https://sam-auxol.github.io/AuxOL/

  26. arXiv:2406.00540  [pdf, ps, other

    eess.SY

    Optimal Transmission Power Scheduling for Networked Control System under DoS Attack

    Authors: Siyi Wang, Yulong Gao, Sandra Hirche

    Abstract: Designing networked control systems that are reliable and resilient against adversarial threats, is essential for ensuring the security of cyber-physical systems. This paper addresses the communication-control co-design problem for networked control systems under denial-of-service (DoS) attacks. In the wireless channel, a transmission power scheduler periodically determines the power level for sen… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  27. arXiv:2406.00416  [pdf, other

    stat.ML cs.LG eess.SP

    Representation and De-interleaving of Mixtures of Hidden Markov Processes

    Authors: Jiadi Bao, Mengtao Zhu, Yunjie Li, Shafei Wang

    Abstract: De-interleaving of the mixtures of Hidden Markov Processes (HMPs) generally depends on its representation model. Existing representation models consider Markov chain mixtures rather than hidden Markov, resulting in the lack of robustness to non-ideal situations such as observation noise or missing observations. Besides, de-interleaving methods utilize a search-based strategy, which is time-consumi… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 13 pages, 9 figures, submitted to IEEE transactions on Signal Processing

  28. arXiv:2405.19373  [pdf, other

    eess.SP cs.LG

    Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition

    Authors: Yihang Dong, Xuhang Chen, Yanyan Shen, Michael Kwok-Po Ng, Tao Qian, Shuqiang Wang

    Abstract: Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals leads to non-negligible natural differences in EEG signals across subjects, posing challenges for cross-subject emotion recognition. While recent studies have attem… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted by International Conference on Neural Computing for Advanced Applications, 2024

  29. arXiv:2405.19298  [pdf, other

    cs.CV eess.IV

    Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

    Authors: Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, Shiqi Wang

    Abstract: While recent advancements in large multimodal models (LMMs) have significantly improved their abilities in image quality assessment (IQA) relying on absolute quality rating, how to transfer reliable relative quality comparison outputs to continuous perceptual quality scores remains largely unexplored. To address this gap, we introduce Compare2Score-an all-around LMM-based no-reference IQA (NR-IQA)… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  30. arXiv:2405.18790  [pdf, other

    cs.CV cs.MM eess.IV

    Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics

    Authors: Zhangkai Ni, Yue Liu, Keyan Ding, Wenhan Yang, Hanli Wang, Shiqi Wang

    Abstract: Deep learning-based methods have significantly influenced the blind image quality assessment (BIQA) field, however, these methods often require training using large amounts of human rating data. In contrast, traditional knowledge-based methods are cost-effective for training but face challenges in effectively extracting features aligned with human visual perception. To bridge these gaps, we propos… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted to IEEE Transactions on Multimedia 2024

  31. arXiv:2405.15259  [pdf, other

    eess.SY

    Robust Economic Dispatch with Flexible Demand and Adjustable Uncertainty Set

    Authors: Tian Liu, Xiaoqi Tan, Su Wang, Danny H. K. Tsang

    Abstract: With more renewable energy sources (RES) integrated into the power system, the intermittency of RES places a heavy burden on the system. The uncertainty of RES is traditionally handled by controllable generators to balance the real time wind power deviation. As the demand side management develops, the flexibility of aggregate loads can be leveraged to mitigate the negative impact of the wind power… ▽ More

    Submitted 4 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  32. arXiv:2405.09768  [pdf, other

    eess.AS cs.SD

    Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model

    Authors: Siyang Wang, Éva Székely

    Abstract: Recent advances in generative language modeling applied to discrete speech tokens presented a new avenue for text-to-speech (TTS) synthesis. These speech language models (SLMs), similarly to their textual counterparts, are scalable, probabilistic, and context-aware. While they can produce diverse and natural outputs, they sometimes face issues such as unintelligibility and the inclusion of non-spe… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 11 pages, 4 figures. Language Resources and Evaluation Conference (LREC) 2024. demo: https://swatsw.github.io/lrec24_eval_slm/

  33. arXiv:2405.09539  [pdf, ps, other

    eess.IV cs.CV cs.MM

    MMFusion: Multi-modality Diffusion Model for Lymph Node Metastasis Diagnosis in Esophageal Cancer

    Authors: Chengyu Wu, Chengkai Wang, Yaqi Wang, Huiyu Zhou, Yatao Zhang, Qifeng Wang, Shuai Wang

    Abstract: Esophageal cancer is one of the most common types of cancer worldwide and ranks sixth in cancer-related mortality. Accurate computer-assisted diagnosis of cancer progression can help physicians effectively customize personalized treatment plans. Currently, CT-based cancer diagnosis methods have received much attention for their comprehensive ability to examine patients' conditions. However, multi-… ▽ More

    Submitted 16 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Early accepted to MICCAI 2024 (6/6/5)

  34. Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis

    Authors: Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li

    Abstract: It remains a challenge to effectively control the emotion rendering in text-to-speech (TTS) synthesis. Prior studies have primarily focused on learning a global prosodic representation at the utterance level, which strongly correlates with linguistic prosody. Our goal is to construct a hierarchical emotion distribution (ED) that effectively encapsulates intensity variations of emotions at various… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: This is accepted to IEEE ICASSP 2024

  35. arXiv:2405.09163  [pdf, other

    eess.SY

    DVS-RG: Differential Variable Speed Limits Control using Deep Reinforcement Learning with Graph State Representation

    Authors: **gwen Yang, ** Wang, Fatemeh Golpayegani, Shen Wang

    Abstract: Variable speed limit (VSL) control is an established yet challenging problem to improve freeway traffic mobility and alleviate bottlenecks by customizing speed limits at proper locations based on traffic conditions. Recent advances in deep reinforcement learning (DRL) have shown promising results in solving VSL control problems by interacting with sophisticated environments. However, the modeling… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  36. arXiv:2405.05518  [pdf, other

    cs.CV cs.RO eess.IV

    DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction

    Authors: Siyu Li, Jiacheng Lin, Hao Shi, Jiaming Zhang, Song Wang, You Yao, Zhiyong Li, Kailun Yang

    Abstract: Temporal information plays a pivotal role in Bird's-Eye-View (BEV) driving scene understanding, which can alleviate the visual information sparsity. However, the indiscriminate temporal fusion method will cause the barrier of feature redundancy when constructing vectorized High-Definition (HD) maps. In this paper, we revisit the temporal fusion of vectorized HD maps, focusing on temporal instance… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: The source code will be made publicly available at https://github.com/lynn-yu/DTCLMapper

  37. arXiv:2405.05126  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Exploring Speech Pattern Disorders in Autism using Machine Learning

    Authors: Chuanbo Hu, Jacob Thrasher, Wenqi Li, Mindi Ruan, Xiangxu Yu, Lynn K Paul, Shuo Wang, Xin Li

    Abstract: Diagnosing autism spectrum disorder (ASD) by identifying abnormal speech patterns from examiner-patient dialogues presents significant challenges due to the subtle and diverse manifestations of speech-related symptoms in affected individuals. This study presents a comprehensive approach to identify distinctive speech patterns through the analysis of examiner-patient dialogues. Utilizing a dataset… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  38. arXiv:2405.03692  [pdf, other

    eess.IV cs.NI eess.SY

    Imitation Learning for Adaptive Video Streaming with Future Adversarial Information Bottleneck Principle

    Authors: Shuoyao Wang, Jiawei Lin, Fangwei Ye

    Abstract: Adaptive video streaming plays a crucial role in ensuring high-quality video streaming services. Despite extensive research efforts devoted to Adaptive BitRate (ABR) techniques, the current reinforcement learning (RL)-based ABR algorithms may benefit the average Quality of Experience (QoE) but suffers from fluctuating performance in individual video sessions. In this paper, we present a novel appr… ▽ More

    Submitted 12 March, 2024; originally announced May 2024.

    Comments: submitted to IEEE Journal

  39. arXiv:2404.19723  [pdf, other

    eess.AS cs.SD

    Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech

    Authors: Hankun Wang, Chenpeng Du, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu

    Abstract: Recent popular decoder-only text-to-speech models are known for their ability of generating natural-sounding speech. However, such models sometimes suffer from word skip** and repeating due to the lack of explicit monotonic alignment constraints. In this paper, we notice from the attention maps that some particular attention heads of the decoder-only model indicate the alignments between speech… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  40. arXiv:2404.16920  [pdf, other

    cs.NI cs.IT cs.LG eess.SP

    Structured Reinforcement Learning for Delay-Optimal Data Transmission in Dense mmWave Networks

    Authors: Shufan Wang, Guojun Xiong, Shichen Zhang, Huacheng Zeng, Jian Li, Shivendra Panwar

    Abstract: We study the data packet transmission problem (mmDPT) in dense cell-free millimeter wave (mmWave) networks, i.e., users sending data packet requests to access points (APs) via uplinks and APs transmitting requested data packets to users via downlinks. Our objective is to minimize the average delay in the system due to APs' limited service capacity and unreliable wireless channels between APs and u… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: IEEE Transactions on Wireless Communications

  41. arXiv:2404.15512  [pdf, other

    eess.SY

    Deep Hankel matrices with random elements

    Authors: Nathan P. Lawrence, Philip D. Loewen, Shuyuan Wang, Michael G. Forbes, R. Bhushan Gopaluni

    Abstract: Willems' fundamental lemma enables a trajectory-based characterization of linear systems through data-based Hankel matrices. However, in the presence of measurement noise, we ask: Is this noisy Hankel-based model expressive enough to re-identify itself? In other words, we study the output prediction accuracy from recursively applying the same persistently exciting input sequence to the model. We f… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: L4DC 2024

  42. arXiv:2404.14716  [pdf, other

    cs.CL cs.AI cs.CV cs.SD eess.AS

    Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities

    Authors: Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang

    Abstract: Large language models (LLMs) can adapt to new tasks through in-context learning (ICL) based on a few examples presented in dialogue history without any model parameter update. Despite such convenience, the performance of ICL heavily depends on the quality of the in-context examples presented, which makes the in-context example selection approach a critical choice. This paper proposes a novel Bayes… ▽ More

    Submitted 16 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 17 pages, 6 figures

  43. arXiv:2404.13804  [pdf, other

    cs.DC cs.LG cs.NI eess.SY

    Adaptive Heterogeneous Client Sampling for Federated Learning over Wireless Networks

    Authors: Bing Luo, Wenli Xiao, Shiqiang Wang, Jianwei Huang, Leandros Tassiulas

    Abstract: Federated learning (FL) algorithms usually sample a fraction of clients in each round (partial participation) when the number of participants is large and the server's communication bandwidth is limited. Recent works on the convergence analysis of FL have focused on unbiased client sampling, e.g., sampling uniformly at random, which suffers from slow wall-clock time for convergence due to high deg… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Published in IEEE Transactions on Mobile Computing (TMC). arXiv admin note: substantial text overlap with arXiv:2112.11256

  44. arXiv:2404.10235  [pdf, ps, other

    eess.SP

    Integrated Sensing and Communication for Edge Inference with End-to-End Multi-View Fusion

    Authors: Xibin **, Guoliang Li, Shuai Wang, Miaowen Wen, Chengzhong Xu, H. Vincent Poor

    Abstract: Integrated sensing and communication (ISAC) is a promising solution to accelerate edge inference via the dual use of wireless signals. However, this paradigm needs to minimize the inference error and latency under ISAC co-functionality interference, for which the existing ISAC or edge resource allocation algorithms become inefficient, as they ignore the inter-dependency between low-level ISAC desi… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  45. arXiv:2404.08566  [pdf, other

    eess.SP cs.LG

    Mitigating Receiver Impact on Radio Frequency Fingerprint Identification via Domain Adaptation

    Authors: Liu Yang, Qiang Li, Xiaoyang Ren, Yi Fang, Shafei Wang

    Abstract: Radio Frequency Fingerprint Identification (RFFI), which exploits non-ideal hardware-induced unique distortion resident in the transmit signals to identify an emitter, is emerging as a means to enhance the security of communication systems. Recently, machine learning has achieved great success in develo** state-of-the-art RFFI models. However, few works consider cross-receiver RFFI problems, whe… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE Internet of Things Journal

  46. arXiv:2404.07215  [pdf, other

    cs.NI cs.AI eess.SP

    Computation Offloading for Multi-server Multi-access Edge Vehicular Networks: A DDQN-based Method

    Authors: Siyu Wang, Bo Yang, Zhiwen Yu, Xuelin Cao, Yan Zhang, Chau Yuen

    Abstract: In this paper, we investigate a multi-user offloading problem in the overlap** domain of a multi-server mobile edge computing system. We divide the original problem into two stages: the offloading decision making stage and the request scheduling stage. To prevent the terminal from going out of service area during offloading, we consider the mobility parameter of the terminal according to the hum… ▽ More

    Submitted 20 February, 2024; originally announced April 2024.

  47. arXiv:2404.06079  [pdf, other

    eess.AS cs.AI

    The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

    Authors: Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu

    Abstract: Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS). In this paper, we describe the systems developed by the SJTU X-LANCE group for the TTS (acoustic + vocoder), SVS, and ASR tracks in the Interspeech 2024 Speech Processing Using Discrete Speech Unit Challen… ▽ More

    Submitted 9 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: 5 pages, 3 figures. Report of a challenge

  48. arXiv:2404.06054  [pdf, other

    eess.SP

    Pseudo MIMO (pMIMO): An Energy and Spectral Efficient MIMO-OFDM System

    Authors: Sen Wang, Tianxiong Wang, Shulun Zhao, Zhen Feng, Guangyi Liu, Chunfeng Cui, Chih-Lin I, Jiangzhou Wang

    Abstract: This article introduces an energy and spectral efficient multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) transmission scheme designed for the future sixth generation (6G) wireless communication networks. The approach involves connecting each receiving radio frequency (RF) chain with multiple antenna elements and conducting sample-level adjustments for receivin… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  49. arXiv:2404.03209  [pdf, other

    eess.IV

    CSR-dMRI: Continuous Super-Resolution of Diffusion MRI with Anatomical Structure-assisted Implicit Neural Representation Learning

    Authors: Ruoyou Wu, Jian Cheng, Cheng Li, Juan Zou, **g Yang, Wenxin Fan, Shanshan Wang

    Abstract: Deep learning-based dMRI super-resolution methods can effectively enhance image resolution by leveraging the learning capabilities of neural networks on large datasets. However, these methods tend to learn a fixed scale map** between low-resolution (LR) and high-resolution (HR) images, overlooking the need for radiologists to scale the images at arbitrary resolutions. Moreover, the pixel-wise lo… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: 10 pages

  50. arXiv:2404.02988  [pdf, other

    eess.SY cs.LG

    Risk-averse Learning with Non-Stationary Distributions

    Authors: Siyi Wang, Zifan Wang, Xinlei Yi, Michael M. Zavlanos, Karl H. Johansson, Sandra Hirche

    Abstract: Considering non-stationary environments in online optimization enables decision-maker to effectively adapt to changes and improve its performance over time. In such cases, it is favorable to adopt a strategy that minimizes the negative impact of change to avoid potentially risky situations. In this paper, we investigate risk-averse online optimization where the distribution of the random cost chan… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.