Skip to main content

Showing 1–50 of 1,023 results for author: Wang, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Ya**g Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, **g Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  2. arXiv:2406.17801  [pdf, other

    cs.SD cs.CL eess.AS

    A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge

    Authors: Xiaopeng Wang, Yi Lu, Xin Qi, Zhiyong Wang, Yuankun Xie, Shuchen Shi, Ruibo Fu

    Abstract: This paper presents the development of a speech synthesis system for the LIMMITS'24 Challenge, focusing primarily on Track 2. The objective of the challenge is to establish a multi-speaker, multi-lingual Indic Text-to-Speech system with voice cloning capabilities, covering seven Indian languages with both male and female speakers. The system was trained using challenge data and fine-tuned for few-… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  3. arXiv:2406.16928  [pdf, other

    eess.SP cs.LG

    A Multi-Resolution Mutual Learning Network for Multi-Label ECG Classification

    Authors: Wei Huang, Ning Wang, Panpan Feng, Haiyan Wang, Zongmin Wang, Bing Zhou

    Abstract: Electrocardiograms (ECG), which record the electrophysiological activity of the heart, have become a crucial tool for diagnosing these diseases. In recent years, the application of deep learning techniques has significantly improved the performance of ECG signal classification. Multi-resolution feature analysis, which captures and processes information at different time scales, can extract subtle… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2406.16026  [pdf

    physics.med-ph cs.LG eess.IV

    CEST-KAN: Kolmogorov-Arnold Networks for CEST MRI Data Analysis

    Authors: Jiawen Wang, Pei Cai, Ziyan Wang, Huabin Zhang, Jianpan Huang

    Abstract: Purpose: This study aims to propose and investigate the feasibility of using Kolmogorov-Arnold Network (KAN) for CEST MRI data analysis (CEST-KAN). Methods: CEST MRI data were acquired from twelve healthy volunteers at 3T. Data from ten subjects were used for training, while the remaining two were reserved for testing. The performance of multi-layer perceptron (MLP) and KAN models with the same ne… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  5. arXiv:2406.14794  [pdf, other

    eess.IV cs.CV cs.LG

    ImageFlowNet: Forecasting Multiscale Trajectories of Disease Progression with Irregularly-Sampled Longitudinal Medical Images

    Authors: Chen Liu, Ke Xu, Liangbo L. Shen, Guillaume Huguet, Zilong Wang, Alexander Tong, Danilo Bzdok, Jay Stewart, Jay C. Wang, Lucian V. Del Priore, Smita Krishnaswamy

    Abstract: The forecasting of disease progression from images is a holy grail for clinical decision making. However, this task is complicated by the inherent high dimensionality, temporal sparsity and sampling irregularity in longitudinal image acquisitions. Existing methods often rely on extracting hand-crafted features and performing time-series analysis in this vector space, leading to a loss of rich spat… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.12463  [pdf, other

    cs.CV eess.IV

    LFMamba: Light Field Image Super-Resolution with State Space Model

    Authors: Wang xia, Yao Lu, Shunzhou Wang, Ziqi Wang, Peiqi Xia, Tianfei Zhou

    Abstract: Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scan… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  7. arXiv:2406.12323  [pdf, other

    eess.SP

    Hybrid Beamforming Design for Near-Field ISAC with Modular XL-MIMO

    Authors: Chunwei Meng, Dingyou Ma, Zhaolin Wang, Yuanwei Liu, Zhiqing Wei, Zhiyong Feng

    Abstract: A novel modular extremely large-scale multiple-input-multiple-output (XL-MIMO) integrated sensing and communication (ISAC) framework is proposed in this paper. We consider a downlink ISAC scenario and exploit the modular array architecture to enhance the communication spectral efficiency and sensing resolution while reducing the channel modeling complexity by employing the hybrid spherical and pla… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.12186  [pdf, ps, other

    eess.IV cs.CV

    Unlocking the Potential of Early Epochs: Uncertainty-aware CT Metal Artifact Reduction

    Authors: Xinquan Yang, Guanqun Zhou, Wei Sun, Youjian Zhang, Zhongya Wang, Jiahui He, Zhicheng Zhang

    Abstract: In computed tomography (CT), the presence of metallic implants in patients often leads to disruptive artifacts in the reconstructed images, hindering accurate diagnosis. Recently, a large amount of supervised deep learning-based approaches have been proposed for metal artifact reduction (MAR). However, these methods neglect the influence of initial training weights. In this paper, we have discover… ▽ More

    Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  9. arXiv:2406.11546  [pdf, other

    eess.AS cs.CL cs.SD

    GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

    Authors: Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, **peng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen

    Abstract: The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired spee… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Under review

  10. Simultaneously Transmitting and Reflecting Surfaces for Ubiquitous Next Generation Multiple Access in 6G and Beyond

    Authors: Xidong Mu, Jiaqi Xu, Zhaolin Wang, Naofal Al-Dhahir

    Abstract: The ultimate goal of next generation multiple access (NGMA) is to support massive terminals and facilitate multiple functionalities over the limited radio resources of wireless networks in the most efficient manner possible. However, the random and uncontrollable wireless radio environment is a major obstacle to realizing this NGMA vision. Given the prominent feature of achieving 360° smart radio… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 25 pages, 18 figures, 7 tables

  11. arXiv:2406.10941  [pdf, other

    eess.SP

    Near-Field Localization and Sensing with Large-Aperture Arrays: From Signal Modeling to Processing

    Authors: Zhaolin Wang, Parisa Ramezani, Yuanwei Liu, Emil Björnson

    Abstract: The signal processing community is currently witnessing a growing interest in near-field signal processing, driven by the trend towards the use of large aperture arrays with high spatial resolution in the fields of communication, localization, sensing, imaging, etc. From the perspective of localization and sensing, this trend breaks the basic far-field assumptions that have dominated the array sig… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 20 pages, 5 figures

  12. arXiv:2406.10591  [pdf, other

    eess.AS cs.AI cs.CV cs.MM cs.SD

    MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

    Authors: Ruibo Fu, Shuchen Shi, Hongming Guo, Tao Wang, Chunyu Qiang, Zhengqi Wen, Jianhua Tao, Xin Qi, Yi Lu, Xiaopeng Wang, Zhiyong Wang, Yukun Liu, Xuefei Liu, Shuai Zhang, Guanjun Li

    Abstract: Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  13. arXiv:2406.09844  [pdf, other

    cs.SD eess.AS

    Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy

    Authors: Linhan Ma, Xinfa Zhu, Yuanjun Lv, Zhichao Wang, Ziqian Wang, Wendi He, Hongbin Zhou, Lei Xie

    Abstract: Zero-shot voice conversion (VC) aims to transform source speech into arbitrary unseen target voice while kee** the linguistic content unchanged. Recent VC methods have made significant progress, but semantic losses in the decoupling process as well as training-inference mismatch still hinder conversion performance. In this paper, we propose Vec-Tok-VC+, a novel prompt-based zero-shot VC model im… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  14. arXiv:2406.09082  [pdf

    eess.SY cs.AI

    Data-driven modeling and supervisory control system optimization for plug-in hybrid electric vehicles

    Authors: Hao Zhang, Nuo Lei, Boli Chen, Bingbing Li, Rulong Li, Zhi Wang

    Abstract: Learning-based intelligent energy management systems for plug-in hybrid electric vehicles (PHEVs) are crucial for achieving efficient energy utilization. However, their application faces system reliability challenges in the real world, which prevents widespread acceptance by original equipment manufacturers (OEMs). This paper begins by establishing a PHEV model based on physical and data-driven mo… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  15. arXiv:2406.08112  [pdf, other

    cs.SD cs.AI eess.AS

    Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio

    Authors: Yi Lu, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Zhiyong Wang, Xin Qi, Xuefei Liu, Yongwei Li, Yukun Liu, Xiaopeng Wang, Shuchen Shi

    Abstract: With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024. arXiv admin note: substantial text overlap with arXiv:2405.04880

  16. arXiv:2406.07918  [pdf, other

    eess.IV

    Micro-expression recognition based on depth map to point cloud

    Authors: Ren Zhang, Jianqin Yin, Chao Qi, Zehao Wang, Zhicheng Zhang, Yonghao Dang

    Abstract: Micro-expressions are nonverbal facial expressions that reveal the covert emotions of individuals, making the micro-expression recognition task receive widespread attention. However, the micro-expression recognition task is challenging due to the subtle facial motion and brevity in duration. Many 2D image-based methods have been developed in recent years to recognize MEs effectively, but, these ap… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  17. arXiv:2406.07846  [pdf, other

    eess.AS

    DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion

    Authors: Ziqian Ning, Shuai Wang, Pengcheng Zhu, Zhichao Wang, Jixun Yao, Lei Xie, Mengxiao Bi

    Abstract: Streaming voice conversion has become increasingly popular for its potential in real-time applications. The recently proposed DualVC 2 has achieved robust and high-quality streaming voice conversion with a latency of about 180ms. Nonetheless, the recognition-synthesis framework hinders end-to-end optimization, and the instability of automatic speech recognition (ASR) model with short chunks makes… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  18. arXiv:2406.06833  [pdf, other

    eess.SY stat.AP

    Data-driven Power Flow Linearization: Simulation

    Authors: Mengshuo Jia, Gabriela Hug, Ning Zhang, Zhaojian Wang, Yi Wang, Chongqing Kang

    Abstract: Building on the theoretical insights of Part I, this paper, as the second part of the tutorial, dives deeper into data-driven power flow linearization (DPFL), focusing on comprehensive numerical testing. The necessity of these simulations stems from the theoretical analysis's inherent limitations, particularly the challenge of identifying the differences in real-world performance among DPFL method… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 26 pages

  19. arXiv:2406.06796  [pdf, other

    cs.CV cs.AI cs.LG cs.RO eess.SP

    FlexLoc: Conditional Neural Networks for Zero-Shot Sensor Perspective Invariance in Object Localization with Distributed Multimodal Sensors

    Authors: Jason Wu, Ziqi Wang, Xiaomin Ouyang, Ho Lyun Jeong, Colin Samplawski, Lance Kaplan, Benjamin Marlin, Mani Srivastava

    Abstract: Localization is a critical technology for various applications ranging from navigation and surveillance to assisted living. Localization systems typically fuse information from sensors viewing the scene from different perspectives to estimate the target location while also employing multiple modalities for enhanced robustness and accuracy. Recently, such systems have employed end-to-end deep neura… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  20. arXiv:2406.06744  [pdf

    cs.LG cs.CR eess.SY

    A Multi-module Robust Method for Transient Stability Assessment against False Label Injection Cyberattacks

    Authors: Hanxuan Wang, Na Lu, Yinhong Liu, Zhuqing Wang, Zixuan Wang

    Abstract: The success of deep learning in transient stability assessment (TSA) heavily relies on high-quality training data. However, the label information in TSA datasets is vulnerable to contamination through false label injection (FLI) cyberattacks, resulting in degraded performance of deep TSA models. To address this challenge, a Multi-Module Robust TSA method (MMR) is proposed to rectify the supervised… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  21. arXiv:2406.06582  [pdf, ps, other

    cs.CL cs.LG eess.AS

    Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing

    Authors: Viet Anh Trinh, Rosy Southwell, Yiwen Guan, Xinlu He, Zhiyong Wang, Jacob Whitehill

    Abstract: Recent work on discrete speech tokenization has paved the way for models that can seamlessly perform multiple tasks across modalities, e.g., speech recognition, text to speech, speech to speech translation. Moreover, large language models (LLMs) pretrained from vast text corpora contain rich linguistic information that can improve accuracy in a variety of tasks. In this paper, we present a decoder… ▽ More

    Submitted 25 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  22. arXiv:2406.04683  [pdf, other

    cs.SD eess.AS

    PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation

    Authors: Shuchen Shi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Tao Wang, Chunyu Qiang, Yi Lu, Xin Qi, Xuefei Liu, Yukun Liu, Yongwei Li, Zhiyong Wang, Xiaopeng Wang

    Abstract: Text-to-Audio (TTA) aims to generate audio that corresponds to the given text description, playing a crucial role in media production. The text descriptions in TTA datasets lack rich variations and diversity, resulting in a drop in TTA model performance when faced with complex text. To address this issue, we propose a method called Portable Plug-in Prompt Refiner, which utilizes rich knowledge abo… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: accepted by INTERSPEECH2024

  23. arXiv:2406.03247  [pdf, other

    cs.SD eess.AS

    Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection

    Authors: Xiaopeng Wang, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Yuankun Xie, Yukun Liu, Jianhua Tao, Xuefei Liu, Yongwei Li, Xin Qi, Yi Lu, Shuchen Shi

    Abstract: The generalization of Fake Audio Detection (FAD) is critical due to the emergence of new spoofing techniques. Traditional FAD methods often focus solely on distinguishing between genuine and known spoofed audio. We propose a Genuine-Focused Learning (GFL) framework guided, aiming for highly generalized FAD, called GFL-FAD. This method incorporates a Counterfactual Reasoning Enhanced Representation… ▽ More

    Submitted 9 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  24. arXiv:2406.03240  [pdf, other

    cs.SD cs.AI eess.AS

    Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy

    Authors: Yuankun Xie, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Xiaopeng Wang, Haonnan Cheng, Long Ye, Jianhua Tao

    Abstract: With the proliferation of deepfake audio, there is an urgent need to investigate their attribution. Current source tracing methods can effectively distinguish in-distribution (ID) categories. However, the rapid evolution of deepfake algorithms poses a critical challenge in the accurate identification of out-of-distribution (OOD) novel deepfake algorithms. In this paper, we propose Real Emphasis an… ▽ More

    Submitted 8 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  25. arXiv:2406.03237  [pdf, other

    cs.SD eess.AS

    Generalized Fake Audio Detection via Deep Stable Learning

    Authors: Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Yuankun Xie, Yukun Liu, Xiaopeng Wang, Xuefei Liu, Yongwei Li, Jianhua Tao, Yi Lu, Xin Qi, Shuchen Shi

    Abstract: Although current fake audio detection approaches have achieved remarkable success on specific datasets, they often fail when evaluated with datasets from different distributions. Previous studies typically address distribution shift by focusing on using extra data or applying extra loss restrictions during training. However, these methods either require a substantial amount of data or complicate t… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: accepted by INTERSPEECH2024

  26. arXiv:2406.02247  [pdf, other

    physics.ins-det eess.SY

    A Study of the Latest Updates of the Readout System for the Hybird-Pixel Detector at HEPS

    Authors: Hangxu Li, Jie Zhang, Wei Wei, Zhenjie Li, Xiaolu Ji, Yan Zhang, Xuanzheng Yang, Shuihan Zhang, Xueke Ma, Peng Liu, Zheng Wang, Yuanbai Chen

    Abstract: The High Energy Photon Source (HEPS) represents a fourth-generation light source. This facility has made unprecedented advancements in accelerator technology, necessitating the development of new detectors to satisfy physical requirements such as single-photon resolution, large dynamic range, and high frame rates. Since 2016, the Institute of High Energy Physics has introduced the first user-exper… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  27. arXiv:2406.00516  [pdf, other

    eess.SY

    Deep Learning based Performance Testing for Analog Integrated Circuits

    Authors: Jiawei Cao, Chongtao Guo, Hao Li, Zhigang Wang, Houjun Wang, Geoffrey Ye Li

    Abstract: In this paper, we propose a deep learning based performance testing framework to minimize the number of required test modules while guaranteeing the accuracy requirement, where a test module corresponds to a combination of one circuit and one stimulus. First, we apply a deep neural network (DNN) to establish the map** from the response of the circuit under test (CUT) in each module to all specif… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  28. arXiv:2406.00320  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Frieren: Efficient Video-to-Audio Generation with Rectified Flow Matching

    Authors: Yongqi Wang, Wenxiang Guo, Rongjie Huang, Jiawei Huang, Zehan Wang, Fuming You, Ruiqi Li, Zhou Zhao

    Abstract: Video-to-audio (V2A) generation aims to synthesize content-matching audio from silent video, and it remains challenging to build V2A models with high generation quality, efficiency, and visual-audio temporal synchrony. We propose Frieren, a V2A model based on rectified flow matching. Frieren regresses the conditional transport vector field from noise to spectrogram latent with straight paths and c… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  29. arXiv:2405.20595  [pdf, other

    eess.SP

    Multi-Beam Integrated Sensing and Communication: State-of-the-Art, Challenges and Opportunities

    Authors: Yinxiao Zhuo, Tianqi Mao, Hao** Li, Chen Sun, Zhaocheng Wang, Zhu Han, Sheng Chen

    Abstract: Integrated sensing and communication (ISAC) has been envisioned as a critical enabling technology for the next-generation wireless communication, which can realize location/motion detection of surroundings with communication devices. This additional sensing capability leads to a substantial network quality gain and expansion of the service scenarios. As the system evolves to millimeter wave (mmWav… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  30. arXiv:2405.20402  [pdf, other

    eess.AS cs.SD eess.SP

    Cross-Talk Reduction

    Authors: Zhong-Qiu Wang, Anurag Kumar, Shinji Watanabe

    Abstract: While far-field multi-talker mixtures are recorded, each speaker can wear a close-talk microphone so that close-talk mixtures can be recorded at the same time. Although each close-talk mixture has a high signal-to-noise ratio (SNR) of the wearer, it has a very limited range of applications, as it also contains significant cross-talk speech by other speakers and is not clean enough. In this context… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: in International Joint Conference on Artificial Intelligence (IJCAI), 2024

  31. arXiv:2405.19889  [pdf, other

    eess.SP cs.IT cs.LG cs.MM

    Deep Joint Semantic Coding and Beamforming for Near-Space Airship-Borne Massive MIMO Network

    Authors: Minghui Wu, Zhen Gao, Zhaocheng Wang, Dusit Niyato, George K. Karagiannidis, Sheng Chen

    Abstract: Near-space airship-borne communication network is recognized to be an indispensable component of the future integrated ground-air-space network thanks to airships' advantage of long-term residency at stratospheric altitudes, but it urgently needs reliable and efficient Airship-to-X link. To improve the transmission efficiency and capacity, this paper proposes to integrate semantic communication wi… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Major Revision by IEEE JSAC

  32. arXiv:2405.17114  [pdf, other

    cs.IT eess.SP

    Holographic MIMO Systems, Their Channel Estimation and Performance

    Authors: Yuanbin Chen, Ying Wang, Zhaocheng Wang, ** Zhang

    Abstract: Holographic multiple-input multiple-output (MIMO) systems constitute a promising technology in support of next-generation wireless communications, thus paving the way for a smart programmable radio environment. However, despite its significant potential, further fundamental issues remain to be addressed, such as the acquisition of accurate channel information. Indeed, the conventional angular-doma… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: This article has been accepted for publication in IEEE VTM

  33. arXiv:2405.16136  [pdf, other

    cs.AI cs.CL cs.LG cs.SD eess.AS

    C3LLM: Conditional Multimodal Content Generation Using Large Language Models

    Authors: Zixuan Wang, Qinkai Duan, Yu-Wing Tai, Chi-Keung Tang

    Abstract: We introduce C3LLM (Conditioned-on-Three-Modalities Large Language Models), a novel framework combining three tasks of video-to-audio, audio-to-text, and text-to-audio together. C3LLM adapts the Large Language Model (LLM) structure as a bridge for aligning different modalities, synthesizing the given conditional information, and making multimodal generation in a discrete manner. Our contributions… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  34. arXiv:2405.15345  [pdf, ps, other

    eess.SP

    Hybrid-Field Channel Estimation for XL-MIMO Systems with Stochastic Gradient Pursuit Algorithm

    Authors: Hao Lei, Jiayi Zhang, Zhe Wang, Bo Ai, Derrick Wing Kwan Ng

    Abstract: Extremely large-scale multiple-input multiple-output (XL-MIMO) is crucial for satisfying the high data rate requirements of the sixth-generation (6G) wireless networks. In this context, ensuring accurate acquisition of channel state information (CSI) with low complexity becomes imperative. Moreover, deploying an extremely large antenna array at the base station (BS) might result in some scatterers… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 30 pages, 6 figures, been ACCEPTED for publication as a REGULAR paper in the IEEE Transactions on Signal Processing

  35. arXiv:2405.14559  [pdf, other

    eess.IV

    HemSeg-200: A Voxel-Annotated Dataset for Intracerebral Hemorrhages Segmentation in Brain CT Scans

    Authors: Changwei Song, Qing Zhao, Jianqiang Li, Xin Yue, Ruoyun Gao, Zhaoxuan Wang, An Gao, Guanghui Fu

    Abstract: Acute intracerebral hemorrhage is a life-threatening condition that demands immediate medical intervention. Intraparenchymal hemorrhage (IPH) and intraventricular hemorrhage (IVH) are critical subtypes of this condition. Clinically, when such hemorrhages are suspected, immediate CT scanning is essential to assess the extent of the bleeding and to facilitate the formulation of a targeted treatment… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  36. arXiv:2405.13564  [pdf, other

    eess.SY

    Hybrid Event-triggered Control of Nonlinear System with Full State Constraints and Disturbance

    Authors: Ziming Wang

    Abstract: This article focuses on the problem of adaptive tracking control for a specific type of nonlinear system that is subject to full-state constraints via a hybrid event-triggered control (HETC) strategy. With the auxiliary system, we proposed a 'log' function to deal with the full-state constraint. Additionally, a disturbance observer (DO) is constructed to handle the unmeasurable external disturbanc… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  37. arXiv:2405.12569  [pdf, other

    eess.SP

    TypeII-CsiNet: CSI Feedback with TypeII Codebook

    Authors: Yiliang Sang, Ke Ma, Yang Ming, ** Lian, Zhaocheng Wang

    Abstract: The latest TypeII codebook selects partial strongest angular-delay ports for the feedback of downlink channel state information (CSI), whereas its performance is limited due to the deficiency of utilizing the correlations among the port coefficients. To tackle this issue, we propose a tailored autoencoder named TypeII-CsiNet to effectively integrate the TypeII codebook with deep learning, wherein… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  38. Acoustic modeling for Overlap** Speech Recognition: JHU Chime-5 Challenge System

    Authors: Vimal Manohar, Szu-Jui Chen, Zhiqi Wang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur

    Abstract: This paper summarizes our acoustic modeling efforts in the Johns Hopkins University speech recognition system for the CHiME-5 challenge to recognize highly-overlapped dinner party speech recorded by multiple microphone arrays. We explore data augmentation approaches, neural network architectures, front-end speech dereverberation, beamforming and robust i-vector extraction with comparisons of our i… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Published in: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    Journal ref: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, pp. 6665-6669

  39. arXiv:2405.10570  [pdf

    eess.IV cs.AI

    Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI

    Authors: Yirong Zhou, Chengyan Wang, Mengtian Lu, Kunyuan Guo, Zi Wang, Dan Ruan, Rui Guo, Peijun Zhao, Jianhua Wang, Naiming Wu, Jianzhong Lin, Yinyin Chen, Hang **, Lianxin Xie, Lilan Wu, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Xiaobo Qu

    Abstract: In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features… ▽ More

    Submitted 29 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 8 figures, 6 tables

  40. arXiv:2405.09901  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded Diffusion Models

    Authors: Ziyu Wang, Lejun Min, Gus Xia

    Abstract: Recent deep music generation studies have put much emphasis on long-term generation with structures. However, we are yet to see high-quality, well-structured whole-song generation. In this paper, we make the first attempt to model a full music piece under the realization of compositional hierarchy. With a focus on symbolic representations of pop songs, we define a hierarchical language, in which e… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Proceedings of the International Conference on Learning Representations (ICLR 2024)

    MSC Class: 68Txx

  41. arXiv:2405.06290   

    cs.RO eess.SY

    Path Planning and Motion Control for Accurate Positioning of Car-like Robots

    Authors: ** Dai, Zejiang Wang, Yebin Wang, Rien Quirynen, Stefano Di Cairano

    Abstract: This paper investigates the planning and control for accurate positioning of car-like robots. We propose a solution that integrates two modules: a motion planner, facilitated by the rapidly-exploring random tree algorithm and continuous-curvature (CC) steering technique, generates a CC trajectory as a reference; and a nonlinear model predictive controller (NMPC) regulates the robot to accurately t… ▽ More

    Submitted 8 June, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

    Comments: The paper needs further revision to guarantee technical correctness and conciseness

  42. arXiv:2405.04880  [pdf, other

    cs.SD cs.AI eess.AS

    The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

    Authors: Yuankun Xie, Yi Lu, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Jianhua Tao, Xin Qi, Xiaopeng Wang, Yukun Liu, Haonan Cheng, Long Ye, Yi Sun

    Abstract: With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods. ALM-based deepfake audio currently exhibits widespread, high deception, and type versatility, posing a significant challenge to current audio deepfake detection (ADD) models trained solely on vocoded data. To effectively detect ALM-based deepfake audio, we focus on… ▽ More

    Submitted 15 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  43. arXiv:2405.03393  [pdf, other

    cs.RO eess.SY

    On-site scale factor linearity calibration of MEMS triaxial gyroscopes

    Authors: Yaqi Li, Li Wang, Zhitao Wang, Xiangqing Li, Jiaojiao Li, Steven Weidong Su

    Abstract: The calibration of MEMS triaxial gyroscopes is crucial for achieving precise attitude estimation for various wearable health monitoring applications. However, gyroscope calibration poses greater challenges compared to accelerometers and magnetometers. This paper introduces an efficient method for calibrating MEMS triaxial gyroscopes via only a servo motor, making it well-suited for field environme… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  44. arXiv:2405.01104  [pdf, other

    cs.IT eess.SP

    Multi-user ISAC through Stacked Intelligent Metasurfaces: New Algorithms and Experiments

    Authors: Ziqing Wang, Hongzheng Liu, Jianan Zhang, Ru**g Xiong, Kai Wan, Xuewen Qian, Marco Di Renzo, Robert Caiming Qiu

    Abstract: This paper investigates a Stacked Intelligent Metasurfaces (SIM)-assisted Integrated Sensing and Communications (ISAC) system. An extended target model is considered, where the BS aims to estimate the complete target response matrix relative to the SIM. Under the constraints of minimum Signal-to-Interference-plus-Noise Ratio (SINR) for the communication users (CUs) and maximum transmit power, we j… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  45. arXiv:2405.00259   

    physics.med-ph eess.IV

    Optimization of Dark-Field CT for Lung Imaging

    Authors: Peiyuan Guo, Simon Spindler, Li Zhang, Zhentian Wang

    Abstract: Background: X-ray grating-based dark-field imaging can sense the small angle scattering caused by an object's micro-structure. This technique is sensitive to lung's porous alveoli and is able to detect lung disease at an early stage. Up to now, a human-scale dark-field CT has been built for lung imaging. Purpose: This study aimed to develop a more thorough optimization method for dark-field lung C… ▽ More

    Submitted 1 May, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

    Comments: There is a mistake in subsection 2.3, where the content is not correct because of the incorrect parameter we set, which leads to the following calculations in the following sections potentially incorrect

  46. arXiv:2404.17318  [pdf, other

    cs.IT eess.SP

    Performance Bounds of Near-Field Sensing with Circular Arrays

    Authors: Zhaolin Wang, Xidong Mu, Yuanwei Liu

    Abstract: The performance bounds of near-field sensing are studied for circular arrays, focusing on the impact of bandwidth and array size. The closed-form Cramer-Rao bound (CRBs) for angle and distance estimation are derived, revealing the scaling laws of the CRBs with bandwidth and array size. Contrary to expectations, enlarging array size does not always enhance sensing performance. Furthermore, the asym… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 6 pages, 6 figures. arXiv admin note: text overlap with arXiv:2404.05076

  47. arXiv:2404.17138  [pdf, other

    eess.SP

    Sub-6GHz Assisted mmWave Hybrid Beamforming with Heterogeneous Graph Neural Network

    Authors: Zhaohui Huang, Zhaocheng Wang, Sheng Chen

    Abstract: In next-generation communications, sub-6GHz and millimeter-wave (mmWave) links typically coexist, with the sub-6GHz link always active and the mmWave link active when high-rate transmission is required. Due to the spatial similarities between sub-6GHz and mmWave channels, sub-6GHz channel information can be utilized to support hybrid beamforming in mmWave communications to reduce overhead costs. W… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: This paper has been submitted to IEEE Transactions on Communications (IEEE TCOM)

  48. arXiv:2404.16883  [pdf, other

    eess.SY cs.LG

    Myopically Verifiable Probabilistic Certificates for Safe Control and Learning

    Authors: Zhuoyuan Wang, Haoming **g, Christian Kurniawan, Albert Chern, Yorie Nakahira

    Abstract: This paper addresses the design of safety certificates for stochastic systems, with a focus on ensuring long-term safety through fast real-time control. In stochastic environments, set invariance-based methods that restrict the probability of risk events in infinitesimal time intervals may exhibit significant long-term risks due to cumulative uncertainties/risks. On the other hand, reachability-ba… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2110.13380

  49. arXiv:2404.16522  [pdf, other

    eess.IV cs.LG

    A Deep Learning-Driven Pipeline for Differentiating Hypertrophic Cardiomyopathy from Cardiac Amyloidosis Using 2D Multi-View Echocardiography

    Authors: Bo Peng, Xiaofeng Li, Xinyu Li, Zhenghan Wang, Hui Deng, Xiaoxian Luo, Lixue Yin, Hongmei Zhang

    Abstract: Hypertrophic cardiomyopathy (HCM) and cardiac amyloidosis (CA) are both heart conditions that can progress to heart failure if untreated. They exhibit similar echocardiographic characteristics, often leading to diagnostic challenges. This paper introduces a novel multi-view deep learning approach that utilizes 2D echocardiography for differentiating between HCM and CA. The method begins by classif… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  50. arXiv:2404.16152  [pdf, ps, other

    cs.IT eess.SP

    Rethinking Grant-Free Protocol in mMTC

    Authors: Minhao Zhu, Yifei Sun, Lizhao You, Zhaorui Wang, Ya-Feng Liu, Shuguang Cui

    Abstract: This paper revisits the identity detection problem under the current grant-free protocol in massive machine-type communications (mMTC) by asking the following question: for stable identity detection performance, is it enough to permit active devices to transmit preambles without any handshaking with the base station (BS)? Specifically, in the current grant-free protocol, the BS blindly allocates a… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE for possible publication