Skip to main content

Showing 1–50 of 523 results for author: Wang, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19328  [pdf, other

    cs.SD cs.LG eess.AS

    Subtractive Training for Music Stem Insertion using Latent Diffusion Models

    Authors: Ivan Villa-Renteria, Mason L. Wang, Zachary Shah, Zhe Li, Soohyun Kim, Neelesh Ramachandran, Mert Pilanci

    Abstract: We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.19205  [pdf, other

    eess.SP

    Coordinated RSMA for Integrated Sensing and Communication in Emergency UAV Systems

    Authors: Binghan Yao, Ruoguang Li, Yingyang Chen, Li Wang

    Abstract: Recently, unmanned aerial vehicle (UAV)-enabled integrated sensing and communication (ISAC) is emerging as a promising technique for achieving robust and rapid emergency response capabilities. Such a novel framework offers high-quality and cost-efficient C\&S services due to the intrinsic flexibility and mobility of UAVs. In parallel, rate-splitting multiple access (RSMA) is able to achieve a tail… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.18625  [pdf, other

    cs.SD cs.AI eess.AS

    Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer

    Authors: Liming Wang, Yuan Gong, Nauman Dawalatabad, Marco Vilela, Katerina Placek, Brian Tracey, Yishu Gong, Alan Premasiri, Fernando Vieira, James Glass

    Abstract: Automatic prediction of amyotrophic lateral sclerosis (ALS) disease progression provides a more efficient and objective alternative than manual approaches. We propose ALS longitudinal speech transformer (ALST), a neural network-based automatic predictor of ALS disease progression from longitudinal speech recordings of ALS patients. By taking advantage of high-quality pretrained speech features and… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.15696  [pdf

    physics.med-ph eess.SP

    Functional photoacoustic noninvasive Doppler angiography in humans

    Authors: Yang Zhang, Joshua Olick-Gibson, Karteekeya Sastry, Lihong V. Wang

    Abstract: Optical imaging of blood flow yields critical functional insights into the circulatory system, but its clinical implementation has typically been limited to shallow depths (~1 millimeter) due to light scattering in biological tissue. Here, we present photoacoustic noninvasive Doppler angiography (PANDA) for deep blood flow imaging. PANDA synergizes the photoacoustic and Doppler effects to generate… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 38 pages, 7 main figures, 10 supplementary figures

  5. arXiv:2406.14067  [pdf

    physics.optics eess.SP

    A microwave photonic prototype for concurrent radar detection and spectrum sensing over an 8 to 40 GHz bandwidth

    Authors: Taixia Shi, Dingding Liang, Lu Wang, Lin Li, Shaogang Guo, Jiawei Gao, Xiaowei Li, Chulun Lin, Lei Shi, Baogang Ding, Shiyang Liu, Fangyi Yang, Chi Jiang, Yang Chen

    Abstract: In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz.… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages, 12 figures, 1 table

  6. arXiv:2406.12703  [pdf, other

    eess.IV cs.CV

    Coarse-Fine Spectral-Aware Deformable Convolution For Hyperspectral Image Reconstruction

    Authors: **cheng Yang, Lishun Wang, Miao Cao, Huan Wang, Yin** Zhao, Xin Yuan

    Abstract: We study the inverse problem of Coded Aperture Snapshot Spectral Imaging (CASSI), which captures a spatial-spectral data cube using snapshot 2D measurements and uses algorithms to reconstruct 3D hyperspectral images (HSI). However, current methods based on Convolutional Neural Networks (CNNs) struggle to capture long-range dependencies and non-local similarities. The recently popular Transformer-b… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures, Accepted by ICIP2024

  7. arXiv:2406.11158  [pdf, other

    eess.SY

    Dynamic Modeling and Control for an Offshore Semisubmersible Floating Wind Turbine

    Authors: Yingjie Gong, Qinmin Yang, Hua Geng, Wenchao Meng, Lin Wang

    Abstract: Floating wind turbines (FWTs) hold significant potential for the exploitation of offshore renewable energy resources. Nevertheless, prior to the construction of FWTs, it is imperative to tackle several critical challenges, especially the issue of performance degradation under combined wind and wave loads. This study initiates with the development of a simplified nonlinear dynamical model for a sem… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  8. arXiv:2406.09873  [pdf, other

    eess.AS cs.AI cs.SD

    Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition

    Authors: Yicong Jiang, Tianzi Wang, Xurong Xie, Juan Liu, Wei Sun, Nan Yan, Hui Chen, Lan Wang, Xunying Liu, Feng Tian

    Abstract: Disordered speech recognition profound implications for improving the quality of life for individuals afflicted with, for example, dysarthria. Dysarthric speech recognition encounters challenges including limited data, substantial dissimilarities between dysarthric and non-dysarthric speakers, and significant speaker variations stemming from the disorder. This paper introduces Perceiver-Prompt, a… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by interspeech 2024

  9. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Kai Yu, Aidi Lin, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Wei Chen, Yilong Luo, Yifan Chen, **gcheng Wang, Yih Chung Tham, Dianbo Liu, Wendy Wong, Sahil Thakur, Beau Fenner, Yanda Meng, Yukun Zhou , et al. (11 additional authors not shown)

    Abstract: The current retinal artificial intelligence models were trained using data with a limited category of diseases and limited knowledge. In this paper, we present a retinal vision-language foundation model (RetiZero) with knowledge of over 400 fundus diseases. Specifically, we collected 341,896 fundus images paired with text descriptions from 29 publicly available datasets, 180 ophthalmic books, and… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  10. arXiv:2406.08911  [pdf, other

    cs.CL eess.AS

    An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios

    Authors: Cheng Gong, Erica Cooper, Xin Wang, Chunyu Qiang, Mengzhe Geng, Dan Wells, Longbiao Wang, Jianwu Dang, Marc Tessier, Aidan Pine, Korin Richmond, Junichi Yamagishi

    Abstract: Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  11. arXiv:2406.08380  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Unsupervised Speech Recognition Without Pronunciation Models

    Authors: Junrui Ni, Liming Wang, Yang Zhang, Kaizhi Qian, Heting Gao, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech and text data to effectively train these systems. In this article, we tackle the challenge of develo** ASR systems without paired speech and text corpora by pro… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  12. arXiv:2406.07256  [pdf, ps, other

    cs.SD cs.AI eess.AS

    AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

    Authors: Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, Binbin Zhang, Jun Du, Jia Bin, Ming Li

    Abstract: The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the large… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  13. arXiv:2406.06979  [pdf, other

    cs.LG cs.CR cs.SD eess.AS

    AudioMarkBench: Benchmarking Robustness of Audio Watermarking

    Authors: Hongbin Liu, Moyang Guo, Zhengyuan Jiang, Lun Wang, Neil Zhenqiang Gong

    Abstract: The increasing realism of synthetic speech, driven by advancements in text-to-speech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding human-imperceptible watermarks into AI-generated audios. However, the robustness of audio watermarking against common/adversarial perturbations remains understudied. We present A… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  14. arXiv:2406.02004  [pdf, ps, other

    cs.CR cs.CL cs.SD eess.AS

    Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clip**

    Authors: Lun Wang, Om Thakkar, Zhong Meng, Nicole Rafidi, Rohit Prabhavalkar, Arun Narayanan

    Abstract: Gradient clip** plays a vital role in training large-scale automatic speech recognition (ASR) models. It is typically applied to minibatch gradients to prevent gradient explosion, and to the individual sample gradients to mitigate unintended memorization. This work systematically investigates the impact of a specific granularity of gradient clip**, namely per-core clip-** (PCC), across train… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech'24

  15. arXiv:2406.00683  [pdf, other

    eess.IV cs.CV cs.MM

    Exploiting Frequency Correlation for Hyperspectral Image Reconstruction

    Authors: Muge Yan, Lizhi Wang, Lin Zhu, Hua Huang

    Abstract: Deep priors have emerged as potent methods in hyperspectral image (HSI) reconstruction. While most methods emphasize space-domain learning using image space priors like non-local similarity, frequency-domain learning using image frequency priors remains neglected, limiting the reconstruction capability of networks. In this paper, we first propose a Hyperspectral Frequency Correlation (HFC) prior r… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 14 pages, 11 figures

  16. arXiv:2405.20064  [pdf, other

    eess.AS cs.SD

    1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem

    Authors: Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu, Ziyang Ma, Peter Bell, Catherine Lai, Joshua Reiss, Lin Wang, Philip C. Woodland, Xie Chen, Huy Phan, Thomas Hain

    Abstract: Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Previous work has utilised class weighted loss for t… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  17. arXiv:2405.11895  [pdf, other

    cs.LG eess.SY

    Sparse Attention-driven Quality Prediction for Production Process Optimization in Digital Twins

    Authors: Yanlei Yin, Lihua Wang, Wenbo Wang, Dinh Thai Hoang

    Abstract: In the process industry, optimizing production lines for long-term efficiency requires real-time monitoring and analysis of operation states to fine-tune production line parameters. However, the complexity in operational logic and the intricate coupling of production process parameters make it difficult to develop an accurate mathematical model for the entire process, thus hindering the deployment… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  18. arXiv:2405.11352  [pdf, other

    cs.NI eess.SP

    Hierarchical Reinforcement Learning Empowered Task Offloading in V2I Networks

    Authors: Xinyu You, Haojie Yan, Yuedong Xu, Lifeng Wang, Liangui Dai

    Abstract: Edge computing plays an essential role in the vehicle-to-infrastructure (V2I) networks, where vehicles offload their intensive computation tasks to the road-side units for saving energy and reduce the latency. This paper designs the optimal task offloading policy to address the concerns involving processing delay, energy consumption and edge computing cost. Each computation task consisting of some… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  19. arXiv:2405.09317  [pdf, other

    eess.SY

    Controllability Test for Nonlinear Datatic Systems

    Authors: Yujie Yang, Letian Tao, Likun Wang, Shengbo Eben Li

    Abstract: Controllability is a fundamental property of control systems, serving as the prerequisite for controller design. While controllability test is well established in modelic (i.e., model-driven) control systems, extending it to datatic (i.e., data-driven) control systems is still a challenging task due to the absence of system models. In this study, we propose a general controllability test method fo… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  20. arXiv:2405.07717  [pdf, other

    eess.IV

    On the Adversarial Robustness of Learning-based Image Compression Against Rate-Distortion Attacks

    Authors: Chenhao Wu, Qingbo Wu, Haoran Wei, Shuai Chen, Lei Wang, King Ngi Ngan, Fanman Meng, Hongliang Li

    Abstract: Despite demonstrating superior rate-distortion (RD) performance, learning-based image compression (LIC) algorithms have been found to be vulnerable to malicious perturbations in recent studies. Adversarial samples in these studies are designed to attack only one dimension of either bitrate or distortion, targeting a submodel with a specific compression ratio. However, adversaries in real-world sce… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  21. arXiv:2405.06178  [pdf, other

    eess.IV cs.LG q-bio.NC

    ACTION: Augmentation and Computation Toolbox for Brain Network Analysis with Functional MRI

    Authors: Yuqi Fang, Junhao Zhang, Linmin Wang, Qianqian Wang, Mingxia Liu

    Abstract: Functional magnetic resonance imaging (fMRI) has been increasingly employed to investigate functional brain activity. Many fMRI-related software/toolboxes have been developed, providing specialized algorithms for fMRI analysis. However, existing toolboxes seldom consider fMRI data augmentation, which is quite useful, especially in studies with limited or imbalanced data. Moreover, current studies… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures, 5 tables

  22. arXiv:2405.04476  [pdf, other

    eess.AS cs.SD

    BERP: A Blind Estimator of Room Acoustic and Physical Parameters for Single-Channel Noisy Speech Signals

    Authors: Lijun Wang, Yixian Lu, Ziyan Gao, Kai Li, Jianqiang Huang, Yuntao Kong, Shogo Okada

    Abstract: Room acoustic parameters (RAPs) and room physical parameters ( RPPs) are essential metrics for parameterizing the room acoustical characteristics (RAC) of a sound field around a listener's local environment, offering comprehensive indications for various applications. The current RAPs and RPPs estimation methods either fall short of covering broad real-world acoustic environments in the context of… ▽ More

    Submitted 16 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 13-page, Submitted to IEEE/ACM Transaction on Audio Speech and Language Processing (TASLP)

  23. arXiv:2405.04066  [pdf, other

    cs.SI eess.SY

    Characterizing Regional Importance in Cities with Human Mobility Motifs in Metro Networks

    Authors: Shuyang Shi, Ding Lyu, Lin Wang, Xiaofan Wang, Guanrong Chen

    Abstract: Uncovering higher-order spatiotemporal dependencies within human mobility networks offers valuable insights into the analysis of urban structures. In most existing studies, human mobility networks are typically constructed by aggregating all trips without distinguishing who takes which specific trip. Instead, we claim individual mobility motifs, higher-order structures generated by daily trips of… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  24. arXiv:2405.03783  [pdf, other

    eess.SY

    Merging Parameter Estimation and Classification Using LASSO

    Authors: Le Wang, Ying Wang, Yu Qiu, Mian Li, Håkan Hjalmarsson

    Abstract: Soft sensing is a way to indirectly obtain information of signals for which direct sensing is difficult or prohibitively expensive. It may not a priori be evident which sensors provide useful information about the target signal. There may be sensors irrelevant for the estimation as well as sensors for which the information is very poor. It is often required that the soft sensor should cover a wide… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  25. arXiv:2405.03393  [pdf, other

    cs.RO eess.SY

    On-site scale factor linearity calibration of MEMS triaxial gyroscopes

    Authors: Yaqi Li, Li Wang, Zhitao Wang, Xiangqing Li, Jiaojiao Li, Steven Weidong Su

    Abstract: The calibration of MEMS triaxial gyroscopes is crucial for achieving precise attitude estimation for various wearable health monitoring applications. However, gyroscope calibration poses greater challenges compared to accelerometers and magnetometers. This paper introduces an efficient method for calibrating MEMS triaxial gyroscopes via only a servo motor, making it well-suited for field environme… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  26. arXiv:2405.03254  [pdf

    eess.AS

    Automatic Assessment of Dysarthria Using Audio-visual Vowel Graph Attention Network

    Authors: Xiaokang Liu, Xiaoxia Du, Juan Liu, Rongfeng Su, Manwa Lawrence Ng, Yumei Zhang, Yudong Yang, Shaofeng Zhao, Lan Wang, Nan Yan

    Abstract: Automatic assessment of dysarthria remains a highly challenging task due to high variability in acoustic signals and the limited data. Currently, research on the automatic assessment of dysarthria primarily focuses on two approaches: one that utilizes expert features combined with machine learning, and the other that employs data-driven deep learning methods to extract representations. Research ha… ▽ More

    Submitted 6 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: 10 pages, 7 figures, 7 tables

  27. arXiv:2405.01115  [pdf

    cs.RO eess.SY

    A New Self-Alignment Method without Solving Wahba Problem for SINS in Autonomous Vehicles

    Authors: Hongliang Zhang, Yilan Zhou, Lei Wang, Tengchao Huang

    Abstract: Initial alignment is one of the key technologies in strapdown inertial navigation system (SINS) to provide initial state information for vehicle attitude and navigation. For some situations, such as the attitude heading reference system, the position is not necessarily required or even available, then the self-alignment that does not rely on any external aid becomes very necessary. This study pres… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  28. arXiv:2404.19666  [pdf, other

    cs.CV eess.IV

    Beyond MOS: Subjective Image Quality Score Preprocessing Method Based on Perceptual Similarity

    Authors: Lei Wang, Desen Yuan

    Abstract: Image quality assessment often relies on raw opinion scores provided by subjects in subjective experiments, which can be noisy and unreliable. To address this issue, postprocessing procedures such as ITU-R BT.500, ITU-T P.910, and ITU-T P.913 have been standardized to clean up the original opinion scores. These methods use annotator-based statistical priors, but they do not take into account exten… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  29. arXiv:2404.19595  [pdf, other

    cs.CV eess.IV

    Perceptual Constancy Constrained Single Opinion Score Calibration for Image Quality Assessment

    Authors: Lei Wang, Desen Yuan

    Abstract: In this paper, we propose a highly efficient method to estimate an image's mean opinion score (MOS) from a single opinion score (SOS). Assuming that each SOS is the observed sample of a normal distribution and the MOS is its unknown expectation, the MOS inference is formulated as a maximum likelihood estimation problem, where the perceptual correlation of pairwise images is considered in modeling… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  30. arXiv:2404.19567  [pdf, other

    cs.CV eess.IV

    Causal Perception Inspired Representation Learning for Trustworthy Image Quality Assessment

    Authors: Lei Wang, Desen Yuan

    Abstract: Despite great success in modeling visual perception, deep neural network based image quality assessment (IQA) still remains unreliable in real-world applications due to its vulnerability to adversarial perturbations and the inexplicit black-box structure. In this paper, we propose to build a trustworthy IQA model via Causal Perception inspired Representation Learning (CPRL), and a score reflection… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  31. arXiv:2404.15530  [pdf, other

    cs.IT eess.SP

    Co-existing/Cooperating Multicell Massive MIMO and Cell-Free Massive MIMO Deployments: Heuristic Designs and Performance Analysis

    Authors: Stefano Buzzi, Carmen D'Andrea, Li Wang, Ahmet Hasim Gokceoglu, Gunnar Peters

    Abstract: Cell-free massive MIMO (CF-mMIMO) represent a deeply investigated evolution from the conventional multicell co-located massive MIMO (MC-mMIMO) network deployments. Anticipating a gradual integration of CF-mMIMO systems alongside pre-existing MC-mMIMO network elements, this paper considers a scenario where both deployments coexist, in order to serve a large number of users using a shared set of fre… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Paper submitted to the IEEE Open Journal of the Communications Society

  32. arXiv:2404.14862  [pdf, other

    eess.SP

    Deep Learning Based Multi-Node ISAC 4D Environmental Reconstruction with Uplink- Downlink Cooperation

    Authors: Bohao Lu, Zhiqing Wei, Huici Wu, Xinrui Zeng, Lin Wang, Xi Lu, Dongyang Mei, Zhiyong Feng

    Abstract: Utilizing widely distributed communication nodes to achieve environmental reconstruction is one of the significant scenarios for Integrated Sensing and Communication (ISAC) and a crucial technology for 6G. To achieve this crucial functionality, we propose a deep learning based multi-node ISAC 4D environment reconstruction method with Uplink-Downlink (UL-DL) cooperation, which employs virtual apert… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 13 pages,21 figures,4 tables

  33. arXiv:2404.13509  [pdf, ps, other

    cs.SD cs.AI eess.AS

    MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention

    Authors: Xinxin Jiao, Liejun Wang, Yinfeng Yu

    Abstract: Speech emotion recognition is crucial in human-computer interaction, but extracting and using emotional cues from audio poses challenges. This paper introduces MFHCA, a novel method for Speech Emotion Recognition using Multi-Spatial Fusion and Hierarchical Cooperative Attention on spectrograms and raw audio. We employ the Multi-Spatial Fusion module (MF) to efficiently identify emotion-related spe… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: Main paper (5 pages). Accepted for publication by ICME 2024

  34. arXiv:2404.12705  [pdf, other

    eess.SP

    Integrated Sensing and Communication enabled Multiple Base Stations Cooperative UAV Detection

    Authors: Xi Lu, Zhiqing Wei, Ruizhong Xu, Lin Wang, Bohao Lu, **ghui Piao

    Abstract: Integrated sensing and communication (ISAC) exhibits notable potential for sensing the unmanned aerial vehicles (UAVs), facilitating real-time monitoring of UAVs for security insurance. Due to the low sensing accuracy of single base stations (BSs), a cooperative UAV sensing method by multi-BS is proposed in this paper to achieve high-accuracy sensing. Specifically, a multiple signal classification… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  35. arXiv:2404.11889  [pdf, other

    eess.IV cs.CV

    Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans

    Authors: Lixing Tan, Shuang Song, Kangneng Zhou, Chengbo Duan, Lanying Wang, Huayang Ren, Linlin Liu, Wei Zhang, Ruoxiu Xiao

    Abstract: X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume d… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 13 pages, 10 figures

  36. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  37. arXiv:2404.09519  [pdf, other

    cs.LG eess.SY

    Nonlinear sparse variational Bayesian learning based model predictive control with application to PEMFC temperature control

    Authors: Qi Zhang, Lei Wang, Weihua Xu, Hongye Su, Lei Xie

    Abstract: The accuracy of the underlying model predictions is crucial for the success of model predictive control (MPC) applications. If the model is unable to accurately analyze the dynamics of the controlled system, the performance and stability guarantees provided by MPC may not be achieved. Learning-based MPC can learn models from data, improving the applicability and reliability of MPC. This study deve… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  38. arXiv:2404.08607  [pdf, other

    cs.IT eess.SP

    Learning-Based Joint Antenna Selection and Precoding Design for Cell-Free MIMO Networks

    Authors: Liangzhi Wang, Chen Chen, Carlo Fischione, Jie Zhang

    Abstract: This paper considers a downlink cell-free multiple-input multiple-output (MIMO) network in which multiple multi-antenna base stations (BSs) serve multiple users via coherent joint transmission. In order to reduce the energy consumption by radio frequency components, each BS selects a subset of antennas for downlink data transmission after estimating the channel state information (CSI). We aim to m… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  39. arXiv:2404.01609  [pdf

    eess.SY

    Identifying the Largest RoCoF and Its Implications

    Authors: Licheng Wang, Luochen Xie, Gang Huang, Changsen Feng

    Abstract: The rate of change of frequency (RoCoF) is a critical factor in ensuring frequency security, particularly in power systems with low inertia. Currently, most RoCoF security constrained optimal inertia dispatch methods and inertia market mechanisms predominantly rely on the center of inertia (COI) model. This model, however, does not account for the disparities in post-contingency frequency dynamics… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  40. arXiv:2404.00471  [pdf, other

    physics.med-ph cs.CV cs.LG eess.IV

    Score-Based Diffusion Models for Photoacoustic Tomography Image Reconstruction

    Authors: Sreemanti Dey, Snigdha Saha, Berthy T. Feng, Manxiu Cui, Laure Delisle, Oscar Leong, Lihong V. Wang, Katherine L. Bouman

    Abstract: Photoacoustic tomography (PAT) is a rapidly-evolving medical imaging modality that combines optical absorption contrast with ultrasound imaging depth. One challenge in PAT is image reconstruction with inadequate acoustic signals due to limited sensor coverage or due to the density of the transducer array. Such cases call for solving an ill-posed inverse reconstruction problem. In this work, we use… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 5 pages

    Journal ref: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, pp. 2470-2474

  41. arXiv:2403.20058  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Revolutionizing Disease Diagnosis with simultaneous functional PET/MR and Deeply Integrated Brain Metabolic, Hemodynamic, and Perfusion Networks

    Authors: Luoyu Wang, Yitian Tao, Qing Yang, Yan Liang, Siwei Liu, Hongcheng Shi, Dinggang Shen, Han Zhang

    Abstract: Simultaneous functional PET/MR (sf-PET/MR) presents a cutting-edge multimodal neuroimaging technique. It provides an unprecedented opportunity for concurrently monitoring and integrating multifaceted brain networks built by spatiotemporally covaried metabolic activity, neural activity, and cerebral blood flow (perfusion). Albeit high scientific/clinical values, short in hardware accessibility of P… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 11 pages

  42. arXiv:2403.15433  [pdf, other

    eess.SP cs.AI cs.LG eess.IV

    HyPer-EP: Meta-Learning Hybrid Personalized Models for Cardiac Electrophysiology

    Authors: Xiajun Jiang, Sumeet Vadhavkar, Yubo Ye, Maryam Toloubidokhti, Ryan Missel, Linwei Wang

    Abstract: Personalized virtual heart models have demonstrated increasing potential for clinical use, although the estimation of their parameters given patient-specific data remain a challenge. Traditional physics-based modeling approaches are computationally costly and often neglect the inherent structural errors in these models due to model simplifications and assumptions. Modern deep learning approaches,… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  43. A Novel Mutual Insurance Model for Hedging Against Cyber Risks in Power Systems Deploying Smart Technologies

    Authors: Pikkin Lau, Lingfeng Wang, Wei Wei, Zhaoxi Liu, Chee-Wooi Ten

    Abstract: In this paper, a novel cyber-insurance model design is proposed based on system risk evaluation with smart technology applications. The cyber insurance policy for power systems is tailored via cyber risk modeling, reliability impact analysis, and insurance premium calculation. A stochastic Epidemic Network Model is developed to evaluate the cyber risk by propagating cyberattacks among graphical vu… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: Power system reliability, cyber-insurance, power system security, cyber-physical systems, cyber risk modeling, actuarial design, tail risk

    Journal ref: in IEEE Transactions on Power Systems, vol. 38, no. 1, pp. 630-642, Jan. 2023

  44. arXiv:2403.09076  [pdf, ps, other

    eess.SY

    Chaotic Masking Protocol for Secure Communication and Attack Detection in Remote Estimation of Cyber-Physical Systems

    Authors: Tao Chen, Andreu Cecilia, Daniele Astolfi, Lei Wang, Zhitao Liu, Hongye Su

    Abstract: In remote estimation of cyber-physical systems (CPSs), sensor measurements transmitted through network may be attacked by adversaries, leading to leakage risk of privacy (e.g., the system state), and/or failure of the remote estimator. To deal with this problem, a chaotic masking protocol is proposed in this paper to secure the sensor measurements transmission. In detail, at the plant side, a chao… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures

  45. arXiv:2403.08778  [pdf

    cs.CV cs.GR eess.IV

    Faster Projected GAN: Towards Faster Few-Shot Image Generation

    Authors: Chuang Wang, Zheng** Li, Yuwen Hao, Lijun Wang, Xiaoxue Li

    Abstract: In order to solve the problems of long training time, large consumption of computing resources and huge parameter amount of GAN network in image generation, this paper proposes an improved GAN network model, which is named Faster Projected GAN, based on Projected GAN. The proposed network is mainly focuses on the improvement of generator of Projected GAN. By introducing depth separable convolution… ▽ More

    Submitted 23 January, 2024; originally announced March 2024.

    Comments: 9 pages,7 figures,4 tables

  46. arXiv:2403.08162  [pdf, other

    eess.IV cs.CV cs.LG

    Iterative Learning for Joint Image Denoising and Motion Artifact Correction of 3D Brain MRI

    Authors: Lintao Zhang, Mengqi Wu, Lihong Wang, David C. Steffens, Guy G. Potter, Mingxia Liu

    Abstract: Image noise and motion artifacts greatly affect the quality of brain MRI and negatively influence downstream medical image analysis. Previous studies often focus on 2D methods that process each volumetric MR image slice-by-slice, thus losing important 3D anatomical information. Additionally, these studies generally treat image denoising and artifact correction as two standalone tasks, without cons… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  47. arXiv:2403.07364  [pdf, other

    eess.IV

    Hybrid Kinetics Embedding Framework for Dynamic PET Reconstruction

    Authors: Yubo Ye, Huafeng Liu, Linwei Wang

    Abstract: In dynamic positron emission tomography (PET) reconstruction, the importance of leveraging the temporal dependence of the data has been well appreciated. Current deep-learning solutions can be categorized in two groups in the way the temporal dynamics is modeled: data-driven approaches use spatiotemporal neural networks to learn the temporal dynamics of tracer kinetics from data, which relies heav… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Under Review

  48. arXiv:2403.06901  [pdf, other

    eess.IV cs.AI cs.LG

    LIBR+: Improving Intraoperative Liver Registration by Learning the Residual of Biomechanics-Based Deformable Registration

    Authors: Dingrong Wang, Soheil Azadvar, Jon Heiselman, Xiajun Jiang, Michael Miga, Linwei Wang

    Abstract: The surgical environment imposes unique challenges to the intraoperative registration of organ shapes to their preoperatively-imaged geometry. Biomechanical model-based registration remains popular, while deep learning solutions remain limited due to the sparsity and variability of intraoperative measurements and the limited ground-truth deformation of an organ that can be obtained during the surg… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 12 pages, Medical Image Computing and Computer Assisted Intervention 2024

  49. arXiv:2403.05820  [pdf, other

    cs.SD cs.CL eess.AS

    An Audio-textual Diffusion Model For Converting Speech Signals Into Ultrasound Tongue Imaging Data

    Authors: Yudong Yang, Rongfeng Su, Xiaokang Liu, Nan Yan, Lan Wang

    Abstract: Acoustic-to-articulatory inversion (AAI) is to convert audio into articulator movements, such as ultrasound tongue imaging (UTI) data. An issue of existing AAI methods is only using the personalized acoustic information to derive the general patterns of tongue motions, and thus the quality of generated UTI data is limited. To address this issue, this paper proposes an audio-textual diffusion model… ▽ More

    Submitted 12 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: ICASSP2024 Accept

  50. arXiv:2403.01435  [pdf, ps, other

    math.OC eess.SY

    Distributed Least-Squares Optimization Solvers with Differential Privacy

    Authors: Weijia Liu, Lei Wang, Fanghong Guo, Zhengguang Wu, Hongye Su

    Abstract: This paper studies the distributed least-squares optimization problem with differential privacy requirement of local cost functions, for which two differentially private distributed solvers are proposed. The first is established on the distributed gradient tracking algorithm, by appropriately perturbing the initial values and parameters that contain the privacy-sensitive data with Gaussian and tru… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.