Skip to main content

Showing 1–50 of 427 results for author: Zhou, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.17661  [pdf, other

    eess.SY

    Physics-Informed AI Inverter

    Authors: Qing Shen, Yifan Zhou, Peng Zhang, Yacov A. Shamash, Xiaochuan Luo, Bin Wang, Huanfeng Zhao, Roshan Sharma, Bo Chen

    Abstract: This letter devises an AI-Inverter that pilots the use of a physics-informed neural network (PINN) to enable AI-based electromagnetic transient simulations (EMT) of grid-forming inverters. The contributions are threefold: (1) A PINN-enabled AI-Inverter is formulated; (2) An enhanced learning strategy, balanced-adaptive PINN, is devised; (3) extensive validations and comparative analysis of the acc… ▽ More

    Submitted 1 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2406.16326  [pdf, other

    eess.AS

    RefXVC: Cross-Lingual Voice Conversion with Enhanced Reference Leveraging

    Authors: Mingyang Zhang, Yi Zhou, Yi Ren, Chen Zhang, Xiang Yin, Haizhou Li

    Abstract: This paper proposes RefXVC, a method for cross-lingual voice conversion (XVC) that leverages reference information to improve conversion performance. Previous XVC works generally take an average speaker embedding to condition the speaker identity, which does not account for the changing timbre of speech that occurs with different pronunciations. To address this, our method uses both global and loc… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Manuscript under review by TASLP

  3. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  4. arXiv:2406.14696  [pdf, other

    eess.SY cs.AI

    Physically Analyzable AI-Based Nonlinear Platoon Dynamics Modeling During Traffic Oscillation: A Koopman Approach

    Authors: Kexin Tian, Haotian Shi, Yang Zhou, Sixu Li

    Abstract: Given the complexity and nonlinearity inherent in traffic dynamics within vehicular platoons, there exists a critical need for a modeling methodology with high accuracy while concurrently achieving physical analyzability. Currently, there are two predominant approaches: the physics model-based approach and the Artificial Intelligence (AI)--based approach. Knowing the facts that the physical-based… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.11175  [pdf, other

    cs.SD eess.AS

    SMRU: Split-and-Merge Recurrent-based UNet for Acoustic Echo Cancellation and Noise Suppression

    Authors: Zhihang Sun, Andong Li, Rilin Chen, Hao Zhang, Meng Yu, Yi Zhou, Dong Yu

    Abstract: The proliferation of deep neural networks has spawned the rapid development of acoustic echo cancellation and noise suppression, and plenty of prior arts have been proposed, which yield promising performance. Nevertheless, they rarely consider the deployment generality in different processing scenarios, such as edge devices, and cloud processing. To this end, this paper proposes a general model, t… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  6. arXiv:2406.10844  [pdf, other

    eess.AS cs.SD

    Multi-Scale Accent Modeling with Disentangling for Multi-Speaker Multi-Accent TTS Synthesis

    Authors: Xuehao Zhou, Mingyang Zhang, Yi Zhou, Zhizheng Wu, Haizhou Li

    Abstract: Synthesizing speech across different accents while preserving the speaker identity is essential for various real-world customer applications. However, the individual and accurate modeling of accents and speakers in a text-to-speech (TTS) system is challenging due to the complexity of accent variations and the intrinsic entanglement between the accent and speaker identity. In this paper, we present… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  7. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  8. arXiv:2406.08374  [pdf, other

    cs.CV cs.AI eess.IV

    2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction

    Authors: Tianqi Chen, Jun Hou, Yinchi Zhou, Huidong Xie, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, James S. Duncan, Chi Liu, Bo Zhou

    Abstract: Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate t… ▽ More

    Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 15 pages, 7 figures

  9. arXiv:2406.07330  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    CTC-based Non-autoregressive Textless Speech-to-Speech Translation

    Authors: Qingkai Fang, Zhengrui Ma, Yan Zhou, Min Zhang, Yang Feng

    Abstract: Direct speech-to-speech translation (S2ST) has achieved impressive translation quality, but it often faces the challenge of slow decoding due to the considerable length of speech sequences. Recently, some research has turned to non-autoregressive (NAR) models to expedite decoding, yet the translation quality typically lags behind autoregressive (AR) models significantly. In this paper, we investig… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

    ACM Class: I.2.7

  10. arXiv:2406.06872  [pdf, other

    eess.SP

    Revolutionizing Wireless Networks with Self-Supervised Learning: A Pathway to Intelligent Communications

    Authors: Zhixiang Yang, Hongyang Du, Dusit Niyato, Xudong Wang, Yu Zhou, Lei Feng, Fanqin Zhou, Wen**g Li, Xuesong Qiu

    Abstract: With the rapid proliferation of mobile devices and data, next-generation wireless communication systems face stringent requirements for ultra-low latency, ultra-high reliability, and massive connectivity. Traditional AI-driven wireless network designs, while promising, often suffer from limitations such as dependency on labeled data and poor generalization. To address these challenges, we present… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  11. arXiv:2406.05954  [pdf, other

    cs.AI cs.LG eess.SY

    Aligning Large Language Models with Representation Editing: A Control Perspective

    Authors: Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang

    Abstract: Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabi… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: fix typos

  12. arXiv:2406.02529  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    ReLUs Are Sufficient for Learning Implicit Neural Representations

    Authors: Joseph Shenouda, Yamin Zhou, Robert D. Nowak

    Abstract: Motivated by the growing theoretical understanding of neural networks that employ the Rectified Linear Unit (ReLU) as their activation function, we revisit the use of ReLU activation functions for learning implicit neural representations (INRs). Inspired by second order B-spline wavelets, we incorporate a set of simple constraints to the ReLU neurons in each layer of a deep neural network (DNN) to… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  13. arXiv:2406.02262  [pdf, other

    eess.SP

    A DAFT Based Unified Waveform Design Framework for High-Mobility Communications

    Authors: Xingyao Zhang, Haoran Yin, Yanqun Tang, Yu Zhou, Yuqing Liu, **ming Du, Yipeng Ding

    Abstract: With the increasing demand for multi-carrier communication in high-mobility scenarios, it is urgent to design new multi-carrier communication waveforms that can resist large delay-Doppler spreads. Various multi-carrier waveforms in the transform domain were proposed for the fast time-varying channels, including orthogonal time frequency space (OTFS), orthogonal chirp division multiplexing (OCDM),… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  14. arXiv:2406.01245  [pdf, other

    eess.IV

    Sparse Focus Network for Multi-Source Remote Sensing Data Classification

    Authors: Xuepeng **, Junyan Lin, Feng Gao, Lin Qi, Yang Zhou

    Abstract: Multi-source remote sensing data classification has emerged as a prominent research topic with the advancement of various sensors. Existing multi-source data classification methods are susceptible to irrelevant information interference during multi-source feature extraction and fusion. To solve this issue, we propose a sparse focus network for multi-source data classification. Sparse attention is… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE IGARSS 2024

  15. arXiv:2406.01240  [pdf, other

    eess.IV

    Arctic Sea Ice Image Super-Resolution Based on Multi-Scale Convolution and Dual-Gating Mechanism

    Authors: Zhaomin Fang, Wankun Chen, Feng Gao, Yanhai Gan, Junyu Dong, Yang Zhou

    Abstract: Arctic Sea Ice Concentration (SIC) is the ratio of ice-covered area to the total sea area of the Arctic Ocean, which is a key indicator for maritime activities. Nowadays, we often use passive microwave images to display SIC, but it has low spatial resolution, and most of the existing super-resolution methods of Arctic SIC don't take the integration of spatial and channel features into account and… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE IGARSS 2024

  16. arXiv:2406.01228  [pdf, other

    eess.IV

    LSKSANet: A Novel Architecture for Remote Sensing Image Semantic Segmentation Leveraging Large Selective Kernel and Sparse Attention Mechanism

    Authors: Miao Fu, Feng Gao, Ruzhuang Hua, Yanhai Gan, Xiaowei Zhou, Yang Zhou

    Abstract: In this paper, we proposed large selective kernel and sparse attention network (LSKSANet) for remote sensing image semantic segmentation. The LSKSANet is a lightweight network that effectively combines convolution with sparse attention mechanisms. Specifically, we design large selective kernel module to decomposing the large kernel into a series of depth-wise convolutions with progressively increa… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted at IEEE IGARSS 2024

  17. arXiv:2405.18692  [pdf, other

    cs.IT eess.SP

    Movable Antenna Empowered Downlink NOMA Systems: Power Allocation and Antenna Position Optimization

    Authors: Yufeng Zhou, Wen Chen, Qingqing Wu, Xusheng Zhu, Nan Cheng

    Abstract: This paper investigates a novel communication paradigm employing movable antennas (MAs) within a multiple-input single-output (MISO) non-orthogonal multiple access (NOMA) downlink framework, where users are equipped with MAs. Initially, leveraging the far-field response, we delineate the channel characteristics concerning both the power allocation coefficient and positions of MAs. Subsequently, we… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  18. arXiv:2405.17004  [pdf, other

    cs.CV eess.IV

    Efficient Visual Fault Detection for Freight Train via Neural Architecture Search with Data Volume Robustness

    Authors: Yang Zhang, Mingying Li, Huilin Pan, Moyun Liu, Yang Zhou

    Abstract: Deep learning-based fault detection methods have achieved significant success. In visual fault detection of freight trains, there exists a large characteristic difference between inter-class components (scale variance) but intra-class on the contrary, which entails scale-awareness for detectors. Moreover, the design of task-specific networks heavily relies on human expertise. As a consequence, neu… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 11 pages, 8 figures

  19. arXiv:2405.15831  [pdf, other

    eess.SY cs.AI cs.LG

    Transmission Interface Power Flow Adjustment: A Deep Reinforcement Learning Approach based on Multi-task Attribution Map

    Authors: Shunyu Liu, Wei Luo, Yanzhen Zhou, Kaixuan Chen, Quan Zhang, Huating Xu, Qinglai Guo, Mingli Song

    Abstract: Transmission interface power flow adjustment is a critical measure to ensure the security and economy operation of power systems. However, conventional model-based adjustment schemes are limited by the increasing variations and uncertainties occur in power systems, where the adjustment problems of different transmission interfaces are often treated as several independent tasks, ignoring their coup… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Transactions on Power Systems

  20. arXiv:2405.15413  [pdf, other

    eess.IV cs.CV cs.IT

    MambaVC: Learned Visual Compression with Selective State Spaces

    Authors: Shiyu Qin, **peng Wang, Yimin Zhou, Bin Chen, Tianci Luo, Baoyi An, Tao Dai, Shutao Xia, Yaowei Wang

    Abstract: Learned visual compression is an important and active task in multimedia. Existing approaches have explored various CNN- and Transformer-based designs to model content distribution and eliminate redundancy, where balancing efficacy (i.e., rate-distortion trade-off) and efficiency remains a challenge. Recently, state-space models (SSMs) have shown promise due to their long-range modeling capacity a… ▽ More

    Submitted 28 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 17pages,15 figures

  21. arXiv:2405.14802  [pdf, other

    eess.IV cs.CV

    Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation

    Authors: Hongxu Jiang, Muhammad Imran, Linhai Ma, Teng Zhang, Yuyin Zhou, Muxuan Liang, Kuang Gong, Wei Shao

    Abstract: Denoising diffusion probabilistic models (DDPMs) have achieved unprecedented success in computer vision. However, they remain underutilized in medical imaging, a field crucial for disease diagnosis and treatment planning. This is primarily due to the high computational cost associated with (1) the use of large number of time steps (e.g., 1,000) in diffusion processes and (2) the increased dimensio… ▽ More

    Submitted 23 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  22. arXiv:2405.12996  [pdf, other

    eess.IV

    Dose-aware Diffusion Model for 3D Low-dose PET: Multi-institutional Validation with Reader Study and Real Low-dose Data

    Authors: Huidong Xie, Weijie Gan, Bo Zhou, Ming-Kai Chen, Michal Kulon, Annemarie Boustani, Benjamin A. Spencer, Reimund Bayerlein, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, Yinchi Zhou, Hui Liu, Liang Guo, Hongyu An, Ulugbek S. Kamilov, Hanzhong Wang, Biao Li, Axel Rominger, Kuangyu Shi, Ge Wang, Ramsey D. Badawi, Chi Liu

    Abstract: As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizabi… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 16 Pages, 15 Figures, 4 Tables. Paper under review. arXiv admin note: substantial text overlap with arXiv:2311.04248

  23. arXiv:2405.12223  [pdf, other

    eess.IV cs.CV

    Cascaded Multi-path Shortcut Diffusion Model for Medical Image Translation

    Authors: Yinchi Zhou, Tianqi Chen, Jun Hou, Huidong Xie, Nicha C. Dvornek, S. Kevin Zhou, David L. Wilson, James S. Duncan, Chi Liu, Bo Zhou

    Abstract: Image-to-image translation is a vital component in medical imaging processing, with many uses in a wide range of imaging modalities and clinical scenarios. Previous methods include Generative Adversarial Networks (GANs) and Diffusion Models (DMs), which offer realism but suffer from instability and lack uncertainty estimation. Even though both GAN and DM methods have individually exhibited their c… ▽ More

    Submitted 5 April, 2024; originally announced May 2024.

    Comments: 15 pages, 5 figures

  24. arXiv:2405.10570  [pdf

    eess.IV cs.AI

    Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI

    Authors: Yirong Zhou, Chengyan Wang, Mengtian Lu, Kunyuan Guo, Zi Wang, Dan Ruan, Rui Guo, Peijun Zhao, Jianhua Wang, Naiming Wu, Jianzhong Lin, Yinyin Chen, Hang **, Lianxin Xie, Lilan Wu, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Xiaobo Qu

    Abstract: In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features… ▽ More

    Submitted 29 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 8 figures, 6 tables

  25. arXiv:2405.05848  [pdf, other

    eess.SY

    Distributed Estimation for a 3-D Moving Target in Quaternion Space with Unknown Correlation

    Authors: Yizhi Zhou, Xufan Liu, Xuan Wang

    Abstract: For distributed estimations in a sensor network, the consistency and accuracy of an estimator are greatly affected by the unknown correlations between individual estimates. An inconsistent or too conservative estimate may degrade the estimation performance and even cause divergence of the estimator. Cooperative estimation methods based on Inverse Covariance Intersection (ICI) can utilize a network… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  26. arXiv:2405.05521  [pdf, other

    cs.LG eess.SY

    Machine Learning for Scalable and Optimal Load Shedding Under Power System Contingency

    Authors: Yuqi Zhou, Hao Zhu

    Abstract: Prompt and effective corrective actions in response to unexpected contingencies are crucial for improving power system resilience and preventing cascading blackouts. The optimal load shedding (OLS) accounting for network limits has the potential to address the diverse system-wide impacts of contingency scenarios as compared to traditional local schemes. However, due to the fast cascading propagati… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  27. arXiv:2405.04000  [pdf, other

    cs.RO eess.SY

    Distributed Invariant Kalman Filter for Cooperative Localization using Matrix Lie Groups

    Authors: Yizhi Zhou, Yufan Liu, Pengxiang Zhu, Xuan Wang

    Abstract: This paper studies the problem of Cooperative Localization (CL) for multi-robot systems, where a group of mobile robots jointly localize themselves by using measurements from onboard sensors and shared information from other robots. We propose a novel distributed invariant Kalman Filter (DInEKF) based on the Lie group theory, to solve the CL problem in a 3-D environment. Unlike the standard EKF wh… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  28. arXiv:2405.03141  [pdf, other

    eess.IV cs.AI cs.CV physics.med-ph

    Automatic Ultrasound Curve Angle Measurement via Affinity Clustering for Adolescent Idiopathic Scoliosis Evaluation

    Authors: Yihao Zhou, Timothy Tin-Yan Lee, Kelly Ka-Lee Lai, Chonglin Wu, Hin Ting Lau, De Yang, Chui-Yi Chan, Winnie Chiu-Wing Chu, Jack Chun-Yiu Cheng, Tsz-** Lam, Yong-** Zheng

    Abstract: The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of mea… ▽ More

    Submitted 6 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  29. arXiv:2405.02567  [pdf, other

    eess.SP

    TiRE-GAN: Task-Incentivized Generative Learning Models for Radiomap Estimation with Radio Propagation Model

    Authors: Yueling Zhou, Achintha Wijesinghe, Songyang Zhang, Zhi Ding

    Abstract: Enriching geometric information on radio frequency (RF) signal power distribution in wireless communication systems, the radiomap has become an essential tool for resource allocation and network management. Usually, a dense radiomap is reconstructed from sparse observations collected by deployed sensors or mobile devices, which makes the radiomap estimation an urgent challenge. To leverage both ph… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  30. arXiv:2405.01115  [pdf

    cs.RO eess.SY

    A New Self-Alignment Method without Solving Wahba Problem for SINS in Autonomous Vehicles

    Authors: Hongliang Zhang, Yilan Zhou, Lei Wang, Tengchao Huang

    Abstract: Initial alignment is one of the key technologies in strapdown inertial navigation system (SINS) to provide initial state information for vehicle attitude and navigation. For some situations, such as the attitude heading reference system, the position is not necessarily required or even available, then the self-alignment that does not rely on any external aid becomes very necessary. This study pres… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  31. arXiv:2405.00130  [pdf, other

    eess.IV cs.CV cs.LG

    A Flexible 2.5D Medical Image Segmentation Approach with In-Slice and Cross-Slice Attention

    Authors: Amarjeet Kumar, Hongxu Jiang, Muhammad Imran, Cyndi Valdes, Gabriela Leon, Dahyun Kang, Parvathi Nataraj, Yuyin Zhou, Michael D. Weiss, Wei Shao

    Abstract: Deep learning has become the de facto method for medical image segmentation, with 3D segmentation models excelling in capturing complex 3D structures and 2D models offering high computational efficiency. However, segmenting 2.5D images, which have high in-plane but low through-plane resolution, is a relatively unexplored challenge. While applying 2D models to individual slices of a 2.5D image is f… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  32. arXiv:2404.19265  [pdf, other

    cs.CV eess.IV

    Map** New Realities: Ground Truth Image Creation with Pix2Pix Image-to-Image Translation

    Authors: Zhenglin Li, Bo Guan, Yuanzhou Wei, Yiming Zhou, **gyu Zhang, **xin Xu

    Abstract: Generative Adversarial Networks (GANs) have significantly advanced image processing, with Pix2Pix being a notable framework for image-to-image translation. This paper explores a novel application of Pix2Pix to transform abstract map images into realistic ground truth images, addressing the scarcity of such images crucial for domains like urban planning and autonomous vehicle training. We detail th… ▽ More

    Submitted 30 April, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  33. arXiv:2404.18820  [pdf, other

    eess.IV cs.CV

    Towards Extreme Image Compression with Latent Feature Guidance and Diffusion Prior

    Authors: Zhiyuan Li, Yanhui Zhou, Hao Wei, Chenyang Ge, **gwen Jiang

    Abstract: Image compression at extremely low bitrates (below 0.1 bits per pixel (bpp)) is a significant challenge due to substantial information loss. In this work, we propose a novel two-stage extreme image compression framework that exploits the powerful generative capability of pre-trained diffusion models to achieve realistic image reconstruction at extremely low bitrates. In the first stage, we treat t… ▽ More

    Submitted 13 June, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE TCSVT

  34. arXiv:2404.17917  [pdf, other

    cs.CV eess.IV

    EvaNet: Elevation-Guided Flood Extent Map** on Earth Imagery

    Authors: Mirza Tanzim Sami, Da Yan, Saugat Adhikari, Lyuheng Yuan, Jiao Han, Zhe Jiang, Jalal Khalil, Yang Zhou

    Abstract: Accurate and timely map** of flood extent from high-resolution satellite imagery plays a crucial role in disaster management such as damage assessment and relief activities. However, current state-of-the-art solutions are based on U-Net, which can-not segment the flood pixels accurately due to the ambiguous pixels (e.g., tree canopies, clouds) that prevent a direct judgement from only the spectr… ▽ More

    Submitted 12 May, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted at the International Joint Conference on Artificial Intelligence (IJCAI, 2024)

  35. arXiv:2404.17519  [pdf, other

    cs.IT eess.SP

    Interpreting Deepcode, a learned feedback code

    Authors: Yingyao Zhou, Natasha Devroye, Gyorgy Turan, Milos Zefran

    Abstract: Deep learning methods have recently been used to construct non-linear codes for the additive white Gaussian noise (AWGN) channel with feedback. However, there is limited understanding of how these black-box-like codes with many learned parameters use feedback. This study aims to uncover the fundamental principles underlying the first deep-learned feedback code, known as Deepcode, which is based on… ▽ More

    Submitted 4 June, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted to the 2024 ISIT conference

  36. arXiv:2404.16727  [pdf, other

    eess.SY

    Learning-Based Efficient Approximation of Data-enabled Predictive Control

    Authors: Yihan Zhou, Yiwen Lu, Zishuo Li, Jiaqi Yan, Yilin Mo

    Abstract: Data-Enabled Predictive Control (DeePC) bypasses the need for system identification by directly leveraging raw data to formulate optimal control policies. However, the size of the optimization problem in DeePC grows linearly with respect to the data size, which prohibits its application due to high computational costs. In this paper, we propose an efficient approximation of DeePC, whose size is in… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  37. arXiv:2404.16619  [pdf, other

    cs.SD eess.AS

    The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge

    Authors: Yixuan Zhou, Shuoyi Zhou, Shun Lei, Zhiyong Wu, Menglin Wu

    Abstract: This paper presents the multi-speaker multi-lingual few-shot voice cloning system developed by THU-HCSI team for LIMMITS'24 Challenge. To achieve high speaker similarity and naturalness in both mono-lingual and cross-lingual scenarios, we build the system upon YourTTS and add several enhancements. For further improving speaker similarity and speech quality, we introduce speaker-aware text encoder… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted in Grand Challenge of ICASSP 2024

  38. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  39. arXiv:2404.15312  [pdf, other

    eess.SP cs.CV

    Realtime Person Identification via Gait Analysis

    Authors: Shanmuga Venkatachalam, Harideep Nair, Prabhu Vellaisamy, Yongqi Zhou, Ziad Youssfi, John Paul Shen

    Abstract: Each person has a unique gait, i.e., walking style, that can be used as a biometric for personal identification. Recent works have demonstrated effective gait recognition using deep neural networks, however most of these works predominantly focus on classification accuracy rather than model efficiency. In order to perform gait recognition using wearable devices on the edge, it is imperative to dev… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  40. arXiv:2404.13279  [pdf, other

    cs.CR eess.IV eess.SP

    Backdoor Attacks and Defenses on Semantic-Symbol Reconstruction in Semantic Communications

    Authors: Yuan Zhou, Rose Qingyang Hu, Yi Qian

    Abstract: Semantic communication is of crucial importance for the next-generation wireless communication networks. The existing works have developed semantic communication frameworks based on deep learning. However, systems powered by deep learning are vulnerable to threats such as backdoor attacks and adversarial attacks. This paper delves into backdoor attacks targeting deep learning-enabled semantic comm… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by IEEE ICC 2024

  41. arXiv:2404.13097  [pdf, other

    eess.IV cs.CV cs.LG q-bio.QM

    DISC: Latent Diffusion Models with Self-Distillation from Separated Conditions for Prostate Cancer Grading

    Authors: Man M. Ho, Elham Ghelichkhan, Yosep Chong, Yufei Zhou, Beatrice Knudsen, Tolga Tasdizen

    Abstract: Latent Diffusion Models (LDMs) can generate high-fidelity images from noise, offering a promising approach for augmenting histopathology images for training cancer grading models. While previous works successfully generated high-fidelity histopathology images using LDMs, the generation of image tiles to improve prostate cancer grading has not yet been explored. Additionally, LDMs face challenges i… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Abstract accepted for ISBI 2024. Extended version to be presented at SynData4CV @ CVPR 2024. See more at https://minhmanho.github.io/disc/

  42. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  43. arXiv:2404.09003  [pdf, other

    cs.CV eess.IV

    THQA: A Perceptual Quality Assessment Database for Talking Heads

    Authors: Yingjie Zhou, Zicheng Zhang, Wei Sun, Xiaohong Liu, Xiongkuo Min, Zhihua Wang, Xiao-** Zhang, Guangtao Zhai

    Abstract: In the realm of media technology, digital humans have gained prominence due to rapid advancements in computer technology. However, the manual modeling and control required for the majority of digital humans pose significant obstacles to efficient development. The speech-driven methods offer a novel avenue for manipulating the mouth shape and expressions of digital humans. Despite the proliferation… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  44. arXiv:2404.05832  [pdf, other

    cs.HC eess.SY

    Human-Machine Interaction in Automated Vehicles: Reducing Voluntary Driver Intervention

    Authors: Xinzhi Zhong, Yang Zhou, Varshini Kamaraj, Zhenhao Zhou, Wissam Kontar, Dan Negrut, John D. Lee, Soyoung Ahn

    Abstract: This paper develops a novel car-following control method to reduce voluntary driver interventions and improve traffic stability in Automated Vehicles (AVs). Through a combination of experimental and empirical analysis, we show how voluntary driver interventions can instigate substantial traffic disturbances that are amplified along the traffic upstream. Motivated by these findings, we present a fr… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  45. arXiv:2404.05600  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechAlign: Aligning Speech Generation to Human Preferences

    Authors: Dong Zhang, Zhaowei Li, Shimin Li, Xin Zhang, Pengyu Wang, Yaqian Zhou, Xipeng Qiu

    Abstract: Speech language models have significantly advanced in generating realistic speech, with neural codec language models standing out. However, the integration of human feedback to align speech outputs to human preferences is often neglected. This paper addresses this gap by first analyzing the distribution gap in codec language models, highlighting how it leads to discrepancies between the training a… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Work in progress

  46. arXiv:2404.03449  [pdf, other

    eess.SP

    Integrating AI in NDE: Techniques, Trends, and Further Directions

    Authors: Eduardo Pérez, Cemil Emre Ardic, Ozan Çakıroğlu, Kevin Jacob, Sayako Kodera, Luca Pompa, Mohamad Rachid, Han Wang, Yiming Zhou, Cyril Zimmer, Florian Römer, Ahmad Osman

    Abstract: The digital transformation is fundamentally changing our industries, affecting planning, execution as well as monitoring of production processes in a wide range of application fields. With product line-ups becoming more and more versatile and diverse, the necessary inspection and monitoring sparks significant novel requirements on the corresponding Nondestructive Evaluation (NDE) systems. The esta… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  47. arXiv:2404.01875  [pdf, other

    eess.SP cs.DC cs.IT cs.LG

    Satellite Federated Edge Learning: Architecture Design and Convergence Analysis

    Authors: Yuanming Shi, Li Zeng, **gyang Zhu, Yong Zhou, Chunxiao Jiang, Khaled B. Letaief

    Abstract: The proliferation of low-earth-orbit (LEO) satellite networks leads to the generation of vast volumes of remote sensing data which is traditionally transferred to the ground server for centralized processing, raising privacy and bandwidth concerns. Federated edge learning (FEEL), as a distributed machine learning approach, has the potential to address these challenges by sharing only model paramet… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 16 pages, 15 figures

  48. arXiv:2404.01088  [pdf, other

    eess.SP

    GI-Free Pilot-Aided Channel Estimation for Affine Frequency Division Multiplexing Systems

    Authors: Yu Zhou, Haoran Yin, Nanhao Zhou, Yanqun Tang, Xiaoying Zhang, Weijie Yuan

    Abstract: The recently developed affine frequency division multiplexing (AFDM) can achieve full diversity in doubly selective channels, providing a comprehensive sparse representation of the delay-Doppler domain channel. Thus, accurate channel estimation is feasible by using just one pilot symbol. However, traditional AFDM channel estimation schemes necessitate the use of guard intervals (GI) to mitigate da… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  49. arXiv:2403.17184  [pdf, ps, other

    math.OC eess.SY

    Robust Finite-time Stabilization of Linear Systems with Limited State Quantization

    Authors: Yu Zhou, Andrey Polyakov, Gang Zheng

    Abstract: This paper investigates the robust asymptotic stabilization of a linear time-invariant (LTI) system by a static feedback with a static state quantization. It is shown that the controllable LTI system can be stabilized to zero in a finite time by means of a nonlinear feedback with a quantizer having a limited (finite) number of values (quantization seeds) even when all parameters of the controller… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  50. arXiv:2403.16830  [pdf, other

    cs.NI eess.SP

    Exploring Communication Technologies, Standards, and Challenges in Electrified Vehicle Charging

    Authors: Xiang Ma, Yuan Zhou, Hanwen Zhang, Qun Wang, Haijian Sun, Hongjie Wang, Rose Qingyang Hu

    Abstract: As public awareness of environmental protection continues to grow, the trend of integrating more electric vehicles (EVs) into the transportation sector is rising. Unlike conventional internal combustion engine (ICE) vehicles, EVs can minimize carbon emissions and potentially achieve autonomous driving. However, several obstacles hinder the widespread adoption of EVs, such as their constrained driv… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: submitted to IET Communication as a survey paper