Skip to main content

Showing 1–50 of 89 results for author: Zhao, W

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18467  [pdf, other

    eess.SY math.OC

    Algebraic Connectivity Control and Maintenance in Multi-Agent Networks under Attack

    Authors: Wenjie Zhao, Diego Deplano, Zhiwu Li, Alessandro Giua, Mauro Franceschelli

    Abstract: This paper studies the problem of increasing the connectivity of an ad-hoc peer-to-peer network subject to cyber-attacks targeting the agents in the network. The adopted strategy involves the design of local interaction rules for the agents to locally modify the graph topology by adding and removing links with neighbors. Two distributed protocols are presented to boost the algebraic connectivity o… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  3. arXiv:2406.03902  [pdf, other

    eess.IV cs.CV

    C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction

    Authors: Yiqun Lin, Jiewen Yang, Hualiang Wang, Xinpeng Ding, Wei Zhao, Xiaomeng Li

    Abstract: Cone beam computed tomography (CBCT) is an important imaging technology widely used in medical scenarios, such as diagnosis and preoperative planning. Using fewer projection views to reconstruct CT, also known as sparse-view reconstruction, can reduce ionizing radiation and further benefit interventional radiology. Compared with sparse-view reconstruction for traditional parallel/fan-beam CT, CBCT… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024

  4. arXiv:2406.02438  [pdf, other

    eess.AS cs.MM cs.SD

    CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

    Authors: Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, **g Guo, Tomoki Toda, Zhiyao Duan

    Abstract: Recent singing voice synthesis and conversion advancements necessitate robust singing voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to limited controllability, diversity in deepfake methods, and licensing restrictions. Addressing these gaps, we introduce CtrSVDD, a large-scale, diverse collection of bonafide and deepfake singing vocals. These vocals are synthesi… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  5. arXiv:2406.02166  [pdf, other

    cs.SD cs.CL eess.AS

    Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision

    Authors: Saierdaer Yusuyin, Te Ma, Hao Huang, Wenbo Zhao, Zhijian Ou

    Abstract: There exist three approaches for multilingual and crosslingual automatic speech recognition (MCL-ASR) - supervised pre-training with phonetic or graphemic transcription, and self-supervised pre-training. We find that pre-training with phonetic supervision has been underappreciated so far for MCL-ASR, while conceptually it is more advantageous for information sharing between different languages. Th… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  6. Multiscale Spatio-Temporal Enhanced Short-term Load Forecasting of Electric Vehicle Charging Stations

    Authors: Zongbao Zhang, Jiao Hao, Wenmeng Zhao, Yan Liu, Yaohui Huang, Xinhang Luo

    Abstract: The rapid expansion of electric vehicles (EVs) has rendered the load forecasting of electric vehicle charging stations (EVCS) increasingly critical. The primary challenge in achieving precise load forecasting for EVCS lies in accounting for the nonlinear of charging behaviors, the spatial interactions among different stations, and the intricate temporal variations in usage patterns. To address the… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 5 pages, 1 figure, AEEES 2024

  7. arXiv:2405.11380  [pdf, other

    cs.RO cs.AI eess.SY

    Meta-Control: Automatic Model-based Control Synthesis for Heterogeneous Robot Skills

    Authors: Tianhao Wei, Liqian Ma, Rui Chen, Weiye Zhao, Changliu Liu

    Abstract: The requirements for real-world manipulation tasks are diverse and often conflicting; some tasks require precise motion while others require force compliance; some tasks require avoidance of certain regions, while others require convergence to certain states. Satisfying these varied requirements with a fixed state-action representation and control strategy is challenging, impeding the development… ▽ More

    Submitted 7 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

  8. arXiv:2405.07483  [pdf, other

    math.OC eess.SY

    A Class of Convex Optimization-Based Recursive Algorithms for Identification of Stochastic Systems

    Authors: Mingxia Ding, Wenxiao Zhao, Tianshi Chen

    Abstract: Focusing on identification, this paper develops a class of convex optimization-based criteria and correspondingly the recursive algorithms to estimate the parameter vector $θ^{*}$ of a stochastic dynamic system. Not only do the criteria include the classical least-squares estimator but also the $L_l=|\cdot|^l, l\geq 1$, the Huber, the Log-cosh, and the Quantile costs as special cases. First, we pr… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  9. arXiv:2404.12887  [pdf, other

    cs.CV eess.IV

    3D Multi-frame Fusion for Video Stabilization

    Authors: Zhan Peng, Xinyi Ye, Weiyue Zhao, Tianqi Liu, Huiqiang Sun, Baopu Li, Zhiguo Cao

    Abstract: In this paper, we present RStab, a novel framework for video stabilization that integrates 3D multi-frame fusion through volume rendering. Departing from conventional methods, we introduce a 3D multi-frame perspective to generate stabilized images, addressing the challenge of full-frame generation while preserving structure. The core of our approach lies in Stabilized Rendering (SR), a volume rend… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  10. arXiv:2403.14968  [pdf, other

    eess.SY

    Real-time Safety Index Adaptation for Parameter-varying Systems via Determinant Gradient Ascend

    Authors: Rui Chen, Weiye Zhao, Ruixuan Liu, Weiyang Zhang, Changliu Liu

    Abstract: Safety Index Synthesis (SIS) is critical for deriving safe control laws. Recent works propose to synthesize a safety index (SI) via nonlinear programming and derive a safe control law such that the system 1) achieves forward invariant (FI) with some safe set and 2) guarantees finite time convergence (FTC) to that safe set. However, real-world system dynamics can vary during run-time, making the co… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted to American Control Conference (ACC) 2024

  11. arXiv:2403.06998  [pdf

    eess.SP cs.HC cs.NE

    High-speed Low-consumption sEMG-based Transient-state micro-Gesture Recognition

    Authors: Youfang Han, Wei Zhao, Xiang** Chen, Xin Meng

    Abstract: Gesture recognition on wearable devices is extensively applied in human-computer interaction. Electromyography (EMG) has been used in many gesture recognition systems for its rapid perception of muscle signals. However, analyzing EMG signals on devices, like smart wristbands, usually needs inference models to have high performances, such as low inference latency, low power consumption, and low mem… ▽ More

    Submitted 12 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  12. arXiv:2403.01013  [pdf

    eess.SY

    A Holistic Power Optimization Approach for Microgrid Control Based on Deep Reinforcement Learning

    Authors: Fulong Yao, Wanqing Zhao, Matthew Forshaw, Yang Song

    Abstract: The global energy landscape is undergoing a transformation towards decarbonization, sustainability, and cost-efficiency. In this transition, microgrid systems integrated with renewable energy sources (RES) and energy storage systems (ESS) have emerged as a crucial component. However, optimizing the operational control of such an integrated energy system lacks a holistic view of multiple environmen… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  13. arXiv:2402.11419  [pdf, other

    eess.SP

    A Self-Healing Magnetic-Array-Type Current Sensor with Data-Driven Identification of Abnormal Magnetic Measurement Units

    Authors: Xiaohu Liu, Wei Zhao, Kang Ma, Jian Liu, Lisha Peng, Songling Huang, Shisong Li

    Abstract: Magnetic-array-type current sensors have garnered increasing popularity owing to their notable advantages, including broadband functionality, a large dynamic range, cost-effectiveness, and compact dimensions. However, the susceptibility of the measurement error of one or more magnetic measurement units (MMUs) within the current sensor to drift significantly from the nominal value due to environmen… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Comments: 11 pages, 10 figures

  14. arXiv:2402.09561  [pdf, other

    cs.CV eess.IV

    Patch-based adaptive temporal filter and residual evaluation

    Authors: Weiying Zhao, Paul Riot, Charles-Alban Deledalle, Henri Maître, Jean-Marie Nicolas, Florence Tupin

    Abstract: In coherent imaging systems, speckle is a signal-dependent noise that visually strongly degrades images' appearance. A huge amount of SAR data has been acquired from different sensors with different wavelengths, resolutions, incidences and polarizations. We extend the nonlocal filtering strategy to the temporal domain and propose a patch-based adaptive temporal filter (PATF) to take advantage of w… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  15. arXiv:2401.03396  [pdf

    eess.SP

    A Closed-loop Brain-Machine Interface SoC Featuring a 0.2$μ$J/class Multiplexer Based Neural Network

    Authors: Chao Zhang, Yongxiang Guo, Dawid Sheng, Zhixiong Ma, Chao Sun, Yuwei Zhang, Wenxin Zhao, Fenyan Zhang, Tongfei Wang, Xing Sheng, Milin Zhang

    Abstract: This work presents the first fabricated electrophysiology-optogenetic closed-loop bidirectional brain-machine interface (CL-BBMI) system-on-chip (SoC) with electrical neural signal recording, on-chip sleep staging and optogenetic stimulation. The first multiplexer with static assignment based table lookup solution (MUXnet) for multiplier-free NN processor was proposed. A state-of-the-art average a… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 2 pages, 6 figures. Accepted by IEEE Custom Integrated Circuits Conference (CICC) 2024. The codes for the MUXnet (constructing neural networks using multiplexers instead of multipliers) will be open-sourced after the Journal version of this work is accepted

  16. arXiv:2401.02961  [pdf, other

    cs.LG cs.CV eess.IV physics.optics

    A Surrogate-Assisted Extended Generative Adversarial Network for Parameter Optimization in Free-Form Metasurface Design

    Authors: Manna Dai, Yang Jiang, Feng Yang, Joyjit Chattoraj, Yingzhi Xia, Xinxing Xu, Weijiang Zhao, My Ha Dao, Yong Liu

    Abstract: Metasurfaces have widespread applications in fifth-generation (5G) microwave communication. Among the metasurface family, free-form metasurfaces excel in achieving intricate spectral responses compared to regular-shape counterparts. However, conventional numerical methods for free-form metasurfaces are time-consuming and demand specialized expertise. Alternatively, recent studies demonstrate that… ▽ More

    Submitted 18 October, 2023; originally announced January 2024.

  17. arXiv:2312.01479  [pdf, other

    cs.SD cs.LG eess.AS

    OpenVoice: Versatile Instant Voice Cloning

    Authors: Zengyi Qin, Wenliang Zhao, Xumin Yu, Xin Sun

    Abstract: We introduce OpenVoice, a versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. OpenVoice represents a significant advancement in addressing the following open challenges in the field: 1) Flexible Voice Style Control. OpenVoice enables granular control over voice styles, including emotio… ▽ More

    Submitted 2 January, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

    Comments: Technical Report

  18. Spectral-wise Implicit Neural Representation for Hyperspectral Image Reconstruction

    Authors: Huan Chen, Wangcai Zhao, Tingfa Xu, Shiyun Zhou, Peifu Liu, Jianan Li

    Abstract: Coded Aperture Snapshot Spectral Imaging (CASSI) reconstruction aims to recover the 3D spatial-spectral signal from 2D measurement. Existing methods for reconstructing Hyperspectral Image (HSI) typically involve learning map**s from a 2D compressed image to a predetermined set of discrete spectral bands. However, this approach overlooks the inherent continuity of the spectral information. In thi… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

    Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology, to be published

  19. arXiv:2311.12840  [pdf, other

    cs.CV cs.AI eess.IV

    Wafer Map Defect Patterns Semi-Supervised Classification Using Latent Vector Representation

    Authors: Qiyu Wei, Wei Zhao, Xiaoyan Zheng, Zeng Zeng

    Abstract: As the globalization of semiconductor design and manufacturing processes continues, the demand for defect detection during integrated circuit fabrication stages is becoming increasingly critical, playing a significant role in enhancing the yield of semiconductor products. Traditional wafer map defect pattern detection methods involve manual inspection using electron microscopes to collect sample i… ▽ More

    Submitted 6 October, 2023; originally announced November 2023.

    Comments: 6 pages, 2 figures, CIS confernece

  20. arXiv:2311.06769  [pdf, other

    eess.SY cs.LG

    Learning Predictive Safety Filter via Decomposition of Robust Invariant Set

    Authors: Zeyang Li, Chuxiong Hu, Weiye Zhao, Changliu Liu

    Abstract: Ensuring safety of nonlinear systems under model uncertainty and external disturbances is crucial, especially for real-world control tasks. Predictive methods such as robust model predictive control (RMPC) require solving nonconvex optimization problems online, which leads to high computational burden and poor scalability. Reinforcement learning (RL) works well with complex systems, but pays the p… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

  21. arXiv:2310.05118  [pdf, other

    cs.SD eess.AS

    VITS-based Singing Voice Conversion System with DSPGAN post-processing for SVCC2023

    Authors: Yiquan Zhou, Meng Chen, Yi Lei, Jihua Zhu, Weifeng Zhao

    Abstract: This paper presents the T02 team's system for the Singing Voice Conversion Challenge 2023 (SVCC2023). Our system entails a VITS-based SVC model, incorporating three modules: a feature extractor, a voice converter, and a post-processor. Specifically, the feature extractor provides F0 contours and extracts speaker-independent linguistic content from the input singing voice by leveraging a HuBERT mod… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU2023

  22. arXiv:2310.04369  [pdf, other

    cs.SD cs.LG eess.AS

    MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement

    Authors: Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie

    Abstract: A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios. Music source separation (MSS) models treat vocals and various accompaniment components equally, which may reduce performance compared to the model that only considers vocal enhancement. In this paper, we propose a novel multi-band temporal-freque… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  23. Continuous 3D Myocardial Motion Tracking via Echocardiography

    Authors: Chengkang Shen, Hao Zhu, You Zhou, Yu Liu, Si Yi, Lili Dong, Weipeng Zhao, David J. Brady, Xun Cao, Zhan Ma, Yi Lin

    Abstract: Myocardial motion tracking stands as an essential clinical tool in the prevention and detection of cardiovascular diseases (CVDs), the foremost cause of death globally. However, current techniques suffer from incomplete and inaccurate motion estimation of the myocardium in both spatial and temporal dimensions, hindering the early identification of myocardial dysfunction. To address these challenge… ▽ More

    Submitted 27 June, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: 18 pages, 11 figures

    Journal ref: IEEE Transactions on Medical Imaging, June 2024

  24. arXiv:2309.12406  [pdf, other

    eess.SY

    Safety Index Synthesis with State-dependent Control Space

    Authors: Rui Chen, Weiye Zhao, Changliu Liu

    Abstract: This paper introduces an approach for synthesizing feasible safety indices to derive safe control laws under state-dependent control spaces. The problem, referred to as Safety Index Synthesis (SIS), is challenging because it requires the existence of feasible control input in all states and leads to an infinite number of constraints. The proposed method leverages Positivstellensatz to formulate SI… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  25. Efficient Rotating Synthetic Aperture Radar Imaging via Robust Sparse Array Synthesis

    Authors: Wei Zhao, Cai Wen, Quan Yuan, Rong Zheng

    Abstract: Rotating Synthetic Aperture Radar (ROSAR) can generate a 360$^\circ$ image of its surrounding environment using the collected data from a single moving track. Due to its non-linear track, the Back-Projection Algorithm (BPA) is commonly used to generate SAR images in ROSAR. Despite its superior imaging performance, BPA suffers from high computation complexity, restricting its application in real-ti… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Journal ref: in IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-12, 2023, Art no. 5108612

  26. arXiv:2309.07136  [pdf, other

    eess.SP cs.AI cs.LG stat.AP

    Masked Transformer for Electrocardiogram Classification

    Authors: Ya Zhou, Xiaolin Diao, Yanni Huo, Yang Liu, Xiaohan Fan, Wei Zhao

    Abstract: Electrocardiogram (ECG) is one of the most important diagnostic tools in clinical applications. With the advent of advanced algorithms, various deep learning models have been adopted for ECG tasks. However, the potential of Transformer for ECG data has not been fully realized, despite their widespread success in computer vision and natural language processing. In this work, we present Masked Trans… ▽ More

    Submitted 22 April, 2024; v1 submitted 31 August, 2023; originally announced September 2023.

    Comments: more experimental results; more implementation details; different abstracts

  27. arXiv:2307.09005  [pdf, other

    eess.IV cs.CV

    Frequency-mixed Single-source Domain Generalization for Medical Image Segmentation

    Authors: Heng Li, Hao** Li, Wei Zhao, Huazhu Fu, Xiuyun Su, Yan Hu, Jiang Liu

    Abstract: The annotation scarcity of medical image segmentation poses challenges in collecting sufficient training data for deep learning models. Specifically, models trained on limited data may not generalize well to other unseen data domains, resulting in a domain shift issue. Consequently, domain generalization (DG) is developed to boost the performance of segmentation models on unseen domains. However,… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  28. arXiv:2307.07892  [pdf, other

    cs.CV cs.LG eess.IV

    Multitemporal SAR images change detection and visualization using RABASAR and simplified GLR

    Authors: Weiying Zhao, Charles-Alban Deledalle, Loïc Denis, Henri Maître, Jean-Marie Nicolas, Florence Tupin

    Abstract: Understanding the state of changed areas requires that precise information be given about the changes. Thus, detecting different kinds of changes is important for land surface monitoring. SAR sensors are ideal to fulfil this task, because of their all-time and all-weather capabilities, with good accuracy of the acquisition geometry and without effects of atmospheric constituents for amplitude data… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

  29. arXiv:2307.07434  [pdf, other

    cs.CV eess.IV

    Combining multitemporal optical and SAR data for LAI imputation with BiLSTM network

    Authors: W. Zhao, F. Yin, H. Ma, Q. Wu, J. Gomez-Dans, P. Lewis

    Abstract: The Leaf Area Index (LAI) is vital for predicting winter wheat yield. Acquisition of crop conditions via Sentinel-2 remote sensing images can be hindered by persistent clouds, affecting yield predictions. Synthetic Aperture Radar (SAR) provides all-weather imagery, and the ratio between its cross- and co-polarized channels (C-band) shows a high correlation with time series LAI over winter wheat re… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

  30. arXiv:2306.12153  [pdf, other

    eess.IV cs.CV

    DIAS: A Dataset and Benchmark for Intracranial Artery Segmentation in DSA sequences

    Authors: Wentao Liu, Tong Tian, Lemeng Wang, Wei** Xu, Lei Li, Haoyuan Li, Wenyi Zhao, Siyu Tian, Xipeng Pan, Huihua Yang, Feng Gao, Yiming Deng, Xin Yang, Ruisheng Su

    Abstract: The automated segmentation of Intracranial Arteries (IA) in Digital Subtraction Angiography (DSA) plays a crucial role in the quantification of vascular morphology, significantly contributing to computer-assisted stroke research and clinical practice. Current research primarily focuses on the segmentation of single-frame DSA using proprietary datasets. However, these methods face challenges due to… ▽ More

    Submitted 13 June, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

  31. arXiv:2306.07712  [pdf

    eess.SY cs.NE

    Multiple-Step Quantized Triplet STDP Implemented with Memristive Synapse

    Authors: Y. Liu, D. Wang, Z. Dong, W. Zhao

    Abstract: As an extension of the pairwise spike-timing-dependent plasticity (STDP) learning rule, the triplet STDP is provided with greater capability in characterizing the synaptic changes in the biological neural cell. In this work, a novel mixed-signal circuit scheme, called multiple-step quantized triplet STDP, is designed to provide a precise and flexible implementation of coactivation triplet STDP lea… ▽ More

    Submitted 27 August, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: 5 pages, 9 figures

  32. arXiv:2306.06379  [pdf

    cs.ET eess.SY

    Implementation of Multiple-Step Quantized STDP Based on Novel Memristive Synapses

    Authors: Y. Liu, D. Wang, Z. Dong, H. Xie, W. Zhao

    Abstract: Memristors have been widely studied as artificial synapses in neuromorphic circuits, due to their functional similarity with biological synapses, low operating power, and high integration density. In this work, a memristive synapse, composed of four memristors and two resistors, for SNN is designed and utilized for a neuron circuit implementing the robust spike-timing dependent plasticity learning… ▽ More

    Submitted 27 August, 2023; v1 submitted 10 June, 2023; originally announced June 2023.

    Comments: 10 pages, 20 figures

  33. arXiv:2305.19228  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Unsupervised Melody-to-Lyric Generation

    Authors: Yufei Tian, Anjali Narayan-Chen, Shereen Oraby, Alessandra Cervone, Gunnar Sigurdsson, Chenyang Tao, Wenbo Zhao, Yiwen Chen, Tagyoung Chung, **g Huang, Nanyun Peng

    Abstract: Automatic melody-to-lyric generation is a task in which song lyrics are generated to go with a given melody. It is of significant practical interest and more challenging than unconstrained lyric generation as the music imposes additional constraints onto the lyrics. The training data is limited as most songs are copyrighted, resulting in models that underfit the complicated cross-modal relationshi… ▽ More

    Submitted 22 December, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: ACL 2023. arXiv admin note: substantial text overlap with arXiv:2305.07760

  34. arXiv:2304.05084  [pdf, other

    cs.LG eess.SP

    A Self-attention Knowledge Domain Adaptation Network for Commercial Lithium-ion Batteries State-of-health Estimation under Shallow Cycles

    Authors: Xin Chen, Yuwen Qin, Weidong Zhao, Qiming Yang, Ningbo Cai, Kai Wu

    Abstract: Accurate state-of-health (SOH) estimation is critical to guarantee the safety, efficiency and reliability of battery-powered applications. Most SOH estimation methods focus on the 0-100\% full state-of-charge (SOC) range that has similar distributions. However, the batteries in real-world applications usually work in the partial SOC range under shallow-cycle conditions and follow different degrada… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  35. A High-Performance Accelerator for Super-Resolution Processing on Embedded GPU

    Authors: Wenqian Zhao, Qi Sun, Yang Bai, Wenbo Li, Haisheng Zheng, Bei Yu, Martin D. F. Wong

    Abstract: Recent years have witnessed impressive progress in super-resolution (SR) processing. However, its real-time inference requirement sets a challenge not only for the model design but also for the on-chip implementation. In this paper, we implement a full-stack SR acceleration framework on embedded GPU devices. The special dictionary learning algorithm used in SR models was analyzed in detail and acc… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

  36. arXiv:2303.06681  [pdf, other

    eess.IV cs.CV

    Learning Deep Intensity Field for Extremely Sparse-View CBCT Reconstruction

    Authors: Yiqun Lin, Zhong** Luo, Wei Zhao, Xiaomeng Li

    Abstract: Sparse-view cone-beam CT (CBCT) reconstruction is an important direction to reduce radiation dose and benefit clinical applications. Previous voxel-based generation methods represent the CT as discrete voxels, resulting in high memory requirements and limited spatial resolution due to the use of 3D decoders. In this paper, we formulate the CT volume as a continuous intensity field and develop a no… ▽ More

    Submitted 31 August, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

    Comments: MICCAI'23

  37. arXiv:2303.04439  [pdf, other

    cs.CV cs.SD eess.AS

    A Light Weight Model for Active Speaker Detection

    Authors: Junhua Liao, Haihan Duan, Kanghui Feng, Wanbing Zhao, Yanbing Yang, Liangyin Chen

    Abstract: Active speaker detection is a challenging task in audio-visual scenario understanding, which aims to detect who is speaking in one or more speakers scenarios. This task has received extensive attention as it is crucial in applications such as speaker diarization, speaker tracking, and automatic video editing. The existing studies try to improve performance by inputting multiple candidate informati… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

  38. arXiv:2211.15030  [pdf, other

    cs.CV eess.IV

    Imperceptible Adversarial Attack via Invertible Neural Networks

    Authors: Zihan Chen, Ziyue Wang, Junjie Huang, Wentao Zhao, Xiao Liu, Dejian Guan

    Abstract: Adding perturbations via utilizing auxiliary gradient information or discarding existing details of the benign images are two common approaches for generating adversarial examples. Though visual imperceptibility is the desired property of adversarial examples, conventional adversarial attacks still generate traceable adversarial perturbations. In this paper, we introduce a novel Adversarial Attack… ▽ More

    Submitted 17 January, 2023; v1 submitted 27 November, 2022; originally announced November 2022.

  39. arXiv:2211.10138  [pdf, other

    eess.IV cs.CV cs.LG

    Joint nnU-Net and Radiomics Approaches for Segmentation and Prognosis of Head and Neck Cancers with PET/CT images

    Authors: Hui Xu, Yihao Li, Wei Zhao, Gwenolé Quellec, Lijun Lu, Mathieu Hatt

    Abstract: Automatic segmentation of head and neck cancer (HNC) tumors and lymph nodes plays a crucial role in the optimization treatment strategy and prognosis analysis. This study aims to employ nnU-Net for automatic segmentation and radiomics for recurrence-free survival (RFS) prediction using pretreatment PET/CT images in multi-center HNC cohort. A multi-center HNC dataset with 883 patients (524 patients… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

  40. arXiv:2211.08798  [pdf, other

    eess.SP

    A SVD-based Dynamic Harmonic Phasor Estimator with Improved Suppression of Out-of-Band Interference

    Authors: Dongfang Zhao, Shisong Li, Fu** Wang, Wei Zhao, Songling Huang, Qing Wang

    Abstract: The diffusion of nonlinear loads and power electronic devices in power systems deteriorates the signal environment and increases the difficulty of measuring harmonic phasors. Considering accurate harmonic phasor information is necessary to deal with harmonic-related issues, this paper focuses on realizing accurate dynamic harmonic phasor estimation when the signal is contaminated by certain interh… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: 10 pages, 14 figures, manuscript accepted for publication on IEEE Transactions on Power Delivery

  41. arXiv:2203.06306  [pdf, other

    eess.IV

    DURRNet: Deep Unfolded Single Image Reflection Removal Network

    Authors: Jun-Jie Huang, Tianrui Liu, Zhixiong Yang, Shao**g Fu, Wentao Zhao, Pier Luigi Dragotti

    Abstract: Single image reflection removal problem aims to divide a reflection-contaminated image into a transmission image and a reflection image. It is a canonical blind source separation problem and is highly ill-posed. In this paper, we present a novel deep architecture called deep unfolded single image reflection removal network (DURRNet) which makes an attempt to combine the best features from model-ba… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

  42. arXiv:2202.07820  [pdf, other

    eess.IV cs.CV

    A Survey of Semen Quality Evaluation in Microscopic Videos Using Computer Assisted Sperm Analysis

    Authors: Wenwei Zhao, **li Ma, Chen Li, Xiaoning Bu, Shuojia Zou, Tao Jiang, Marcin Grzegorzek

    Abstract: The Computer Assisted Sperm Analysis (CASA) plays a crucial role in male reproductive health diagnosis and Infertility treatment. With the development of the computer industry in recent years, a great of accurate algorithms are proposed. With the assistance of those novel algorithms, it is possible for CASA to achieve a faster and higher quality result. Since image processing is the technical basi… ▽ More

    Submitted 17 February, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

  43. arXiv:2112.09574  [pdf

    eess.IV cs.CV cs.LG

    Super-resolution reconstruction of cytoskeleton image based on A-net deep learning network

    Authors: Qian Chen, Haoxin Bai, Bingchen Che, Tianyun Zhao, Ce Zhang, Kaige Wang, **tao Bai, Wei Zhao

    Abstract: To date, live-cell imaging at the nanometer scale remains challenging. Even though super-resolution microscopy methods have enabled visualization of subcellular structures below the optical resolution limit, the spatial resolution is still far from enough for the structural reconstruction of biomolecules in vivo (i.e. ~24 nm thickness of microtubule fiber). In this study, we proposed an A-net netw… ▽ More

    Submitted 17 December, 2021; originally announced December 2021.

    Comments: The manuscript has 17 pages, 10 figures and 58 references

  44. arXiv:2112.03756  [pdf, other

    cs.RO cs.AI eess.SY

    Bridging the Model-Reality Gap with Lipschitz Network Adaptation

    Authors: Siqi Zhou, Karime Pereida, Wenda Zhao, Angela P. Schoellig

    Abstract: As robots venture into the real world, they are subject to unmodeled dynamics and disturbances. Traditional model-based control approaches have been proven successful in relatively static and known operating environments. However, when an accurate model of the robot is not available, model-based design can lead to suboptimal and even unsafe behaviour. In this work, we propose a method that bridges… ▽ More

    Submitted 7 December, 2021; originally announced December 2021.

    Journal ref: IEEE Robotics and Automation Letters (RA-L) 2021

  45. arXiv:2111.00962  [pdf, other

    cs.SD cs.AI eess.AS

    RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses

    Authors: Shengyuan Xu, Wenxiao Zhao, **g Guo

    Abstract: Most GAN(Generative Adversarial Network)-based approaches towards high-fidelity waveform generation heavily rely on discriminators to improve their performance. However, GAN methods introduce much uncertainty into the generation process and often result in mismatches of pitch and intensity, which is fatal when it comes to sensitive use cases such as singing voice synthesis(SVS). To address this pr… ▽ More

    Submitted 20 March, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

    Comments: Submitted to INTERSPEECH2022

  46. arXiv:2110.12103  [pdf

    physics.optics eess.IV

    Single-shot fast 3D imaging through scattering media using structured illumination

    Authors: Ai** Zhai, Yuancheng Li, Wen**g Zhao, Dong Wang

    Abstract: Conventional approaches for 3D imaging in or through scattering media are usually limited to 2D reconstruction of objects at some discontinuous locations, although the time-consuming iteration, guide-star, or complex system are implemented. How to quickly visualize dynamic 3D objects behind scattering media is still an open issue. Here, by using structured light illumination, we propose a single-s… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

  47. arXiv:2110.09821  [pdf, other

    physics.ins-det eess.SP

    A SVD-Based Synchrophasor Estimator for P-class PMUs with Improved Immune from Interharmonic Tones

    Authors: Dongfang Zhao, Fu** Wang, Shisong Li, Lei Chen, Wei Zhao, Songling Huang

    Abstract: The increasing use of renewable generation, power electronic devices, and nonlinear loads in power systems brings more severe interharmonic tones to the measurand, which can increase estimation errors of P-class PMUs, cause misoperation of protection relays, and even threaten the stability of the power systems. Therefore, the performance of the P-class synchrophasor estimator under interharmonic i… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: 10 pages, 10 figures, submitted to IEEE Access

  48. arXiv:2110.09121  [pdf, ps, other

    cs.SD eess.AS

    KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke

    Authors: Xiaobin Zhuang, Huiran Yu, Weifeng Zhao, Tao Jiang, Peng Hu

    Abstract: An automatic pitch correction system typically includes several stages, such as pitch extraction, deviation estimation, pitch shift processing, and cross-fade smoothing. However, designing these components with strategies often requires domain expertise and they are likely to fail on corner cases. In this paper, we present KaraTuner, an end-to-end neural architecture that predicts pitch curve and… ▽ More

    Submitted 26 June, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

    Comments: To be published in Proc. Interspeech 2022, Incheon, South Korea

  49. arXiv:2109.13483  [pdf, other

    eess.IV cs.CV cs.LG physics.med-ph

    Metal Artifact Reduction in 2D CT Images with Self-supervised Cross-domain Learning

    Authors: Lequan Yu, Zhicheng Zhang, Xiaomeng Li, Hongyi Ren, Wei Zhao, Lei Xing

    Abstract: The presence of metallic implants often introduces severe metal artifacts in the X-ray CT images, which could adversely influence clinical diagnosis or dose calculation in radiation therapy. In this work, we present a novel deep-learning-based approach for metal artifact reduction (MAR). In order to alleviate the need for anatomically identical CT image pairs (i.e., metal artifact-corrupted CT ima… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: Accepted by PMB

  50. Cross-Site Severity Assessment of COVID-19 from CT Images via Domain Adaptation

    Authors: Geng-Xin Xu, Chen Liu, Jun Liu, Zhongxiang Ding, Feng Shi, Man Guo, Wei Zhao, Xiaoming Li, Ying Wei, Yaozong Gao, Chuan-Xian Ren, Dinggang Shen

    Abstract: Early and accurate severity assessment of Coronavirus disease 2019 (COVID-19) based on computed tomography (CT) images offers a great help to the estimation of intensive care unit event and the clinical decision of treatment planning. To augment the labeled data and improve the generalization ability of the classification model, it is necessary to aggregate data from multiple sites. This task face… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.