Skip to main content

Showing 1–50 of 135 results for author: Zhou, W

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18108  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Token-Weighted RNN-T for Learning from Flawed Data

    Authors: Gil Keren, Wei Zhou, Ozlem Kalinli

    Abstract: ASR models are commonly trained with the cross-entropy criterion to increase the probability of a target token sequence. While optimizing the probability of all tokens in the target sequence is sensible, one may want to de-emphasize tokens that reflect transcription errors. In this work, we propose a novel token-weighted RNN-T criterion that augments the RNN-T objective with token-specific weights… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.08203  [pdf, other

    eess.AS cs.SD

    LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation

    Authors: Wenhao Guan, Kaidi Wang, Wang** Zhou, Yang Wang, Feng Deng, Hui Wang, Lin Li, Qingyang Hong, Yong Qin

    Abstract: Recently, the application of diffusion models has facilitated the significant development of speech and audio generation. Nevertheless, the quality of samples generated by diffusion models still needs improvement. And the effectiveness of the method is accompanied by the extensive number of sampling steps, leading to an extended synthesis time necessary for generating high-quality audio. Previous… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech2024

  3. arXiv:2405.15519  [pdf

    physics.optics eess.IV

    Confocal structured illumination microscopy

    Authors: Weishuai Zhou, Manhong Yao, Xi Lin, Quan Yu, Junzheng Peng, **gang Zhong

    Abstract: Confocal microscopy, a critical advancement in optical imaging, is widely applied because of its excellent anti-noise ability. However, it has low imaging efficiency and can cause phototoxicity. Optical-sectioning structured illumination microscopy (OS-SIM) can overcome the limitations of confocal microscopy but still face challenges in imaging depth and signal-to-noise ratio (SNR). We introduce t… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  4. arXiv:2405.05667  [pdf, other

    eess.IV cs.CV

    VM-DDPM: Vision Mamba Diffusion for Medical Image Synthesis

    Authors: Zhihan Ju, Wanting Zhou

    Abstract: In the realm of smart healthcare, researchers enhance the scale and diversity of medical datasets through medical image synthesis. However, existing methods are limited by CNN local perception and Transformer quadratic complexity, making it difficult to balance structural texture consistency. To this end, we propose the Vision Mamba DDPM (VM-DDPM) based on State Space Model (SSM), fully combining… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  5. arXiv:2405.04902  [pdf, other

    eess.IV cs.CV

    HAGAN: Hybrid Augmented Generative Adversarial Network for Medical Image Synthesis

    Authors: Zhihan Ju, Wanting Zhou, Longteng Kong, Yu Chen, Yi Li, Zhenan Sun, Caifeng Shan

    Abstract: Medical Image Synthesis (MIS) plays an important role in the intelligent medical field, which greatly saves the economic and time costs of medical diagnosis. However, due to the complexity of medical images and similar characteristics of different tissue cells, existing methods face great challenges in meeting their biological consistency. To this end, we propose the Hybrid Augmented Generative Ad… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  6. arXiv:2404.15163  [pdf, other

    cs.CV eess.IV

    Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment

    Authors: Tianwei Zhou, Songbai Tan, Wei Zhou, Yu Luo, Yuan-Gen Wang, Guanghui Yue

    Abstract: With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a nov… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: IEEE Transactions on Broadcasting (TBC)

  7. arXiv:2404.00362  [pdf, other

    cs.CV eess.IV

    STBA: Towards Evaluating the Robustness of DNNs for Query-Limited Black-box Scenario

    Authors: Renyang Liu, Kwok-Yan Lam, Wei Zhou, Sixing Wu, Jun Zhao, Dongting Hu, Mingming Gong

    Abstract: Many attack techniques have been proposed to explore the vulnerability of DNNs and further help to improve their robustness. Despite the significant progress made recently, existing black-box attack methods still suffer from unsatisfactory performance due to the vast number of queries needed to optimize desired perturbations. Besides, the other critical challenge is that adversarial examples built… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  8. arXiv:2403.10481  [pdf, other

    eess.IV eess.SP

    Tensor Star Decomposition

    Authors: Wuyang Zhou, Yu-Bang Zheng, Qibin Zhao, Danilo Mandic

    Abstract: A novel tensor decomposition framework, termed Tensor Star (TS) decomposition, is proposed which represents a new type of tensor network decomposition based on tensor contractions. This is achieved by connecting the core tensors in a ring shape, whereby the core tensors act as skip connections between the factor tensors and allow for direct correlation characterisation between any two arbitrary di… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  9. arXiv:2403.00671  [pdf, other

    eess.IV

    Asymmetric Feature Fusion for Image Retrieval

    Authors: Hui Wu, Min Wang, Wengang Zhou, Zhenbo Lu, Houqiang Li

    Abstract: In asymmetric retrieval systems, models with different capacities are deployed on platforms with different computational and storage resources. Despite the great progress, existing approaches still suffer from a dilemma between retrieval efficiency and asymmetric accuracy due to the limited capacity of the lightweight query model. In this work, we propose an Asymmetric Feature Fusion (AFF) paradig… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  10. arXiv:2403.00648  [pdf, other

    eess.IV

    Structure Similarity Preservation Learning for Asymmetric Image Retrieval

    Authors: Hui Wu, Min Wang, Wengang Zhou, Houqiang Li

    Abstract: Asymmetric image retrieval is a task that seeks to balance retrieval accuracy and efficiency by leveraging lightweight and large models for the query and gallery sides, respectively. The key to asymmetric image retrieval is realizing feature compatibility between different models. Despite the great progress, most existing approaches either rely on classifiers inherited from gallery models or simpl… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  11. arXiv:2402.06841  [pdf

    eess.IV cs.CV

    Point cloud-based registration and image fusion between cardiac SPECT MPI and CTA

    Authors: Shaojie Tang, Penpen Miao, Xingyu Gao, Yu Zhong, Dantong Zhu, Haixing Wen, Zhihui Xu, Qiuyue Wei, Hong** Yao, Xin Huang, Rui Gao, Chen Zhao, Weihua Zhou

    Abstract: A method was proposed for the point cloud-based registration and image fusion between cardiac single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) and cardiac computed tomography angiograms (CTA). Firstly, the left ventricle (LV) epicardial regions (LVERs) in SPECT and CTA images were segmented by using different U-Net neural networks trained to generate the point c… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  12. arXiv:2402.02349  [pdf

    eess.IV cs.CV

    Vision Transformer-based Multimodal Feature Fusion Network for Lymphoma Segmentation on PET/CT Images

    Authors: Huan Huang, Liheng Qiu, Shenmiao Yang, Longxi Li, Jiaofen Nan, Yanting Li, Chuang Han, Fubao Zhu, Chen Zhao, Weihua Zhou

    Abstract: Background: Diffuse large B-cell lymphoma (DLBCL) segmentation is a challenge in medical image analysis. Traditional segmentation methods for lymphoma struggle with the complex patterns and the presence of DLBCL lesions. Objective: We aim to develop an accurate method for lymphoma segmentation with 18F-Fluorodeoxyglucose positron emission tomography (PET) and computed tomography (CT) images. Metho… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 14 pages, 6 figures; reference added

  13. arXiv:2402.01186  [pdf, other

    eess.IV cs.CV

    Ambient-Pix2PixGAN for Translating Medical Images from Noisy Data

    Authors: Wentao Chen, Xichen Xu, Jie Luo, Weimin Zhou

    Abstract: Image-to-image translation is a common task in computer vision and has been rapidly increasing the impact on the field of medical imaging. Deep learning-based methods that employ conditional generative adversarial networks (cGANs), such as Pix2PixGAN, have been extensively explored to perform image-to-image translation tasks. However, when noisy medical image data are considered, such methods cann… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: SPIE Medical Imaging 2024

  14. arXiv:2402.01171  [pdf, other

    eess.IV cs.CV

    AmbientCycleGAN for Establishing Interpretable Stochastic Object Models Based on Mathematical Phantoms and Medical Imaging Measurements

    Authors: Xichen Xu, Wentao Chen, Weimin Zhou

    Abstract: Medical imaging systems that are designed for producing diagnostically informative images should be objectively assessed via task-based measures of image quality (IQ). Ideally, computation of task-based measures of IQ needs to account for all sources of randomness in the measurement data, including the variability in the ensemble of objects to be imaged. To address this need, stochastic object mod… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: SPIE Medical Imaging 2024

  15. arXiv:2401.13249  [pdf, other

    eess.AS cs.MM

    MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction

    Authors: Wang** Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi Zhao, Tatsuya Kawahara

    Abstract: Automatic Mean Opinion Score (MOS) prediction is employed to evaluate the quality of synthetic speech. This study extends the application of predicted MOS to the task of Fake Audio Detection (FAD), as we expect that MOS can be used to assess how close synthesized speech is to the natural human voice. We propose MOS-FAD, where MOS can be leveraged at two key points in FAD: training data selection a… ▽ More

    Submitted 24 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted in ICASSP2024

  16. arXiv:2401.08982  [pdf

    cs.RO eess.SY

    Robot Tape Manipulation for 3D Printing

    Authors: Nahid Tushar, Rencheng Wu, Yu She, Wenchao Zhou, Wan Shou

    Abstract: 3D printing has enabled various applications using different forms of materials, such as filaments, sheets, and inks. Typically, during 3D printing, feedstocks are transformed into discrete building blocks and placed or deposited in a designated location similar to the manipulation and assembly of discrete objects. However, 3D printing of continuous and flexible tape (with the geometry between fil… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  17. arXiv:2312.07258  [pdf, other

    cs.CV eess.IV

    SSTA: Salient Spatially Transformed Attack

    Authors: Renyang Liu, Wei Zhou, Sixin Wu, Jun Zhao, Kwok-Yan Lam

    Abstract: Extensive studies have demonstrated that deep neural networks (DNNs) are vulnerable to adversarial attacks, which brings a huge security risk to the further application of DNNs, especially for the AI models developed in the real world. Despite the significant progress that has been made recently, existing attack methods still suffer from the unsatisfactory performance of esca** from being detect… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  18. arXiv:2312.00437  [pdf

    eess.SP

    Investigation on data fusion of sun-induced chlorophyll fluorescence and reflectance for photosynthetic capacity of rice

    Authors: Yu-an Zhou, Li Zhai, Weijun Zhou, Ji Zhou, Haiyan Cen

    Abstract: Studying crop photosynthesis is crucial for improving yield, but current methods are labor-intensive. This research aims to enhance accuracy by combining leaf reflectance and sun-induced chlorophyll fluorescence (SIF) signals to estimate key photosynthetic traits in rice. The study analyzes 149 leaf samples from two rice cultivars, considering reflectance, SIF, chlorophyll, carotenoids, and CO2 re… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  19. arXiv:2311.10656  [pdf, other

    eess.AS

    LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement

    Authors: Zili Qi, Xinhui Hu, Wang** Zhou, Sheng Li, Hao Wu, Jian Lu, Xinkang Xu

    Abstract: Recently, researchers have shown an increasing interest in automatically predicting the subjective evaluation for speech synthesis systems. This prediction is a challenging task, especially on the out-of-domain test set. In this paper, we proposed a novel fusion model for MOS prediction that combines supervised and unsupervised approaches. In the supervised aspect, we developed an SSL-based predic… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: accepted in IEEE-ASRU2023

  20. arXiv:2311.05282  [pdf, other

    physics.optics eess.SP

    Empowering high-dimensional optical fiber communications with integrated photonic processors

    Authors: Kaihang Lu, Zengqi Chen, Hao Chen, Wu Zhou, Zunyue Zhang, Hon Ki Tsang, Yeyu Tong

    Abstract: Mode division multiplexing (MDM) in optical fibers enables multichannel capabilities for various applications, including data transmission, quantum networks, imaging, and sensing. However, MDM optical fiber systems, usually necessities bulk-optics approaches for launching different orthogonal fiber modes into the multimode optical fiber, and multiple-input multiple-output digital electronic signal… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  21. arXiv:2311.00567  [pdf

    eess.IV cs.CV cs.LG physics.med-ph q-bio.QM

    A Robust Deep Learning Method with Uncertainty Estimation for the Pathological Classification of Renal Cell Carcinoma based on CT Images

    Authors: Ni Yao, Hang Hu, Kaicong Chen, Chen Zhao, Yuan Guo, Boya Li, Jiaofen Nan, Yanting Li, Chuang Han, Fubao Zhu, Weihua Zhou, Li Tian

    Abstract: Objectives To develop and validate a deep learning-based diagnostic model incorporating uncertainty estimation so as to facilitate radiologists in the preoperative differentiation of the pathological subtypes of renal cell carcinoma (RCC) based on CT images. Methods Data from 668 consecutive patients, pathologically proven RCC, were retrospectively collected from Center 1. By using five-fold cross… ▽ More

    Submitted 12 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: 16 pages, 6 figures

  22. arXiv:2310.07345  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Investigating the Effect of Language Models in Sequence Discriminative Training for Neural Transducers

    Authors: Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

    Abstract: In this work, we investigate the effect of language models (LMs) with different context lengths and label units (phoneme vs. word) used in sequence discriminative training for phoneme-based neural transducers. Both lattice-free and N-best-list approaches are examined. For lattice-free methods with phoneme-level LMs, we propose a method to approximate the context history to employ LMs with full-con… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: accepted at ASRU 2023

  23. arXiv:2310.04369  [pdf, other

    cs.SD cs.LG eess.AS

    MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement

    Authors: Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie

    Abstract: A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios. Music source separation (MSS) models treat vocals and various accompaniment components equally, which may reduce performance compared to the model that only considers vocal enhancement. In this paper, we propose a novel multi-band temporal-freque… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  24. arXiv:2309.14130  [pdf, ps, other

    cs.SD cs.CL cs.LG eess.AS

    On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers

    Authors: Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

    Abstract: Internal language model (ILM) subtraction has been widely applied to improve the performance of the RNN-Transducer with external language model (LM) fusion for speech recognition. In this work, we show that sequence discriminative training has a strong correlation with ILM subtraction from both theoretical and empirical points of view. Theoretically, we derive that the global optimum of maximum mu… ▽ More

    Submitted 13 April, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: accepted at ICASSP 2024

  25. arXiv:2309.11714  [pdf, other

    eess.SP cs.AI cs.LG

    A Dynamic Domain Adaptation Deep Learning Network for EEG-based Motor Imagery Classification

    Authors: Jie Jiao, Meiyan Xu, Qingqing Chen, Hefan Zhou, Wangliang Zhou

    Abstract: There is a correlation between adjacent channels of electroencephalogram (EEG), and how to represent this correlation is an issue that is currently being explored. In addition, due to inter-individual differences in EEG signals, this discrepancy results in new subjects need spend a amount of calibration time for EEG-based motor imagery brain-computer interface. In order to solve the above problems… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: 10 pages,4 figures,journal

    MSC Class: 68T07 (Primary) ACM Class: I.2.4

  26. arXiv:2309.08415  [pdf

    cs.LG eess.SP physics.med-ph

    A new method of modeling the multi-stage decision-making process of CRT using machine learning with uncertainty quantification

    Authors: Kristoffer Larsen, Chen Zhao, Joyce Keyak, Qiuying Sha, Diana Paez, Xinwei Zhang, Guang-Uei Hung, Jiangang Zou, Amalia Peix, Weihua Zhou

    Abstract: Aims. The purpose of this study is to create a multi-stage machine learning model to predict cardiac resynchronization therapy (CRT) response for heart failure (HF) patients. This model exploits uncertainty quantification to recommend additional collection of single-photon emission computed tomography myocardial perfusion imaging (SPECT MPI) variables if baseline clinical variables and features fr… ▽ More

    Submitted 28 April, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: 30 pages,6 figures. arXiv admin note: text overlap with arXiv:2305.02475

  27. arXiv:2309.08276   

    eess.SY

    A New Adaptive Phase-locked Loop for Synchronization of a Grid-Connected Voltage Source Converter: Simulation and Experimental Results

    Authors: Wei He, Jiachen Yan, Romeo Ortega, Daniele Zonetti, Wang** Zhou

    Abstract: In [1] a new adaptive phase-locked loop scheme for synchronization of a grid connected voltage source converter with guaranteed (almost) global stability properties was reported. To guarantee a suitable synchronization with the angle of the three-phase grid voltage we design an adaptive observer for such a signal requiring measurements only at the point of common coupling. An interesting feature o… ▽ More

    Submitted 30 October, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Something needs to be modified so that this paper is more clear

  28. arXiv:2308.09302  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms

    Authors: Penghui Wen, Kun Hu, Wenxi Yue, Sen Zhang, Wanlei Zhou, Zhiyong Wang

    Abstract: Robust audio anti-spoofing has been increasingly challenging due to the recent advancements on deepfake techniques. While spectrograms have demonstrated their capability for anti-spoofing, complementary information presented in multi-order spectral patterns have not been well explored, which limits their effectiveness for varying spoofing attacks. Therefore, we propose a novel deep learning method… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  29. arXiv:2308.04774  [pdf, other

    cs.RO cs.AI cs.CV eess.SY

    E$^3$-UAV: An Edge-based Energy-Efficient Object Detection System for Unmanned Aerial Vehicles

    Authors: Jiashun Suo, Xingzhou Zhang, Weisong Shi, Wei Zhou

    Abstract: Motivated by the advances in deep learning techniques, the application of Unmanned Aerial Vehicle (UAV)-based object detection has proliferated across a range of fields, including vehicle counting, fire detection, and city monitoring. While most existing research studies only a subset of the challenges inherent to UAV-based object detection, there are few studies that balance various aspects to de… ▽ More

    Submitted 2 December, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: 16 pages, 8 figures

    Journal ref: IEEE Internet of Things Journal, Early Access 1-1 (2023)

  30. arXiv:2307.04327  [pdf

    cs.RO eess.SY

    Legal Decision-making for Highway Automated Driving

    Authors: Xiaohan Ma, Wenhao Yu, Chengxiang Zhao, Changjun Wang, Wenhui Zhou, Guangming Zhao, Mingyue Ma, Weida Wang, Lin Yang, Rui Mu, Hong Wang, Jun Li

    Abstract: Compliance with traffic laws is a fundamental requirement for human drivers on the road, and autonomous vehicles must adhere to traffic laws as well. However, current autonomous vehicles prioritize safety and collision avoidance primarily in their decision-making and planning, which will lead to misunderstandings and distrust from human drivers and may even result in accidents in mixed traffic flo… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: 14 pages, 17 figures

  31. arXiv:2307.00020  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation

    Authors: Yuhao Cui, Xiongwei Wang, Zhongzhou Zhao, Wei Zhou, Haiqing Chen

    Abstract: Existing fine-grained intensity regulation methods rely on explicit control through predicted emotion probabilities. However, these high-level semantic probabilities are often inaccurate and unsmooth at the phoneme level, leading to bias in learning. Especially when we attempt to mix multiple emotion intensities for specific phonemes, resulting in markedly reduced controllability and naturalness o… ▽ More

    Submitted 27 June, 2023; originally announced July 2023.

    Comments: Accepted at Interspeech 2023

  32. arXiv:2306.17008  [pdf

    eess.IV cs.CV

    MLA-BIN: Model-level Attention and Batch-instance Style Normalization for Domain Generalization of Federated Learning on Medical Image Segmentation

    Authors: Fubao Zhu, Yanhui Tian, Chuang Han, Yanting Li, Jiaofen Nan, Ni Yao, Weihua Zhou

    Abstract: The privacy protection mechanism of federated learning (FL) offers an effective solution for cross-center medical collaboration and data sharing. In multi-site medical image segmentation, each medical site serves as a client of FL, and its data naturally forms a domain. FL supplies the possibility to improve the performance of seen domains model. However, there is a problem of domain generalizatio… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: 9 pages, 8 figures, 2 tables

  33. arXiv:2306.05704  [pdf, other

    cs.CV cs.MM eess.IV

    Exploring Effective Mask Sampling Modeling for Neural Image Compression

    Authors: Lin Liu, Mingming Zhao, Shanxin Yuan, Wenlong Lyu, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

    Abstract: Image compression aims to reduce the information redundancy in images. Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy, but rarely address the channel redundancy. Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: 10 pages

  34. arXiv:2306.01210  [pdf

    eess.SP cs.CV

    A new method using deep transfer learning on ECG to predict the response to cardiac resynchronization therapy

    Authors: Zhuo He, Hong** Si, Xinwei Zhang, Qing-Hui Chen, Jiangang Zou, Weihua Zhou

    Abstract: Background: Cardiac resynchronization therapy (CRT) has emerged as an effective treatment for heart failure patients with electrical dyssynchrony. However, accurately predicting which patients will respond to CRT remains a challenge. This study explores the application of deep transfer learning techniques to train a predictive model for CRT response. Methods: In this study, the short-time Fourier… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  35. RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition

    Authors: Wei Zhou, Eugen Beck, Simon Berger, Ralf Schlüter, Hermann Ney

    Abstract: Modern public ASR tools usually provide rich support for training various sequence-to-sequence (S2S) models, but rather simple support for decoding open-vocabulary scenarios only. For closed-vocabulary scenarios, public tools supporting lexical-constrained decoding are usually only for classical ASR, or do not support all S2S models. To eliminate this restriction on research possibilities such as… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: accepted at Interspeech 2023

  36. arXiv:2305.01165  [pdf, other

    eess.IV cs.CV physics.optics

    Self-similarity-based super-resolution of photoacoustic angiography from hand-drawn doodles

    Authors: Yuanzheng Ma, Wangting Zhou, Rui Ma, Sihua Yang, Yansong Tang, Xun Guan

    Abstract: Deep-learning-based super-resolution photoacoustic angiography (PAA) is a powerful tool that restores blood vessel images from under-sampled images to facilitate disease diagnosis. Nonetheless, due to the scarcity of training samples, PAA super-resolution models often exhibit inadequate generalization capabilities, particularly in the context of continuous monitoring tasks. To address this challen… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: 12 pages, 6 figures, journal

  37. arXiv:2304.14302  [pdf

    physics.app-ph eess.SY physics.optics

    In-memory photonic dot-product engine with electrically programmable weight banks

    Authors: Wen Zhou, Bowei Dong, Nikolaos Farmakidis, Xuan Li, Nathan Youngblood, Kairan Huang, Yuhan He, C. David Wright, Wolfram H. P. Pernice, Harish Bhaskaran

    Abstract: Electronically reprogrammable photonic circuits based on phase-change chalcogenides present an avenue to resolve the von-Neumann bottleneck; however, implementation of such hybrid photonic-electronic processing has not achieved computational success. Here, we achieve this milestone by demonstrating an in-memory photonic-electronic dot-product engine, one that decouples electronic programming of ph… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

  38. arXiv:2304.11521  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    An Order-Complexity Model for Aesthetic Quality Assessment of Homophony Music Performance

    Authors: Xin **, Wu Zhou, **yu Wang, Duo Xu, Yiqing Rong, Jialin Sun

    Abstract: Although computational aesthetics evaluation has made certain achievements in many fields, its research of music performance remains to be explored. At present, subjective evaluation is still a ultimate method of music aesthetics research, but it will consume a lot of human and material resources. In addition, the music performance generated by AI is still mechanical, monotonous and lacking in bea… ▽ More

    Submitted 22 April, 2023; originally announced April 2023.

    Journal ref: AIART 2023 ICME Workshop

  39. arXiv:2304.11509  [pdf

    eess.SP

    Co-GRU Enhanced End-to-End Design for Long-haul Coherent Transmission Systems

    Authors: Jiayu Zheng, Tianhong Zhang, Yu Wen**g, Weiqin Zhou, Chuanchuan Yang, Fan Zhang

    Abstract: In recent years, the end-to-end (E2E) scheme based on deep learning (DL) has been proposed as a potential scheme to jointly optimize the encoder and the decoder parameters of the optical communication system. Compared with conventional deep neural network (DNN) adopted in E2E design, center-oriented Gated Recurrent Unit (Co-GRU) network has the ability to learn and compensate for inter-symbol inte… ▽ More

    Submitted 27 May, 2023; v1 submitted 22 April, 2023; originally announced April 2023.

  40. arXiv:2304.01218  [pdf, other

    eess.SY cs.AI cs.LG

    POLAR-Express: Efficient and Precise Formal Reachability Analysis of Neural-Network Controlled Systems

    Authors: Yixuan Wang, Weichao Zhou, Jiameng Fan, Zhilu Wang, Jiajun Li, Xin Chen, Chao Huang, Wenchao Li, Qi Zhu

    Abstract: Neural networks (NNs) playing the role of controllers have demonstrated impressive empirical performances on challenging control problems. However, the potential adoption of NN controllers in real-life applications also gives rise to a growing concern over the safety of these neural-network controlled systems (NNCSs), especially when used in safety-critical applications. In this work, we present P… ▽ More

    Submitted 5 April, 2023; v1 submitted 31 March, 2023; originally announced April 2023.

  41. arXiv:2304.00433  [pdf, other

    eess.SP cs.CV cs.LG stat.CO

    Ideal Observer Computation by Use of Markov-Chain Monte Carlo with Generative Adversarial Networks

    Authors: Weimin Zhou, Umberto Villa, Mark A. Anastasio

    Abstract: Medical imaging systems are often evaluated and optimized via objective, or task-specific, measures of image quality (IQ) that quantify the performance of an observer on a specific clinically-relevant task. The performance of the Bayesian Ideal Observer (IO) sets an upper limit among all observers, numerical or human, and has been advocated for use as a figure-of-merit (FOM) for evaluating and opt… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

    Comments: Submitted to IEEE Transactions on Medical Imaging

  42. arXiv:2301.12340  [pdf

    eess.IV cs.CV

    Incremental Value and Interpretability of Radiomics Features of Both Lung and Epicardial Adipose Tissue for Detecting the Severity of COVID-19 Infection

    Authors: Ni Yao, Yanhui Tian, Daniel Gama das Neves, Chen Zhao, Claudio Tinoco Mesquita, Wolney de Andrade Martins, Alair Augusto Sarmet Moreira Damas dos Santos, Yanting Li, Chuang Han, Fubao Zhu, Neng Dai, Weihua Zhou

    Abstract: Epicardial adipose tissue (EAT) is known for its pro-inflammatory properties and association with Coronavirus Disease 2019 (COVID-19) severity. However, current EAT segmentation methods do not consider positional information. Additionally, the detection of COVID-19 severity lacks consideration for EAT radiomics features, which limits interpretability. This study investigates the use of radiomics f… ▽ More

    Submitted 6 December, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

    Comments: 20 pages, 7 figures

  43. arXiv:2301.05908  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    An Order-Complexity Model for Aesthetic Quality Assessment of Symbolic Homophony Music Scores

    Authors: Xin **, Wu Zhou, **yu Wang, Duo Xu, Yiqing Rong, Shuai Cui

    Abstract: Computational aesthetics evaluation has made great achievements in the field of visual arts, but the research work on music still needs to be explored. Although the existing work of music generation is very substantial, the quality of music score generated by AI is relatively poor compared with that created by human composers. The music scores created by AI are usually monotonous and devoid of emo… ▽ More

    Submitted 14 January, 2023; originally announced January 2023.

  44. arXiv:2212.04325  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers

    Authors: Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

    Abstract: Recently, RNN-Transducers have achieved remarkable results on various automatic speech recognition tasks. However, lattice-free sequence discriminative training methods, which obtain superior performance in hybrid models, are rarely investigated in RNN-Transducers. In this work, we propose three lattice-free training objectives, namely lattice-free maximum mutual information, lattice-free segment-… ▽ More

    Submitted 25 May, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: accepted at ICASSP 2023

  45. arXiv:2211.07472  [pdf

    physics.med-ph eess.SP

    A new method using machine learning to integrate ECG and gated SPECT MPI for Cardiac Resynchronization Therapy Decision Support on behalf of the VISION-CRT

    Authors: Fernando de A. Fernandes, Kristoffer Larsen, Zhuo He, Erivelton Nascimento, Amalia Peix, Qiuying Sha, Diana Paez, Ernest V. Garcia, Weihua Zhou, Claudio T Mesquita

    Abstract: Cardiac resynchronization therapy (CRT) has been established as an important therapy for heart failure. Mechanical dyssynchrony has the potential to predict responders to CRT. The aim of this study was to report the development and the validation of machine learning (ML) models which integrates ECG, gated SPECT MPI (GMPS) and clinical variables to predict patients' response to CRT. This analysis i… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

  46. arXiv:2211.06369  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Enhancing and Adversarial: Improve ASR with Speaker Labels

    Authors: Wei Zhou, Haotian Wu, **g**g Xu, Mohammad Zeineldeen, Christoph Lüscher, Ralf Schlüter, Hermann Ney

    Abstract: ASR can be improved by multi-task learning (MTL) with domain enhancing or domain adversarial training, which are two opposite objectives with the aim to increase/decrease domain variance towards domain-aware/agnostic ASR, respectively. In this work, we study how to best apply these two opposite objectives with speaker labels to improve conformer-based ASR. We also propose a novel adaptive gradient… ▽ More

    Submitted 24 February, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: accepted at ICASSP 2023

  47. arXiv:2211.03885  [pdf, other

    cs.CV eess.IV

    Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, Shuai Liu, Chaoyu Feng, Furui Bai, Xiaotao Wang, Lei Lei, Ziyao Yi, Yan Xiang, Zibin Liu, Shaoqing Li, Keming Shi, Dehui Kong, Ke Xu, Minsu Kwon, Yaqi Wu, Jiesi Zheng, Zhihao Fan, Xun Wu, Feng Zhang, Albert No, Minhyeok Cho, Zewen Chen, Xiaze Zhang, Ran Li , et al. (13 additional authors not shown)

    Abstract: The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. Th… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  48. arXiv:2211.00899  [pdf, other

    eess.IV cs.CV

    LightVessel: Exploring Lightweight Coronary Artery Vessel Segmentation via Similarity Knowledge Distillation

    Authors: Hao Dang, Yuekai Zhang, Xingqun Qi, Wanting Zhou, Muyi Sun

    Abstract: In recent years, deep convolution neural networks (DCNNs) have achieved great prospects in coronary artery vessel segmentation. However, it is difficult to deploy complicated models in clinical scenarios since high-performance approaches have excessive parameters and high computation costs. To tackle this problem, we propose \textbf{LightVessel}, a Similarity Knowledge Distillation Framework, for… ▽ More

    Submitted 25 February, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: 5 pages, 7 figures, conference

  49. arXiv:2210.15875  [pdf, ps, other

    eess.SY eess.SP

    Dynamic Event-Triggered Discrete-Time Linear Time-Varying System with Privacy-Preservation

    Authors: Xuefeng Yang, Li Liu, Wenju Zhou, **g Shi, Yinggang Zhang, Xin Hu, Huiyu Zhou

    Abstract: This paper focuses on discrete-time wireless sensor networks with privacy-preservation. In practical applications, information exchange between sensors is subject to attacks. For the information leakage caused by the attack during the information transmission process, privacy-preservation is introduced for system states. To make communication resources more effectively utilized, a dynamic event-tr… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  50. arXiv:2210.14742  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Monotonic segmental attention for automatic speech recognition

    Authors: Albert Zeyer, Robin Schmitt, Wei Zhou, Ralf Schlüter, Hermann Ney

    Abstract: We introduce a novel segmental-attention model for automatic speech recognition. We restrict the decoder attention to segments to avoid quadratic runtime of global attention, better generalize to long sequences, and eventually enable streaming. We directly compare global-attention and different segmental-attention modeling variants. We develop and compare two separate time-synchronous decoders, on… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: accepted at SLT: https://slt2022.org/