Skip to main content

Showing 1–50 of 111 results for author: Zhou, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.08523  [pdf, other

    eess.IV

    A Plug-and-Play Untrained Neural Network for Full Waveform Inversion in Reconstructing Sound Speed Images of Ultrasound Computed Tomography

    Authors: Weicheng Yan, Qiude Zhang, Yun Wu, Zhaohui Liu, Liang Zhou, Mingyue Ding, Ming Yuchi, Wu Qiu

    Abstract: Ultrasound computed tomography (USCT), as an emerging technology, can provide multiple quantitative parametric images of human tissue, such as sound speed and attenuation images, distinguishing it from conventional B-mode (reflection) ultrasound imaging. Full waveform inversion (FWI) is acknowledged as a technique with the greatest potential for reconstructing high-resolution sound speed images in… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  2. arXiv:2406.07855  [pdf, other

    cs.CL cs.SD eess.AS

    VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

    Authors: Bing Han, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Yanming Qian, Yanqing Liu, Sheng Zhao, **yu Li, Furu Wei

    Abstract: With the help of discrete neural audio codecs, large language models (LLM) have increasingly been recognized as a promising methodology for zero-shot Text-to-Speech (TTS) synthesis. However, sampling based decoding strategies bring astonishing diversity to generation, but also pose robustness issues such as typos, omissions and repetition. In addition, the high sampling rate of audio also brings h… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 15 pages, 5 figures

  3. arXiv:2406.05370  [pdf, other

    cs.CL cs.SD eess.AS

    VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

    Authors: Sanyuan Chen, Shujie Liu, Long Zhou, Yanqing Liu, Xu Tan, **yu Li, Sheng Zhao, Yao Qian, Furu Wei

    Abstract: This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, the new iteration introduces two significant enhancements: Repetition Aware Sampling refines the original nucleus sampling process by accounting for token repetition in… ▽ More

    Submitted 17 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: Demo posted

  4. arXiv:2405.17809  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

    Authors: Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, **yu Li, Sheng Zhao, Michael Zeng

    Abstract: There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeline framework by concatenating speech recognition, machine translation and text-to-speech models. The primary challenges stem from the inherent complex… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Work in progress

  5. arXiv:2405.14770  [pdf, other

    eess.IV

    Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography

    Authors: Shuo Han, Yongshun Xu, Dayang Wang, Bahareh Morovati, Li Zhou, Jonathan S. Maltz, Ge Wang, Hengyong Yu

    Abstract: Cardiac computed tomography (CT) has emerged as a major imaging modality for the diagnosis and monitoring of cardiovascular diseases. High temporal resolution is essential to ensure diagnostic accuracy. Limited-angle data acquisition can reduce scan time and improve temporal resolution, but typically leads to severe image degradation and motivates for improved reconstruction techniques. In this pa… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 12 pages

  6. arXiv:2405.10550  [pdf, other

    eess.IV cs.CV

    LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion

    Authors: Tong Chen, Qingcheng Lyu, Long Bai, Erjian Guo, Huxin Gao, Xiaoxiao Yang, Hongliang Ren, Lu** Zhou

    Abstract: Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It ad… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  7. arXiv:2405.09554  [pdf, ps, other

    eess.SP cs.IT

    Underdetermined DOA Estimation of Off-Grid Sources Based on the Generalized Double Pareto Prior

    Authors: Yongfeng Huang, Zhendong Chen, Kun Ye, Lang Zhou, Haixin Sun

    Abstract: In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to mod… ▽ More

    Submitted 17 May, 2024; v1 submitted 18 April, 2024; originally announced May 2024.

  8. arXiv:2405.04274  [pdf, other

    eess.IV cs.CV

    Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression

    Authors: Zhenghao Chen, Lu** Zhou, Zhihao Hu, Dong Xu

    Abstract: Content-adaptive compression is crucial for enhancing the adaptability of the pre-trained neural codec for various contents. Although these methods have been very practical in neural image compression (NIC), their application in neural video compression (NVC) is still limited due to two main aspects: 1), video compression relies heavily on temporal redundancy, therefore updating just one or a few… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  9. arXiv:2405.01161  [pdf, ps, other

    eess.SP

    Exponentially Consistent Outlier Hypothesis Testing for Continuous Sequences

    Authors: Lina Zhu, Lin Zhou

    Abstract: In outlier hypothesis testing, one aims to detect outlying sequences among a given set of sequences, where most sequences are generated i.i.d. from a nominal distribution while outlying sequences (outliers) are generated i.i.d. from a different anomalous distribution. Most existing studies focus on discrete-valued sequences, where each data sample takes values in a finite set. To account for pract… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  10. arXiv:2405.01113  [pdf, other

    cs.CV cs.AI eess.IV

    Domain-Transferred Synthetic Data Generation for Improving Monocular Depth Estimation

    Authors: Seungyeop Lee, Knut Peterson, Solmaz Arezoomandan, Bill Cai, Peihan Li, Lifeng Zhou, David Han

    Abstract: A major obstacle to the development of effective monocular depth estimation algorithms is the difficulty in obtaining high-quality depth data that corresponds to collected RGB images. Collecting this data is time-consuming and costly, and even data collected by modern sensors has limited range or resolution, and is subject to inconsistencies and noise. To combat this, we propose a method of data g… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  11. arXiv:2404.10026  [pdf

    eess.IV cs.CR cs.LG

    Distributed Federated Learning-Based Deep Learning Model for Privacy MRI Brain Tumor Detection

    Authors: Lisang Zhou, Meng Wang, Ning Zhou

    Abstract: Distributed training can facilitate the processing of large medical image datasets, and improve the accuracy and efficiency of disease diagnosis while protecting patient privacy, which is crucial for achieving efficient medical image analysis and accelerating medical research progress. This paper presents an innovative approach to medical image classification, leveraging Federated Learning (FL) to… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Journal ref: Journal of Information, Technology and Policy (2023): 1-12

  12. arXiv:2404.06690  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

    Authors: Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, **yu Li, Lei He, Sheng Zhao, Michael Zeng

    Abstract: Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-rou… ▽ More

    Submitted 29 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  13. arXiv:2404.00656  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    WavLLM: Towards Robust and Adaptive Speech Large Language Model

    Authors: Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Hongkun Hao, **g Pan, Xunying Liu, **yu Li, Sunit Sivasankaran, Linquan Liu, Furu Wei

    Abstract: The recent advancements in large language models (LLMs) have revolutionized the field of natural language processing, progressively broadening their scope to multimodal perception and generation. However, effectively integrating listening capabilities into LLMs poses significant challenges, particularly with respect to generalizing across varied contexts and executing complex auditory tasks. In th… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  14. arXiv:2403.00453  [pdf, ps, other

    eess.SP

    Exploring Fairness for FAS-assisted Communication Systems: from NOMA to OMA

    Authors: Junteng Yao, Liaoshi Zhou, Tuo Wu, Ming **, Cunhua Pan, Maged Elkashlan, Kai-Kit Wong

    Abstract: This paper addresses the fairness issue within fluid antenna system (FAS)-assisted non-orthogonal multiple access (NOMA) and orthogonal multiple access (OMA) systems, where a single fixed-antenna base station (BS) transmits superposition-coded signals to two users, each with a single fluid antenna. We define fairness through the minimization of the maximum outage probability for the two users, und… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  15. arXiv:2401.15803  [pdf, other

    cs.RO cs.AI cs.CV eess.SY

    GarchingSim: An Autonomous Driving Simulator with Photorealistic Scenes and Minimalist Workflow

    Authors: Liguo Zhou, Yinglei Song, Yichao Gao, Zhou Yu, Michael Sodamin, Hongshen Liu, Liang Ma, Lian Liu, Hao Liu, Yang Liu, Haichuan Li, Guang Chen, Alois Knoll

    Abstract: Conducting real road testing for autonomous driving algorithms can be expensive and sometimes impractical, particularly for small startups and research institutes. Thus, simulation becomes an important method for evaluating these algorithms. However, the availability of free and open-source simulators is limited, and the installation and configuration process can be daunting for beginners and inte… ▽ More

    Submitted 30 January, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  16. arXiv:2401.00246  [pdf, other

    cs.CL cs.SD eess.AS

    Boosting Large Language Model for Speech Synthesis: An Empirical Study

    Authors: Hongkun Hao, Long Zhou, Shujie Liu, **yu Li, Shujie Hu, Rui Wang, Furu Wei

    Abstract: Large language models (LLMs) have made significant advancements in natural language processing and are concurrently extending the language ability to other modalities, such as speech and vision. Nevertheless, most of the previous work focuses on prompting LLMs with perception abilities like auditory comprehension, and the effective approach for augmenting LLMs with speech synthesis capabilities re… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

  17. arXiv:2311.15215  [pdf, other

    eess.SP eess.SY

    From OTFS to DD-ISAC: Integrating Sensing and Communications in the Delay Doppler Domain

    Authors: Weijie Yuan, Lin Zhou, Saeid K. Dehkordi, Shuangyang Li, **zhi Fan, Giuseppe Caire, H. Vincent Poor

    Abstract: Next-generation vehicular networks are expected to provide the capability of robust environmental sensing in addition to reliable communications to meet intelligence requirements. A promising solution is the integrated sensing and communication (ISAC) technology, which performs both functionalities using the same spectrum and hardware resources. Most existing works on ISAC consider the Orthogonal… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: Magazine paper submitted to IEEE

  18. arXiv:2311.10416  [pdf, other

    eess.SP

    Meta-DSP: A Meta-Learning Approach for Data-Driven Nonlinear Compensation in High-Speed Optical Fiber Systems

    Authors: Xinyu Xiao, Zhennan Zhou, Bin Dong, Dingjiong Ma, Li Zhou, Jie Sun

    Abstract: Non-linear effects in long-haul, high-speed optical fiber systems significantly hinder channel capacity. While the Digital Backward Propagation algorithm (DBP) with adaptive filter (ADF) can mitigate these effects, it suffers from an overwhelming computational complexity. Recent solutions have incorporated deep neural networks in a data-driven strategy to alleviate this complexity in the DBP model… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  19. arXiv:2311.06491  [pdf, other

    eess.SY

    Nonsmooth-Optimization-Based Bandwidth Optimal Control for Precision Motion Systems

    Authors: **gjie Wu, Lei Zhou

    Abstract: Precision motion systems are at the core of various manufacturing equipment. The rapidly increasing demand for higher productivity necessitates higher control bandwidth in the motion systems to effectively reject disturbances while maintaining excellent positioning accuracy. However, most existing optimal control methods do not explicitly optimize for control bandwidth, and the classic loop-shapin… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

  20. arXiv:2310.17116  [pdf, other

    eess.AS cs.SD

    Real-time Neonatal Chest Sound Separation using Deep Learning

    Authors: Yang Yi Poh, Ethan Grooby, Kenneth Tan, Lindsay Zhou, Arrabella King, Ashwin Ramanathan, Atul Malhotra, Mehrtash Harandi, Faezeh Marzbanrad

    Abstract: Auscultation for neonates is a simple and non-invasive method of providing diagnosis for cardiovascular and respiratory disease. Such diagnosis often requires high-quality heart and lung sounds to be captured during auscultation. However, in most cases, obtaining such high-quality sounds is non-trivial due to the chest sounds containing a mixture of heart, lung, and noise sounds. As such, addition… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  21. arXiv:2310.12405  [pdf, other

    eess.IV cs.CV

    LoMAE: Low-level Vision Masked Autoencoders for Low-dose CT Denoising

    Authors: Dayang Wang, Yongshun Xu, Shuo Han, Zhan Wu, Li Zhou, Bahareh Morovati, Hengyong Yu

    Abstract: Low-dose computed tomography (LDCT) offers reduced X-ray radiation exposure but at the cost of compromised image quality, characterized by increased noise and artifacts. Recently, transformer models emerged as a promising avenue to enhance LDCT image quality. However, the success of such models relies on a large amount of paired noisy and clean images, which are often scarce in clinical settings.… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  22. arXiv:2310.07729  [pdf, other

    cs.RO eess.SY

    Energy-Aware Routing Algorithm for Mobile Ground-to-Air Charging

    Authors: Bill Cai, Fei Lu, Lifeng Zhou

    Abstract: We investigate the problem of energy-constrained planning for a cooperative system of an Unmanned Ground Vehicles (UGV) and an Unmanned Aerial Vehicle (UAV). In scenarios where the UGV serves as a mobile base to ferry the UAV and as a charging station to recharge the UAV, we formulate a novel energy-constrained routing problem. To tackle this problem, we design an energy-aware routing algorithm, a… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

  23. arXiv:2310.00141  [pdf, other

    cs.CL eess.AS

    The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections through Federated Learning

    Authors: Lillian Zhou, Yuxin Ding, Mingqing Chen, Harry Zhang, Rohit Prabhavalkar, Dhruv Guliani, Giovanni Motta, Rajiv Mathews

    Abstract: Automatic speech recognition (ASR) models are typically trained on large datasets of transcribed speech. As language evolves and new terms come into use, these models can become outdated and stale. In the context of models trained on the server but deployed on edge devices, errors may result from the mismatch between server training data and actual on-device usage. In this work, we seek to continu… ▽ More

    Submitted 30 November, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: Accepted to IEEE ASRU 2023

  24. arXiv:2309.14248  [pdf, other

    eess.SY

    Transcending the Acceleration-Bandwidth Trade-off: Lightweight Precision Stages with Active Control of Flexible Dynamics

    Authors: **gjie Wu, Lei Zhou

    Abstract: Micro/Nano-positioning stages are of great importance in a wide range of manufacturing machines and instruments. In recent years, the drastically growing demand for higher throughput and reduced power consumption in various IC manufacturing equipment calls for the development of next-generation precision positioning systems with unprecedented acceleration capability while maintaining exceptional p… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2301.04208; text overlap with arXiv:2309.11735

  25. arXiv:2309.13874  [pdf, other

    eess.AS cs.LG cs.SD

    Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

    Authors: Leying Zhang, Yao Qian, Linfeng Yu, Heming Wang, Xinkai Wang, Hemin Yang, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng

    Abstract: Target Speech Extraction (TSE) is a crucial task in speech processing that focuses on isolating the clean speech of a specific speaker from complex mixtures. While discriminative methods are commonly used for TSE, they can introduce distortion in terms of speech perception quality. On the other hand, generative approaches, particularly diffusion-based methods, can enhance speech quality perceptual… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  26. arXiv:2309.11735  [pdf, other

    eess.SY

    FleXstage: Lightweight Magnetically Levitated Precision Stage with Over-Actuation towards High-Throughput IC Manufacturing

    Authors: **gjie Wu, Lei Zhou

    Abstract: Precision motion stages play a critical role in various manufacturing and inspection equipment, for example, the wafer/reticle scanning in photolithography scanners and positioning stages in wafer inspection systems. To meet the growing demand for higher throughput in chip manufacturing and inspection, it is critical to create new precision motion stages with higher acceleration capability with hi… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  27. arXiv:2309.09953  [pdf, other

    eess.SY math.AP

    PINN-based viscosity solution of HJB equation

    Authors: Tianyu Liu, Steven Ding, Jiarui Zhang, Liutao Zhou

    Abstract: This paper proposed a novel PINN-based viscosity solution for HJB equations. Although there exists work using PINN to solve HJB, but none of them gives the solution in viscosity sense. This paper reveals the fact that using the convex neural network, one can guarantee the viscosity solution and thus the neural network can easily converge to the true solution of HJB despite of the starting point.

    Submitted 18 September, 2023; originally announced September 2023.

  28. arXiv:2308.10157  [pdf, ps, other

    eess.IV cs.CV

    Contrastive Diffusion Model with Auxiliary Guidance for Coarse-to-Fine PET Reconstruction

    Authors: Zeyu Han, Yuhan Wang, Lu** Zhou, Peng Wang, Binyu Yan, Jiliu Zhou, Yan Wang, Dinggang Shen

    Abstract: To obtain high-quality positron emission tomography (PET) scans while reducing radiation exposure to the human body, various approaches have been proposed to reconstruct standard-dose PET (SPET) images from low-dose PET (LPET) images. One widely adopted technique is the generative adversarial networks (GANs), yet recently, diffusion probabilistic models (DPMs) have emerged as a compelling alternat… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: Accepted and presented in MICCAI 2023. To be published in Proceedings

  29. arXiv:2308.04805  [pdf, other

    cs.IR cs.SD eess.AS

    DiVa: An Iterative Framework to Harvest More Diverse and Valid Labels from User Comments for Music

    Authors: Hongru Liang, **gyao Liu, Yuanxin Xiang, Jiachen Du, Lanjun Zhou, Shushen Pan, Wenqiang Lei

    Abstract: Towards sufficient music searching, it is vital to form a complete set of labels for each song. However, current solutions fail to resolve it as they cannot produce diverse enough map**s to make up for the information missed by the gold labels. Based on the observation that such missing information may already be presented in user comments, we propose to study the automated music labeling in an… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: 11 pages, 5 figures, published to ACM MM 2023

  30. arXiv:2307.04015  [pdf, other

    cs.SD cs.MM eess.AS

    Emotion-Guided Music Accompaniment Generation Based on Variational Autoencoder

    Authors: Qi Wang, Shubing Zhang, Li Zhou

    Abstract: Music accompaniment generation is a crucial aspect in the composition process. Deep neural networks have made significant strides in this field, but it remains a challenge for AI to effectively incorporate human emotions to create beautiful accompaniments. Existing models struggle to effectively characterize human emotions within neural network models while composing music. To address this issue,… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: Accepted By International Joint Conference on Neural Networks 2023(IJCNN2023)

  31. arXiv:2307.03917  [pdf, other

    eess.AS cs.CL cs.SD

    On decoder-only architecture for speech-to-text and large language model integration

    Authors: Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, **yu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu

    Abstract: Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been explored well. The "decoder-only" architecture has also not been well studied for speech processing tasks. In this research, we introduce Speech-LLaMA,… ▽ More

    Submitted 2 October, 2023; v1 submitted 8 July, 2023; originally announced July 2023.

  32. arXiv:2305.17778  [pdf

    physics.med-ph eess.IV

    PND-Net: Physics based Non-local Dual-domain Network for Metal Artifact Reduction

    Authors: **qiu Xia, Yiwen Zhou, Hailong Wang, Wenxin Deng, **g Kang, Wangjiang Wu, Mengke Qi, Linghong Zhou, Jianhui Ma, Yuan Xu

    Abstract: Metal artifacts caused by the presence of metallic implants tremendously degrade the reconstructed computed tomography (CT) image quality, affecting clinical diagnosis or reducing the accuracy of organ delineation and dose calculation in radiotherapy. Recently, deep learning methods in sinogram and image domains have been rapidly applied on metal artifact reduction (MAR) task. The supervised dual-… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: 19 pages, 8 figures

  33. arXiv:2305.16107  [pdf, other

    cs.CL cs.SD eess.AS

    VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

    Authors: Tianrui Wang, Long Zhou, Ziqiang Zhang, Yu Wu, Shujie Liu, Yashesh Gaur, Zhuo Chen, **yu Li, Furu Wei

    Abstract: Recent research shows a big convergence in model architecture, training objectives, and inference methods across various tasks for different modalities. In this paper, we propose VioLA, a single auto-regressive Transformer decoder-only network that unifies various cross-modal tasks involving speech and text, such as speech-to-text, text-to-text, text-to-speech, and speech-to-speech tasks, as a con… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: Working in progress

  34. arXiv:2305.14838  [pdf, other

    cs.CL cs.SD eess.AS

    ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

    Authors: Chenyang Le, Yao Qian, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng, Xuedong Huang

    Abstract: Joint speech-language training is challenging due to the large demand for training data and GPU consumption, as well as the modality gap between speech and language. We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models and optimized data-efficiently for spoken language tasks. Particularly, we propose to incorporate… ▽ More

    Submitted 14 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023, Poster

  35. arXiv:2304.10691  [pdf, other

    eess.IV cs.CV cs.LG

    SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual Large Language Model

    Authors: Juexiao Zhou, Xiaonan He, Liyuan Sun, Jiannan Xu, Xiuying Chen, Yuetan Chu, Longxi Zhou, Xingyu Liao, Bin Zhang, Xin Gao

    Abstract: Skin and subcutaneous diseases rank high among the leading contributors to the global burden of nonfatal diseases, impacting a considerable portion of the population. Nonetheless, the field of dermatology diagnosis faces three significant hurdles. Firstly, there is a shortage of dermatologists accessible to diagnose patients, particularly in rural regions. Secondly, accurately interpreting skin di… ▽ More

    Submitted 8 June, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

  36. arXiv:2303.12441  [pdf, ps, other

    cs.NI eess.SP

    AMPLE: An Adaptive Multiple Path Loss Exponent Radio Propagation Model Considering Environmental Factors

    Authors: Lingyou Zhou, Jie Zhang, Jiliang Zhang, Oktay Cetinkaya, Steve Jubb

    Abstract: We present AMPLE -- a novel multiple path loss exponent (PLE) radio propagation model that can adapt to different environmental factors. The proposed model aims at accurately predicting path loss with low computational complexity considering environmental factors. In the proposed model, the scenario under consideration is classified into regions from a raster map, and each type of region is assign… ▽ More

    Submitted 20 August, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

    Comments: This paper has been submitted to IEEE Transactions for possible publications

  37. arXiv:2303.03926  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

    Authors: Ziqiang Zhang, Long Zhou, Chengyi Wang, Sanyuan Chen, Yu Wu, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, **yu Li, Lei He, Sheng Zhao, Furu Wei

    Abstract: We propose a cross-lingual neural codec language model, VALL-E X, for cross-lingual speech synthesis. Specifically, we extend VALL-E and train a multi-lingual conditional codec language model to predict the acoustic token sequences of the target language speech by using both the source language speech and the target language text as prompts. VALL-E X inherits strong in-context learning capabilitie… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: We encourage readers to listen to the audio samples on our demo page: \url{https://aka.ms/vallex}

  38. arXiv:2303.00786  [pdf

    cs.CL eess.AS

    Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

    Authors: Eric Sun, **yu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong

    Abstract: We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference. Our method incorporates a gating mechanism and LID loss, enabling transformer experts to learn language-specific information. By combining gated transformer experts with shared transformer layers, we const… ▽ More

    Submitted 7 July, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  39. arXiv:2302.12052  [pdf, other

    cs.CV cs.LG eess.IV

    Attention Mechanism for Contrastive Learning in GAN-based Image-to-Image Translation

    Authors: Hanzhen Zhang, Liguo Zhou, Ruining Wang, Alois Knoll

    Abstract: Using real road testing to optimize autonomous driving algorithms is time-consuming and capital-intensive. To solve this problem, we propose a GAN-based model that is capable of generating high-quality images across different domains. We further leverage Contrastive Learning to train the model in a self-supervised way using image data acquired in the real world using real sensors and simulated ima… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

  40. arXiv:2302.11795  [pdf, other

    eess.IV cs.CV cs.LG

    Bridging Synthetic and Real Images: a Transferable and Multiple Consistency aided Fundus Image Enhancement Framework

    Authors: Erjian Guo, Huazhu Fu, Lu** Zhou, Dong Xu

    Abstract: Deep learning based image enhancement models have largely improved the readability of fundus images in order to decrease the uncertainty of clinical observations and the risk of misdiagnosis. However, due to the difficulty of acquiring paired real fundus images at different qualities, most existing methods have to adopt synthetic image pairs as training data. The domain shift between the synthetic… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

  41. Personalized and privacy-preserving federated heterogeneous medical image analysis with PPPML-HMI

    Authors: Juexiao Zhou, Longxi Zhou, Di Wang, Xiaopeng Xu, Haoyang Li, Yuetan Chu, Wenkai Han, Xin Gao

    Abstract: Heterogeneous data is endemic due to the use of diverse models and settings of devices by hospitals in the field of medical imaging. However, there are few open-source frameworks for federated heterogeneous medical image analysis with personalization and privacy protection simultaneously without the demand to modify the existing model structures or to share any private data. In this paper, we prop… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

  42. arXiv:2301.04208  [pdf, other

    eess.SY

    Sequential Structure and Control Co-design of Lightweight Precision Stages with Active control of flexible modes

    Authors: **gjie Wu, Lei Zhou

    Abstract: Precision motion stages are playing a prominent role in various manufacturing equipment. The drastically increasing demand for higher throughput in integrated circuit (IC) manufacturing and inspection calls for the next-generation precision stages that have light weight and high control bandwidth simultaneously. In today's design techniques, the stage's first flexible mode is limiting its achievab… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

  43. arXiv:2301.02111  [pdf, other

    cs.CL cs.SD eess.AS

    Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

    Authors: Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, **yu Li, Lei He, Sheng Zhao, Furu Wei

    Abstract: We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called Vall-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

    Comments: Working in progress

  44. arXiv:2212.05557   

    eess.SY

    Provably High-Quality Solutions for the Liquid Medical Oxygen Allocation Problem

    Authors: Lejun Zhou, Lavanya Marla, Varun Gupta, Ankur Mani

    Abstract: Oxygen is an essential life-saving medicine used in several indications at all levels of healthcare. During the COVID-19 pandemic, the demand for liquid medical oxygen (LMO) has increased significantly due to the occurrence of lung infections in many patients. However, many countries and regions are not prepared for the emergence of this phenomenon, and the limited supply of LMO has resulted in un… ▽ More

    Submitted 9 May, 2023; v1 submitted 11 December, 2022; originally announced December 2022.

    Comments: Have some mistakes

  45. arXiv:2212.05240  [pdf, other

    eess.SY

    Protocol selection for second-order consensus against disturbance

    Authors: Jiamin Wang, Liqi Zhou, Dong Zhang, Jian Liu, Yuanshi Zheng

    Abstract: Noticing that both the absolute and relative velocity protocols can solve the second-order consensus of multi-agent systems, this paper aims to investigate which of the above two protocols has better anti-disturbance capability, in which the anti-disturbance capability is measured by the L2 gain from the disturbance to the consensus error. More specifically, by the orthogonal transformation techni… ▽ More

    Submitted 10 December, 2022; originally announced December 2022.

  46. arXiv:2212.03505   

    eess.SY

    A Four-stage Heuristic Algorithm for Solving On-demand Meal Delivery Routing Problem

    Authors: Lejun Zhou, Anke Ye, Simon Hu

    Abstract: Meal delivery services provided by platforms with integrated delivery systems are becoming increasingly popular. This paper adopts a rolling horizon approach to solve the meal delivery routing problem (MDRP). To improve delivery efficiency in scenarios with high delivery demand, multiple orders are allowed to be combined into one bundle with orders from different restaurants. Following this strate… ▽ More

    Submitted 9 May, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: Need to be updated

  47. arXiv:2211.11275  [pdf, other

    eess.AS cs.AI cs.CL cs.CV cs.SD

    VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

    Authors: Qiushi Zhu, Long Zhou, Ziqiang Zhang, Shujie Liu, Binxing Jiao, Jie Zhang, Lirong Dai, Daxin Jiang, **yu Li, Furu Wei

    Abstract: Although speech is a simple and effective way for humans to communicate with the outside world, a more realistic speech interaction contains multimodal information, e.g., vision, text. How to design a unified framework to integrate different modal information and leverage different resources (e.g., visual-audio pairs, audio-text pairs, unlabeled speech, and unlabeled text) to facilitate speech rep… ▽ More

    Submitted 19 May, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: 11 pages, Accepted by IEEE Transactions on Multimedia

  48. arXiv:2211.08284  [pdf, other

    eess.IV

    Spatially Exclusive Pasting: A General Data Augmentation for the Polyp Segmentation

    Authors: Lei Zhou

    Abstract: Automated polyp segmentation technology plays an important role in diagnosing intestinal diseases, such as tumors and precancerous lesions. Previous works have typically trained convolution-based U-Net or Transformer-based neural network architectures with labeled data. However, the available public polyp segmentation datasets are too small to train the network sufficiently, suppressing each netwo… ▽ More

    Submitted 17 November, 2022; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: 5 pages

  49. arXiv:2211.05983  [pdf, other

    cs.SD eess.AS

    Acoustic Pornography Recognition Using Convolutional Neural Networks and Bag of Refinements

    Authors: Lifeng Zhou, Kaifeng Wei, Yuke Li, Yiya Hao, Weiqiang Yang, Haoqi Zhu

    Abstract: A large number of pornographic audios publicly available on the Internet seriously threaten the mental and physical health of children, but these audios are rarely detected and filtered. In this paper, we firstly propose a convolutional neural networks (CNN) based model for acoustic pornography recognition. Then, we research a collection of refinements and verify their effectiveness through ablati… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

  50. arXiv:2211.02915  [pdf

    eess.IV cs.CV

    ESKNet-An enhanced adaptive selection kernel convolution for breast tumors segmentation

    Authors: Gong** Chen, Lu Zhou, Jianxun Zhang, Xiaotao Yin, Liang Cui, Yu Dai

    Abstract: Breast cancer is one of the common cancers that endanger the health of women globally. Accurate target lesion segmentation is essential for early clinical intervention and postoperative follow-up. Recently, many convolutional neural networks (CNNs) have been proposed to segment breast tumors from ultrasound images. However, the complex ultrasound pattern and the variable tumor shape and size bring… ▽ More

    Submitted 20 January, 2024; v1 submitted 5 November, 2022; originally announced November 2022.

    Comments: 12 pages, 8 figures