Skip to main content

Showing 1–50 of 65 results for author: Lei, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.03210  [pdf, other

    cs.CV eess.IV

    HDR Imaging for Dynamic Scenes with Events

    Authors: Li Xiaopeng, Zeng Zhaoyuan, Fan Cien, Zhao Chen, Deng Lei, Yu Lei

    Abstract: High dynamic range imaging (HDRI) for real-world dynamic scenes is challenging because moving objects may lead to hybrid degradation of low dynamic range and motion blur. Existing event-based approaches only focus on a separate task, while cascading HDRI and motion deblurring would lead to sub-optimal solutions, and unavailable ground-truth sharp HDR images aggravate the predicament. To address th… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  2. arXiv:2404.02663  [pdf

    eess.SP cs.IT

    Ground-to-UAV sub-Terahertz channel measurement and modeling

    Authors: Da Li, Peian Li, Jiabiao Zhao, Jianjian Liang, Jiacheng Liu, Guohao Liu, Yuanshuai Lei, Wenbo Liu, Jianqin Deng, Fuyong Liu, Jianjun Ma

    Abstract: Unmanned Aerial Vehicle (UAV) assisted terahertz (THz) wireless communications have been expected to play a vital role in the next generation of wireless networks. UAVs can serve as either repeaters or data collectors within the communication link, thereby potentially augmenting the efficacy of communication systems. Despite their promise, the channel analysis and modeling specific to THz wireless… ▽ More

    Submitted 28 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: Submitted to Optics Express

  3. arXiv:2402.06073  [pdf

    cs.CL cs.SD eess.AS

    LightCAM: A Fast and Light Implementation of Context-Aware Masking based D-TDNN for Speaker Verification

    Authors: Di Cao, Xianchen Wang, Junfeng Zhou, Jiakai Zhang, Yan**g Lei, Wenpeng Chen

    Abstract: Traditional Time Delay Neural Networks (TDNN) have achieved state-of-the-art performance at the cost of high computational complexity and slower inference speed, making them difficult to implement in an industrial environment. The Densely Connected Time Delay Neural Network (D-TDNN) with Context Aware Masking (CAM) module has proven to be an efficient structure to reduce complexity while maintaini… ▽ More

    Submitted 12 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  4. HOPE: Hybrid-granularity Ordinal Prototype Learning for Progression Prediction of Mild Cognitive Impairment

    Authors: Chenhui Wang, Yiming Lei, Tao Chen, Jun** Zhang, Yuxin Li, Hongming Shan

    Abstract: Mild cognitive impairment (MCI) is often at high risk of progression to Alzheimer's disease (AD). Existing works to identify the progressive MCI (pMCI) typically require MCI subtype labels, pMCI vs. stable MCI (sMCI), determined by whether or not an MCI patient will progress to AD after a long follow-up. However, prospectively acquiring MCI subtype data is time-consuming and resource-intensive; th… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: IEEE Journal of Biomedical and Health Informatics, 2024

    Journal ref: IEEE Journal of Biomedical and Health Informatics, 2024

  5. arXiv:2312.16850  [pdf, other

    cs.SD eess.AS

    Accent-VITS:accent transfer for end-to-end TTS

    Authors: Linhan Ma, Yongmao Zhang, Xinfa Zhu, Yi Lei, Ziqian Ning, Pengcheng Zhu, Lei Xie

    Abstract: Accent transfer aims to transfer an accent from a source speaker to synthetic speech in the target speaker's voice. The main challenge is how to effectively disentangle speaker timbre and accent which are entangled in speech. This paper presents a VITS-based end-to-end accent transfer model named Accent-VITS.Based on the main structure of VITS, Accent-VITS makes substantial improvements to enable… ▽ More

    Submitted 29 December, 2023; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted by NCMMSC2023

  6. arXiv:2312.15380  [pdf, other

    cs.NI eess.SP

    Battery-Care Resource Allocation and Task Offloading in Multi-Agent Post-Disaster MEC Environment

    Authors: Yiwei Tang, Hualong Huang, Wenhan Zhan, Geyong Min, Zhekai Duan, Yuchuan Lei

    Abstract: Being an up-and-coming application scenario of mobile edge computing (MEC), the post-disaster rescue suffers multitudinous computing-intensive tasks but unstably guaranteed network connectivity. In rescue environments, quality of service (QoS), such as task execution delay, energy consumption and battery state of health (SoH), is of significant meaning. This paper studies a multi-user post-disaste… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: accepted by wcnc2024

  7. arXiv:2311.16531  [pdf

    physics.app-ph eess.SY physics.ao-ph

    Channel Modeling for Terahertz Communications in Rain

    Authors: Peian Li, Wenbo Liu, Jiacheng Liu, Da Li, Guohao Liu, Yuanshuai Lei, Jiabiao Zhao, Xiaopeng Wang, Houjun Sun, Jianjun Ma, John F. Federici

    Abstract: Terahertz (THz) communication channels, integral to outdoor applications, are critically influenced by natural factors like rainfall. Our research focused on the nuanced effects of rain on these channels, employing an advanced rainfall emulation system. By analyzing key parameters such as rain rate, altitude based variations in rainfall, and diverse raindrop sizes, we identified the paramount sign… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: submitted to IEEE Transactions on Antennas and Propagation

  8. arXiv:2311.06712  [pdf, other

    eess.IV

    PuzzleTuning: Explicitly Bridge Pathological and Natural Image with Puzzles

    Authors: Tianyi Zhang, Shangqing Lyu, Yanli Lei, Sicheng Chen, Nan Ying, Yufang He, Yu Zhao, Yunlu Feng, Hwee Kuan Lee, Guanglei Zhang

    Abstract: Pathological image analysis is a crucial field in computer vision. Due to the annotation scarcity in the pathological field, pre-training with self-supervised learning (SSL) is widely applied to learn on unlabeled images. However, the current SSL-based pathological pre-training: (1) does not explicitly explore the essential focuses of the pathological field, and (2) does not effectively bridge wit… ▽ More

    Submitted 22 April, 2024; v1 submitted 11 November, 2023; originally announced November 2023.

    Comments: 13 pages, 9 figures, 8 tables

  9. arXiv:2310.17902  [pdf

    eess.IV

    CPIA Dataset: A Comprehensive Pathological Image Analysis Dataset for Self-supervised Learning Pre-training

    Authors: Nan Ying, Yanli Lei, Tianyi Zhang, Shangqing Lyu, Chunhui Li, Sicheng Chen, Zeyu Liu, Yu Zhao, Guanglei Zhang

    Abstract: Pathological image analysis is a crucial field in computer-aided diagnosis, where deep learning is widely applied. Transfer learning using pre-trained models initialized on natural images has effectively improved the downstream pathological performance. However, the lack of sophisticated domain-specific pathological initialization hinders their potential. Self-supervised learning (SSL) enables pre… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

  10. arXiv:2310.17101  [pdf, other

    eess.AS cs.SD

    Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning

    Authors: Xinfa Zhu, Yuke Li, Yi Lei, Ning Jiang, Guoqing Zhao, Lei Xie

    Abstract: This paper aims to build a multi-speaker expressive TTS system, synthesizing a target speaker's speech with multiple styles and emotions. To this end, we propose a novel contrastive learning-based TTS approach to transfer style and emotion across speakers. Specifically, contrastive learning from different levels, i.e. utterance and category level, is leveraged to extract the disentangled style, em… ▽ More

    Submitted 25 April, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: 6 pages, 4 figures; Accepted by ICME 2024

  11. arXiv:2310.15011  [pdf, ps, other

    cs.IT eess.SP

    Interference Management by Harnessing Multi-Domain Resources in Spectrum-Sharing Aided Satellite-Ground Integrated Networks

    Authors: Xiao** Ding, Yue Lei, Yulong Zou, Gengxin Zhang, Lajos Hanzo

    Abstract: A spectrum-sharing satellite-ground integrated network is conceived, consisting of a pair of non-geostationary orbit (NGSO) constellations and multiple terrestrial base stations, which impose the co-frequency interference (CFI) on each other. The CFI may increase upon increasing the number of satellites. To manage the potentially severe interference, we propose to rely on joint multi-domain resour… ▽ More

    Submitted 29 January, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Submitted to IEEE Transactions on Vehicular Technology

  12. arXiv:2310.07246  [pdf, other

    cs.SD eess.AS

    Vec-Tok Speech: speech vectorization and tokenization for neural speech generation

    Authors: Xinfa Zhu, Yuanjun Lv, Yi Lei, Tao Li, Wendi He, Hongbin Zhou, Heng Lu, Lei Xie

    Abstract: Language models (LMs) have recently flourished in natural language processing and computer vision, generating high-fidelity texts or images in various tasks. In contrast, the current speech generative models are still struggling regarding speech quality and task generalization. This paper presents Vec-Tok Speech, an extensible framework that resembles multiple speech generation tasks, generating e… ▽ More

    Submitted 12 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 15 pages, 2 figures

  13. arXiv:2310.05118  [pdf, other

    cs.SD eess.AS

    VITS-based Singing Voice Conversion System with DSPGAN post-processing for SVCC2023

    Authors: Yiquan Zhou, Meng Chen, Yi Lei, Jihua Zhu, Weifeng Zhao

    Abstract: This paper presents the T02 team's system for the Singing Voice Conversion Challenge 2023 (SVCC2023). Our system entails a VITS-based SVC model, incorporating three modules: a feature extractor, a voice converter, and a post-processor. Specifically, the feature extractor provides F0 contours and extracts speaker-independent linguistic content from the input singing voice by leveraging a HuBERT mod… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU2023

  14. arXiv:2310.05001  [pdf, other

    cs.SD eess.AS

    PromptSpeaker: Speaker Generation Based on Text Descriptions

    Authors: Yongmao Zhang, Guanghou Liu, Yi Lei, Yunlin Chen, Hao Yin, Lei Xie, Zhifei Li

    Abstract: Recently, text-guided content generation has received extensive attention. In this work, we explore the possibility of text description-based speaker generation, i.e., using text prompts to control the speaker generation process. Specifically, we propose PromptSpeaker, a text-guided speaker generation system. PromptSpeaker consists of a prompt encoder, a zero-shot VITS, and a Glow model, where the… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to ASRU 2023

  15. arXiv:2310.03963  [pdf, other

    cs.SD eess.AS

    Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis

    Authors: Yuke Li, Xinfa Zhu, Yi Lei, Hai Li, Junhui Liu, Danming Xie, Lei Xie

    Abstract: Zero-shot emotion transfer in cross-lingual speech synthesis aims to transfer emotion from an arbitrary speech reference in the source language to the synthetic speech in the target language. Building such a system faces challenges of unnatural foreign accents and difficulty in modeling the shared emotional expressions of different languages. Building on the DelightfulTTS neural architecture, this… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU2023

  16. arXiv:2310.00593  [pdf, other

    eess.SP

    Nonlinear Multi-Carrier System with Signal Clip**: Measurement, Analysis, and Optimization

    Authors: Yuyang Du, Liang Hao, Yiming Lei, Qun Yang, Shiqi Xu

    Abstract: Signal clip** is a classic technique for reducing peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. It has been widely applied in consumer electronic devices owing to its low complexity and high efficiency. Although clip** reduces the nonlinear distortion caused by power amplifiers (PAs), it induces additional clip** distortion. Optimizing the j… ▽ More

    Submitted 16 February, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

  17. arXiv:2309.09262  [pdf, other

    eess.AS cs.SD

    PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts

    Authors: Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, **g**g Yin, Hongbin Zhou, Heng Lu, Lei Xie

    Abstract: Style voice conversion aims to transform the style of source speech to a desired style according to real-world application demands. However, the current style voice conversion approach relies on pre-defined labels or reference speech to control the conversion process, which leads to limitations in style diversity or falls short in terms of the intuitive and interpretability of style representation… ▽ More

    Submitted 26 December, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

  18. arXiv:2308.08968  [pdf, other

    eess.SP cs.IT

    On the Performance of Multidimensional Constellation Sha** for Linear and Nonlinear Optical Fiber Channel

    Authors: Bin Chen, Zhiwei Liang, Shen Li, Yi Lei, Gabriele Liga, Alex Alvarado

    Abstract: Multidimensional constellation sha** of up to 32 dimensions with different spectral efficiencies are compared through AWGN and fiber-optic simulations. The results show that no constellation is universal and the balance of required and effective SNRs should be jointly considered for the specific optical transmission scenario.

    Submitted 18 October, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: The paper has been accepted by the ECOC 2023

  19. arXiv:2308.06776  [pdf, other

    eess.IV cs.CV

    Unsupervised Image Denoising in Real-World Scenarios via Self-Collaboration Parallel Generative Adversarial Branches

    Authors: Xin Lin, Chao Ren, Xiao Liu, Jie Huang, Yinjie Lei

    Abstract: Deep learning methods have shown remarkable performance in image denoising, particularly when trained on large-scale paired datasets. However, acquiring such paired datasets for real-world scenarios poses a significant challenge. Although unsupervised approaches based on generative adversarial networks offer a promising solution for denoising without paired datasets, they are difficult in surpassi… ▽ More

    Submitted 13 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023

  20. arXiv:2307.15951  [pdf, other

    eess.AS

    METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer

    Authors: Xinfa Zhu, Yi Lei, Tao Li, Yongmao Zhang, Hongbin Zhou, Heng Lu, Lei Xie

    Abstract: Previous multilingual text-to-speech (TTS) approaches have considered leveraging monolingual speaker data to enable cross-lingual speech synthesis. However, such data-efficient approaches have ignored synthesizing emotional aspects of speech due to the challenges of cross-speaker cross-lingual emotion transfer - the heavy entanglement of speaker timbre, emotion, and language factors in the speech… ▽ More

    Submitted 29 July, 2023; originally announced July 2023.

    Comments: 10 pages, 3 figures

  21. arXiv:2307.04630  [pdf, other

    cs.SD eess.AS

    The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task

    Authors: Kun Song, Yi lei, Peikun Chen, Yiqing Cao, Kun Wei, Yongmao Zhang, Lei Xie, Ning Jiang, Guoqing Zhao

    Abstract: This paper describes the NPU-MSXF system for the IWSLT 2023 speech-to-speech translation (S2ST) task which aims to translate from English speech of multi-source to Chinese speech. The system is built in a cascaded manner consisting of automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS). We make tremendous efforts to handle the challenging multi-source input. Spec… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: IWSLT@ACL 2023 system paper. Our submitted system ranks 1st in the S2ST task of the IWSLT 2023 evaluation campaign

  22. arXiv:2305.19522  [pdf, other

    cs.SD eess.AS

    PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions

    Authors: Guanghou Liu, Yongmao Zhang, Yi Lei, Yunlin Chen, Rui Wang, Zhifei Li, Lei Xie

    Abstract: Style transfer TTS has shown impressive performance in recent years. However, style control is often restricted to systems built on expressive speech recordings with discrete style categories. In practical situations, users may be interested in transferring style by ty** text descriptions of desired styles, without the reference speech in the target style. The text-guided content generation tech… ▽ More

    Submitted 1 June, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

  23. arXiv:2305.17732  [pdf, other

    cs.SD eess.AS

    StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation

    Authors: Kun Song, Yi Ren, Yi Lei, Chunfeng Wang, Kun Wei, Lei Xie, Xiang Yin, Zejun Ma

    Abstract: Direct speech-to-speech translation (S2ST) has gradually become popular as it has many advantages compared with cascade S2ST. However, current research mainly focuses on the accuracy of semantic translation and ignores the speech style transfer from a source language to a target language. The lack of high-fidelity expressive parallel data makes such style transfer challenging, especially in more p… ▽ More

    Submitted 25 July, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023

  24. FAN-Net: Fourier-Based Adaptive Normalization For Cross-Domain Stroke Lesion Segmentation

    Authors: Weiyi Yu, Yiming Lei, Hongming Shan

    Abstract: Since stroke is the main cause of various cerebrovascular diseases, deep learning-based stroke lesion segmentation on magnetic resonance (MR) images has attracted considerable attention. However, the existing methods often neglect the domain shift among MR images collected from different sites, which has limited performance improvement. To address this problem, we intend to change style informatio… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

    Comments: Accepted by IEEE ICASSP 2023

    Journal ref: IEEE ICASSP 2023

  25. arXiv:2302.06831  [pdf, other

    eess.SP cs.IT

    Analytical Model of Nonlinear Fiber Propagation for General Dual-Polarization Four-Dimensional Modulation Format

    Authors: Zhiwei Liang, Bin Chen, Yi Lei, Gabriele Liga, Alex Alvarado

    Abstract: Coherent dual-polarization (DP) optical transmission systems encode information on the four available degrees of freedom of an optical field: the two polarization states, each with two quadrature components. Such systems naturally operate based on a four-dimensional (4D) signal space. Having a general analytical model to accurately estimate nonlinear interference (NLI) is key to analyze such trans… ▽ More

    Submitted 9 October, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: 12 pages,8 figures

  26. arXiv:2212.01546  [pdf, other

    cs.SD eess.AS

    UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis

    Authors: Yi Lei, Shan Yang, Xinsheng Wang, Qicong Xie, Jixun Yao, Lei Xie, Dan Su

    Abstract: Text-to-speech (TTS) and singing voice synthesis (SVS) aim at generating high-quality speaking and singing voice according to textual input and music scores, respectively. Unifying TTS and SVS into a single system is crucial to the applications requiring both of them. Existing methods usually suffer from some limitations, which rely on either both singing and speaking data from the same person or… ▽ More

    Submitted 6 December, 2022; v1 submitted 3 December, 2022; originally announced December 2022.

  27. arXiv:2211.10568  [pdf, other

    eess.AS cs.SD

    Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling

    Authors: Xinfa Zhu, Yi Lei, Kun Song, Yongmao Zhang, Tao Li, Lei Xie

    Abstract: This paper aims to synthesize the target speaker's speech with desired speaking style and emotion by transferring the style and emotion from reference speech recorded by other speakers. We address this challenging problem with a two-stage framework composed of a text-to-style-and-emotion (Text2SE) module and a style-and-emotion-to-wave (SE2Wave) module, bridging by neural bottleneck (BN) features.… ▽ More

    Submitted 14 March, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: Accepted by ICASSP2023

  28. arXiv:2211.03038  [pdf, other

    eess.AS cs.CR cs.SD

    Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling

    Authors: Jixun Yao, Qing Wang, Yi Lei, Pengcheng Guo, Lei Xie, Namin Wang, Jie Liu

    Abstract: Speech data on the Internet are proliferating exponentially because of the emergence of social media, and the sharing of such personal data raises obvious security and privacy concerns. One solution to mitigate these concerns involves concealing speaker identities before sharing speech data, also referred to as speaker anonymization. In our previous work, we have developed an automatic speaker ver… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  29. arXiv:2211.03036  [pdf, other

    eess.AS cs.SD

    Preserving background sound in noise-robust voice conversion via multi-task learning

    Authors: Jixun Yao, Yi Lei, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie, Hai Li, Junhui Liu, Danming Xie

    Abstract: Background sound is an informative form of art that is helpful in providing a more immersive experience in real-application voice conversion (VC) scenarios. However, prior research about VC, mainly focusing on clean voices, pay rare attention to VC with background sound. The critical problem for preserving background sound in VC is inevitable speech distortion by the neural separation model and th… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  30. arXiv:2211.01087  [pdf, other

    cs.SD eess.AS

    DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP

    Authors: Kun Song, Yongmao Zhang, Yi Lei, Jian Cong, Hanzhao Li, Lei Xie, Gang He, **feng Bai

    Abstract: Recent development of neural vocoders based on the generative adversarial neural network (GAN) has shown obvious advantages of generating raw waveform conditioned on mel-spectrogram with fast inference speed and lightweight networks. Whereas, it is still challenging to train a universal neural vocoder that can synthesize high-fidelity speech from various scenarios with unseen speakers, languages,… ▽ More

    Submitted 28 May, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted to ICASSP 2023

  31. arXiv:2209.04854  [pdf, other

    eess.SY cs.LG

    Performance-Driven Controller Tuning via Derivative-Free Reinforcement Learning

    Authors: Yuheng Lei, Jianyu Chen, Shengbo Eben Li, Sifa Zheng

    Abstract: Choosing an appropriate parameter set for the designed controller is critical for the final performance but usually requires a tedious and careful tuning process, which implies a strong need for automatic tuning methods. However, among existing methods, derivative-free ones suffer from poor scalability or low efficiency, while gradient-based ones are often unavailable due to possibly non-different… ▽ More

    Submitted 11 September, 2022; originally announced September 2022.

    Comments: Accepted by the 61st IEEE Conference on Decision and Control (CDC), 2022. Copyright @IEEE

  32. arXiv:2208.13686  [pdf, other

    eess.IV cs.CV physics.med-ph

    Deformable Image Registration using Unsupervised Deep Learning for CBCT-guided Abdominal Radiotherapy

    Authors: Huiqiao Xie, Yang Lei, Yabo Fu, Tonghe Wang, Justin Roper, Jeffrey D. Bradley, Pretesh Patel, Tian Liu, Xiaofeng Yang

    Abstract: CBCTs in image-guided radiotherapy provide crucial anatomy information for patient setup and plan evaluation. Longitudinal CBCT image registration could quantify the inter-fractional anatomic changes. The purpose of this study is to propose an unsupervised deep learning based CBCT-CBCT deformable image registration. The proposed deformable registration workflow consists of training and inference s… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

  33. arXiv:2208.06833  [pdf

    eess.IV cs.CV

    Shuffle Instances-based Vision Transformer for Pancreatic Cancer ROSE Image Classification

    Authors: Tianyi Zhang, Youdan Feng, Yunlu Feng, Yu Zhao, Yanli Lei, Nan Ying, Zhiling Yan, Yufang He, Guanglei Zhang

    Abstract: The rapid on-site evaluation (ROSE) technique can signifi-cantly accelerate the diagnosis of pancreatic cancer by im-mediately analyzing the fast-stained cytopathological images. Computer-aided diagnosis (CAD) can potentially address the shortage of pathologists in ROSE. However, the cancerous patterns vary significantly between different samples, making the CAD task extremely challenging. Besides… ▽ More

    Submitted 14 August, 2022; originally announced August 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2206.03080

  34. arXiv:2207.10910  [pdf, ps, other

    cs.IT eess.SP

    Delay-Doppler Reversal for OTFS System in Doubly-selective Fading Channels

    Authors: Xiangxiang Li, Haiyan Wang, Yao Ge, Xiaohong Shen, Yuanyuan Lei

    Abstract: The recent proposed orthogonal time frequency space (OTFS) modulation shows signifcant advantages than conventional orthogonal frequency division multiplexing (OFDM) for high mobility wireless communications. However, a challenging problem is the development of effcient receivers for practical OTFS systems with low complexity. In this paper, we propose a novel delay-Doppler reversal (DDR) technolo… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

  35. arXiv:2207.01832  [pdf, other

    cs.SD eess.AS

    Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion

    Authors: Yi Lei, Shan Yang, Jian Cong, Lei Xie, Dan Su

    Abstract: The zero-shot scenario for speech generation aims at synthesizing a novel unseen voice with only one utterance of the target speaker. Although the challenges of adapting new voices in zero-shot scenario exist in both stages -- acoustic modeling and vocoder, previous works usually consider the problem from only one stage. In this paper, we extend our previous Glow-WaveGAN to Glow-WaveGAN 2, aiming… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

  36. Geometrically-Shaped Multi-Dimensional Modulation Formats in Coherent Optical Transmission Systems

    Authors: Bin Chen, Yi Lei, Gabriele Liga, Zhiwei Liang, Wei Ling, Xuwei Xue, Alex Alvarado

    Abstract: Sha** modulation formats in multi-dimensional (MD) space is an effective approach to harvest spectral efficiency gains in both the additive white Gaussian noise (AWGN) channel and the optical fiber channel. In the first part of this paper, existing MD geometrically-shaped modulations for fiber optical communications are reviewed. It is shown that large gains can be obtained by exploiting correla… ▽ More

    Submitted 31 August, 2022; v1 submitted 3 July, 2022; originally announced July 2022.

    Comments: 14 pages, 10 figures, accepted by JLT

  37. arXiv:2206.07569  [pdf, other

    eess.AS cs.SD

    End-to-End Voice Conversion with Information Perturbation

    Authors: Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su

    Abstract: The ideal goal of voice conversion is to convert the source speaker's speech to sound naturally like the target speaker while maintaining the linguistic content and the prosody of the source speech. However, current approaches are insufficient to achieve comprehensive source prosody transfer and target speaker timbre preservation in the converted speech, and the quality of the converted speech is… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

  38. arXiv:2206.00866  [pdf, other

    eess.SP cs.IT

    Analytical SNR Prediction in Long-Haul Optical Transmission using General Dual-Polarization 4D Formats

    Authors: Zhiwei Liang, Bin Chen, Yi Lei, Gabriele Liga, Alex Alvarado

    Abstract: Nonlinear interference models for dual-polarization 4D (DP-4D) modulation have only been used so far to predict signal-signal nonlinear interference. We show that including the signal-noise term in the prediction of the effective signal-to-noise ratio in long distance DP-4D transmission improves the accuracy by up to 0.2 dB.

    Submitted 15 July, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

    Comments: 4 pages

  39. arXiv:2201.12518  [pdf, other

    cs.LG cs.AI eess.SY

    Zeroth-Order Actor-Critic

    Authors: Yuheng Lei, Jianyu Chen, Shengbo Eben Li, Sifa Zheng

    Abstract: The recent advanced evolution-based zeroth-order optimization methods and the policy gradient-based first-order methods are two promising alternatives to solve reinforcement learning (RL) problems with complementary advantages. The former methods work with arbitrary policies, drive state-dependent and temporally-extended exploration, possess robustness-seeking property, but suffer from high sample… ▽ More

    Submitted 11 June, 2022; v1 submitted 29 January, 2022; originally announced January 2022.

  40. arXiv:2201.06460  [pdf, other

    cs.SD eess.AS

    MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis

    Authors: Yi Lei, Shan Yang, Xinsheng Wang, Lei Xie

    Abstract: Expressive synthetic speech is essential for many human-computer interaction and audio broadcast scenarios, and thus synthesizing expressive speech has attracted much attention in recent years. Previous methods performed the expressive speech synthesis either with explicit labels or with a fixed-length style embedding extracted from reference audio, both of which can only learn an average style an… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

  41. arXiv:2112.12377  [pdf, other

    eess.SP

    Shaped Four-Dimensional Modulation Formats for Optical Fiber Communication Systems

    Authors: Bin Chen, Gabriele Liga, Yi Lei, Wei Ling, Zhengyan Huan, Xuwei Xue, Alex Alvarado

    Abstract: We review the design of multidimensional modulations by maximizing generalized mutual information and compare the maximum transmission reach of recently introduced 4D formats. A model-based optimization for nonlinear-tolerant 4D modulations is also discussed.

    Submitted 23 December, 2021; originally announced December 2021.

    Comments: OFC2022 invited paper

  42. Low-Complexity Geometrical Sha** for 4D Modulation Formats via Amplitude Coding

    Authors: Bin Chen, Wei Ling, Yunus Can Gültekin, Yi Lei, Chigo Okonkwo, Alex Alvarado

    Abstract: Signal sha** is vital to approach Shannon's capacity, yet it is challenging to implement at very high speeds. For example, probabilistic sha** often requires arithmetic coding to realize the target distribution. Geometric sha** requires look-up tables to store the constellation points. In this paper, we propose a four-dimensional amplitude coding (4D-AC) geometrical shaper architecture. The… ▽ More

    Submitted 29 October, 2021; originally announced October 2021.

    Comments: 4 pages, 5 figures, Accepted by IEEE Photonics Technology Letter

  43. arXiv:2105.05419  [pdf, other

    cs.IT eess.SP

    On Parameter Optimization and Reach Enhancement for the Improved Soft-Aided Staircase Decoder

    Authors: Yi Lei, Bin Chen, Gabriele Liga, Alex Alvarado

    Abstract: The so-called improved soft-aided bit-marking algorithm was recently proposed for staircase codes (SCCs) in the context of fiber optical communications. This algorithm is known as iSABM-SCC. With the help of channel soft information, the iSABM-SCC decoder marks bits via thresholds to deal with both miscorrections and failures of hard-decision (HD) decoding. In this paper, we study iSABM-SCC focusi… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: 5pages,5 figures

  44. arXiv:2103.13588  [pdf, other

    eess.IV cs.CV physics.med-ph

    Artificial Intelligence in Tumor Subregion Analysis Based on Medical Imaging: A Review

    Authors: Mingquan Lin, Jacob Wynne, Yang Lei, Tonghe Wang, Walter J. Curran, Tian Liu, Xiaofeng Yang

    Abstract: Medical imaging is widely used in cancer diagnosis and treatment, and artificial intelligence (AI) has achieved tremendous success in various tasks of medical image analysis. This paper reviews AI-based tumor subregion analysis in medical imaging. We summarize the latest AI-based methods for tumor subregion analysis and their applications. Specifically, we categorize the AI-based methods by traini… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

  45. A Soft-Aided Staircase Decoder Using Three-Level Channel Reliabilities

    Authors: Yi Lei, Bin Chen, Gabriele Liga, Alexios Balatsoukas-Stimming, Kaixuan Sun, Alex Alvarado

    Abstract: The soft-aided bit-marking (SABM) algorithm is based on the idea of marking bits as highly reliable bits (HRBs), highly unreliable bits (HUBs), and uncertain bits to improve the performance of hard-decision (HD) decoders. The HRBs and HUBs are used to assist the HD decoders to prevent miscorrections and to decode those originally uncorrectable cases via bit flip** (BF), respectively. In this pap… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

  46. arXiv:2012.15446  [pdf, other

    physics.med-ph eess.IV

    Generative Adversarial Network for Image Synthesis

    Authors: Yang Lei, Richard L. J. Qiu, Tonghe Wang, Walter J. Curran, Tian Liu, Xiaofeng Yang

    Abstract: This chapter reviews recent developments of generative adversarial networks (GAN)-based methods for medical and biomedical image synthesis tasks. These methods are classified into conditional GAN and Cycle-GAN according to the network architecture designs. For each category, a literature survey is given, which covers discussions of the network architecture designs, highlights important contributio… ▽ More

    Submitted 30 December, 2020; originally announced December 2020.

  47. arXiv:2011.08477  [pdf, other

    cs.SD eess.AS

    Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis

    Authors: Yi Lei, Shan Yang, Lei Xie

    Abstract: This paper proposes a unified model to conduct emotion transfer, control and prediction for sequence-to-sequence based fine-grained emotional speech synthesis. Conventional emotional speech synthesis often needs manual labels or reference audio to determine the emotional expressions of synthesized speech. Such coarse labels cannot control the details of speech emotion, often resulting in an averag… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

  48. arXiv:2011.08467  [pdf, other

    cs.SD eess.AS

    Learn2Sing: Target Speaker Singing Voice Synthesis by learning from a Singing Teacher

    Authors: Heyang Xue, Shan Yang, Yi Lei, Lei Xie, Xiulin Li

    Abstract: Singing voice synthesis has been paid rising attention with the rapid development of speech synthesis area. In general, a studio-level singing corpus is usually necessary to produce a natural singing voice from lyrics and music-related transcription. However, such a corpus is difficult to collect since it's hard for many of us to sing like a professional singer. In this paper, we propose an approa… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

    Comments: 8 pages, 3 figures

  49. arXiv:2010.04275  [pdf, other

    physics.med-ph eess.IV

    Synthetic MRI-aided Head-and-Neck Organs-at-Risk Auto-Delineation for CBCT-guided Adaptive Radiotherapy

    Authors: Xian** Dai, Yang Lei, Tonghe Wang, Anees H. Dhabaan, Mark McDonald, Jonathan J. Beitler, Walter J. Curran, Jun Zhou, Tian Liu, Xiaofeng Yang

    Abstract: Purpose: Organ-at-risk (OAR) delineation is a key step for cone-beam CT (CBCT) based adaptive radiotherapy planning that can be a time-consuming, labor-intensive, and subject-to-variability process. We aim to develop a fully automated approach aided by synthetic MRI for rapid and accurate CBCT multi-organ contouring in head-and-neck (HN) cancer patients. MRI has superb soft-tissue contrasts, while… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

  50. arXiv:2005.12908  [pdf, other

    physics.med-ph eess.IV

    Learning-Based Synthetic Dual Energy CT Imaging from Single Energy CT for Stop** Power Ratio Calculation in Proton Radiation Therapy

    Authors: Serdar Charyyev, Tonghe Wang, Yang Lei, Beth Ghavidel, Jonathan J. Beitler, Mark McDonald, Walter J. Curran, Tian Liu, Jun Zhou, Xiaofeng Yang

    Abstract: Purpose: Dual-energy CT (DECT) has been shown to derive stop** power ratio (SPR) map with higher accuracy than conventional single energy CT (SECT) by obtaining the energy dependence of photon interactions. However, DECT is not as widely implemented as SECT in proton radiation therapy simulation. This work presents a learning-based method to synthetize DECT images from SECT for proton radiation… ▽ More

    Submitted 25 May, 2020; originally announced May 2020.

    Comments: arXiv admin note: text overlap with arXiv:2003.09058