Skip to main content

Showing 1–30 of 30 results for author: Lei, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.16619  [pdf, other

    cs.SD eess.AS

    The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge

    Authors: Yixuan Zhou, Shuoyi Zhou, Shun Lei, Zhiyong Wu, Menglin Wu

    Abstract: This paper presents the multi-speaker multi-lingual few-shot voice cloning system developed by THU-HCSI team for LIMMITS'24 Challenge. To achieve high speaker similarity and naturalness in both mono-lingual and cross-lingual scenarios, we build the system upon YourTTS and add several enhancements. For further improving speaker similarity and speech quality, we introduce speaker-aware text encoder… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted in Grand Challenge of ICASSP 2024

  2. arXiv:2401.09639  [pdf

    eess.IV cs.CV

    Uncertainty Modeling in Ultrasound Image Segmentation for Precise Fetal Biometric Measurements

    Authors: Shuge Lei

    Abstract: Medical image segmentation, particularly in the context of ultrasound data, is a crucial aspect of computer vision and medical imaging. This paper delves into the complexities of uncertainty in the segmentation process, focusing on fetal head and femur ultrasound images. The proposed methodology involves extracting target contours and exploring techniques for precise parameter measurement. Uncerta… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  3. arXiv:2401.03664  [pdf

    eess.IV cs.CV cs.LG

    Dual-Channel Reliable Breast Ultrasound Image Classification Based on Explainable Attribution and Uncertainty Quantification

    Authors: Shuge Lei, Haonan Hu, Dasheng Sun, Huabin Zhang, Kehong Yuan, Jian Dai, Jijun Tang, Yan Tong

    Abstract: This paper focuses on the classification task of breast ultrasound images and researches on the reliability measurement of classification results. We proposed a dual-channel evaluation framework based on the proposed inference reliability and predictive reliability scores. For the inference reliability evaluation, human-aligned and doctor-agreed inference rationales based on the improved feature a… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

  4. Sample Robust Scheduling of Electricity-Gas Systems Under Wind Power Uncertainty

    Authors: Rong-Peng Liu, Yunhe Hou, Yujia Li, Shunbo Lei, Wei Wei, Xiaozhe Wang

    Abstract: This paper adopts a two-stage sample robust optimization (SRO) model to address the wind power penetrated unit commitment optimal energy flow (UC-OEF) problem for IEGSs. The two-stage SRO model can be approximately transformed into a computationally efficient form. Specifically, we employ linear decision rules to simplify the proposed UC-OEF model. Moreover, we further enhance the tractability of… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: 10 pages

    Journal ref: IEEE Trans. Power Syst., vol. 36, no. 6, pp. 5889-5900, Nov. 2021

  5. arXiv:2309.11977  [pdf, other

    cs.SD eess.AS

    Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

    Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Dan Luo, Zhiyong Wu, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han, Helen Meng

    Abstract: Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speaker's voice without adaptation parameters. By quantizing speech waveform into discrete acoustic tokens and modeling these tokens with the language model, recent language model-based TTS models show zero-shot speaker adaptation capabilities with only a 3-second acoustic prompt of an unseen speaker. However, they are limited by th… ▽ More

    Submitted 9 April, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: Accepted bt ICASSP 2024

  6. arXiv:2309.09799  [pdf, other

    cs.CL cs.SD eess.AS

    Watch the Speakers: A Hybrid Continuous Attribution Network for Emotion Recognition in Conversation With Emotion Disentanglement

    Authors: Shanglin Lei, ** Wang, Guanting Dong, Jiang Li, Yingjian Liu

    Abstract: Emotion Recognition in Conversation (ERC) has attracted widespread attention in the natural language processing field due to its enormous potential for practical applications. Existing ERC methods face challenges in achieving generalization to diverse scenarios due to insufficient modeling of context, ambiguous capture of dialogue relationships and overfitting in speaker modeling. In this work, we… ▽ More

    Submitted 19 September, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

  7. arXiv:2309.02780  [pdf, other

    cs.CL cs.SD eess.AS

    GRASS: Unified Generation Model for Speech-to-Semantic Tasks

    Authors: Aobo Xia, Shuyu Lei, Yushu Yang, Xiang Guo, Hua Chai

    Abstract: This paper explores the instruction fine-tuning technique for speech-to-semantic tasks by introducing a unified end-to-end (E2E) framework that generates target text conditioned on a task-related prompt for audio data. We pre-train the model using large and diverse data, where instruction-speech pairs are constructed via a text-to-speech (TTS) system. Extensive experiments demonstrate that our pro… ▽ More

    Submitted 11 September, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

  8. arXiv:2308.16836  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

    Authors: Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice. Based on the main architecture of recently proposed VISinger, we put forward several specific designs for expressive singing voice synthesis. First, dif… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

  9. arXiv:2308.16593  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis

    Authors: Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style. However, synthesizing spontaneous-style speech is challenging due to the lack of high-quality spontaneous datasets and the high cost of labeling spontaneous behavior. In this paper, we propose a semi-supervised pre-training method to increase the amount of spontaneous-style speech an… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: Accepted by INTERSPEECH 2023

  10. arXiv:2307.16012  [pdf, other

    cs.SD eess.AS

    MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis

    Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Xixin Wu, Shiyin Kang, Helen Meng

    Abstract: Expressive speech synthesis is crucial for many human-computer interaction scenarios, such as audiobooks, podcasts, and voice assistants. Previous works focus on predicting the style embeddings at one single scale from the information within the current sentence. Whereas, context information in neighboring sentences and multi-scale nature of style in human speech are neglected, making it challengi… ▽ More

    Submitted 29 July, 2023; originally announced July 2023.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

  11. arXiv:2304.12704  [pdf, other

    cs.SD cs.MM eess.AS

    GTN-Bailando: Genre Consistent Long-Term 3D Dance Generation based on Pre-trained Genre Token Network

    Authors: Haolin Zhuang, Shun Lei, Long Xiao, Weiqin Li, Liyang Chen, Sicheng Yang, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: Music-driven 3D dance generation has become an intensive research topic in recent years with great potential for real-world applications. Most existing methods lack the consideration of genre, which results in genre inconsistency in the generated dance movements. In addition, the correlation between the dance genre and the music has not been investigated. To address these issues, we propose a genr… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: Accepted by ICASSP2023.Demo page: https://im1eon.github.io/ICASSP23-GTNB-DG/

  12. arXiv:2304.06359  [pdf, other

    cs.SD eess.AS

    Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis

    Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: Recent advances in text-to-speech have significantly improved the expressiveness of synthesized speech. However, it is still challenging to generate speech with contextually appropriate and coherent speaking style for multi-sentence text in audiobooks. In this paper, we propose a context-aware coherent speaking style prediction method for audiobook speech synthesis. To predict the style embedding… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: Accepted by ICASSP 2023

  13. A Distributionally Robust Resilience Enhancement Strategy for Distribution Networks Considering Decision-Dependent Contingencies

    Authors: Yujia Li, Shunbo Lei, Wei Sun, Chenxi Hu, Yunhe Hou

    Abstract: When performing the resilience enhancement for distribution networks, there are two obstacles to reliably model the uncertain contingencies: 1) decision-dependent uncertainty (DDU) due to various line hardening decisions, and 2) distributional ambiguity due to limited outage information during extreme weather events (EWEs). To address these two challenges, this paper develops scenario-wise decisio… ▽ More

    Submitted 23 August, 2022; v1 submitted 2 July, 2022; originally announced July 2022.

  14. arXiv:2204.02743  [pdf, other

    cs.SD eess.AS

    Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

    Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Jiankun Hu, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: Previous works on expressive speech synthesis focus on modelling the mono-scale style embedding from the current sentence or context, but the multi-scale nature of speaking style in human speech is neglected. In this paper, we propose a multi-scale speaking style modelling method to capture and predict multi-scale speaking style for improving the naturalness and expressiveness of synthetic speech.… ▽ More

    Submitted 5 July, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: Accepted by INTERSPEECH 2022

  15. arXiv:2203.16746  [pdf

    eess.SY

    Resilient Distribution System Restoration with Communication Recovery by Drone Small Cells

    Authors: Haochen Zhang, Chen Chen, Shunbo Lei, Zhaohong Bie

    Abstract: Distribution system (DS) restoration after natural disasters often faces the challenge of communication failures to feeder automation (FA) facilities, resulting in prolonged load pick-up process. This letter discusses the utilization of drone small cells for wireless communication recovery of FA, and proposes an integrated DS restoration strategy with communication recovery. Demonstrative case stu… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  16. arXiv:2203.14000  [pdf

    eess.SY math.NA

    On Time Step** Schemes Considering Switching Behaviors for Power System Electromagnetic Transient Simulation

    Authors: Sheng Lei

    Abstract: Several difficulties will appear when typical electromagnetic transient simulation, using the implicit trapezoidal method and fixed step sizes, is applied to power systems with switching behaviors. These difficulties are addressed by different aspects of time step** schemes in the literature. This paper first details the different aspects and reviews corresponding methods. Some misunderstanding… ▽ More

    Submitted 26 March, 2022; originally announced March 2022.

    Comments: Copyright may be transferred without notice, after which this version may no longer be accessible

  17. arXiv:2203.12201  [pdf, other

    cs.SD cs.CL eess.AS

    Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

    Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: Previous works on expressive speech synthesis mainly focus on current sentence. The context in adjacent sentences is neglected, resulting in inflexible speaking style for the same text, which lacks speech variations. In this paper, we propose a hierarchical framework to model speaking style from context. A hierarchical context encoder is proposed to explore a wider range of contextual information… ▽ More

    Submitted 6 April, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: Accepted by ICASSP 2022

  18. arXiv:2106.03329  [pdf

    eess.SY math.NA

    Improved Method for Dealing with Discontinuities in Power System Transient Simulation Based on Frequency Response Optimized Integrators Considering Second Order Derivative

    Authors: Sheng Lei, Alexander Flueck

    Abstract: Potential disagreement in the result induced by discontinuities is revealed in this paper between a novel power system transient simulation scheme using numerical integrators considering second order derivative and conventional ones using numerical integrators considering first order derivative. The disagreement is due to the formula of the different numerical integrators. An improved method for d… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: Accepted by the 2021 IEEE Midwest Symposium on Circuits and Systems

  19. arXiv:2104.10385  [pdf, other

    eess.SP

    Wide-Beam Array Antenna Power Gain Maximization via ADMM Framework

    Authors: Shiwen Lei, **g Tian, Zhipeng Lin, Haoquan Hu, Bo Chen, Wei Yang, Pu Tang, Xiangdong Qiu

    Abstract: This paper proposes two algorithms to maximize the minimum array power gain in a wide-beam mainlobe by solving the power gain pattern synthesis (PGPS) problem with and without sidelobe constraints. Firstly, the nonconvex PGPS problem is transformed into a nonconvex linear inequality optimization problem and then converted to an augmented Lagrangian problem by introducing auxiliary variables via th… ▽ More

    Submitted 21 April, 2021; originally announced April 2021.

  20. arXiv:2101.03266  [pdf

    math.NA eess.SY

    Studies on Frequency Response Optimized Integrators Considering Second Order Derivative

    Authors: Sheng Lei, Alexander Flueck

    Abstract: This paper presents comprehensive studies on frequency response optimized integrators considering second order derivative regarding their numerical error, numerical stability and transient performance. Frequency domain error analysis is conducted on these numerical integrators to reveal their accuracy. Numerical stability of the numerical integrators is investigated. Interesting new types of numer… ▽ More

    Submitted 8 January, 2021; originally announced January 2021.

    Comments: Copyright may be transferred without notice, after which this version may no longer be accessible

  21. arXiv:2101.03063  [pdf

    eess.IV cs.CV

    Knowledge AI: New Medical AI Solution for Medical image Diagnosis

    Authors: Yingni Wang, Shuge Lei, Jian Dai, Kehong Yuan

    Abstract: The implementation of medical AI has always been a problem. The effect of traditional perceptual AI algorithm in medical image processing needs to be improved. Here we propose a method of knowledge AI, which is a combination of perceptual AI and clinical knowledge and experience. Based on this method, the geometric information mining of medical images can represent the experience and information a… ▽ More

    Submitted 8 January, 2021; originally announced January 2021.

    Comments: 9 pages,8 figures. arXiv admin note: text overlap with arXiv:2101.02639

  22. arXiv:2012.01375  [pdf

    math.NA eess.SY

    Proper Selection of Obreshkov-Like Numerical Integrators Used as Numerical Differentiators for Power System Transient Simulation

    Authors: Sheng Lei, Alexander Flueck

    Abstract: Obreshkov-like numerical integrators have been widely applied to power system transient simulation. Misuse of the numerical integrators as numerical differentiators may lead to numerical oscillation or bias. Criteria for Obreshkov-like numerical integrators to be used as numerical differentiators are proposed in this paper to avoid these misleading phenomena. The coefficients of a numerical integr… ▽ More

    Submitted 15 February, 2022; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: Accepted by the 2022 IEEE PES General Meeting

  23. arXiv:2011.05439  [pdf

    eess.SY

    Transient Simulation of Grid-Feeding Converter System for Stability Studies Using Frequency Response Optimized Integrators

    Authors: Sheng Lei, Alexander Flueck

    Abstract: A grid-feeding converter system is added to a novel power system transient simulation scheme based on frequency response optimized integrators considering second order derivative. The converter system and its implementation in the simulation scheme are detailed. Case studies verify the accuracy and efficiency of the simulation scheme. Furthermore, this paper proposes and justifies extending the si… ▽ More

    Submitted 20 February, 2021; v1 submitted 10 November, 2020; originally announced November 2020.

    Comments: Accepted by the 2021 IEEE PES General Meeting

  24. arXiv:2011.00711  [pdf

    eess.SY math.NA

    Multistep Frequency Response Optimized Integrators and Their Application to Accelerating a Power System Transient Simulation Scheme

    Authors: Sheng Lei, Alexander Flueck

    Abstract: This paper proposes several explicit and implicit multistep frequency response optimized integrators considering first or second order derivative. A prediction-based method aiming at accelerating a novel power system transient simulation scheme without impacting its accuracy is further put forward utilizing the proposed numerical integrators and some others available in the literature. Case studie… ▽ More

    Submitted 15 February, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

    Comments: Accepted by the 2021 IEEE PES General Meeting

  25. arXiv:2008.13059  [pdf

    eess.SY

    Initialization Process of a Power System Transient Simulation Scheme for Stability Studies

    Authors: Sheng Lei, Alexander Flueck

    Abstract: The initialization process of a novel power system transient simulation scheme for stability studies is put forward, by further develo** a "time-domain harmonic power-flow algorithm". The initialization process is formulated as an algebraic problem to ensure that the power system under study is in steady state and operated at a specified operating point, at the beginning of a transient simulatio… ▽ More

    Submitted 29 August, 2020; originally announced August 2020.

    Comments: Accepted by the 52nd North American Power Symposium

  26. arXiv:2007.01496  [pdf, other

    cs.CV cs.LG eess.IV

    Few-Shot Semantic Segmentation Augmented with Image-Level Weak Annotations

    Authors: Shuo Lei, Xuchao Zhang, Jianfeng He, Fanglan Chen, Chang-Tien Lu

    Abstract: Despite the great progress made by deep neural networks in the semantic segmentation task, traditional neural-networkbased methods typically suffer from a shortage of large amounts of pixel-level annotations. Recent progress in fewshot semantic segmentation tackles the issue by only a few pixel-level annotated examples. However, these few-shot approaches cannot easily be applied to multi-way or we… ▽ More

    Submitted 18 June, 2021; v1 submitted 3 July, 2020; originally announced July 2020.

    Comments: Accpeted to ICME2021

  27. arXiv:2005.00964  [pdf

    eess.SY math.NA

    Efficient Power System Transient Simulation Based on Frequency Response Optimized Integrators Considering Second Order Derivative

    Authors: Sheng Lei, Alexander Flueck

    Abstract: Frequency response optimized integrators considering second order derivative are proposed in this paper. Based on the proposed numerical integrators, and others which also consider second order derivative, this paper puts forward a novel power system transient simulation scheme. Instead of using a unique numerical integrator, the proposed simulation scheme chooses proper ones according to the domi… ▽ More

    Submitted 2 May, 2020; originally announced May 2020.

    Comments: Accepted by the 2020 IEEE PES General Meeting

  28. arXiv:2004.13557  [pdf, other

    eess.SP cs.LG stat.ML

    Baseline Estimation of Commercial Building HVAC Fan Power Using Tensor Completion

    Authors: Shunbo Lei, David Hong, Johanna L. Mathieu, Ian A. Hiskens

    Abstract: Commercial building heating, ventilation, and air conditioning (HVAC) systems have been studied for providing ancillary services to power grids via demand response (DR). One critical issue is to estimate the counterfactual baseline power consumption that would have prevailed without DR. Baseline methods have been developed based on whole building electric load profiles. New methods are necessary t… ▽ More

    Submitted 24 April, 2020; originally announced April 2020.

  29. arXiv:1912.06936  [pdf

    eess.SP physics.optics

    Compressed Sensing for Reconstructing Coherent Multidimensional Spectra

    Authors: Zhengjun Wang, Shiwen Lei, Khadga Jung Karki, Andreas Jakobsson, Tönu Pullerits

    Abstract: We apply two sparse reconstruction techniques, the least absolute shrinkage and selection operator (LASSO) and the sparse exponential mode analysis (SEMA), to two-dimensional (2D) spectroscopy. The algorithms are first tested on model data, showing that both are able to reconstruct the spectra using only a fraction of the data required by the traditional Fourier-based estimator. Through the analys… ▽ More

    Submitted 14 December, 2019; originally announced December 2019.

  30. arXiv:1911.09987  [pdf, other

    math.OC eess.SY

    Transmission System Resilience Enhancement with Extended Steady-state Security Region in Consideration of Uncertain Topology Changes

    Authors: Chong Wang, Feng Wu, ** Ju, Shunbo Lei, Tianguang Lu, Yunhe Hou

    Abstract: The increasing extreme weather events poses unprecedented challenges on power system operation because of their uncertain and sequential impacts on power systems. This paper proposes the concept of an extended steady-state security region (ESSR), and resilience enhancement for transmission systems based on ESSR in consideration of uncertain varying topology changes caused by the extreme weather ev… ▽ More

    Submitted 22 November, 2019; originally announced November 2019.