Search | arXiv e-print repository

The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge

Authors: Yixuan Zhou, Shuoyi Zhou, Shun Lei, Zhiyong Wu, Menglin Wu

Abstract: This paper presents the multi-speaker multi-lingual few-shot voice cloning system developed by THU-HCSI team for LIMMITS'24 Challenge. To achieve high speaker similarity and naturalness in both mono-lingual and cross-lingual scenarios, we build the system upon YourTTS and add several enhancements. For further improving speaker similarity and speech quality, we introduce speaker-aware text encoder… ▽ More This paper presents the multi-speaker multi-lingual few-shot voice cloning system developed by THU-HCSI team for LIMMITS'24 Challenge. To achieve high speaker similarity and naturalness in both mono-lingual and cross-lingual scenarios, we build the system upon YourTTS and add several enhancements. For further improving speaker similarity and speech quality, we introduce speaker-aware text encoder and flow-based decoder with Transformer blocks. In addition, we denoise the few-shot data, mix up them with pre-training data, and adopt a speaker-balanced sampling strategy to guarantee effective fine-tuning for target speakers. The official evaluations in track 1 show that our system achieves the best speaker similarity MOS of 4.25 and obtains considerable naturalness MOS of 3.97. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: Accepted in Grand Challenge of ICASSP 2024

arXiv:2401.09639 [pdf]

Uncertainty Modeling in Ultrasound Image Segmentation for Precise Fetal Biometric Measurements

Authors: Shuge Lei

Abstract: Medical image segmentation, particularly in the context of ultrasound data, is a crucial aspect of computer vision and medical imaging. This paper delves into the complexities of uncertainty in the segmentation process, focusing on fetal head and femur ultrasound images. The proposed methodology involves extracting target contours and exploring techniques for precise parameter measurement. Uncerta… ▽ More Medical image segmentation, particularly in the context of ultrasound data, is a crucial aspect of computer vision and medical imaging. This paper delves into the complexities of uncertainty in the segmentation process, focusing on fetal head and femur ultrasound images. The proposed methodology involves extracting target contours and exploring techniques for precise parameter measurement. Uncertainty modeling methods are employed to enhance the training and testing processes of the segmentation network. The study reveals that the average absolute error in fetal head circumference measurement is 8.0833mm, with a relative error of 4.7347%. Similarly, the average absolute error in fetal femur measurement is 2.6163mm, with a relative error of 6.3336%. Uncertainty modeling experiments employing Test-Time Augmentation (TTA) demonstrate effective interpretability of data uncertainty on both datasets. This suggests that incorporating data uncertainty based on the TTA method can support clinical practitioners in making informed decisions and obtaining more reliable measurement results in practical clinical applications. The paper contributes to the advancement of ultrasound image segmentation, addressing critical challenges and improving the reliability of biometric measurements. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.03664 [pdf]

Dual-Channel Reliable Breast Ultrasound Image Classification Based on Explainable Attribution and Uncertainty Quantification

Authors: Shuge Lei, Haonan Hu, Dasheng Sun, Huabin Zhang, Kehong Yuan, Jian Dai, Jijun Tang, Yan Tong

Abstract: This paper focuses on the classification task of breast ultrasound images and researches on the reliability measurement of classification results. We proposed a dual-channel evaluation framework based on the proposed inference reliability and predictive reliability scores. For the inference reliability evaluation, human-aligned and doctor-agreed inference rationales based on the improved feature a… ▽ More This paper focuses on the classification task of breast ultrasound images and researches on the reliability measurement of classification results. We proposed a dual-channel evaluation framework based on the proposed inference reliability and predictive reliability scores. For the inference reliability evaluation, human-aligned and doctor-agreed inference rationales based on the improved feature attribution algorithm SP-RISA are gracefully applied. Uncertainty quantification is used to evaluate the predictive reliability via the Test Time Enhancement. The effectiveness of this reliability evaluation framework has been verified on our breast ultrasound clinical dataset YBUS, and its robustness is verified on the public dataset BUSI. The expected calibration errors on both datasets are significantly lower than traditional evaluation methods, which proves the effectiveness of our proposed reliability measurement. △ Less

Submitted 7 January, 2024; originally announced January 2024.

arXiv:2401.00269 [pdf]

doi 10.1109/TPWRS.2021.3081557

Sample Robust Scheduling of Electricity-Gas Systems Under Wind Power Uncertainty

Authors: Rong-Peng Liu, Yunhe Hou, Yujia Li, Shunbo Lei, Wei Wei, Xiaozhe Wang

Abstract: This paper adopts a two-stage sample robust optimization (SRO) model to address the wind power penetrated unit commitment optimal energy flow (UC-OEF) problem for IEGSs. The two-stage SRO model can be approximately transformed into a computationally efficient form. Specifically, we employ linear decision rules to simplify the proposed UC-OEF model. Moreover, we further enhance the tractability of… ▽ More This paper adopts a two-stage sample robust optimization (SRO) model to address the wind power penetrated unit commitment optimal energy flow (UC-OEF) problem for IEGSs. The two-stage SRO model can be approximately transformed into a computationally efficient form. Specifically, we employ linear decision rules to simplify the proposed UC-OEF model. Moreover, we further enhance the tractability of the simplified model by exploring its structural features and, accordingly, develop a solution method. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: 10 pages

Journal ref: IEEE Trans. Power Syst., vol. 36, no. 6, pp. 5889-5900, Nov. 2021

arXiv:2309.11977 [pdf, other]

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Dan Luo, Zhiyong Wu, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han, Helen Meng

Abstract: Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speaker's voice without adaptation parameters. By quantizing speech waveform into discrete acoustic tokens and modeling these tokens with the language model, recent language model-based TTS models show zero-shot speaker adaptation capabilities with only a 3-second acoustic prompt of an unseen speaker. However, they are limited by th… ▽ More Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speaker's voice without adaptation parameters. By quantizing speech waveform into discrete acoustic tokens and modeling these tokens with the language model, recent language model-based TTS models show zero-shot speaker adaptation capabilities with only a 3-second acoustic prompt of an unseen speaker. However, they are limited by the length of the acoustic prompt, which makes it difficult to clone personal speaking style. In this paper, we propose a novel zero-shot TTS model with the multi-scale acoustic prompts based on a neural codec language model VALL-E. A speaker-aware text encoder is proposed to learn the personal speaking style at the phoneme-level from the style prompt consisting of multiple sentences. Following that, a VALL-E based acoustic decoder is utilized to model the timbre from the timbre prompt at the frame-level and generate speech. The experimental results show that our proposed method outperforms baselines in terms of naturalness and speaker similarity, and can achieve better performance by scaling out to a longer style prompt. △ Less

Submitted 9 April, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: Accepted bt ICASSP 2024

arXiv:2309.09799 [pdf, other]

Watch the Speakers: A Hybrid Continuous Attribution Network for Emotion Recognition in Conversation With Emotion Disentanglement

Authors: Shanglin Lei, ** Wang, Guanting Dong, Jiang Li, Yingjian Liu

Abstract: Emotion Recognition in Conversation (ERC) has attracted widespread attention in the natural language processing field due to its enormous potential for practical applications. Existing ERC methods face challenges in achieving generalization to diverse scenarios due to insufficient modeling of context, ambiguous capture of dialogue relationships and overfitting in speaker modeling. In this work, we… ▽ More Emotion Recognition in Conversation (ERC) has attracted widespread attention in the natural language processing field due to its enormous potential for practical applications. Existing ERC methods face challenges in achieving generalization to diverse scenarios due to insufficient modeling of context, ambiguous capture of dialogue relationships and overfitting in speaker modeling. In this work, we present a Hybrid Continuous Attributive Network (HCAN) to address these issues in the perspective of emotional continuation and emotional attribution. Specifically, HCAN adopts a hybrid recurrent and attention-based module to model global emotion continuity. Then a novel Emotional Attribution Encoding (EAE) is proposed to model intra- and inter-emotional attribution for each utterance. Moreover, aiming to enhance the robustness of the model in speaker modeling and improve its performance in different scenarios, A comprehensive loss function emotional cognitive loss $\mathcal{L}_{\rm EC}$ is proposed to alleviate emotional drift and overcome the overfitting of the model to speaker modeling. Our model achieves state-of-the-art performance on three datasets, demonstrating the superiority of our work. Another extensive comparative experiments and ablation studies on three benchmarks are conducted to provided evidence to support the efficacy of each module. Further exploration of generalization ability experiments shows the plug-and-play nature of the EAE module in our method. △ Less

Submitted 19 September, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

arXiv:2309.02780 [pdf, other]

GRASS: Unified Generation Model for Speech-to-Semantic Tasks

Authors: Aobo Xia, Shuyu Lei, Yushu Yang, Xiang Guo, Hua Chai

Abstract: This paper explores the instruction fine-tuning technique for speech-to-semantic tasks by introducing a unified end-to-end (E2E) framework that generates target text conditioned on a task-related prompt for audio data. We pre-train the model using large and diverse data, where instruction-speech pairs are constructed via a text-to-speech (TTS) system. Extensive experiments demonstrate that our pro… ▽ More This paper explores the instruction fine-tuning technique for speech-to-semantic tasks by introducing a unified end-to-end (E2E) framework that generates target text conditioned on a task-related prompt for audio data. We pre-train the model using large and diverse data, where instruction-speech pairs are constructed via a text-to-speech (TTS) system. Extensive experiments demonstrate that our proposed model achieves state-of-the-art (SOTA) results on many benchmarks covering speech named entity recognition, speech sentiment analysis, speech question answering, and more, after fine-tuning. Furthermore, the proposed model achieves competitive performance in zero-shot and few-shot scenarios. To facilitate future work on instruction fine-tuning for speech-to-semantic tasks, we release our instruction dataset and code. △ Less

Submitted 11 September, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

arXiv:2308.16836 [pdf, other]

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

Authors: Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng

Abstract: This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice. Based on the main architecture of recently proposed VISinger, we put forward several specific designs for expressive singing voice synthesis. First, dif… ▽ More This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice. Based on the main architecture of recently proposed VISinger, we put forward several specific designs for expressive singing voice synthesis. First, different from the previous SVS models, we use text representation of lyrics extracted from pre-trained BERT as additional input to the model. The representation contains information about semantics of the lyrics, which could help SVS system produce more expressive and natural voice. Second, we further introduce an energy predictor to stabilize the synthesized voice and model the wider range of energy variations that also contribute to the expressiveness of singing voice. Last but not the least, to attenuate the off-key issues, the pitch predictor is re-designed to predict the real to note pitch ratio. Both objective and subjective experimental results indicate that the proposed SVS system can produce singing voice with higher-quality outperforming VISinger. △ Less

Submitted 31 August, 2023; originally announced August 2023.

arXiv:2308.16593 [pdf, other]

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis

Authors: Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou, Zhiyong Wu, Shiyin Kang, Helen Meng

Abstract: The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style. However, synthesizing spontaneous-style speech is challenging due to the lack of high-quality spontaneous datasets and the high cost of labeling spontaneous behavior. In this paper, we propose a semi-supervised pre-training method to increase the amount of spontaneous-style speech an… ▽ More The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style. However, synthesizing spontaneous-style speech is challenging due to the lack of high-quality spontaneous datasets and the high cost of labeling spontaneous behavior. In this paper, we propose a semi-supervised pre-training method to increase the amount of spontaneous-style speech and spontaneous behavioral labels. In the process of semi-supervised learning, both text and speech information are considered for detecting spontaneous behaviors labels in speech. Moreover, a linguistic-aware encoder is used to model the relationship between each sentence in the conversation. Experimental results indicate that our proposed method achieves superior expressive speech synthesis performance with the ability to model spontaneous behavior in spontaneous-style speech and predict reasonable spontaneous behavior from text. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: Accepted by INTERSPEECH 2023

arXiv:2307.16012 [pdf, other]

MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis

Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Xixin Wu, Shiyin Kang, Helen Meng

Abstract: Expressive speech synthesis is crucial for many human-computer interaction scenarios, such as audiobooks, podcasts, and voice assistants. Previous works focus on predicting the style embeddings at one single scale from the information within the current sentence. Whereas, context information in neighboring sentences and multi-scale nature of style in human speech are neglected, making it challengi… ▽ More Expressive speech synthesis is crucial for many human-computer interaction scenarios, such as audiobooks, podcasts, and voice assistants. Previous works focus on predicting the style embeddings at one single scale from the information within the current sentence. Whereas, context information in neighboring sentences and multi-scale nature of style in human speech are neglected, making it challenging to convert multi-sentence text into natural and expressive speech. In this paper, we propose MSStyleTTS, a style modeling method for expressive speech synthesis, to capture and predict styles at different levels from a wider range of context rather than a sentence. Two sub-modules, including multi-scale style extractor and multi-scale style predictor, are trained together with a FastSpeech 2 based acoustic model. The predictor is designed to explore the hierarchical context information by considering structural relationships in context and predict style embeddings at global-level, sentence-level and subword-level. The extractor extracts multi-scale style embedding from the ground-truth speech and explicitly guides the style prediction. Evaluations on both in-domain and out-of-domain audiobook datasets demonstrate that the proposed method significantly outperforms the three baselines. In addition, we conduct the analysis of the context information and multi-scale style representations that have never been discussed before. △ Less

Submitted 29 July, 2023; originally announced July 2023.

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

arXiv:2304.12704 [pdf, other]

GTN-Bailando: Genre Consistent Long-Term 3D Dance Generation based on Pre-trained Genre Token Network

Authors: Haolin Zhuang, Shun Lei, Long Xiao, Weiqin Li, Liyang Chen, Sicheng Yang, Zhiyong Wu, Shiyin Kang, Helen Meng

Abstract: Music-driven 3D dance generation has become an intensive research topic in recent years with great potential for real-world applications. Most existing methods lack the consideration of genre, which results in genre inconsistency in the generated dance movements. In addition, the correlation between the dance genre and the music has not been investigated. To address these issues, we propose a genr… ▽ More Music-driven 3D dance generation has become an intensive research topic in recent years with great potential for real-world applications. Most existing methods lack the consideration of genre, which results in genre inconsistency in the generated dance movements. In addition, the correlation between the dance genre and the music has not been investigated. To address these issues, we propose a genre-consistent dance generation framework, GTN-Bailando. First, we propose the Genre Token Network (GTN), which infers the genre from music to enhance the genre consistency of long-term dance generation. Second, to improve the generalization capability of the model, the strategy of pre-training and fine-tuning is adopted.Experimental results on the AIST++ dataset show that the proposed dance generation framework outperforms state-of-the-art methods in terms of motion quality and genre consistency. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: Accepted by ICASSP2023.Demo page: https://im1eon.github.io/ICASSP23-GTNB-DG/

arXiv:2304.06359 [pdf, other]

Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis

Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng

Abstract: Recent advances in text-to-speech have significantly improved the expressiveness of synthesized speech. However, it is still challenging to generate speech with contextually appropriate and coherent speaking style for multi-sentence text in audiobooks. In this paper, we propose a context-aware coherent speaking style prediction method for audiobook speech synthesis. To predict the style embedding… ▽ More Recent advances in text-to-speech have significantly improved the expressiveness of synthesized speech. However, it is still challenging to generate speech with contextually appropriate and coherent speaking style for multi-sentence text in audiobooks. In this paper, we propose a context-aware coherent speaking style prediction method for audiobook speech synthesis. To predict the style embedding of the current utterance, a hierarchical transformer-based context-aware style predictor with a mixture attention mask is designed, considering both text-side context information and speech-side style information of previous speeches. Based on this, we can generate long-form speech with coherent style and prosody sentence by sentence. Objective and subjective evaluations on a Mandarin audiobook dataset demonstrate that our proposed model can generate speech with more expressive and coherent speaking style than baselines, for both single-sentence and multi-sentence test. △ Less

Submitted 13 April, 2023; originally announced April 2023.

Comments: Accepted by ICASSP 2023

arXiv:2207.00741 [pdf, other]

doi 10.1109/TSG.2023.3310979

A Distributionally Robust Resilience Enhancement Strategy for Distribution Networks Considering Decision-Dependent Contingencies

Authors: Yujia Li, Shunbo Lei, Wei Sun, Chenxi Hu, Yunhe Hou

Abstract: When performing the resilience enhancement for distribution networks, there are two obstacles to reliably model the uncertain contingencies: 1) decision-dependent uncertainty (DDU) due to various line hardening decisions, and 2) distributional ambiguity due to limited outage information during extreme weather events (EWEs). To address these two challenges, this paper develops scenario-wise decisio… ▽ More When performing the resilience enhancement for distribution networks, there are two obstacles to reliably model the uncertain contingencies: 1) decision-dependent uncertainty (DDU) due to various line hardening decisions, and 2) distributional ambiguity due to limited outage information during extreme weather events (EWEs). To address these two challenges, this paper develops scenario-wise decision-dependent ambiguity sets (SWDD-ASs), where the DDU and distributional ambiguity inherent in EWE-induced contingencies are simultaneously captured for each possible EWE scenario. Then, a two-stage trilevel decision-dependent distributionally robust resilient enhancement (DD-DRRE) model is formulated, whose outputs include the optimal line hardening, distributed generation (DG) allocation, and proactive network reconfiguration strategy under the worst-case distributions in SWDD-ASs. Subsequently, the DD-DRRE model is equivalently recast to a mixed-integer linear programming (MILP)-based master problem and multiple scenario-wise subproblems, facilitating the adoption of a customized column-and-constraint generation (C&CG) algorithm. Finally, case studies demonstrate a remarkable improvement in the out-of-sample performance of our model, compared to its prevailing stochastic and robust counterparts. Moreover, the potential values of incorporating the ambiguity and distributional information are quantitatively estimated, providing a useful reference for planners with different budgets and risk-aversion levels. △ Less

Submitted 23 August, 2022; v1 submitted 2 July, 2022; originally announced July 2022.

arXiv:2204.02743 [pdf, other]

Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Jiankun Hu, Zhiyong Wu, Shiyin Kang, Helen Meng

Abstract: Previous works on expressive speech synthesis focus on modelling the mono-scale style embedding from the current sentence or context, but the multi-scale nature of speaking style in human speech is neglected. In this paper, we propose a multi-scale speaking style modelling method to capture and predict multi-scale speaking style for improving the naturalness and expressiveness of synthetic speech.… ▽ More Previous works on expressive speech synthesis focus on modelling the mono-scale style embedding from the current sentence or context, but the multi-scale nature of speaking style in human speech is neglected. In this paper, we propose a multi-scale speaking style modelling method to capture and predict multi-scale speaking style for improving the naturalness and expressiveness of synthetic speech. A multi-scale extractor is proposed to extract speaking style embeddings at three different levels from the ground-truth speech, and explicitly guide the training of a multi-scale style predictor based on hierarchical context information. Both objective and subjective evaluations on a Mandarin audiobooks dataset demonstrate that our proposed method can significantly improve the naturalness and expressiveness of the synthesized speech. △ Less

Submitted 5 July, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

Comments: Accepted by INTERSPEECH 2022

arXiv:2203.16746 [pdf]

Resilient Distribution System Restoration with Communication Recovery by Drone Small Cells

Authors: Haochen Zhang, Chen Chen, Shunbo Lei, Zhaohong Bie

Abstract: Distribution system (DS) restoration after natural disasters often faces the challenge of communication failures to feeder automation (FA) facilities, resulting in prolonged load pick-up process. This letter discusses the utilization of drone small cells for wireless communication recovery of FA, and proposes an integrated DS restoration strategy with communication recovery. Demonstrative case stu… ▽ More Distribution system (DS) restoration after natural disasters often faces the challenge of communication failures to feeder automation (FA) facilities, resulting in prolonged load pick-up process. This letter discusses the utilization of drone small cells for wireless communication recovery of FA, and proposes an integrated DS restoration strategy with communication recovery. Demonstrative case studies are conducted to validate the proposed model, and its advantages are illustrated by comparing to benchmark strategies. △ Less

Submitted 30 March, 2022; originally announced March 2022.

arXiv:2203.14000 [pdf]

On Time Step** Schemes Considering Switching Behaviors for Power System Electromagnetic Transient Simulation

Authors: Sheng Lei

Abstract: Several difficulties will appear when typical electromagnetic transient simulation, using the implicit trapezoidal method and fixed step sizes, is applied to power systems with switching behaviors. These difficulties are addressed by different aspects of time step** schemes in the literature. This paper first details the different aspects and reviews corresponding methods. Some misunderstanding… ▽ More Several difficulties will appear when typical electromagnetic transient simulation, using the implicit trapezoidal method and fixed step sizes, is applied to power systems with switching behaviors. These difficulties are addressed by different aspects of time step** schemes in the literature. This paper first details the different aspects and reviews corresponding methods. Some misunderstanding in the literature is clarified. Issues that may be encountered by the existing methods are concurrently revealed. Based on the detailed review, the paper then puts forward a novel time step** scheme which fully addresses the difficulties. The effectiveness of the proposed scheme is demonstrated via numerical case studies. △ Less

Submitted 26 March, 2022; originally announced March 2022.

Comments: Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2203.12201 [pdf, other]

Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng

Abstract: Previous works on expressive speech synthesis mainly focus on current sentence. The context in adjacent sentences is neglected, resulting in inflexible speaking style for the same text, which lacks speech variations. In this paper, we propose a hierarchical framework to model speaking style from context. A hierarchical context encoder is proposed to explore a wider range of contextual information… ▽ More Previous works on expressive speech synthesis mainly focus on current sentence. The context in adjacent sentences is neglected, resulting in inflexible speaking style for the same text, which lacks speech variations. In this paper, we propose a hierarchical framework to model speaking style from context. A hierarchical context encoder is proposed to explore a wider range of contextual information considering structural relationship in context, including inter-phrase and inter-sentence relations. Moreover, to encourage this encoder to learn style representation better, we introduce a novel training strategy with knowledge distillation, which provides the target for encoder training. Both objective and subjective evaluations on a Mandarin lecture dataset demonstrate that the proposed method can significantly improve the naturalness and expressiveness of the synthesized speech. △ Less

Submitted 6 April, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

Comments: Accepted by ICASSP 2022

arXiv:2106.03329 [pdf]

Improved Method for Dealing with Discontinuities in Power System Transient Simulation Based on Frequency Response Optimized Integrators Considering Second Order Derivative

Authors: Sheng Lei, Alexander Flueck

Abstract: Potential disagreement in the result induced by discontinuities is revealed in this paper between a novel power system transient simulation scheme using numerical integrators considering second order derivative and conventional ones using numerical integrators considering first order derivative. The disagreement is due to the formula of the different numerical integrators. An improved method for d… ▽ More Potential disagreement in the result induced by discontinuities is revealed in this paper between a novel power system transient simulation scheme using numerical integrators considering second order derivative and conventional ones using numerical integrators considering first order derivative. The disagreement is due to the formula of the different numerical integrators. An improved method for dealing with discontinuities in the novel transient simulation scheme is proposed to resolve the disagreement. The effectiveness of the improved method is demonstrated and verified via numerical case studies. Although the disagreement is studied on and the improved method is proposed for a particular transient simulation scheme, similar conclusions also apply to other ones using numerical integrators considering high order derivative. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: Accepted by the 2021 IEEE Midwest Symposium on Circuits and Systems

arXiv:2104.10385 [pdf, other]

Wide-Beam Array Antenna Power Gain Maximization via ADMM Framework

Authors: Shiwen Lei, **g Tian, Zhipeng Lin, Haoquan Hu, Bo Chen, Wei Yang, Pu Tang, Xiangdong Qiu

Abstract: This paper proposes two algorithms to maximize the minimum array power gain in a wide-beam mainlobe by solving the power gain pattern synthesis (PGPS) problem with and without sidelobe constraints. Firstly, the nonconvex PGPS problem is transformed into a nonconvex linear inequality optimization problem and then converted to an augmented Lagrangian problem by introducing auxiliary variables via th… ▽ More This paper proposes two algorithms to maximize the minimum array power gain in a wide-beam mainlobe by solving the power gain pattern synthesis (PGPS) problem with and without sidelobe constraints. Firstly, the nonconvex PGPS problem is transformed into a nonconvex linear inequality optimization problem and then converted to an augmented Lagrangian problem by introducing auxiliary variables via the Alternating Direction Method of Multipliers (ADMM) framework. Next,the original intractable problem is converted into a series of nonconvex and convex subproblems. The nonconvex subproblems are solved by dividing their solution space into a finite set of smaller ones, in which the solution would be obtained pseudoanalytically. In such a way, the proposed algorithms are superior to the existing PGPS-based ones as their convergence can be theoretically guaranteed with a lower computational burden. Numerical examples with both isotropic element pattern (IEP) and active element pattern (AEP) arrays are simulated to show the effectiveness and superiority of the proposed algorithms by comparing with the related existing algorithms. △ Less

Submitted 21 April, 2021; originally announced April 2021.

arXiv:2101.03266 [pdf]

Studies on Frequency Response Optimized Integrators Considering Second Order Derivative

Authors: Sheng Lei, Alexander Flueck

Abstract: This paper presents comprehensive studies on frequency response optimized integrators considering second order derivative regarding their numerical error, numerical stability and transient performance. Frequency domain error analysis is conducted on these numerical integrators to reveal their accuracy. Numerical stability of the numerical integrators is investigated. Interesting new types of numer… ▽ More This paper presents comprehensive studies on frequency response optimized integrators considering second order derivative regarding their numerical error, numerical stability and transient performance. Frequency domain error analysis is conducted on these numerical integrators to reveal their accuracy. Numerical stability of the numerical integrators is investigated. Interesting new types of numerical stability are recognized. Transient performance of the numerical integrators is defined to qualitatively characterize their ability to track fast decaying transients. This property is related to unsatisfactory phenomena such as numerical oscillation which frequently appear in time domain simulation of circuits and systems. Transient performance analysis of the numerical integrators is provided. Theoretical observations from the analysis of the numerical integrators are verified via time domain case studies. △ Less

Submitted 8 January, 2021; originally announced January 2021.

Comments: Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2101.03063 [pdf]

Knowledge AI: New Medical AI Solution for Medical image Diagnosis

Authors: Yingni Wang, Shuge Lei, Jian Dai, Kehong Yuan

Abstract: The implementation of medical AI has always been a problem. The effect of traditional perceptual AI algorithm in medical image processing needs to be improved. Here we propose a method of knowledge AI, which is a combination of perceptual AI and clinical knowledge and experience. Based on this method, the geometric information mining of medical images can represent the experience and information a… ▽ More The implementation of medical AI has always been a problem. The effect of traditional perceptual AI algorithm in medical image processing needs to be improved. Here we propose a method of knowledge AI, which is a combination of perceptual AI and clinical knowledge and experience. Based on this method, the geometric information mining of medical images can represent the experience and information and evaluate the quality of medical images. △ Less

Submitted 8 January, 2021; originally announced January 2021.

Comments: 9 pages,8 figures. arXiv admin note: text overlap with arXiv:2101.02639

arXiv:2012.01375 [pdf]

Proper Selection of Obreshkov-Like Numerical Integrators Used as Numerical Differentiators for Power System Transient Simulation

Authors: Sheng Lei, Alexander Flueck

Abstract: Obreshkov-like numerical integrators have been widely applied to power system transient simulation. Misuse of the numerical integrators as numerical differentiators may lead to numerical oscillation or bias. Criteria for Obreshkov-like numerical integrators to be used as numerical differentiators are proposed in this paper to avoid these misleading phenomena. The coefficients of a numerical integr… ▽ More Obreshkov-like numerical integrators have been widely applied to power system transient simulation. Misuse of the numerical integrators as numerical differentiators may lead to numerical oscillation or bias. Criteria for Obreshkov-like numerical integrators to be used as numerical differentiators are proposed in this paper to avoid these misleading phenomena. The coefficients of a numerical integrator for the highest order derivative turn out to determine its suitability. Some existing Obreshkov-like numerical integrators are examined under the proposed criteria. It is revealed that the notorious numerical oscillations induced by the implicit trapezoidal method cannot always be eliminated by using the backward Euler method for a few time steps. Guided by the proposed criteria, a frequency response optimized integrator considering second order derivative is put forward which is suitable to be used as a numerical differentiator. Theoretical observations are demonstrated in time domain via case studies. The paper points out how to properly select the numerical integrators for power system transient simulation and helps to prevent their misuse. △ Less

Submitted 15 February, 2022; v1 submitted 2 December, 2020; originally announced December 2020.

Comments: Accepted by the 2022 IEEE PES General Meeting

arXiv:2011.05439 [pdf]

Transient Simulation of Grid-Feeding Converter System for Stability Studies Using Frequency Response Optimized Integrators

Authors: Sheng Lei, Alexander Flueck

Abstract: A grid-feeding converter system is added to a novel power system transient simulation scheme based on frequency response optimized integrators considering second order derivative. The converter system and its implementation in the simulation scheme are detailed. Case studies verify the accuracy and efficiency of the simulation scheme. Furthermore, this paper proposes and justifies extending the si… ▽ More A grid-feeding converter system is added to a novel power system transient simulation scheme based on frequency response optimized integrators considering second order derivative. The converter system and its implementation in the simulation scheme are detailed. Case studies verify the accuracy and efficiency of the simulation scheme. Furthermore, this paper proposes and justifies extending the simulation scheme by integrating commonly used numerical integrators considering first order derivative for part of the studied system. The proposed extension has an insignificant impact on the accuracy of the simulation scheme while significantly enhancing its efficiency. It also reduces the development burden in adding new devices. △ Less

Submitted 20 February, 2021; v1 submitted 10 November, 2020; originally announced November 2020.

Comments: Accepted by the 2021 IEEE PES General Meeting

arXiv:2011.00711 [pdf]

Multistep Frequency Response Optimized Integrators and Their Application to Accelerating a Power System Transient Simulation Scheme

Authors: Sheng Lei, Alexander Flueck

Abstract: This paper proposes several explicit and implicit multistep frequency response optimized integrators considering first or second order derivative. A prediction-based method aiming at accelerating a novel power system transient simulation scheme without impacting its accuracy is further put forward utilizing the proposed numerical integrators and some others available in the literature. Case studie… ▽ More This paper proposes several explicit and implicit multistep frequency response optimized integrators considering first or second order derivative. A prediction-based method aiming at accelerating a novel power system transient simulation scheme without impacting its accuracy is further put forward utilizing the proposed numerical integrators and some others available in the literature. Case studies verify the effectiveness of the proposed prediction method. Although they are utilized to accelerate the simulation scheme in this paper, the proposed numerical integrators are in fact general-purpose and can be applied to other areas. △ Less

Submitted 15 February, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

Comments: Accepted by the 2021 IEEE PES General Meeting

arXiv:2008.13059 [pdf]

Initialization Process of a Power System Transient Simulation Scheme for Stability Studies

Authors: Sheng Lei, Alexander Flueck

Abstract: The initialization process of a novel power system transient simulation scheme for stability studies is put forward, by further develo** a "time-domain harmonic power-flow algorithm". The initialization process is formulated as an algebraic problem to ensure that the power system under study is in steady state and operated at a specified operating point, at the beginning of a transient simulatio… ▽ More The initialization process of a novel power system transient simulation scheme for stability studies is put forward, by further develo** a "time-domain harmonic power-flow algorithm". The initialization process is formulated as an algebraic problem to ensure that the power system under study is in steady state and operated at a specified operating point, at the beginning of a transient simulation run. The algebraic problem is then solved efficiently by a preconditioned finite difference Newton-GMRES method. Case studies verify the validity and efficiency of the initialization process. The proposed initialization process is general-purpose and can be applied to other power system transient simulation schemes. △ Less

Submitted 29 August, 2020; originally announced August 2020.

Comments: Accepted by the 52nd North American Power Symposium

arXiv:2007.01496 [pdf, other]

Few-Shot Semantic Segmentation Augmented with Image-Level Weak Annotations

Authors: Shuo Lei, Xuchao Zhang, Jianfeng He, Fanglan Chen, Chang-Tien Lu

Abstract: Despite the great progress made by deep neural networks in the semantic segmentation task, traditional neural-networkbased methods typically suffer from a shortage of large amounts of pixel-level annotations. Recent progress in fewshot semantic segmentation tackles the issue by only a few pixel-level annotated examples. However, these few-shot approaches cannot easily be applied to multi-way or we… ▽ More Despite the great progress made by deep neural networks in the semantic segmentation task, traditional neural-networkbased methods typically suffer from a shortage of large amounts of pixel-level annotations. Recent progress in fewshot semantic segmentation tackles the issue by only a few pixel-level annotated examples. However, these few-shot approaches cannot easily be applied to multi-way or weak annotation settings. In this paper, we advance the few-shot segmentation paradigm towards a scenario where image-level annotations are available to help the training process of a few pixel-level annotations. Our key idea is to learn a better prototype representation of the class by fusing the knowledge from the image-level labeled data. Specifically, we propose a new framework, called PAIA, to learn the class prototype representation in a metric space by integrating image-level annotations. Furthermore, by considering the uncertainty of pseudo-masks, a distilled soft masked average pooling strategy is designed to handle distractions in image-level annotations. Extensive empirical results on two datasets show superior performance of PAIA. △ Less

Submitted 18 June, 2021; v1 submitted 3 July, 2020; originally announced July 2020.

Comments: Accpeted to ICME2021

arXiv:2005.00964 [pdf]

Efficient Power System Transient Simulation Based on Frequency Response Optimized Integrators Considering Second Order Derivative

Authors: Sheng Lei, Alexander Flueck

Abstract: Frequency response optimized integrators considering second order derivative are proposed in this paper. Based on the proposed numerical integrators, and others which also consider second order derivative, this paper puts forward a novel power system transient simulation scheme. Instead of using a unique numerical integrator, the proposed simulation scheme chooses proper ones according to the domi… ▽ More Frequency response optimized integrators considering second order derivative are proposed in this paper. Based on the proposed numerical integrators, and others which also consider second order derivative, this paper puts forward a novel power system transient simulation scheme. Instead of using a unique numerical integrator, the proposed simulation scheme chooses proper ones according to the dominant frequency component of the differential state variables. With the proposed simulation scheme, computational efficiency is improved by using large step sizes without sacrificing accuracy. Numerical case studies demonstrate the validity and efficiency of the simulation scheme. △ Less

Submitted 2 May, 2020; originally announced May 2020.

Comments: Accepted by the 2020 IEEE PES General Meeting

arXiv:2004.13557 [pdf, other]

Baseline Estimation of Commercial Building HVAC Fan Power Using Tensor Completion

Authors: Shunbo Lei, David Hong, Johanna L. Mathieu, Ian A. Hiskens

Abstract: Commercial building heating, ventilation, and air conditioning (HVAC) systems have been studied for providing ancillary services to power grids via demand response (DR). One critical issue is to estimate the counterfactual baseline power consumption that would have prevailed without DR. Baseline methods have been developed based on whole building electric load profiles. New methods are necessary t… ▽ More Commercial building heating, ventilation, and air conditioning (HVAC) systems have been studied for providing ancillary services to power grids via demand response (DR). One critical issue is to estimate the counterfactual baseline power consumption that would have prevailed without DR. Baseline methods have been developed based on whole building electric load profiles. New methods are necessary to estimate the baseline power consumption of HVAC sub-components (e.g., supply and return fans), which have different characteristics compared to that of the whole building. Tensor completion can estimate the unobserved entries of multi-dimensional tensors describing complex data sets. It exploits high-dimensional data to capture granular insights into the problem. This paper proposes to use it for baselining HVAC fan power, by utilizing its capability of capturing dominant fan power patterns. The tensor completion method is evaluated using HVAC fan power data from several buildings at the University of Michigan, and compared with several existing methods. The tensor completion method generally outperforms the benchmarks. △ Less

Submitted 24 April, 2020; originally announced April 2020.

arXiv:1912.06936 [pdf]

doi 10.1021/acs.jpca.9b11681

Compressed Sensing for Reconstructing Coherent Multidimensional Spectra

Authors: Zhengjun Wang, Shiwen Lei, Khadga Jung Karki, Andreas Jakobsson, Tönu Pullerits

Abstract: We apply two sparse reconstruction techniques, the least absolute shrinkage and selection operator (LASSO) and the sparse exponential mode analysis (SEMA), to two-dimensional (2D) spectroscopy. The algorithms are first tested on model data, showing that both are able to reconstruct the spectra using only a fraction of the data required by the traditional Fourier-based estimator. Through the analys… ▽ More We apply two sparse reconstruction techniques, the least absolute shrinkage and selection operator (LASSO) and the sparse exponential mode analysis (SEMA), to two-dimensional (2D) spectroscopy. The algorithms are first tested on model data, showing that both are able to reconstruct the spectra using only a fraction of the data required by the traditional Fourier-based estimator. Through the analysis of a sparsely sampled experimental fluorescence detected 2D spectra of LH2 complexes, we conclude that both SEMA and LASSO can be used to significantly reduce the required data, still allowing to reconstruct the multidimensional spectra. Of the two techniques, it is shown that SEMA offers preferable performance, providing more accurate estimation of the spectral line widths and their positions. Furthermore, SEMA allows for off-grid components, enabling the use of a much smaller dictionary than the LASSO, thereby improving both the performance and lowering the computational complexity for reconstructing coherent multidimensional spectra. △ Less

Submitted 14 December, 2019; originally announced December 2019.

arXiv:1911.09987 [pdf, other]

Transmission System Resilience Enhancement with Extended Steady-state Security Region in Consideration of Uncertain Topology Changes

Authors: Chong Wang, Feng Wu, ** Ju, Shunbo Lei, Tianguang Lu, Yunhe Hou

Abstract: The increasing extreme weather events poses unprecedented challenges on power system operation because of their uncertain and sequential impacts on power systems. This paper proposes the concept of an extended steady-state security region (ESSR), and resilience enhancement for transmission systems based on ESSR in consideration of uncertain varying topology changes caused by the extreme weather ev… ▽ More The increasing extreme weather events poses unprecedented challenges on power system operation because of their uncertain and sequential impacts on power systems. This paper proposes the concept of an extended steady-state security region (ESSR), and resilience enhancement for transmission systems based on ESSR in consideration of uncertain varying topology changes caused by the extreme weather events is implemented. ESSR is a ploytope describing a region, in which the operating points are within the operating constraints. In consideration of uncertain varying topology changes with ESSR, the resilience enhancement problem is built as a bilevel programming optimization model, in which the system operators deploy the optimal strategy against the most threatening scenario caused by the extreme weather events. To avoid the curse of dimensionality with regard to system topologies for a large scale system, the Monte Carlo method is used to generate uncertain system topologies, and a recursive McCormick envelope-based approach is proposed to connect generated system topologies to optimization variables. Karush Kuhn Tucker (KKT) conditions are used to transform the suboptimization model in the second level into a group of equivalent constraints in the first level. A simple test system and IEEE 118-bus system are used to validate the proposed. △ Less

Submitted 22 November, 2019; originally announced November 2019.

Showing 1–30 of 30 results for author: Lei, S