Search | arXiv e-print repository

DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser

Authors: Peng Chen, Xiaobao Wei, Ming Lu, Yitong Zhu, Naiming Yao, Xingyu Xiao, Hui Chen

Abstract: Speech-driven 3D facial animation has been an attractive task in both academia and industry. Traditional methods mostly focus on learning a deterministic map** from speech to animation. Recent approaches start to consider the non-deterministic fact of speech-driven 3D face animation and employ the diffusion model for the task. However, personalizing facial animation and accelerating animation ge… ▽ More Speech-driven 3D facial animation has been an attractive task in both academia and industry. Traditional methods mostly focus on learning a deterministic map** from speech to animation. Recent approaches start to consider the non-deterministic fact of speech-driven 3D face animation and employ the diffusion model for the task. However, personalizing facial animation and accelerating animation generation are still two major limitations of existing diffusion-based methods. To address the above limitations, we propose DiffusionTalker, a diffusion-based method that utilizes contrastive learning to personalize 3D facial animation and knowledge distillation to accelerate 3D animation generation. Specifically, to enable personalization, we introduce a learnable talking identity to aggregate knowledge in audio sequences. The proposed identity embeddings extract customized facial cues across different people in a contrastive learning manner. During inference, users can obtain personalized facial animation based on input audio, reflecting a specific talking style. With a trained diffusion model with hundreds of steps, we distill it into a lightweight model with 8 steps for acceleration. Extensive experiments are conducted to demonstrate that our method outperforms state-of-the-art methods. The code will be released. △ Less

Submitted 2 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.00567 [pdf]

A Robust Deep Learning Method with Uncertainty Estimation for the Pathological Classification of Renal Cell Carcinoma based on CT Images

Authors: Ni Yao, Hang Hu, Kaicong Chen, Chen Zhao, Yuan Guo, Boya Li, Jiaofen Nan, Yanting Li, Chuang Han, Fubao Zhu, Weihua Zhou, Li Tian

Abstract: Objectives To develop and validate a deep learning-based diagnostic model incorporating uncertainty estimation so as to facilitate radiologists in the preoperative differentiation of the pathological subtypes of renal cell carcinoma (RCC) based on CT images. Methods Data from 668 consecutive patients, pathologically proven RCC, were retrospectively collected from Center 1. By using five-fold cross… ▽ More Objectives To develop and validate a deep learning-based diagnostic model incorporating uncertainty estimation so as to facilitate radiologists in the preoperative differentiation of the pathological subtypes of renal cell carcinoma (RCC) based on CT images. Methods Data from 668 consecutive patients, pathologically proven RCC, were retrospectively collected from Center 1. By using five-fold cross-validation, a deep learning model incorporating uncertainty estimation was developed to classify RCC subtypes into clear cell RCC (ccRCC), papillary RCC (pRCC), and chromophobe RCC (chRCC). An external validation set of 78 patients from Center 2 further evaluated the model's performance. Results In the five-fold cross-validation, the model's area under the receiver operating characteristic curve (AUC) for the classification of ccRCC, pRCC, and chRCC was 0.868 (95% CI: 0.826-0.923), 0.846 (95% CI: 0.812-0.886), and 0.839 (95% CI: 0.802-0.88), respectively. In the external validation set, the AUCs were 0.856 (95% CI: 0.838-0.882), 0.787 (95% CI: 0.757-0.818), and 0.793 (95% CI: 0.758-0.831) for ccRCC, pRCC, and chRCC, respectively. Conclusions The developed deep learning model demonstrated robust performance in predicting the pathological subtypes of RCC, while the incorporated uncertainty emphasized the importance of understanding model confidence, which is crucial for assisting clinical decision-making for patients with renal tumors. Clinical relevance statement Our deep learning approach, integrated with uncertainty estimation, offers clinicians a dual advantage: accurate RCC subtype predictions complemented by diagnostic confidence references, promoting informed decision-making for patients with RCC. △ Less

Submitted 12 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: 16 pages, 6 figures

arXiv:2306.17008 [pdf]

MLA-BIN: Model-level Attention and Batch-instance Style Normalization for Domain Generalization of Federated Learning on Medical Image Segmentation

Authors: Fubao Zhu, Yanhui Tian, Chuang Han, Yanting Li, Jiaofen Nan, Ni Yao, Weihua Zhou

Abstract: The privacy protection mechanism of federated learning (FL) offers an effective solution for cross-center medical collaboration and data sharing. In multi-site medical image segmentation, each medical site serves as a client of FL, and its data naturally forms a domain. FL supplies the possibility to improve the performance of seen domains model. However, there is a problem of domain generalizatio… ▽ More The privacy protection mechanism of federated learning (FL) offers an effective solution for cross-center medical collaboration and data sharing. In multi-site medical image segmentation, each medical site serves as a client of FL, and its data naturally forms a domain. FL supplies the possibility to improve the performance of seen domains model. However, there is a problem of domain generalization (DG) in the actual de-ployment, that is, the performance of the model trained by FL in unseen domains will decrease. Hence, MLA-BIN is proposed to solve the DG of FL in this study. Specifically, the model-level attention module (MLA) and batch-instance style normalization (BIN) block were designed. The MLA represents the unseen domain as a linear combination of seen domain models. The atten-tion mechanism is introduced for the weighting coefficient to obtain the optimal coefficient ac-cording to the similarity of inter-domain data features. MLA enables the global model to gen-eralize to unseen domain. In the BIN block, batch normalization (BN) and instance normalization (IN) are combined to perform the shallow layers of the segmentation network for style normali-zation, solving the influence of inter-domain image style differences on DG. The extensive experimental results of two medical image seg-mentation tasks demonstrate that the proposed MLA-BIN outperforms state-of-the-art methods. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Comments: 9 pages, 8 figures, 2 tables

arXiv:2301.12340 [pdf]

Incremental Value and Interpretability of Radiomics Features of Both Lung and Epicardial Adipose Tissue for Detecting the Severity of COVID-19 Infection

Authors: Ni Yao, Yanhui Tian, Daniel Gama das Neves, Chen Zhao, Claudio Tinoco Mesquita, Wolney de Andrade Martins, Alair Augusto Sarmet Moreira Damas dos Santos, Yanting Li, Chuang Han, Fubao Zhu, Neng Dai, Weihua Zhou

Abstract: Epicardial adipose tissue (EAT) is known for its pro-inflammatory properties and association with Coronavirus Disease 2019 (COVID-19) severity. However, current EAT segmentation methods do not consider positional information. Additionally, the detection of COVID-19 severity lacks consideration for EAT radiomics features, which limits interpretability. This study investigates the use of radiomics f… ▽ More Epicardial adipose tissue (EAT) is known for its pro-inflammatory properties and association with Coronavirus Disease 2019 (COVID-19) severity. However, current EAT segmentation methods do not consider positional information. Additionally, the detection of COVID-19 severity lacks consideration for EAT radiomics features, which limits interpretability. This study investigates the use of radiomics features from EAT and lungs to detect the severity of COVID-19 infections. A retrospective analysis of 515 patients with COVID-19 (Cohort1: 415, Cohort2: 100) was conducted using a proposed three-stage deep learning approach for EAT extraction. Lung segmentation was achieved using a published method. A hybrid model for detecting the severity of COVID-19 was built in a derivation cohort, and its performance and uncertainty were evaluated in internal (125, Cohort1) and external (100, Cohort2) validation cohorts. For EAT extraction, the Dice similarity coefficients (DSC) of the two centers were 0.972 (+-0.011) and 0.968 (+-0.005), respectively. For severity detection, the hybrid model with radiomics features of both lungs and EAT showed improvements in AUC, net reclassification improvement (NRI), and integrated discrimination improvement (IDI) compared to the model with only lung radiomics features. The hybrid model exhibited an increase of 0.1 (p<0.001), 19.3%, and 18.0% respectively, in the internal validation cohort and an increase of 0.09 (p<0.001), 18.0%, and 18.0%, respectively, in the external validation cohort while outperforming existing detection methods. Uncertainty quantification and radiomics features analysis confirmed the interpretability of case prediction after inclusion of EAT features. △ Less

Submitted 6 December, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

Comments: 20 pages, 7 figures

arXiv:2109.05375 [pdf, other]

From Instantaneous Schedulability to Worst Case Schedulability: A Significant Moment Approach

Authors: Ningshi Yao, Fumin Zhang

Abstract: The method of significant moment analysis has been employed to derive instantaneous schedulability tests for real-time systems. However, the instantaneous schedulability can only be checked within a finite time window. On the other hand, worst-case schedulability guarantees schedulability of systems for infinite time. This paper derives the classical worst-case schedulability conditions for preemp… ▽ More The method of significant moment analysis has been employed to derive instantaneous schedulability tests for real-time systems. However, the instantaneous schedulability can only be checked within a finite time window. On the other hand, worst-case schedulability guarantees schedulability of systems for infinite time. This paper derives the classical worst-case schedulability conditions for preemptive periodic systems starting from instantaneous schedulability, hence unifying the two notions of schedulability. The results provide a rigorous justification on the critical time instants being the worst case for scheduling of preemptive periodic systems. The paper also show that the critical time instant is not the only worst case moments. △ Less

Submitted 11 September, 2021; originally announced September 2021.

Comments: 24 pages, 5 figures

arXiv:2002.05534 [pdf, other]

Abnormal respiratory patterns classifier may contribute to large-scale screening of people infected with COVID-19 in an accurate and unobtrusive manner

Authors: Yunlu Wang, Menghan Hu, Qingli Li, Xiao-** Zhang, Guangtao Zhai, Nan Yao

Abstract: Research significance: The extended version of this paper has been accepted by IEEE Internet of Things journal (DOI: 10.1109/JIOT.2020.2991456), please cite the journal version. During the epidemic prevention and control period, our study can be helpful in prognosis, diagnosis and screening for the patients infected with COVID-19 (the novel coronavirus) based on breathing characteristics. Accordin… ▽ More Research significance: The extended version of this paper has been accepted by IEEE Internet of Things journal (DOI: 10.1109/JIOT.2020.2991456), please cite the journal version. During the epidemic prevention and control period, our study can be helpful in prognosis, diagnosis and screening for the patients infected with COVID-19 (the novel coronavirus) based on breathing characteristics. According to the latest clinical research, the respiratory pattern of COVID-19 is different from the respiratory patterns of flu and the common cold. One significant symptom that occurs in the COVID-19 is Tachypnea. People infected with COVID-19 have more rapid respiration. Our study can be utilized to distinguish various respiratory patterns and our device can be preliminarily put to practical use. Demo videos of this method working in situations of one subject and two subjects can be downloaded online. Research details: Accurate detection of the unexpected abnormal respiratory pattern of people in a remote and unobtrusive manner has great significance. In this work, we innovatively capitalize on depth camera and deep learning to achieve this goal. The challenges in this task are twofold: the amount of real-world data is not enough for training to get the deep model; and the intra-class variation of different types of respiratory patterns is large and the outer-class variation is small. In this paper, considering the characteristics of actual respiratory signals, a novel and efficient Respiratory Simulation Model (RSM) is first proposed to fill the gap between the large amount of training data and scarce real-world data. The proposed deep model and the modeling ideas have the great potential to be extended to large scale applications such as public places, sleep scenario, and office environment. △ Less

Submitted 20 December, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

Comments: 6 page, 3 figure

arXiv:1610.01455 [pdf]

Scheduling Feasibility of Energy Management in Micro-grids Based on Significant Moment Analysis

Authors: Zhenwu Shi, Ningshi Yao, Fumin Zhang

Abstract: This paper studies the operation and scheduling of electric loads in micro-grid, a highly automated and distributed cyber-physical energy system (CPES). We establish rigorous mathematical expressions for electric loads and battery banks in the micro-grid by considering their characteristics and constraints. Based on these mathematical models, we propose a novel real-time scheduling analysis method… ▽ More This paper studies the operation and scheduling of electric loads in micro-grid, a highly automated and distributed cyber-physical energy system (CPES). We establish rigorous mathematical expressions for electric loads and battery banks in the micro-grid by considering their characteristics and constraints. Based on these mathematical models, we propose a novel real-time scheduling analysis method for priority-based energy management in micro-grid, named Significant Moments Analysis (SMA). SMA pinpoints all the crucial moments when electrical operations are requested among the micro-grid and establishes a dynamic model to describe the scheduling behavior of electric loads. Using SMA, we can check the scheduling feasibility and predict whether the micro-grid can generate enough power to support the execution of electric loads. In the case where the power is insufficient to supply load demands, SMA can provide accurate information about the amount of insufficient power and the time when the insufficiency happens. Simulated results are presented to show the effectiveness of the proposed analysis method. △ Less

Submitted 2 October, 2016; originally announced October 2016.

Showing 1–7 of 7 results for author: Yao, N