-
Solving Parametric PDEs with Radial Basis Functions and Deep Neural Networks
Authors:
Guanhang Lei,
Zhen Lei,
Lei Shi,
Chenyu Zeng
Abstract:
We propose the POD-DNN, a novel algorithm leveraging deep neural networks (DNNs) along with radial basis functions (RBFs) in the context of the proper orthogonal decomposition (POD) reduced basis method (RBM), aimed at approximating the parametric map** of parametric partial differential equations on irregular domains. The POD-DNN algorithm capitalizes on the low-dimensional characteristics of t…
▽ More
We propose the POD-DNN, a novel algorithm leveraging deep neural networks (DNNs) along with radial basis functions (RBFs) in the context of the proper orthogonal decomposition (POD) reduced basis method (RBM), aimed at approximating the parametric map** of parametric partial differential equations on irregular domains. The POD-DNN algorithm capitalizes on the low-dimensional characteristics of the solution manifold for parametric equations, alongside the inherent offline-online computational strategy of RBM and DNNs. In numerical experiments, POD-DNN demonstrates significantly accelerated computation speeds during the online phase. Compared to other algorithms that utilize RBF without integrating DNNs, POD-DNN substantially improves the computational speed in the online inference process. Furthermore, under reasonable assumptions, we have rigorously derived upper bounds on the complexity of approximating parametric map**s with POD-DNN, thereby providing a theoretical analysis of the algorithm's empirical performance.
△ Less
Submitted 12 April, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Materials science in the era of large language models: a perspective
Authors:
Ge Lei,
Ronan Docherty,
Samuel J. Cooper
Abstract:
Large Language Models (LLMs) have garnered considerable interest due to their impressive natural language capabilities, which in conjunction with various emergent properties make them versatile tools in workflows ranging from complex code generation to heuristic finding for combinatorial problems. In this paper we offer a perspective on their applicability to materials science research, arguing th…
▽ More
Large Language Models (LLMs) have garnered considerable interest due to their impressive natural language capabilities, which in conjunction with various emergent properties make them versatile tools in workflows ranging from complex code generation to heuristic finding for combinatorial problems. In this paper we offer a perspective on their applicability to materials science research, arguing their ability to handle ambiguous requirements across a range of tasks and disciplines mean they could be a powerful tool to aid researchers. We qualitatively examine basic LLM theory, connecting it to relevant properties and techniques in the literature before providing two case studies that demonstrate their use in task automation and knowledge extraction at-scale. At their current stage of development, we argue LLMs should be viewed less as oracles of novel insight, and more as tireless workers that can accelerate and unify exploration across domains. It is our hope that this paper can familiarise material science researchers with the concepts needed to leverage these tools in their own research.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery
Authors:
Wei Zhang,
Miaoxin Cai,
Tong Zhang,
Guoqiang Lei,
Yin Zhuang,
Xuerui Mao
Abstract:
Ship detection needs to identify ship locations from remote sensing (RS) scenes. Due to different imaging payloads, various appearances of ships, and complicated background interference from the bird's eye view, it is difficult to set up a unified paradigm for achieving multi-source ship detection. To address this challenge, in this article, leveraging the large language models (LLMs)'s powerful g…
▽ More
Ship detection needs to identify ship locations from remote sensing (RS) scenes. Due to different imaging payloads, various appearances of ships, and complicated background interference from the bird's eye view, it is difficult to set up a unified paradigm for achieving multi-source ship detection. To address this challenge, in this article, leveraging the large language models (LLMs)'s powerful generalization ability, a unified visual-language model called Popeye is proposed for multi-source ship detection from RS imagery. Specifically, to bridge the interpretation gap between the multi-source images for ship detection, a novel unified labeling paradigm is designed to integrate different visual modalities and the various ship detection ways, i.e., horizontal bounding box (HBB) and oriented bounding box (OBB). Subsequently, the hybrid experts encoder is designed to refine multi-scale visual features, thereby enhancing visual perception. Then, a visual-language alignment method is developed for Popeye to enhance interactive comprehension ability between visual and language content. Furthermore, an instruction adaption mechanism is proposed for transferring the pre-trained visual-language knowledge from the nature scene into the RS domain for multi-source ship detection. In addition, the segment anything model (SAM) is also seamlessly integrated into the proposed Popeye to achieve pixel-level ship segmentation without additional training costs. Finally, extensive experiments are conducted on the newly constructed ship instruction dataset named MMShip, and the results indicate that the proposed Popeye outperforms current specialist, open-vocabulary, and other visual-language models for zero-shot multi-source ship detection.
△ Less
Submitted 13 June, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis
Authors:
Yu Gu,
Yianrao Bian,
Guangzhi Lei,
Chao Weng,
Dan Su
Abstract:
This paper introduces an improved duration informed attention neural network (DurIAN-E) for expressive and high-fidelity text-to-speech (TTS) synthesis. Inherited from the original DurIAN model, an auto-regressive model structure in which the alignments between the input linguistic information and the output acoustic features are inferred from a duration model is adopted. Meanwhile the proposed Du…
▽ More
This paper introduces an improved duration informed attention neural network (DurIAN-E) for expressive and high-fidelity text-to-speech (TTS) synthesis. Inherited from the original DurIAN model, an auto-regressive model structure in which the alignments between the input linguistic information and the output acoustic features are inferred from a duration model is adopted. Meanwhile the proposed DurIAN-E utilizes multiple stacked SwishRNN-based Transformer blocks as linguistic encoders. Style-Adaptive Instance Normalization (SAIN) layers are exploited into frame-level encoders to improve the modeling ability of expressiveness. A denoiser incorporating both denoising diffusion probabilistic model (DDPM) for mel-spectrograms and SAIN modules is conducted to further improve the synthetic speech quality and expressiveness. Experimental results prove that the proposed expressive TTS model in this paper can achieve better performance than the state-of-the-art approaches in both subjective mean opinion score (MOS) and preference tests.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Solving PDEs on Spheres with Physics-Informed Convolutional Neural Networks
Authors:
Guanhang Lei,
Zhen Lei,
Lei Shi,
Chenyu Zeng,
Ding-Xuan Zhou
Abstract:
Physics-informed neural networks (PINNs) have been demonstrated to be efficient in solving partial differential equations (PDEs) from a variety of experimental perspectives. Some recent studies have also proposed PINN algorithms for PDEs on surfaces, including spheres. However, theoretical understanding of the numerical performance of PINNs, especially PINNs on surfaces or manifolds, is still lack…
▽ More
Physics-informed neural networks (PINNs) have been demonstrated to be efficient in solving partial differential equations (PDEs) from a variety of experimental perspectives. Some recent studies have also proposed PINN algorithms for PDEs on surfaces, including spheres. However, theoretical understanding of the numerical performance of PINNs, especially PINNs on surfaces or manifolds, is still lacking. In this paper, we establish rigorous analysis of the physics-informed convolutional neural network (PICNN) for solving PDEs on the sphere. By using and improving the latest approximation results of deep convolutional neural networks and spherical harmonic analysis, we prove an upper bound for the approximation error with respect to the Sobolev norm. Subsequently, we integrate this with innovative localization complexity analysis to establish fast convergence rates for PICNN. Our theoretical results are also confirmed and supplemented by our experiments. In light of these findings, we explore potential strategies for circumventing the curse of dimensionality that arises when solving high-dimensional PDEs.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Pairwise Ranking with Gaussian Kernels
Authors:
Guanhang Lei,
Lei Shi
Abstract:
Regularized pairwise ranking with Gaussian kernels is one of the cutting-edge learning algorithms. Despite a wide range of applications, a rigorous theoretical demonstration still lacks to support the performance of such ranking estimators. This work aims to fill this gap by develo** novel oracle inequalities for regularized pairwise ranking. With the help of these oracle inequalities, we derive…
▽ More
Regularized pairwise ranking with Gaussian kernels is one of the cutting-edge learning algorithms. Despite a wide range of applications, a rigorous theoretical demonstration still lacks to support the performance of such ranking estimators. This work aims to fill this gap by develo** novel oracle inequalities for regularized pairwise ranking. With the help of these oracle inequalities, we derive fast learning rates of Gaussian ranking estimators under a general box-counting dimension assumption on the input domain combined with the noise conditions or the standard smoothness condition. Our theoretical analysis improves the existing estimates and shows that a low intrinsic dimension of input space can help the rates circumvent the curse of dimensionality.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
Adjacent-Level Feature Cross-Fusion With 3-D CNN for Remote Sensing Image Change Detection
Authors:
Yuanxin Ye,
Mengmeng Wang,
Liang Zhou,
Guangyang Lei,
Jianwei Fan,
Yao Qin
Abstract:
Deep learning-based change detection (CD) using remote sensing images has received increasing attention in recent years. However, how to effectively extract and fuse the deep features of bi-temporal images for improving the accuracy of CD is still a challenge. To address that, a novel adjacent-level feature fusion network with 3D convolution (named AFCF3D-Net) is proposed in this article. First, t…
▽ More
Deep learning-based change detection (CD) using remote sensing images has received increasing attention in recent years. However, how to effectively extract and fuse the deep features of bi-temporal images for improving the accuracy of CD is still a challenge. To address that, a novel adjacent-level feature fusion network with 3D convolution (named AFCF3D-Net) is proposed in this article. First, through the inner fusion property of 3D convolution, we design a new feature fusion way that can simultaneously extract and fuse the feature information from bi-temporal images. Then, to alleviate the semantic gap between low-level features and high-level features, we propose an adjacent-level feature cross-fusion (AFCF) module to aggregate complementary feature information between the adjacent levels. Furthermore, the full-scale skip connection strategy is introduced to improve the capability of pixel-wise prediction and the compactness of changed objects in the results. Finally, the proposed AFCF3D-Net has been validated on the three challenging remote sensing CD datasets: the Wuhan building dataset (WHU-CD), the LEVIR building dataset (LEVIR-CD), and the Sun Yat-Sen University dataset (SYSU-CD). The results of quantitative analysis and qualitative comparison demonstrate that the proposed AFCF3D-Net achieves better performance compared to other state-of-the-art methods. The code for this work is available at https://github.com/wm-Githuber/AFCF3D-Net.
△ Less
Submitted 17 January, 2024; v1 submitted 10 February, 2023;
originally announced February 2023.
-
Construction of Two Statistical Anomaly Features for Small-Sample APT Attack Traffic Classification
Authors:
Ru Zhang,
Wenxin Sun,
Jianyi Liu,
**gwen Li,
Guan Lei,
Han Guo
Abstract:
Advanced Persistent Threat (APT) attack, also known as directed threat attack, refers to the continuous and effective attack activities carried out by an organization on a specific object. They are covert, persistent and targeted, which are difficult to capture by traditional intrusion detection system(IDS). The traffic generated by the APT organization, which is the organization that launch the A…
▽ More
Advanced Persistent Threat (APT) attack, also known as directed threat attack, refers to the continuous and effective attack activities carried out by an organization on a specific object. They are covert, persistent and targeted, which are difficult to capture by traditional intrusion detection system(IDS). The traffic generated by the APT organization, which is the organization that launch the APT attack, has a high similarity, especially in the Command and Control(C2) stage. The addition of features for APT organizations can effectively improve the accuracy of traffic detection for APT attacks. This paper analyzes the DNS and TCP traffic of the APT attack, and constructs two new features, C2Load_fluct (response packet load fluctuation) and Bad_rate (bad packet rate). The analysis showed APT attacks have obvious statistical laws in these two features. This article combines two new features with common features to classify APT attack traffic. Aiming at the problem of data loss and boundary samples, we improve the Adaptive Synthetic(ADASYN) Sampling Approach and propose the PADASYN algorithm to achieve data balance. A traffic classification scheme is designed based on the AdaBoost algorithm. Experiments show that the classification accuracy of APT attack traffic is improved after adding new features to the two datasets so that 10 DNS features, 11 TCP and HTTP/HTTPS features are used to construct a Features set. On the two datasets, F1-score can reach above 0.98 and 0.94 respectively, which proves that the two new features in this paper are effective for APT traffic detection.
△ Less
Submitted 26 October, 2020;
originally announced October 2020.
-
Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams
Authors:
Huirong Huang,
Zhiyong Wu,
Shiyin Kang,
Dongyang Dai,
Jia Jia,
Tianxiao Fu,
Deyi Tuo,
Guangzhi Lei,
Peng Liu,
Dan Su,
Dong Yu,
Helen Meng
Abstract:
Generating 3D speech-driven talking head has received more and more attention in recent years. Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input. In this work, we propose a novel approach using phone…
▽ More
Generating 3D speech-driven talking head has received more and more attention in recent years. Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input. In this work, we propose a novel approach using phonetic posteriorgrams (PPG). In this way, our method doesn't need hand-crafted features and is more robust to noise compared to recent approaches. Furthermore, our method can support multilingual speech as input by building a universal phoneme space. As far as we know, our model is the first to support multilingual/mixlingual speech as input with convincing results. Objective and subjective experiments have shown that our model can generate high quality animations given speech from unseen languages or speakers and be robust to noise.
△ Less
Submitted 20 June, 2020;
originally announced June 2020.
-
DurIAN: Duration Informed Attention Network For Multimodal Synthesis
Authors:
Chengzhu Yu,
Heng Lu,
Na Hu,
Meng Yu,
Chao Weng,
Kun Xu,
Peng Liu,
Deyi Tuo,
Shiyin Kang,
Guangzhi Lei,
Dan Su,
Dong Yu
Abstract:
In this paper, we present a generic and robust multimodal synthesis system that produces highly natural speech and facial expression simultaneously. The key component of this system is the Duration Informed Attention Network (DurIAN), an autoregressive model in which the alignments between the input text and the output acoustic features are inferred from a duration model. This is different from th…
▽ More
In this paper, we present a generic and robust multimodal synthesis system that produces highly natural speech and facial expression simultaneously. The key component of this system is the Duration Informed Attention Network (DurIAN), an autoregressive model in which the alignments between the input text and the output acoustic features are inferred from a duration model. This is different from the end-to-end attention mechanism used, and accounts for various unavoidable artifacts, in existing end-to-end speech synthesis systems such as Tacotron. Furthermore, DurIAN can be used to generate high quality facial expression which can be synchronized with generated speech with/without parallel speech and face data. To improve the efficiency of speech generation, we also propose a multi-band parallel generation strategy on top of the WaveRNN model. The proposed Multi-band WaveRNN effectively reduces the total computational complexity from 9.8 to 5.5 GFLOPS, and is able to generate audio that is 6 times faster than real time on a single CPU core. We show that DurIAN could generate highly natural speech that is on par with current state of the art end-to-end systems, while at the same time avoid word skip**/repeating errors in those systems. Finally, a simple yet effective approach for fine-grained control of expressiveness of speech and facial expression is introduced.
△ Less
Submitted 5 September, 2019; v1 submitted 4 September, 2019;
originally announced September 2019.