Finite basis Kolmogorov-Arnold networks: domain decomposition for data-driven and physics-informed problems

[Uncaptioned image] Amanda A. Howard
Pacific Northwest National Laboratory
Richland, WA 99354 USA
[email protected]
&[Uncaptioned image] Bruno Jacob
Pacific Northwest National Laboratory
Richland, WA 99354 USA
&[Uncaptioned image] Sarah H. Murphy
University of North Carolina, Charlotte
Charlotte, NC USA
Pacific Northwest National Laboratory
Richland, WA 99354 USA
&[Uncaptioned image] Alexander Heinlein
Delft University of Technology
Delft Institute of Applied Mathematics
2628 CD Delft, The Netherlands
[email protected] &[Uncaptioned image] Panos Stinis
Pacific Northwest National Laboratory
Richland, WA 99354 USA
University of Washington, Applied Mathematics
Seattle, WA USA
Brown University, Applied Mathematics
Providence, RI, 02912 USA
[email protected]
Abstract

Kolmogorov-Arnold networks (KANs) have attracted attention recently as an alternative to multilayer perceptrons (MLPs) for scientific machine learning. However, KANs can be expensive to train, even for relatively small networks. Inspired by finite basis physics-informed neural networks (FBPINNs), in this work, we develop a domain decomposition method for KANs that allows for several small KANs to be trained in parallel to give accurate solutions for multiscale problems. We show that finite basis KANs (FBKANs) can provide accurate results with noisy data and for physics-informed training.

Keywords Kolmogorov-Arnold networks  \cdot Physics-informed neural networks  \cdot Domain decomposition

Refer to caption
Figure 1: Graphical abstract.

1 Introduction

Scientific machine learning, predominantly developed using the theory of multilayer perceptrons (MLPs), has recently garnered a great deal of attention [1, 2, 3]. Kolmogorov-Arnold networks (KANs) [4] use the Kolmogorov-Arnold Theorem as inspiration for an alternative to MLPs. KANs offer advantages over MLPs in some cases, such as for continual learning and noisy data. While MLPs use trainable weights and biases on edges with fixed activation functions, KANs use trainable activation functions represented by splines. This switch may be beneficial because splines are easy to adjust locally and can switch between different resolutions. KANs have also been shown to provide increased interpretability over an MLP. Consequently, it is essential to provide a well-rounded analysis of the performance and applicability of KANs. An active area of research involves exploring how to utilize a KAN effectively and discerning in which cases a KAN might be preferred over an MLP. In this paper, we develop an architecture for KANs based on overlap** domain decompositions, which allows for more accurate solutions for complex problems.

Since the publication of [4], numerous variations of KANs have been released, including fractional KANs [5], deep operator KANs [6], physics-informed KANs (PI-KANs)[7], KAN-informed neural networks (KINNs)[8], temporal KANs [9], graph KANs [10, 11, 12], Chebyshev KANs (cKANs) [13], convolutional KANs [14], ReLU-KANs [15], among others. KANs have been applied to satellite image classification [16], abnormality detection [17], and computer vision [18]. While these results are promising, there remains much to learn about how KANs can best be utilized and optimized. Many of the recent variations employ alternative functions to the B-spline parametrization, while others extend KANs through well-developed methods from the field of machine learning, such as physics-informed neural networks (PINNs) [19], deep operator networks (DeepONets) [20], and recurrent neural networks. Because KANs still follow the general network architecture of an MLP, save for the switch between the edges and neurons, it is often straightforward to apply such techniques to KANs.

PINNs [19] have demonstrated success for a wide range of problems, including fluid dynamics [21, 22, 23, 24, 25], energy systems [26, 27, 28, 29, 30], and heat transfer [21]. PINNs allow for efficient and accurate solving of physical problems where robust data is lacking. This is accomplished by incorporating the governing differential equations of a system into the loss function of a neural network, which ensures that the resulting solution will satisfy the physical laws of the system. Although PINNs can successfully train with minimal data in many cases, there are notable cases in which challenges arise. For instance, PINNs can get stuck at the fixed points of a dynamical system while training [31]. PINNs are also difficult to train for multiscale problems, due to so-called spectral bias [32], or when the problem domain is large. Domain decomposition techniques can improve training in such cases [33, 34, 35, 36, 37]; for a general overview of domain decomposition approaches in scientific machine learning, see also [38, 39]. Rather than training the network over the entire domain, there are many approaches for breaking up the domain into subdomains and training networks on each subdomain. A popular choice for domain decomposition applied to PINNs is extended PINNs (XPINNs) introduced in [33], with a parallel version presented in [40], and extended in [34, 41]. XPINNs have different subnetworks for each domain, which are stitched together across boundaries and are trained in parallel. Another class of domain decomposition techniques for PINNs consists of time marching methods, where, in general, a network is initially trained for a small subdomain, and then the subdomain is slowly expanded until it contains the entire domain. Methods in this class include backward-compatible PINNs, which consist of a single network shared by all subdomains [35]. Also similar is the time marching approach in [36]. Finally, one particularly successful technique is finite basis PINNs (FBPINNs) [37, 42, 43, 44], which use partition of unity functions to weight the output of neural networks in each domain. In comparison to other techniques, FBPINNs offer simplicity because no boundary conditions are needed between adjacent domains. This work explores whether a similar technique can be used to enhance the training of a KAN for challenging cases.

In addition to domain decomposition, many methods have been developed to improve the training of PINNs, including adaptive weighting schemes [45, 32], residual-based attention [46], adaptive residual point selection [47, 48, 49, 50, 51, 52], causality techniques [53], multifidelity PINNs [54, 55], and hierarchical methods to learn more accurate solutions progressively [56, 57, 58, 59, 60, 61, 62]. A substantial advantage of the methods presented in this work is that these additional techniques could be applied in combination with FBKANs to improve the training further.

In this paper, we first introduce finite basis KANs (FBKANs) in section 2. In sections 3 and 4 we highlight some of the features of FBKANs applied to data-driven problems and physics-informed problems, respectively. We show that FBKANs can increase accuracy over KANs. One important feature of FBKANs is that the finite basis architecture serves as a wrapper around a KAN architecture. While we have chosen to focus on KANs as described in [4], most available extensions of KANs could be considered instead to increase accuracy or robustness further.

2 Methods

2.1 KANs

In [4], the authors proposed approximating a multivariate function f(𝐱)𝑓𝐱f(\mathbf{x})italic_f ( bold_x ) by a model of the form

f(𝐱)inl1=1mnl1φnl1,inl,inl1(inl2=1mnl2(i2=1m2φ2,i3,i2(i1=1m1φ1,i2,i1(i0=1m0φ0,i1,i0(xi0))))),𝑓𝐱superscriptsubscriptsubscript𝑖subscript𝑛𝑙11subscript𝑚subscript𝑛𝑙1subscript𝜑subscript𝑛𝑙1subscript𝑖subscript𝑛𝑙subscript𝑖subscript𝑛𝑙1superscriptsubscriptsubscript𝑖subscript𝑛𝑙21subscript𝑚subscript𝑛𝑙2superscriptsubscriptsubscript𝑖21subscript𝑚2subscript𝜑2subscript𝑖3subscript𝑖2superscriptsubscriptsubscript𝑖11subscript𝑚1subscript𝜑1subscript𝑖2subscript𝑖1superscriptsubscriptsubscript𝑖01subscript𝑚0subscript𝜑0subscript𝑖1subscript𝑖0subscript𝑥subscript𝑖0f(\mathbf{x})\approx\sum_{i_{n_{l}-1}=1}^{m_{n_{l}-1}}\varphi_{n_{l}-1,i_{n_{l% }},i_{n_{l}-1}}\left(\sum_{i_{n_{l}-2}=1}^{m_{n_{l}-2}}\ldots\left(\sum_{i_{2}% =1}^{m_{2}}\varphi_{2,i_{3},i_{2}}\left(\sum_{i_{1}=1}^{m_{1}}\varphi_{1,i_{2}% ,i_{1}}\left(\sum_{i_{0}=1}^{m_{0}}\varphi_{0,i_{1},i_{0}}(x_{i_{0}})\right)% \right)\right)\ldots\right),italic_f ( bold_x ) ≈ ∑ start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - 1 , italic_i start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - 2 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT … ( ∑ start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT 2 , italic_i start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT 1 , italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT 0 , italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) ) ) … ) , (1)

denoted as a Kolmogorov-Arnold network. Here, nlsubscript𝑛𝑙n_{l}italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is the number of layers in the KAN, {mj}j=0nlsuperscriptsubscriptsubscript𝑚𝑗𝑗0subscript𝑛𝑙\{m_{j}\}_{j=0}^{n_{l}}{ italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the number of nodes per layer, and ϕi,j,ksubscriptitalic-ϕ𝑖𝑗𝑘\phi_{i,j,k}italic_ϕ start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT are the univariant activation functions. We denote the right-hand side of eq. 1 as 𝒦(x)𝒦𝑥\mathcal{K}(x)caligraphic_K ( italic_x ). The activation functions are polynomials of degree k𝑘kitalic_k on a grid with g𝑔gitalic_g grid points. They are represented by a weighted combination of a basis function b(x)𝑏𝑥b(x)italic_b ( italic_x ) and a B-spline,

ϕ(x)=wbb(x)+wsspline(x)italic-ϕ𝑥subscript𝑤𝑏𝑏𝑥subscript𝑤𝑠spline𝑥\phi(x)=w_{b}b(x)+w_{s}\text{spline}(x)italic_ϕ ( italic_x ) = italic_w start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT italic_b ( italic_x ) + italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT spline ( italic_x ) (2)

where

b(x)=x1+ex𝑏𝑥𝑥1superscript𝑒𝑥b(x)=\frac{x}{1+e^{-x}}italic_b ( italic_x ) = divide start_ARG italic_x end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT - italic_x end_POSTSUPERSCRIPT end_ARG

and

spline(x)=iciBi(x).spline𝑥subscript𝑖subscript𝑐𝑖subscript𝐵𝑖𝑥\text{spline}(x)=\sum_{i}c_{i}B_{i}(x).spline ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) .

Here, Bi(x)subscript𝐵𝑖𝑥B_{i}(x)italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) is a polynomial of degree k𝑘kitalic_k. ci,wb,subscript𝑐𝑖subscript𝑤𝑏c_{i},w_{b},italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , and wssubscript𝑤𝑠w_{s}italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT are trainable parameters.

KANs evaluate the B-splines on a precomputed grid. In one dimension, a domain [a,b]𝑎𝑏[a,b][ italic_a , italic_b ] a grid with g1subscript𝑔1g_{1}italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT intervals has grid points {t0=a,t1,t2,,tg1=b}formulae-sequencesubscript𝑡0𝑎subscript𝑡1subscript𝑡2subscript𝑡subscript𝑔1𝑏\{t_{0}=a,t_{1},t_{2},\ldots,t_{g_{1}}=b\}{ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_a , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_b }; cf. [4]. Grid extension [4] allows for fitting a new, fine-grained spline to a coarse-grained spline, increasing the expressivity of the KAN. The coarse splines are transferred to the fine splines following the procedure in [4].

In this work, we do not consider the method for enforcing sparsity as outlined in [4] and instead only consider the mean squared error in the loss function. Many variations of KANs have been proposed recently. However, in this work, we consider only the formulation as outlined in [4], although we note that the domain decomposition method presented here would work with many variants of KANs.

2.2 Physics-informed neural networks

A PINN is a neural network (NN) that is trained to approximate the solution of an initial-boundary value problem (IBVP) by optimizing loss terms accounting for initial conditions, boundary conditions, and a residual term using backpropagation to calculate derivatives. In particular, the residual term represents how well the output satisfies the governing differential equation [19]. More specifically, we aim at approximating the solution f𝑓fitalic_f of a generic IBVP

ft+𝒩x[f]subscript𝑓𝑡subscript𝒩𝑥delimited-[]𝑓\displaystyle f_{t}+\mathcal{N}_{x}[f]italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + caligraphic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT [ italic_f ] =0,absent0\displaystyle=0,= 0 , xΩ,t[0,T],formulae-sequence𝑥Ω𝑡0𝑇\displaystyle x\in\Omega,t\in[0,T],italic_x ∈ roman_Ω , italic_t ∈ [ 0 , italic_T ] , (3)
f(x,t)𝑓𝑥𝑡\displaystyle f(x,t)italic_f ( italic_x , italic_t ) =g(x,t),absent𝑔𝑥𝑡\displaystyle=g(x,t),= italic_g ( italic_x , italic_t ) , xΩ,t[0,T],formulae-sequence𝑥Ω𝑡0𝑇\displaystyle x\in\partial\Omega,t\in[0,T],italic_x ∈ ∂ roman_Ω , italic_t ∈ [ 0 , italic_T ] ,
f(x,0)𝑓𝑥0\displaystyle f(x,0)italic_f ( italic_x , 0 ) =u(x),absent𝑢𝑥\displaystyle=u(x),= italic_u ( italic_x ) , xΩ,𝑥Ω\displaystyle x\in\Omega,italic_x ∈ roman_Ω ,

over the open domain ΩdΩsuperscript𝑑\Omega\in\mathbb{R}^{d}roman_Ω ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with boundary Ω.Ω\partial\Omega.∂ roman_Ω . Here, x𝑥xitalic_x and t𝑡titalic_t represent the spatial and temporal coordinates, respectively, 𝒩xsubscript𝒩𝑥\mathcal{N}_{x}caligraphic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT is the differential operator with respect to x𝑥xitalic_x, and g𝑔gitalic_g and u𝑢uitalic_u are given functions representing the boundary and initial conditions, respectively. The PINN model is then trained to minimize the mean squared error for the initial conditions, boundary conditions, and residual (physics) sampled at Nic,Nbc,Nrsubscript𝑁𝑖𝑐subscript𝑁𝑏𝑐subscript𝑁𝑟N_{ic},N_{bc},N_{r}italic_N start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT data points. We denote the sampling points for the initial conditions, boundary conditions, and residual as {xici,u(xici)}i=1Nicsuperscriptsubscriptsuperscriptsubscript𝑥𝑖𝑐𝑖𝑢superscriptsubscript𝑥𝑖𝑐𝑖𝑖1subscript𝑁𝑖𝑐\{x_{ic}^{i},u(x_{ic}^{i})\}_{i=1}^{N_{ic}}{ italic_x start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u ( italic_x start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, {(xbci,tbci),g(xbci,tbci)}i=1Nbcsuperscriptsubscriptsuperscriptsubscript𝑥𝑏𝑐𝑖superscriptsubscript𝑡𝑏𝑐𝑖𝑔superscriptsubscript𝑥𝑏𝑐𝑖superscriptsubscript𝑡𝑏𝑐𝑖𝑖1subscript𝑁𝑏𝑐\{(x_{bc}^{i},t_{bc}^{i}),g(x_{bc}^{i},t_{bc}^{i})\}_{i=1}^{N_{bc}}{ ( italic_x start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , italic_g ( italic_x start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, and {(xri,tri)}i=1Nrsuperscriptsubscriptsuperscriptsubscript𝑥𝑟𝑖superscriptsubscript𝑡𝑟𝑖𝑖1subscript𝑁𝑟\{(x_{r}^{i},t_{r}^{i})\}_{i=1}^{N_{r}}{ ( italic_x start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, respectively. We then optimize the following weighted loss function with respect to the trainable parameters θ𝜃\thetaitalic_θ of the NN model fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT:

(θ)𝜃\displaystyle\mathcal{L}(\theta)caligraphic_L ( italic_θ ) =λicic(θ)+λbcbc(θ)+λrr(θ),absentsubscript𝜆𝑖𝑐subscript𝑖𝑐𝜃subscript𝜆𝑏𝑐subscript𝑏𝑐𝜃subscript𝜆𝑟subscript𝑟𝜃\displaystyle=\lambda_{ic}\mathcal{L}_{ic}(\theta)+\lambda_{bc}\mathcal{L}_{bc% }(\theta)+\lambda_{r}\mathcal{L}_{r}(\theta),= italic_λ start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT ( italic_θ ) + italic_λ start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT ( italic_θ ) + italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_θ ) , (4)
ic(θ)subscript𝑖𝑐𝜃\displaystyle\mathcal{L}_{ic}(\theta)caligraphic_L start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT ( italic_θ ) =1Nici=1Nic(fθ(xici,0)u(xici))2,absent1subscript𝑁𝑖𝑐superscriptsubscript𝑖1subscript𝑁𝑖𝑐superscriptsubscript𝑓𝜃superscriptsubscript𝑥𝑖𝑐𝑖0𝑢superscriptsubscript𝑥𝑖𝑐𝑖2\displaystyle=\frac{1}{N_{ic}}\sum_{i=1}^{N_{ic}}\left(f_{\theta}(x_{ic}^{i},0% )-u(x_{ic}^{i})\right)^{2},= divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , 0 ) - italic_u ( italic_x start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (5)
bc(θ)subscript𝑏𝑐𝜃\displaystyle\mathcal{L}_{bc}(\theta)caligraphic_L start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT ( italic_θ ) =1Nbci=1Nbc(fθ(xbci,tbci)g(xbci,tbci))2,absent1subscript𝑁𝑏𝑐superscriptsubscript𝑖1subscript𝑁𝑏𝑐superscriptsubscript𝑓𝜃superscriptsubscript𝑥𝑏𝑐𝑖superscriptsubscript𝑡𝑏𝑐𝑖𝑔superscriptsubscript𝑥𝑏𝑐𝑖superscriptsubscript𝑡𝑏𝑐𝑖2\displaystyle=\frac{1}{N_{bc}}\sum_{i=1}^{N_{bc}}\left(f_{\theta}(x_{bc}^{i},t% _{bc}^{i})-g(x_{bc}^{i},t_{bc}^{i})\right)^{2},= divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) - italic_g ( italic_x start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (6)
r(θ)subscript𝑟𝜃\displaystyle\mathcal{L}_{r}(\theta)caligraphic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_θ ) =1Nri=1Nr(tfθ(xri,tri)+𝒩x[fθ(xri,tri)])2,absent1subscript𝑁𝑟superscriptsubscript𝑖1subscript𝑁𝑟superscript𝑡subscript𝑓𝜃superscriptsubscript𝑥𝑟𝑖superscriptsubscript𝑡𝑟𝑖subscript𝒩𝑥delimited-[]subscript𝑓𝜃superscriptsubscript𝑥𝑟𝑖superscriptsubscript𝑡𝑟𝑖2\displaystyle=\frac{1}{N_{r}}\sum_{i=1}^{N_{r}}\left(\frac{\partial}{\partial t% }f_{\theta}(x_{r}^{i},t_{r}^{i})+\mathcal{N}_{x}[f_{\theta}(x_{r}^{i},t_{r}^{i% })]\right)^{2},= divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) + caligraphic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (7)

where λic,λbc,λrsubscript𝜆𝑖𝑐subscript𝜆𝑏𝑐subscript𝜆𝑟\lambda_{ic},\lambda_{bc},\lambda_{r}italic_λ start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT are weights for the different loss terms. The choice of these weights can have a significant impact on the training process and model performance. While adaptive schemes are available for choosing the weights [45, 32, 63, 64], in this work, we set the weights manually.

2.3 FBPINNs

In FBPINNs [37, 42, 43, 44], we decompose the space domain ΩΩ\Omegaroman_Ω or the space-time domain Ω×[0,T]Ω0𝑇\Omega\times[0,T]roman_Ω × [ 0 , italic_T ] into overlap** subdomains. Each overlap** subdomain ΩjsubscriptΩ𝑗\Omega_{j}roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the interior of the support of a corresponding function ωjsubscript𝜔𝑗\omega_{j}italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and all functions ωjsubscript𝜔𝑗\omega_{j}italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT form a partition of unity. In particular, in the case of L𝐿Litalic_L overlap** subdomains, we have

Ω=j=1LΩj,supp(ωj)=Ωj¯, andj=1Lωj1 in Ω.formulae-sequenceΩsuperscriptsubscript𝑗1𝐿subscriptΩ𝑗formulae-sequencesuppsubscript𝜔𝑗¯subscriptΩ𝑗 andsuperscriptsubscript𝑗1𝐿subscript𝜔𝑗1 in Ω\Omega=\bigcup_{j=1}^{L}\Omega_{j},\quad{\rm supp}(\omega_{j})=\overline{% \Omega_{j}},\text{ and}\quad\sum_{j=1}^{L}\omega_{j}\equiv 1\text{ in }\Omega.roman_Ω = ⋃ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , roman_supp ( italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = over¯ start_ARG roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG , and ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≡ 1 in roman_Ω .

In one dimension, a uniform overlap** domain decomposition with overlap ratio δ>1𝛿1\delta>1italic_δ > 1 is given by subdomains

Ωj=((j1)lδl/2L1,(j1)l+δl/2L1),subscriptΩ𝑗𝑗1𝑙𝛿𝑙2𝐿1𝑗1𝑙𝛿𝑙2𝐿1\Omega_{j}=\left(\frac{(j-1)l-\delta l/2}{L-1},\frac{(j-1)l+\delta l/2}{L-1}% \right),roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ( divide start_ARG ( italic_j - 1 ) italic_l - italic_δ italic_l / 2 end_ARG start_ARG italic_L - 1 end_ARG , divide start_ARG ( italic_j - 1 ) italic_l + italic_δ italic_l / 2 end_ARG start_ARG italic_L - 1 end_ARG ) ,

where l=max(x)min(x)𝑙𝑥𝑥l=\max(x)-\min(x)italic_l = roman_max ( italic_x ) - roman_min ( italic_x ).

There are multiple ways to define the partition of unity functions. Here, we construct them based on the expression

ωj=ω^jj=1Lω^j,subscript𝜔𝑗subscript^𝜔𝑗superscriptsubscript𝑗1𝐿subscript^𝜔𝑗\omega_{j}=\frac{\hat{\omega}_{j}}{\sum_{j=1}^{L}\hat{\omega}_{j}},italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = divide start_ARG over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG , (8)

where

ω^j(x)={1L=1,[1+cos(π(xμj)/σj)]2L>1,subscript^𝜔𝑗𝑥cases1𝐿1superscriptdelimited-[]1𝜋𝑥subscript𝜇𝑗subscript𝜎𝑗2𝐿1\hat{\omega}_{j}(x)=\begin{cases}1&L=1,\\ \left[1+\cos\left(\pi(x-\mu_{j})/\sigma_{j}\right)\right]^{2}&L>1,\end{cases}over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) = { start_ROW start_CELL 1 end_CELL start_CELL italic_L = 1 , end_CELL end_ROW start_ROW start_CELL [ 1 + roman_cos ( italic_π ( italic_x - italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) / italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL italic_L > 1 , end_CELL end_ROW (9)

with μj=l(j1)/(L1)subscript𝜇𝑗𝑙𝑗1𝐿1\mu_{j}=l(j-1)/(L-1)italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_l ( italic_j - 1 ) / ( italic_L - 1 ) and σj=(δl/2)/(L1)subscript𝜎𝑗𝛿𝑙2𝐿1\sigma_{j}=(\delta l/2)/(L-1)italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ( italic_δ italic_l / 2 ) / ( italic_L - 1 ) representing the center and half-width of each subdomain, respectively. An example of the one-dimensional partition of unity functions with L=4𝐿4L=4italic_L = 4 is depicted in fig. 2.

Refer to caption
Figure 2: Example the partition of unity functions on the domain Ω=[0,2]Ω02\Omega=[0,2]roman_Ω = [ 0 , 2 ] with L=4𝐿4L=4italic_L = 4 subdomains.

In multiple dimensions, 𝐱d𝐱superscript𝑑\mathbf{x}\in\mathbb{R}^{d}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we then obtain

ω^j(x)={1L=1,Πi=1d[1+cos(π(xiμij)/σij)]2L>1,subscript^𝜔𝑗𝑥cases1𝐿1superscriptsubscriptΠ𝑖1𝑑superscriptdelimited-[]1𝜋subscript𝑥𝑖subscript𝜇𝑖𝑗subscript𝜎𝑖𝑗2𝐿1\hat{\omega}_{j}(x)=\begin{cases}1&L=1,\\ \Pi_{i=1}^{d}\left[1+\cos\left(\pi(x_{i}-\mu_{ij})/\sigma_{ij}\right)\right]^{% 2}&L>1,\end{cases}over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) = { start_ROW start_CELL 1 end_CELL start_CELL italic_L = 1 , end_CELL end_ROW start_ROW start_CELL roman_Π start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT [ 1 + roman_cos ( italic_π ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) / italic_σ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL italic_L > 1 , end_CELL end_ROW (10)

with μijsubscript𝜇𝑖𝑗\mu_{ij}italic_μ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT and σijsubscript𝜎𝑖𝑗\sigma_{ij}italic_σ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT representing the center and half-width of each subdomain j𝑗jitalic_j in each direction i𝑖iitalic_i. An example of of a single partition of unity function with d=2𝑑2d=2italic_d = 2 and L=4𝐿4L=4italic_L = 4 is shown in fig. 3.

Refer to caption
Refer to caption
Figure 3: (Left) Example domain decomposition on the domain Ω=[1,1]×[1,1]Ω1111\Omega=[-1,1]\times[-1,1]roman_Ω = [ - 1 , 1 ] × [ - 1 , 1 ] with L=4𝐿4L=4italic_L = 4 subdomains. (Right) One example partition of unity function ω11(x,y).subscript𝜔11𝑥𝑦\omega_{11}(x,y).italic_ω start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( italic_x , italic_y ) .

Then, the FBPINN architecture reads

fθ(x)=j=1Lωj(x)fj,θj(x),subscript𝑓𝜃𝑥superscriptsubscript𝑗1𝐿subscript𝜔𝑗𝑥subscript𝑓𝑗superscript𝜃𝑗𝑥f_{\theta}(x)=\sum_{j=1}^{L}\omega_{j}(x)f_{j,\theta^{j}}(x),italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) italic_f start_POSTSUBSCRIPT italic_j , italic_θ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) ,

where the fj,θjsubscript𝑓𝑗superscript𝜃𝑗f_{j,\theta^{j}}italic_f start_POSTSUBSCRIPT italic_j , italic_θ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is the neural network with parameters θjsuperscript𝜃𝑗\theta^{j}italic_θ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT that corresponds to the subdomain ΩjsubscriptΩ𝑗\Omega_{j}roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT; it is localized to ΩjsubscriptΩ𝑗\Omega_{j}roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT by multiplication with the partition of unity function ωjsubscript𝜔𝑗\omega_{j}italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, which is zero outside of ΩjsubscriptΩ𝑗\Omega_{j}roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. The FBPINN model is trained in the same way as the PINN model, i.e., using initial, boundary, and residual loss terms; cf. section 2.2. Note that, in the original FBPINNs paper [37], hard enforcement of initial and boundary conditions is employed, such that only the residual loss term remains. Moreover, in [43], the approach has been extended to multilevel domain decompositions, and in [44], this domain decomposition approach has been applied to long-range time-dependent problems.

2.4 FBKANs

In FBKANs, we replace the NN used in FBPINNs with a KAN The function approximation in eq. 1 then becomes

f(𝐱)𝑓𝐱\displaystyle f(\mathbf{x})italic_f ( bold_x ) j=1Lωj(x)[inl1=1mnl1φnl1,inl,inl1j(inl2=1mnl2(i1=1m1φ1,i2,i1j(i0=1m0φ0,i1,i0j(xi0))))]absentsuperscriptsubscript𝑗1𝐿subscript𝜔𝑗𝑥delimited-[]superscriptsubscriptsubscript𝑖subscript𝑛𝑙11subscript𝑚subscript𝑛𝑙1subscriptsuperscript𝜑𝑗subscript𝑛𝑙1subscript𝑖subscript𝑛𝑙subscript𝑖subscript𝑛𝑙1superscriptsubscriptsubscript𝑖subscript𝑛𝑙21subscript𝑚subscript𝑛𝑙2superscriptsubscriptsubscript𝑖11subscript𝑚1subscriptsuperscript𝜑𝑗1subscript𝑖2subscript𝑖1superscriptsubscriptsubscript𝑖01subscript𝑚0subscriptsuperscript𝜑𝑗0subscript𝑖1subscript𝑖0subscript𝑥subscript𝑖0\displaystyle\approx\sum_{j=1}^{L}\omega_{j}(x)\left[\sum_{i_{n_{l}-1}=1}^{m_{% n_{l}-1}}\varphi^{j}_{n_{l}-1,i_{n_{l}},i_{n_{l}-1}}\left(\sum_{i_{n_{l}-2}=1}% ^{m_{n_{l}-2}}\ldots\left(\sum_{i_{1}=1}^{m_{1}}\varphi^{j}_{1,i_{2},i_{1}}% \left(\sum_{i_{0}=1}^{m_{0}}\varphi^{j}_{0,i_{1},i_{0}}(x_{i_{0}})\right)% \right)\ldots\right)\right]≈ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) [ ∑ start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_φ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - 1 , italic_i start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - 2 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT … ( ∑ start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_φ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 , italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_φ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 , italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) ) … ) ] (11)
=j=1Lωj(x)𝒦j(x;θj)absentsuperscriptsubscript𝑗1𝐿subscript𝜔𝑗𝑥superscript𝒦𝑗𝑥superscript𝜃𝑗\displaystyle=\sum_{j=1}^{L}\omega_{j}(x)\mathcal{K}^{j}(x;\theta^{j})= ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) caligraphic_K start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x ; italic_θ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT )

where 𝒦j(x;θj)superscript𝒦𝑗𝑥superscript𝜃𝑗\mathcal{K}^{j}(x;\theta^{j})caligraphic_K start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x ; italic_θ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) denotes the jthsuperscript𝑗𝑡j^{th}italic_j start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT KAN with trainable parameters θjsuperscript𝜃𝑗\theta^{j}italic_θ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT and ωjsubscript𝜔𝑗\omega_{j}italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT denotes the corresponding finite basis partition of unity function introduced in section 2.3. FBKANs are trained to minimize the loss function

(θ)=λicic(θ)+λbcbc(θ)+λrr(θ)+λdatadata(θ),𝜃subscript𝜆𝑖𝑐subscript𝑖𝑐𝜃subscript𝜆𝑏𝑐subscript𝑏𝑐𝜃subscript𝜆𝑟subscript𝑟𝜃subscript𝜆𝑑𝑎𝑡𝑎subscript𝑑𝑎𝑡𝑎𝜃\mathcal{L}(\theta)=\lambda_{ic}\mathcal{L}_{ic}(\theta)+\lambda_{bc}\mathcal{% L}_{bc}(\theta)+\lambda_{r}\mathcal{L}_{r}(\theta)+\lambda_{data}\mathcal{L}_{% data}(\theta),caligraphic_L ( italic_θ ) = italic_λ start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT ( italic_θ ) + italic_λ start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT ( italic_θ ) + italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_θ ) + italic_λ start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ( italic_θ ) , (12)

composed of

ic(θ)subscript𝑖𝑐𝜃\displaystyle\mathcal{L}_{ic}(\theta)caligraphic_L start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT ( italic_θ ) =1Nici=1Nic(j=1Lωj(x)𝒦j(xici;θj)u(xici))2,absent1subscript𝑁𝑖𝑐superscriptsubscript𝑖1subscript𝑁𝑖𝑐superscriptsuperscriptsubscript𝑗1𝐿subscript𝜔𝑗𝑥superscript𝒦𝑗superscriptsubscript𝑥𝑖𝑐𝑖superscript𝜃𝑗𝑢superscriptsubscript𝑥𝑖𝑐𝑖2\displaystyle=\frac{1}{N_{ic}}\sum_{i=1}^{N_{ic}}\left(\sum_{j=1}^{L}\omega_{j% }(x)\mathcal{K}^{j}(x_{ic}^{i};\theta^{j})-u(x_{ic}^{i})\right)^{2},= divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) caligraphic_K start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_θ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) - italic_u ( italic_x start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,
bc(θ)subscript𝑏𝑐𝜃\displaystyle\mathcal{L}_{bc}(\theta)caligraphic_L start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT ( italic_θ ) =1Nbci=1Nbc(j=1Lωj(x)𝒦j(xbci;θj)g(xbci,tbci))2,absent1subscript𝑁𝑏𝑐superscriptsubscript𝑖1subscript𝑁𝑏𝑐superscriptsuperscriptsubscript𝑗1𝐿subscript𝜔𝑗𝑥superscript𝒦𝑗superscriptsubscript𝑥𝑏𝑐𝑖superscript𝜃𝑗𝑔superscriptsubscript𝑥𝑏𝑐𝑖superscriptsubscript𝑡𝑏𝑐𝑖2\displaystyle=\frac{1}{N_{bc}}\sum_{i=1}^{N_{bc}}\left(\sum_{j=1}^{L}\omega_{j% }(x)\mathcal{K}^{j}(x_{bc}^{i};\theta^{j})-g(x_{bc}^{i},t_{bc}^{i})\right)^{2},= divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) caligraphic_K start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_θ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) - italic_g ( italic_x start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,
r(θ)subscript𝑟𝜃\displaystyle\mathcal{L}_{r}(\theta)caligraphic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_θ ) =1Nri=1Nr(t[j=1Lωj(x)𝒦j(xri;θj)]+𝒩x[j=1Lωj(x)𝒦j(xri;θj)])2,absent1subscript𝑁𝑟superscriptsubscript𝑖1subscript𝑁𝑟superscript𝑡delimited-[]superscriptsubscript𝑗1𝐿subscript𝜔𝑗𝑥superscript𝒦𝑗superscriptsubscript𝑥𝑟𝑖superscript𝜃𝑗subscript𝒩𝑥delimited-[]superscriptsubscript𝑗1𝐿subscript𝜔𝑗𝑥superscript𝒦𝑗superscriptsubscript𝑥𝑟𝑖superscript𝜃𝑗2\displaystyle=\frac{1}{N_{r}}\sum_{i=1}^{N_{r}}\left(\frac{\partial}{\partial t% }\left[\sum_{j=1}^{L}\omega_{j}(x)\mathcal{K}^{j}(x_{r}^{i};\theta^{j})\right]% +\mathcal{N}_{x}\left[\sum_{j=1}^{L}\omega_{j}(x)\mathcal{K}^{j}(x_{r}^{i};% \theta^{j})\right]\right)^{2},= divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG [ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) caligraphic_K start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_θ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) ] + caligraphic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) caligraphic_K start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_θ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,
data(θ)subscript𝑑𝑎𝑡𝑎𝜃\displaystyle\mathcal{L}_{data}(\theta)caligraphic_L start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ( italic_θ ) =1Ndatai=1Ndata(j=1Lωj(x)𝒦j(xdatai;θj)f(xdatai))2.absent1subscript𝑁𝑑𝑎𝑡𝑎superscriptsubscript𝑖1subscript𝑁𝑑𝑎𝑡𝑎superscriptsuperscriptsubscript𝑗1𝐿subscript𝜔𝑗𝑥superscript𝒦𝑗superscriptsubscript𝑥𝑑𝑎𝑡𝑎𝑖superscript𝜃𝑗𝑓superscriptsubscript𝑥𝑑𝑎𝑡𝑎𝑖2\displaystyle=\frac{1}{N_{data}}\sum_{i=1}^{N_{data}}\left(\sum_{j=1}^{L}% \omega_{j}(x)\mathcal{K}^{j}(x_{data}^{i};\theta^{j})-f(x_{data}^{i})\right)^{% 2}.= divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) caligraphic_K start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_θ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Here, θ={θj}j=1L𝜃superscriptsubscriptsuperscript𝜃𝑗𝑗1𝐿\theta=\{\theta^{j}\}_{j=1}^{L}italic_θ = { italic_θ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT is the set of trainable parameters, and the loss function \mathcal{L}caligraphic_L contains both a term for data-driven training datasubscript𝑑𝑎𝑡𝑎\mathcal{L}_{data}caligraphic_L start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT and terms for physics-informed training icsubscript𝑖𝑐\mathcal{L}_{ic}caligraphic_L start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT, bcsubscript𝑏𝑐\mathcal{L}_{bc}caligraphic_L start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT, and rsubscript𝑟\mathcal{L}_{r}caligraphic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. In this way, FBKANs are adaptable to given problem characteristics.

For FBKANs, the grid of each local KAN 𝒦jsuperscript𝒦𝑗\mathcal{K}^{j}caligraphic_K start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT is set separately on the corresponding subdomain ΩjsuperscriptΩ𝑗\Omega^{j}roman_Ω start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT. To do so, we densely sample points xΩ𝑥Ωx\in\Omegaitalic_x ∈ roman_Ω and compute the partition of unity function ωj(x)subscript𝜔𝑗𝑥\omega_{j}(x)italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ). We then take the boundaries of each subdomain based on the partition of unity functions as

ajsuperscript𝑎𝑗\displaystyle a^{j}italic_a start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT =minx s.t. ωj(x)>0.0001x,absentsubscript𝑥 s.t. superscript𝜔𝑗𝑥0.0001𝑥\displaystyle=\min_{x\text{ s.t. }\omega^{j}(x)>0.0001}x,= roman_min start_POSTSUBSCRIPT italic_x s.t. italic_ω start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x ) > 0.0001 end_POSTSUBSCRIPT italic_x ,
bjsuperscript𝑏𝑗\displaystyle b^{j}italic_b start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT =maxx s.t. ωj(x)>0.0001x.absentsubscript𝑥 s.t. superscript𝜔𝑗𝑥0.0001𝑥\displaystyle=\max_{x\text{ s.t. }\omega^{j}(x)>0.0001}x.= roman_max start_POSTSUBSCRIPT italic_x s.t. italic_ω start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x ) > 0.0001 end_POSTSUBSCRIPT italic_x .

This gives a custom grid fit to each subdomain.

We evaluate the models using the relative 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error, calculated by

f(x)j=1Lωj(x)𝒦j(x;θj)2f(x)2.subscriptnorm𝑓𝑥superscriptsubscript𝑗1𝐿subscript𝜔𝑗𝑥superscript𝒦𝑗𝑥superscript𝜃𝑗2subscriptnorm𝑓𝑥2\frac{||f(x)-\sum_{j=1}^{L}\omega_{j}(x)\mathcal{K}^{j}(x;\theta^{j})||_{2}}{|% |f(x)||_{2}}.divide start_ARG | | italic_f ( italic_x ) - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) caligraphic_K start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x ; italic_θ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG | | italic_f ( italic_x ) | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG .

Our implementation is based on Jax [65], using the Jax-KAN package [66] for the KAN implementation. For simplicity, we keep the hyperparameters of each KAN on each subdomain fixed, including the width and depth, reducing the total number of hyperparameters. However, in general, the parameters for each individual KAN 𝒦jsuperscript𝒦𝑗\mathcal{K}^{j}caligraphic_K start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT could be chosen differently.

3 Data-driven results

In this section, we consider data-driven problems, which are trained without physics meaning that λic=λbc=λr=0subscript𝜆𝑖𝑐subscript𝜆𝑏𝑐subscript𝜆𝑟0\lambda_{ic}=\lambda_{bc}=\lambda_{r}=0italic_λ start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = 0. These results specifically illustrate how FBKANs can improve predictions for noisy data and multiscale oscillations.

3.1 Test 1

In this section, we consider a toy problem

f(x)=exp[sin(0.3πx2)],𝑓𝑥0.3𝜋superscript𝑥2f(x)=\exp[\sin(0.3\pi x^{2})],italic_f ( italic_x ) = roman_exp [ roman_sin ( 0.3 italic_π italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ] , (13)

on x[0,8]𝑥08x\in[0,8]italic_x ∈ [ 0 , 8 ]. This problem is designed to test both the scaling of FBKANs as we increase the number of subdomains, as well as the impact of noisy data on training FBKANs.

3.1.1 Scaling of FBKANs

First, we consider clean data, Ndata=1 200subscript𝑁𝑑𝑎𝑡𝑎1200N_{data}=1\,200italic_N start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT = 1 200, which is sampled uniformly in x[0,8].𝑥08x\in[0,8].italic_x ∈ [ 0 , 8 ] . We present results for training on L=1,2,4,8,16,𝐿124816L=1,2,4,8,16,italic_L = 1 , 2 , 4 , 8 , 16 , and 32323232 subdomains. Note that the function described in eq. 13 is highly oscillatory, which makes it difficult to capture with a small KAN. As shown in fig. 4, increasing the number of subdomains from L=2𝐿2L=2italic_L = 2 to L=8𝐿8L=8italic_L = 8 and L=32𝐿32L=32italic_L = 32 significantly decreases the pointwise error in the FBKAN predictions. Doubling the number of subdomains doubles the number of trainable parameters of the FBKAN model, increasing its expressivity. However, we observe approximately first order convergence of the relative 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error for a large number of subdomains in fig. 4.

Refer to caption
Figure 4: Results with N=2,8,𝑁28N=2,8,italic_N = 2 , 8 , and 32323232 subdomains for eq. 13. (a) Training data and plot of exact f(x)𝑓𝑥f(x)italic_f ( italic_x ). (b) Loss curves (eq. 12). (c) Plot of the outputs f(x)𝑓𝑥f(x)italic_f ( italic_x ) and j=1Lωj(x)𝒦j(x)superscriptsubscript𝑗1𝐿subscript𝜔𝑗𝑥superscript𝒦𝑗𝑥\sum_{j=1}^{L}\omega_{j}(x)\mathcal{K}^{j}(x)∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) caligraphic_K start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x ). (d) Pointwise errors f(x)j=1Lωj(x)𝒦j(x)𝑓𝑥superscriptsubscript𝑗1𝐿subscript𝜔𝑗𝑥superscript𝒦𝑗𝑥f(x)-\sum_{j=1}^{L}\omega_{j}(x)\mathcal{K}^{j}(x)italic_f ( italic_x ) - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) caligraphic_K start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x ). (e) Scaling results for Test 1 with L𝐿Litalic_L subdomains.

3.1.2 FBKANs with noisy data

One strength of KANs over MLPs is their increased accuracy for noisy data [4]. We now test FBKANs on noisy training data using four subdomains (L=4𝐿4L=4italic_L = 4). Therefore, we sample 600 training points from a uniform distribution in [0,8]08[0,8][ 0 , 8 ] and evaluate f(x)𝑓𝑥f(x)italic_f ( italic_x ). Then, we add Gaussian white noise with zero mean and varying magnitude to f(x)𝑓𝑥f(x)italic_f ( italic_x ), up to an average relative magnitude of 20%percent2020\%20 %. This becomes the training set. The method is then tested on 1 000 evenly spaced points in [0,8]08[0,8][ 0 , 8 ] without noise.

The training set and results are shown in fig. 5. In the noisiest case, with relative noise of 18.1% added to the training set, the KAN yields a relative 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error of 0.1404, whereas the FBKAN yields a relative error of 0.0646. For all noise levels tested, the FBKAN consistently has a lower relative error than the plain KAN in fig. 5, and the predictions are robust to noisy data.

Refer to caption
Figure 5: Results for Test 1 with noisy training data; cf. eq. 13. (a) Example training data and plot of exact f(x)𝑓𝑥f(x)italic_f ( italic_x ) with 9.6%percent9.69.6\%9.6 % mean relative noise. (b) Loss curves (eq. 12) for an example training with 9.6%percent9.69.6\%9.6 % mean relative noise. (c) Plot of the outputs f(x)𝑓𝑥f(x)italic_f ( italic_x ) and j=1Lωj(x)𝒦j(x)superscriptsubscript𝑗1𝐿subscript𝜔𝑗𝑥superscript𝒦𝑗𝑥\sum_{j=1}^{L}\omega_{j}(x)\mathcal{K}^{j}(x)∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) caligraphic_K start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x ) with 9.6%percent9.69.6\%9.6 % mean relative noise. (d) Pointwise errors f(x)j=1Lωj(x)𝒦j(x)𝑓𝑥superscriptsubscript𝑗1𝐿subscript𝜔𝑗𝑥superscript𝒦𝑗𝑥f(x)-\sum_{j=1}^{L}\omega_{j}(x)\mathcal{K}^{j}(x)italic_f ( italic_x ) - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) caligraphic_K start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x ) with 9.6%percent9.69.6\%9.6 % mean relative noise. (e) Relative 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error of the KANs and FBKANs with respect to the magnitude of the noise added to the training data.

3.2 Test 2

We now consider the following equation

f(x,y)=sin(6πx2)sin(8πy2),𝑓𝑥𝑦6𝜋superscript𝑥28𝜋superscript𝑦2f(x,y)=\sin(6\pi x^{2})\sin(8\pi y^{2}),italic_f ( italic_x , italic_y ) = roman_sin ( 6 italic_π italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) roman_sin ( 8 italic_π italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (14)

for (x,y)[0,1]×[0,1].𝑥𝑦0101(x,y)\in[0,1]\times[0,1].( italic_x , italic_y ) ∈ [ 0 , 1 ] × [ 0 , 1 ] . This data-driven example exhibits fine-scale oscillations. We test two cases: (1) a KAN and FBKAN with a fixed grid, denoted by KAN-1/FBKAN-1 and (2) a KAN and FBKAN with grid extension, denoted by KAN-2/FBKAN-2. The FBKAN has L=4𝐿4L=4italic_L = 4 subdomains. The training set is composed of 10 000 points randomly sampled from a uniform distribution.

In both cases, we begin training with g=5𝑔5g=5italic_g = 5 grid points. In the grid extension case, the grid increases every 600 iterations as listed in table 4, and the learning rate drops by 20%percent2020\%20 % each time the grid is updated. As can be observed in fig. 6, the final training loss for FBKAN-1 with a fixed grid is approximately the same as the training loss for KAN-2 with the grid extension approach, even though each FBKAN network has six times fewer grid points. This is also reflected in the relative errors reported in table 1. Comparing the KAN-1 and FBKAN-1 models in fig. 7, KAN-1 struggles to capture the fine-scale features of the data accuratelyand has a larger relative error. As can be seen in fig. 8, KAN-2 and FBKAN-2 both outperform their counterparts with a static grid, but FBKAN-2 is better able to capture the data than KAN-2.

Refer to caption
Figure 6: Training loss curves for Test 2.
Name Grid type Relative error
KAN-1 Fixed grid 2.36×1012.36superscript1012.36\times 10^{-1}2.36 × 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT
FBKAN-1 Fixed grid 7.43×1027.43superscript1027.43\times 10^{-2}7.43 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT
KAN-2 Grid extension 8.10×1028.10superscript1028.10\times 10^{-2}8.10 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT
FBKAN-2 Grid extension 2.27×1022.27superscript1022.27\times 10^{-2}2.27 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT
Table 1: Relative 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT errors for physics-informed Test 2.
Refer to caption
Figure 7: Reference solution (left), predicted solutions (middle) and relative errors (right) for KAN-1 and FBKAN-1 for Test 2. Note that the error plots have different color scales.
Refer to caption
Figure 8: Reference solution (left), predicted solutions (middle) and relative errors (right) for KAN-2 and FBKAN-2 for Test 2. Note that the error plots have different color scales.

4 Physics-informed results

4.1 Test 3

As the first physics-informed example, we consider the following one-dimensional problem with multiscale features:

dfdx𝑑𝑓𝑑𝑥\displaystyle\frac{df}{dx}divide start_ARG italic_d italic_f end_ARG start_ARG italic_d italic_x end_ARG =4cos(4x)+40cos(40x),absent44𝑥4040𝑥\displaystyle=4\cos(4x)+40\cos(40x),= 4 roman_cos ( 4 italic_x ) + 40 roman_cos ( 40 italic_x ) ,
f(0)𝑓0\displaystyle f(0)italic_f ( 0 ) =0,absent0\displaystyle=0,= 0 ,

on x[4,4]𝑥44x\in[-4,4]italic_x ∈ [ - 4 , 4 ] with exact solution

f(x)=sin(4x)+sin(40x).𝑓𝑥4𝑥40𝑥f(x)=\sin(4x)+\sin(40x).italic_f ( italic_x ) = roman_sin ( 4 italic_x ) + roman_sin ( 40 italic_x ) . (15)

We sample 400 collocation points from a uniform distribution (Nr=400subscript𝑁𝑟400N_{r}=400italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = 400) at every training iteration. The grid starts with g=5𝑔5g=5italic_g = 5 and increases by 5555 every 1 000 iterations. The KAN reaches a relative 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error of 0.2407 with gfinal=20subscript𝑔𝑓𝑖𝑛𝑎𝑙20g_{final}=20italic_g start_POSTSUBSCRIPT italic_f italic_i italic_n italic_a italic_l end_POSTSUBSCRIPT = 20, whereas the FBKAN for four subdomains (L=4𝐿4L=4italic_L = 4) reaches a relative 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error of 0.0898 for gfinal=20subscript𝑔𝑓𝑖𝑛𝑎𝑙20g_{final}=20italic_g start_POSTSUBSCRIPT italic_f italic_i italic_n italic_a italic_l end_POSTSUBSCRIPT = 20. For eight subdomains (L=8𝐿8L=8italic_L = 8), the FBKAN reaches a relative error of 0.0369 with gfinal=15subscript𝑔𝑓𝑖𝑛𝑎𝑙15g_{final}=15italic_g start_POSTSUBSCRIPT italic_f italic_i italic_n italic_a italic_l end_POSTSUBSCRIPT = 15 and 0.0662 with gfinal=10subscript𝑔𝑓𝑖𝑛𝑎𝑙10g_{final}=10italic_g start_POSTSUBSCRIPT italic_f italic_i italic_n italic_a italic_l end_POSTSUBSCRIPT = 10.

Refer to caption
Figure 9: Results for Test 3 with L=8𝐿8L=8italic_L = 8. The FBKAN clearly outperforms the KAN, reaching a lower loss with fewer grid points.

4.2 Test 4

We now test the Helmholtz equation

2fy2+2fx2+kh2fq(x,y)superscript2𝑓superscript𝑦2superscript2𝑓superscript𝑥2superscriptsubscript𝑘2𝑓𝑞𝑥𝑦\displaystyle\frac{\partial^{2}f}{\partial y^{2}}+\frac{\partial^{2}f}{% \partial x^{2}}+k_{h}^{2}f-q(x,y)divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f end_ARG start_ARG ∂ italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_k start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f - italic_q ( italic_x , italic_y ) =0,(x,y)[1,1]×[1,1],formulae-sequenceabsent0𝑥𝑦1111\displaystyle=0,\ (x,y)\in[-1,1]\times[-1,1],= 0 , ( italic_x , italic_y ) ∈ [ - 1 , 1 ] × [ - 1 , 1 ] ,
f(1,y)=f(1,y)𝑓1𝑦𝑓1𝑦\displaystyle f(-1,y)=f(1,y)italic_f ( - 1 , italic_y ) = italic_f ( 1 , italic_y ) =0,y[1,1],formulae-sequenceabsent0𝑦11\displaystyle=0,\ y\in[-1,1],= 0 , italic_y ∈ [ - 1 , 1 ] ,
f(x,1)=f(x,1)𝑓𝑥1𝑓𝑥1\displaystyle f(x,-1)=f(x,1)italic_f ( italic_x , - 1 ) = italic_f ( italic_x , 1 ) =0,x[1,1],formulae-sequenceabsent0𝑥11\displaystyle=0,\ x\in[-1,1],= 0 , italic_x ∈ [ - 1 , 1 ] ,

with

q(x,y)=(a1π)2sin(a1πx)sin(a2πy)(a2π)2sin(a1πx)sin(a2πy)+khsin(a1πx)sin(a2πy).𝑞𝑥𝑦superscriptsubscript𝑎1𝜋2subscript𝑎1𝜋𝑥subscript𝑎2𝜋𝑦superscriptsubscript𝑎2𝜋2subscript𝑎1𝜋𝑥subscript𝑎2𝜋𝑦subscript𝑘subscript𝑎1𝜋𝑥subscript𝑎2𝜋𝑦q(x,y)=-(a_{1}\pi)^{2}\sin(a_{1}\pi x)\sin(a_{2}\pi y)-(a_{2}\pi)^{2}\sin(a_{1% }\pi x)\sin(a_{2}\pi y)+k_{h}\sin(a_{1}\pi x)\sin(a_{2}\pi y).italic_q ( italic_x , italic_y ) = - ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_π ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_sin ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_π italic_x ) roman_sin ( italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_π italic_y ) - ( italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_π ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_sin ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_π italic_x ) roman_sin ( italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_π italic_y ) + italic_k start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT roman_sin ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_π italic_x ) roman_sin ( italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_π italic_y ) .

In our tests, we vary a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and a2subscript𝑎2a_{2}italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

For each choice of (a1,a2)subscript𝑎1subscript𝑎2(a_{1},a_{2})( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) we consider three training cases. In the first case, we use higher-order splines with k=5𝑘5k=5italic_k = 5 and a fixed grid with g=5𝑔5g=5italic_g = 5 (denoted by KAN-1/FBKAN-1). In the second case, we take k=3𝑘3k=3italic_k = 3 and use grid extension to increase g𝑔gitalic_g by 5555 every 600600600600 iterations, starting with g=5𝑔5g=5italic_g = 5 (denoted by KAN-2/FBKAN-2). We use one hidden layer with width 10 for both cases. Additionally, we test a third case, denoted by KAN-3/FBKAN-3, where we consider the same hyperparameters as KAN-1/FBKAN-1 but with width 5 in the hidden layer. The relative 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT errors are reported in table 2. In all cases, the FBKAN outperforms the corresponding KAN. The grid extension KAN cases, where we increase the grid size over the training, comes at the expense of increasing the computational time by a factor of 1.4-1.5 in our tests. For a1=1,a2=4formulae-sequencesubscript𝑎11subscript𝑎24a_{1}=1,a_{2}=4italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 4, the smaller FBKAN-3 network is sufficient to represent the solution accurately. However, for larger values of a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and a2subscript𝑎2a_{2}italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, the larger network in FBKAN-1 is necessary. The results for KAN-1/FBKAN-1 are shown in figs. 10 and 11 for a1=1,a2=4formulae-sequencesubscript𝑎11subscript𝑎24a_{1}=1,a_{2}=4italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 4 and a1=4,a2=4formulae-sequencesubscript𝑎14subscript𝑎24a_{1}=4,a_{2}=4italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 4 , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 4, respectively. In fig. 12, we consider a1=a2=6subscript𝑎1subscript𝑎26a_{1}=a_{2}=6italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 6 with L=4,9𝐿49L=4,9italic_L = 4 , 9 and 16161616, demonstrating the further refinement possible with additional subdomains.

a1=1subscript𝑎11a_{1}=1italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1, a2=4subscript𝑎24a_{2}=4italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 4 a1=4subscript𝑎14a_{1}=4italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 4, a2=4subscript𝑎24a_{2}=4italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 4 a1=6subscript𝑎16a_{1}=6italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 6, a2=6subscript𝑎26a_{2}=6italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 6
KAN-1 0.0259 0.5465 1.1254
FBKAN-1, L=4 0.0102 0.0267 0.1151
FBKAN-1, L=9 0.0213 0.0239 0.0399
FBKAN-1, L=16 0.0037 0.0128 0.0321
KAN-2 0.0180 0.2045 0.5854
FBKAN-2 0.0112 0.0427 0.2272
KAN-3 0.3771 0.5488 1.2825
FBKAN-3 0.0214 0.2760 0.9797
Table 2: Relative 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT errors for physics-informed Test 3.
Refer to caption
Figure 10: Results for Test 4 with KAN-1 and FBKAN-1, and a1=1,a2=4formulae-sequencesubscript𝑎11subscript𝑎24a_{1}=1,a_{2}=4italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 4, L=4𝐿4L=4italic_L = 4. The solution along the line x=0𝑥0x=0italic_x = 0 is given in the bottom left subfigure. Note that the error plots have different color scales.
Refer to caption
Figure 11: Results for Test 4 with KAN-1 and FBKAN-1, and a1=4,a2=4formulae-sequencesubscript𝑎14subscript𝑎24a_{1}=4,a_{2}=4italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 4 , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 4, L=4𝐿4L=4italic_L = 4. The solution along the line x=0𝑥0x=0italic_x = 0 is given in the bottom left subfigure. Note that the error plots have different color scales.
Refer to caption
Figure 12: Results for Test 4 with KAN-1 and FBKAN-1 and a1=a2=6subscript𝑎1subscript𝑎26a_{1}=a_{2}=6italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 6, with L=4𝐿4L=4italic_L = 4 and L=16𝐿16L=16italic_L = 16.The solution along the line x=0𝑥0x=0italic_x = 0 is given in the middle left subfigure. Note that the error plots have different color scales.

4.3 Test 5

Finally, we consider the wave equation

2ft2c22fx2superscript2𝑓superscript𝑡2superscript𝑐2superscript2𝑓superscript𝑥2\displaystyle\frac{\partial^{2}f}{\partial t^{2}}-c^{2}\frac{\partial^{2}f}{% \partial x^{2}}divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f end_ARG start_ARG ∂ italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG =0,(x,t)[0,1]×[0,1],formulae-sequenceabsent0𝑥𝑡0101\displaystyle=0,\ (x,t)\in[0,1]\times[0,1],= 0 , ( italic_x , italic_t ) ∈ [ 0 , 1 ] × [ 0 , 1 ] ,
f(0,t)𝑓0𝑡\displaystyle f(0,t)italic_f ( 0 , italic_t ) =0,t[0,1],formulae-sequenceabsent0𝑡01\displaystyle=0,\ t\in[0,1],= 0 , italic_t ∈ [ 0 , 1 ] ,
f(1,t)𝑓1𝑡\displaystyle f(1,t)italic_f ( 1 , italic_t ) =0,t[0,1],formulae-sequenceabsent0𝑡01\displaystyle=0,\ t\in[0,1],= 0 , italic_t ∈ [ 0 , 1 ] ,
f(x,0)𝑓𝑥0\displaystyle f(x,0)italic_f ( italic_x , 0 ) =sin(πx)+0.5sin(4πx),x[0,1],formulae-sequenceabsent𝜋𝑥0.54𝜋𝑥𝑥01\displaystyle=\sin(\pi x)+0.5\sin(4\pi x),\ x\in[0,1],= roman_sin ( italic_π italic_x ) + 0.5 roman_sin ( 4 italic_π italic_x ) , italic_x ∈ [ 0 , 1 ] ,
ft(x,0)subscript𝑓𝑡𝑥0\displaystyle f_{t}(x,0)italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x , 0 ) =0,x[0,1],formulae-sequenceabsent0𝑥01\displaystyle=0,\ x\in[0,1],= 0 , italic_x ∈ [ 0 , 1 ] ,

which has the exact solution

f(x,t)=sin(πx)cos(cπt)+0.5sin(4πx)cos(4cπt).𝑓𝑥𝑡𝜋𝑥𝑐𝜋𝑡0.54𝜋𝑥4𝑐𝜋𝑡f(x,t)=\sin(\pi x)\cos(c\pi t)+0.5\sin(4\pi x)\cos(4c\pi t).italic_f ( italic_x , italic_t ) = roman_sin ( italic_π italic_x ) roman_cos ( italic_c italic_π italic_t ) + 0.5 roman_sin ( 4 italic_π italic_x ) roman_cos ( 4 italic_c italic_π italic_t ) .

We first consider c=2𝑐2c=\sqrt{2}italic_c = square-root start_ARG 2 end_ARG. The KAN has a relative 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error of 0.1402, and the FBKAN with L=4𝐿4L=4italic_L = 4 has a relative 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error of 0.0153, as illustrated in fig. 13. We then consider the harder case with c=2𝑐2c=2italic_c = 2, shown in fig. 14. The KAN has a relative 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error of 0.1778 and the FBKAN with L=4𝐿4L=4italic_L = 4 has a relative 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error of 0.0587.

Refer to caption
Figure 13: Results for Test 5 with c=2𝑐2c=\sqrt{2}italic_c = square-root start_ARG 2 end_ARG and L=4𝐿4L=4italic_L = 4. Note that the error plots have different color scales.
Refer to caption
Figure 14: Results for Test 5 with c=2𝑐2c=2italic_c = 2 and L=4.𝐿4L=4.italic_L = 4 . Note that the error plots have different color scales.

5 Conclusions

We have developed domain decomposition-based KAN models for data-driven and physics-informed training with KANs; in accordance with FBPINNs, we denote them as FBKANs. FBKANs are scalable to complex problems and have a strong advantage over other domain decomposition-based approaches in that they do not require enforcement of transmission conditions between the subdomains via the loss function. They allow accurate training using an ensemble of small KANs combined using partition of unity functions, instead of a single large network.

One advantage of FBKANs is that they can be combined with existing techniques to improve the training of KANs and PI-KANs, including residual-based attention weights as introduced in [7], cKANs [13], deep operator KANs [6], and others. These methods would be directly compatible with the FBKAN framework. In future work, we will further examine the scalability of FBKANs and consider their application to higher dimensional problems. We will also consider multilevel FBKANs, following the multilevel FBPINNs [43] approach. Multilevel FBPINNs show improvement over FBPINNs for a large number of subdomains by providing a mechanism for global communication between subdomains. Multilevel FBKANs could offer a similar advantage, allowing robust training with an increasing number of subdomains.

6 Code and data availability

All code, trained models, and data required to replicate the examples presented in this paper will be released upon publication. Meanwhile, we have released code and Google Colab tutorials for FBKANs in Neuromancer [67] at https://github.com/pnnl/neuromancer/tree/feature/fbkans/examples/KANs, for the reader to explore the ideas implemented in this work.

7 Acknowledgements

The KAN diagrams in fig. 1 were developed with pykan [4]. This project was completed with support from the U.S. Department of Energy, Advanced Scientific Computing Research program, under the Scalable, Efficient and Accelerated Causal Reasoning Operators, Graphs and Spikes for Earth and Embedded Systems (SEA-CROGS) project (Project No. 80278) and under the Uncertainty Quantification for Multifidelity Operator Learning (MOLUcQ) project (Project No. 81739). The computational work was performed using PNNL Institutional Computing at Pacific Northwest National Laboratory. Pacific Northwest National Laboratory (PNNL) is a multi-program national laboratory operated for the U.S. Department of Energy (DOE) by Battelle Memorial Institute under Contract No. DE-AC05-76RL01830.

References

  • [1] George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021.
  • [2] Nathan Baker, Frank Alexander, Timo Bremer, Aric Hagberg, Yannis Kevrekidis, Habib Najm, Manish Parashar, Abani Patra, James Sethian, Stefan Wild, et al. Workshop report on basic research needs for scientific machine learning: Core technologies for artificial intelligence. Technical report, USDOE Office of Science (SC), Washington, DC (United States), 2019.
  • [3] Jonathan Carter, John Feddema, Doug Kothe, Rob Neely, Jason Pruet, Rick Stevens, Prasanna Balaprakash, Pete Beckman, Ian Foster, Kamil Iskra, et al. Advanced research directions on ai for science, energy, and security: Report on summer 2022 workshops. 2023.
  • [4] Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljačić, Thomas Y Hou, and Max Tegmark. Kan: Kolmogorov-arnold networks. arXiv preprint arXiv:2404.19756, 2024.
  • [5] Alireza Afzal Aghaei. fkan: Fractional kolmogorov-arnold networks with trainable jacobi basis functions. arXiv preprint arXiv:2406.07456, 2024.
  • [6] Diab W Abueidda, Panos Pantidis, and Mostafa E Mobasher. Deepokan: Deep operator network based on kolmogorov arnold networks for mechanics problems. arXiv preprint arXiv:2405.19143, 2024.
  • [7] Khemraj Shukla, Juan Diego Toscano, Zhicheng Wang, Zongren Zou, and George Em Karniadakis. A comprehensive and fair comparison between mlp and kan representations for differential equations and operator networks. arXiv preprint arXiv:2406.02917, 2024.
  • [8] Yizheng Wang, Jia Sun, **shuai Bai, Cosmin Anitescu, Mohammad Sadegh Eshaghi, Xiaoying Zhuang, Timon Rabczuk, and Yinghua Liu. Kolmogorov arnold informed neural network: A physics-informed deep learning framework for solving pdes based on kolmogorov arnold networks. arXiv preprint arXiv:2406.11045, 2024.
  • [9] Remi Genet and Hugo Inzirillo. Tkan: Temporal kolmogorov-arnold networks. arXiv preprint arXiv:2405.07344, 2024.
  • [10] Mehrdad Kiamari, Mohammad Kiamari, and Bhaskar Krishnamachari. Gkan: Graph kolmogorov-arnold networks. arXiv preprint arXiv:2406.06470, 2024.
  • [11] Gianluca De Carlo, Andrea Mastropietro, and Aris Anagnostopoulos. Kolmogorov-arnold graph neural networks, 2024.
  • [12] Roman Bresson, Giannis Nikolentzos, George Panagopoulos, Michail Chatzianastasis, Jun Pang, and Michalis Vazirgiannis. Kagnns: Kolmogorov-arnold networks meet graph learning, 2024.
  • [13] Sidharth SS. Chebyshev polynomial-based kolmogorov-arnold networks: An efficient architecture for nonlinear function approximation. arXiv preprint arXiv:2405.07200, 2024.
  • [14] Alexander Dylan Bodner, Antonio Santiago Tepsich, Jack Natan Spolski, and Santiago Pourteau. Convolutional kolmogorov-arnold networks. arXiv preprint arXiv:2406.13155, 2024.
  • [15] Qi Qiu, Tao Zhu, Helin Gong, Liming Chen, and Huansheng Ning. Relu-kan: New kolmogorov-arnold networks that only need matrix addition, dot multiplication, and relu. arXiv preprint arXiv:2406.02075, 2024.
  • [16] Minjong Cheon. Kolmogorov-arnold network for satellite image classification in remote sensing. arXiv preprint arXiv:2406.00600, 2024.
  • [17] Zhao**g Huang, Jiashuo Cui, Le** Yu, Luis Fernando Herbozo Contreras, and Omid Kavehei. Abnormality detection in time-series bio-signals using kolmogorov-arnold networks for resource-constrained devices. medRxiv, pages 2024–06, 2024.
  • [18] Basim Azam and Naveed Akhtar. Suitability of kans for computer vision: A preliminary investigation. arXiv preprint arXiv:2406.09087, 2024.
  • [19] Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
  • [20] Lu Lu, Pengzhan **, and George Em Karniadakis. DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv preprint arXiv:1910.03193, 2019.
  • [21] Shengze Cai, Zhi** Mao, Zhicheng Wang, Minglang Yin, and George Em Karniadakis. Physics-informed neural networks (pinns) for fluid mechanics: A review. Acta Mechanica Sinica, 37(12):1727–1738, 2021.
  • [22] Xiaowei **, Shengze Cai, Hui Li, and George Em Karniadakis. NSFnets (Navier-Stokes flow nets): Physics-informed neural networks for the incompressible Navier-Stokes equations. Journal of Computational Physics, 426:109951, 2021.
  • [23] Maziar Raissi, Alireza Yazdani, and George Em Karniadakis. Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations. Science, 367(6481):1026–1030, 2020.
  • [24] Mohammadamin Mahmoudabadbozchelou, George Em Karniadakis, and Safa Jamali. nn-pinns: Non-newtonian physics-informed neural networks for complex fluid modeling. Soft Matter, 18(1):172–185, 2022.
  • [25] Muhammad M Almajid and Moataz O Abu-Al-Saud. Prediction of porous media fluid flow using physics informed neural networks. Journal of Petroleum Science and Engineering, 208:109205, 2022.
  • [26] Wenqian Chen, Yucheng Fu, and Panos Stinis. Physics-informed machine learning of redox flow battery based on a two-dimensional unit cell model. Journal of Power Sources, 584:233548, 2023.
  • [27] George S Misyris, Andreas Venzke, and Spyros Chatzivasileiadis. Physics-informed neural networks for power systems. In 2020 IEEE Power & Energy Society General Meeting (PESGM), pages 1–5. IEEE, 2020.
  • [28] Bin Huang and Jianhui Wang. Applications of physics-informed neural networks in power systems-a review. IEEE Transactions on Power Systems, 38(1):572–588, 2022.
  • [29] Christian Moya and Guang Lin. DAE-PINN: a physics-informed neural network model for simulating differential algebraic equations with application to power networks. Neural Computing and Applications, 35(5):3789–3804, 2023.
  • [30] Murilo EC Bento. Physics-guided neural network for load margin assessment of power systems. IEEE Transactions on Power Systems, 2023.
  • [31] Franz Martin Rohrhofer, Stefan Posch, Clemens Gößnitzer, and Bernhard Geiger. On the role of fixed points of dynamical systems in training physics-informed neural networks. Transactions on Machine Learning Research, 2023(1):490, 2023.
  • [32] Sifan Wang, Xinling Yu, and Paris Perdikaris. When and why PINNs fail to train: A neural tangent kernel perspective. Journal of Computational Physics, 449:110768, 2022.
  • [33] Ameya D Jagtap and George Em Karniadakis. Extended physics-informed neural networks (xpinns): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. Communications in Computational Physics, 28(5), 2020.
  • [34] Michael Penwarden, Ameya D Jagtap, Shandian Zhe, George Em Karniadakis, and Robert M Kirby. A unified scalable framework for causal swee** strategies for physics-informed neural networks (PINNs) and their temporal decompositions. arXiv preprint arXiv:2302.14227, 2023.
  • [35] Revanth Mattey and Susanta Ghosh. A novel sequential method to train physics informed neural networks for Allen Cahn and Cahn Hilliard equations. Computer Methods in Applied Mechanics and Engineering, 390:114474, 2022.
  • [36] Colby L Wight and Jia Zhao. Solving Allen-Cahn and Cahn-Hilliard equations using the adaptive physics informed neural networks. arXiv preprint arXiv:2007.04542, 2020.
  • [37] Ben Moseley, Andrew Markham, and Tarje Nissen-Meyer. Finite basis physics-informed neural networks (FBPINNs): a scalable domain decomposition approach for solving differential equations. Advances in Computational Mathematics, 49(4):62, 2023.
  • [38] Alexander Heinlein, Axel Klawonn, Martin Lanser, and Janine Weber. Combining machine learning and domain decomposition methods for the solution of partial differential equations—a review. GAMM-Mitteilungen, 44(1):Paper No. e202100001, 28, 2021.
  • [39] Axel Klawonn, Martin Lanser, and Janine Weber. Machine learning and domain decomposition methods – a survey, December 2023. arXiv:2312.14050 [cs, math].
  • [40] Khemraj Shukla, Ameya D Jagtap, and George Em Karniadakis. Parallel physics-informed neural networks via domain decomposition. Journal of Computational Physics, 447:110683, 2021.
  • [41] Zheyuan Hu, Ameya D Jagtap, George Em Karniadakis, and Kenji Kawaguchi. When do extended physics-informed neural networks (xpinns) improve generalization? arXiv preprint arXiv:2109.09444, 2021.
  • [42] Victorita Dolean, Alexander Heinlein, Siddhartha Mishra, and Ben Moseley. Finite Basis Physics-Informed Neural Networks as a Schwarz Domain Decomposition Method. In Zdeněk Dostál, Tomáš Kozubek, Axel Klawonn, Ulrich Langer, Luca F. Pavarino, Jakub Šístek, and Olof B. Widlund, editors, Domain Decomposition Methods in Science and Engineering XXVII, pages 165–172, Cham, 2024. Springer Nature Switzerland.
  • [43] Victorita Dolean, Alexander Heinlein, Siddhartha Mishra, and Ben Moseley. Multilevel domain decomposition-based architectures for physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 429:117116, September 2024.
  • [44] Alexander Heinlein, Amanda A Howard, Damien Beecroft, and Panos Stinis. Multifidelity domain decomposition-based physics-informed neural networks for time-dependent problems. arXiv preprint arXiv:2401.07888, 2024.
  • [45] Levi McClenny and Ulisses Braga-Neto. Self-adaptive physics-informed neural networks using a soft attention mechanism. arXiv preprint arXiv:2009.04544, 2020.
  • [46] Sokratis J Anagnostopoulos, Juan Diego Toscano, Nikolaos Stergiopulos, and George Em Karniadakis. Residual-based attention in physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 421:116805, 2024.
  • [47] Chenxi Wu, Min Zhu, Qinyang Tan, Yadhu Kartha, and Lu Lu. A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 403:115671, 2023.
  • [48] Zhi** Mao and Xuhui Meng. Physics-informed neural networks with residual/gradient-based adaptive sampling methods for solving partial differential equations with sharp solutions. Applied Mathematics and Mechanics, 44(7):1069–1084, 2023.
  • [49] Jie Hou, Ying Li, and Shihui Ying. Enhancing PINNs for solving PDEs via adaptive collocation point movement and adaptive loss weighting. Nonlinear Dynamics, 111(16):15233–15261, 2023.
  • [50] Gregory Kang Ruey Lau, Apivich Hemachandra, See-Kiong Ng, and Bryan Kian Hsiang Low. Pinnacle: Pinn adaptive collocation and experimental points selection. arXiv preprint arXiv:2404.07662, 2024.
  • [51] Mohammad Amin Nabian, Rini Jasmine Gladstone, and Hadi Meidani. Efficient training of physics-informed neural networks via importance sampling. Computer-Aided Civil and Infrastructure Engineering, 36(8):962–977, 2021.
  • [52] Zhiwei Gao, Liang Yan, and Tao Zhou. Failure-informed adaptive sampling for PINNs. SIAM Journal on Scientific Computing, 45(4):A1971–A1994, 2023.
  • [53] Sifan Wang, Shyam Sankaran, and Paris Perdikaris. Respecting causality is all you need for training physics-informed neural networks. arXiv preprint arXiv:2203.07404, 2022.
  • [54] Michael Penwarden, Shandian Zhe, Akil Narayan, and Robert M Kirby. Multifidelity modeling for physics-informed neural networks (PINNs). Journal of Computational Physics, 451:110844, 2022.
  • [55] Xuhui Meng and George E Karniadakis. A composite neural network that learns from multi-fidelity data: Application to function approximation and inverse PDE problems. Journal of Computational Physics, 2019.
  • [56] Amanda Howard, Yucheng Fu, and Panos Stinis. A multifidelity approach to continual learning for physical systems. Machine Learning: Science and Technology, 5(2):025042, 2024.
  • [57] Amanda A Howard, Sarah H Murphy, Shady E Ahmed, and Panos Stinis. Stacked networks improve physics-informed training: applications to neural networks and deep operator networks. arXiv preprint arXiv:2311.06483, 2023.
  • [58] Yongji Wang and Ching-Yao Lai. Multi-stage neural networks: Function approximator of machine precision. Journal of Computational Physics, page 112865, 2024.
  • [59] Mark Ainsworth and Justin Dong. Galerkin neural networks: A framework for approximating variational equations with error control. SIAM Journal on Scientific Computing, 43(4):A2474–A2501, 2021.
  • [60] Mark Ainsworth and Justin Dong. Galerkin neural network approximation of singularly-perturbed elliptic systems. Computer Methods in Applied Mechanics and Engineering, 402:115169, 2022.
  • [61] Nathaniel Trask, Amelia Henriksen, Carianne Martinez, and Eric Cyr. Hierarchical partition of unity networks: fast multilevel training. In Mathematical and Scientific Machine Learning, pages 271–286. PMLR, 2022.
  • [62] Ziad Aldirany, Régis Cottereau, Marc Laforest, and Serge Prudhomme. Multi-level neural networks for accurate solutions of boundary-value problems. arXiv preprint arXiv:2308.11503, 2023.
  • [63] Sifan Wang, Hanwen Wang, and Paris Perdikaris. Improved architectures and training algorithms for deep operator networks. Journal of Scientific Computing, 92(2):35, 2022.
  • [64] Amanda A Howard, Saad Qadeer, Andrew William Engel, Adam Tsou, Max Vargas, Tony Chiang, and Panos Stinis. The conjugate kernel for efficient training of physics-informed deep operator networks. In ICLR 2024 Workshop on AI4DifferentialEquations In Science, 2024.
  • [65] James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy programs, 2018.
  • [66] Spyros Rigas and Michalis Papachristou. jaxKAN: A JAX-based implementation of Kolmogorov-Arnold Networks, May 2024.
  • [67] Jan Drgona, Aaron Tuor, James Koch, Madelyn Shapiro, and Draguna Vrabie. NeuroMANCER: Neural Modules with Adaptive Nonlinear Constraints and Efficient Regularizations. 2023.

Appendix A Training parameters

All results in this paper are implemented in JAX [65] using the Jax-KAN [66] KAN implementation. All networks are trained with the ADAM optimizer. For all FBKANs we take the domain overlap δ=1.9𝛿1.9\delta=1.9italic_δ = 1.9.

A.1 Test 1

parameter section 3.1.1 section 3.1.2
KAN architecture [1, 5, 1] [1, 5, 1]
L𝐿Litalic_L 4 - 32 4
g𝑔gitalic_g 5 5
k𝑘kitalic_k 3 3
Learning rate 0.04 0.04
Iterations 4000 4000
Ndatasubscript𝑁𝑑𝑎𝑡𝑎N_{data}italic_N start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT 1200 600
Table 3: Hyperparameters used for the results in section 3.1.

A.2 Test 2

Parameter KAN-1/FBKAN-1 KAN-2/FBKAN-2
KAN architecture [2, 10, 1] [2, 5, 1]
L𝐿Litalic_L 4 4
g𝑔gitalic_g 5 [5, 10, 25, 30]
g𝑔gitalic_g schedule - [0, 600, 1200, 1800]
k𝑘kitalic_k 3 3
Initial learning rate 0.02 0.02
Learning rate scale - 0.8
Iterations 2400 2400
Ndatasubscript𝑁𝑑𝑎𝑡𝑎N_{data}italic_N start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT 10000 10000
Table 4: Hyperparameters used for the results in section 3.2. The grid (g𝑔gitalic_g) schedule denotes the iterations at which the grid is updated. The learning rate scale denotes the change to the learning rate at each grid update. KAN-1/FBKAN-1 use fixed grids.

A.3 Test 3

Parameter
KAN architecture [2, 10, 1]
L𝐿Litalic_L 4, 8
g𝑔gitalic_g [5, 10, 15, 20]
g𝑔gitalic_g schedule [0, 1000, 2000, 3000]
k𝑘kitalic_k 3
Initial learning rate 0.01
Learning rate scale 0.8
Iterations 4000
Nrsubscript𝑁𝑟N_{r}italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT 400
Nicsubscript𝑁𝑖𝑐N_{ic}italic_N start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT 1
λrsubscript𝜆𝑟\lambda_{r}italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT 1/40
λicsubscript𝜆𝑖𝑐\lambda_{ic}italic_λ start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT 1
Table 5: Hyperparameters used for the results in section 4.1. The grid (g𝑔gitalic_g) schedule denotes the iterations at which the grid is updated. The learning rate scale denotes the change to the learning rate at each grid update.

A.4 Test 4

Parameter KAN-1/FBKAN-1 KAN-2/FBKAN-2 KAN-3/FBKAN-3
KAN architecture [2, 10, 1] [2, 10, 1] [2, 5, 1]
L𝐿Litalic_L 4-16 4 4
g𝑔gitalic_g 5 [5, 10, 15] 5
g𝑔gitalic_g schedule - [0, 3000, 6000] -
k𝑘kitalic_k 5 3 5
Initial learning rate 0.005 0.005 0.005
Learning rate scale - 0.8 -
Iterations a1=1,a2=4formulae-sequencesubscript𝑎11subscript𝑎24a_{1}=1,a_{2}=4italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 4: 10000 a1=4,a2=4formulae-sequencesubscript𝑎14subscript𝑎24a_{1}=4,a_{2}=4italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 4 , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 4: 10000 a1=6,a2=6formulae-sequencesubscript𝑎16subscript𝑎26a_{1}=6,a_{2}=6italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 6 , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 6: 30000 10000 10000
Nrsubscript𝑁𝑟N_{r}italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT 800 800 800
Nbcsubscript𝑁𝑏𝑐N_{bc}italic_N start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT 400 400 400
λrsubscript𝜆𝑟\lambda_{r}italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT 0.01 0.01 0.01
λicsubscript𝜆𝑖𝑐\lambda_{ic}italic_λ start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT 1 1 1
Table 6: Hyperparameters used for the results in section 4.2. The grid (g𝑔gitalic_g) schedule denotes the iterations at which the grid is updated. The learning rate scale denotes the change to the learning rate at each grid update.

A.5 Test 5

Parameter c=2𝑐2c=\sqrt{2}italic_c = square-root start_ARG 2 end_ARG c=2𝑐2c=2italic_c = 2
KAN architecture [2, 10, 1] [2, 10, 10, 1]
L𝐿Litalic_L 4 4
g𝑔gitalic_g 10 10
k𝑘kitalic_k 5 5
Initial learning rate 0.001 0.0005
Iterations 60000 120000
Nrsubscript𝑁𝑟N_{r}italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT 1000 1200
Nicsubscript𝑁𝑖𝑐N_{ic}italic_N start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT 100 100
Nbcsubscript𝑁𝑏𝑐N_{bc}italic_N start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT 200 200
λrsubscript𝜆𝑟\lambda_{r}italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT 0.01 0.01
λicsubscript𝜆𝑖𝑐\lambda_{ic}italic_λ start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT 1 1
λbcsubscript𝜆𝑏𝑐\lambda_{bc}italic_λ start_POSTSUBSCRIPT italic_b italic_c end_POSTSUBSCRIPT 1 1
Table 7: Hyperparameters used for the results in section 4.3.