Finite basis Kolmogorov-Arnold networks: domain decomposition for data-driven and physics-informed problems

Amanda A. Howard
Pacific Northwest National Laboratory
Richland, WA 99354 USA
[email protected]
&

Bruno Jacob
Pacific Northwest National Laboratory
Richland, WA 99354 USA
&

Sarah H. Murphy
University of North Carolina, Charlotte
Charlotte, NC USA
Pacific Northwest National Laboratory
Richland, WA 99354 USA
&

Alexander Heinlein
Delft University of Technology
Delft Institute of Applied Mathematics
2628 CD Delft, The Netherlands
[email protected] &

Panos Stinis
Pacific Northwest National Laboratory
Richland, WA 99354 USA
University of Washington, Applied Mathematics
Seattle, WA USA
Brown University, Applied Mathematics
Providence, RI, 02912 USA
[email protected]

Abstract

Kolmogorov-Arnold networks (KANs) have attracted attention recently as an alternative to multilayer perceptrons (MLPs) for scientific machine learning. However, KANs can be expensive to train, even for relatively small networks. Inspired by finite basis physics-informed neural networks (FBPINNs), in this work, we develop a domain decomposition method for KANs that allows for several small KANs to be trained in parallel to give accurate solutions for multiscale problems. We show that finite basis KANs (FBKANs) can provide accurate results with noisy data and for physics-informed training.

Keywords Kolmogorov-Arnold networks $\cdot$ Physics-informed neural networks $\cdot$ Domain decomposition

Refer to caption — Figure 1: Graphical abstract.

1 Introduction

Scientific machine learning, predominantly developed using the theory of multilayer perceptrons (MLPs), has recently garnered a great deal of attention [1, 2, 3]. Kolmogorov-Arnold networks (KANs) [4] use the Kolmogorov-Arnold Theorem as inspiration for an alternative to MLPs. KANs offer advantages over MLPs in some cases, such as for continual learning and noisy data. While MLPs use trainable weights and biases on edges with fixed activation functions, KANs use trainable activation functions represented by splines. This switch may be beneficial because splines are easy to adjust locally and can switch between different resolutions. KANs have also been shown to provide increased interpretability over an MLP. Consequently, it is essential to provide a well-rounded analysis of the performance and applicability of KANs. An active area of research involves exploring how to utilize a KAN effectively and discerning in which cases a KAN might be preferred over an MLP. In this paper, we develop an architecture for KANs based on overlap** domain decompositions, which allows for more accurate solutions for complex problems.

Since the publication of [4], numerous variations of KANs have been released, including fractional KANs [5], deep operator KANs [6], physics-informed KANs (PI-KANs)[7], KAN-informed neural networks (KINNs)[8], temporal KANs [9], graph KANs [10, 11, 12], Chebyshev KANs (cKANs) [13], convolutional KANs [14], ReLU-KANs [15], among others. KANs have been applied to satellite image classification [16], abnormality detection [17], and computer vision [18]. While these results are promising, there remains much to learn about how KANs can best be utilized and optimized. Many of the recent variations employ alternative functions to the B-spline parametrization, while others extend KANs through well-developed methods from the field of machine learning, such as physics-informed neural networks (PINNs) [19], deep operator networks (DeepONets) [20], and recurrent neural networks. Because KANs still follow the general network architecture of an MLP, save for the switch between the edges and neurons, it is often straightforward to apply such techniques to KANs.

PINNs [19] have demonstrated success for a wide range of problems, including fluid dynamics [21, 22, 23, 24, 25], energy systems [26, 27, 28, 29, 30], and heat transfer [21]. PINNs allow for efficient and accurate solving of physical problems where robust data is lacking. This is accomplished by incorporating the governing differential equations of a system into the loss function of a neural network, which ensures that the resulting solution will satisfy the physical laws of the system. Although PINNs can successfully train with minimal data in many cases, there are notable cases in which challenges arise. For instance, PINNs can get stuck at the fixed points of a dynamical system while training [31]. PINNs are also difficult to train for multiscale problems, due to so-called spectral bias [32], or when the problem domain is large. Domain decomposition techniques can improve training in such cases [33, 34, 35, 36, 37]; for a general overview of domain decomposition approaches in scientific machine learning, see also [38, 39]. Rather than training the network over the entire domain, there are many approaches for breaking up the domain into subdomains and training networks on each subdomain. A popular choice for domain decomposition applied to PINNs is extended PINNs (XPINNs) introduced in [33], with a parallel version presented in [40], and extended in [34, 41]. XPINNs have different subnetworks for each domain, which are stitched together across boundaries and are trained in parallel. Another class of domain decomposition techniques for PINNs consists of time marching methods, where, in general, a network is initially trained for a small subdomain, and then the subdomain is slowly expanded until it contains the entire domain. Methods in this class include backward-compatible PINNs, which consist of a single network shared by all subdomains [35]. Also similar is the time marching approach in [36]. Finally, one particularly successful technique is finite basis PINNs (FBPINNs) [37, 42, 43, 44], which use partition of unity functions to weight the output of neural networks in each domain. In comparison to other techniques, FBPINNs offer simplicity because no boundary conditions are needed between adjacent domains. This work explores whether a similar technique can be used to enhance the training of a KAN for challenging cases.

In addition to domain decomposition, many methods have been developed to improve the training of PINNs, including adaptive weighting schemes [45, 32], residual-based attention [46], adaptive residual point selection [47, 48, 49, 50, 51, 52], causality techniques [53], multifidelity PINNs [54, 55], and hierarchical methods to learn more accurate solutions progressively [56, 57, 58, 59, 60, 61, 62]. A substantial advantage of the methods presented in this work is that these additional techniques could be applied in combination with FBKANs to improve the training further.

In this paper, we first introduce finite basis KANs (FBKANs) in section 2. In sections 3 and 4 we highlight some of the features of FBKANs applied to data-driven problems and physics-informed problems, respectively. We show that FBKANs can increase accuracy over KANs. One important feature of FBKANs is that the finite basis architecture serves as a wrapper around a KAN architecture. While we have chosen to focus on KANs as described in [4], most available extensions of KANs could be considered instead to increase accuracy or robustness further.

2 Methods

2.1 KANs

In [4], the authors proposed approximating a multivariate function $f(\mathbf{x})$ by a model of the form

f(\mathbf{x})\approx\sum_{i_{n_{l}-1}=1}^{m_{n_{l}-1}}\varphi_{n_{l}-1,i_{n_{l% }},i_{n_{l}-1}}\left(\sum_{i_{n_{l}-2}=1}^{m_{n_{l}-2}}\ldots\left(\sum_{i_{2}% =1}^{m_{2}}\varphi_{2,i_{3},i_{2}}\left(\sum_{i_{1}=1}^{m_{1}}\varphi_{1,i_{2}% ,i_{1}}\left(\sum_{i_{0}=1}^{m_{0}}\varphi_{0,i_{1},i_{0}}(x_{i_{0}})\right)% \right)\right)\ldots\right),

(1)

denoted as a Kolmogorov-Arnold network. Here, $n_{l}$ is the number of layers in the KAN, $\{m_{j}\}_{j=0}^{n_{l}}$ is the number of nodes per layer, and $\phi_{i,j,k}$ are the univariant activation functions. We denote the right-hand side of eq. 1 as $\mathcal{K}(x)$ . The activation functions are polynomials of degree $k$ on a grid with $g$ grid points. They are represented by a weighted combination of a basis function $b(x)$ and a B-spline,

\phi(x)=w_{b}b(x)+w_{s}\text{spline}(x)

(2)

where

b(x)=\frac{x}{1+e^{-x}}

and

\text{spline}(x)=\sum_{i}c_{i}B_{i}(x).

Here, $B_{i}(x)$ is a polynomial of degree $k$ . $c_{i},w_{b},$ and $w_{s}$ are trainable parameters.

KANs evaluate the B-splines on a precomputed grid. In one dimension, a domain $[a,b]$ a grid with $g_{1}$ intervals has grid points $\{t_{0}=a,t_{1},t_{2},\ldots,t_{g_{1}}=b\}$ ; cf. [4]. Grid extension [4] allows for fitting a new, fine-grained spline to a coarse-grained spline, increasing the expressivity of the KAN. The coarse splines are transferred to the fine splines following the procedure in [4].

In this work, we do not consider the method for enforcing sparsity as outlined in [4] and instead only consider the mean squared error in the loss function. Many variations of KANs have been proposed recently. However, in this work, we consider only the formulation as outlined in [4], although we note that the domain decomposition method presented here would work with many variants of KANs.

2.2 Physics-informed neural networks

A PINN is a neural network (NN) that is trained to approximate the solution of an initial-boundary value problem (IBVP) by optimizing loss terms accounting for initial conditions, boundary conditions, and a residual term using backpropagation to calculate derivatives. In particular, the residual term represents how well the output satisfies the governing differential equation [19]. More specifically, we aim at approximating the solution $f$ of a generic IBVP

$\displaystyle f_{t}+\mathcal{N}_{x}[f]$	$\displaystyle=0,$	$\displaystyle x\in\Omega,t\in[0,T],$	(3)
$\displaystyle f(x,t)$	$\displaystyle=g(x,t),$	$\displaystyle x\in\partial\Omega,t\in[0,T],$
$\displaystyle f(x,0)$	$\displaystyle=u(x),$	$\displaystyle x\in\Omega,$

over the open domain $\Omega\in\mathbb{R}^{d}$ with boundary $\partial\Omega.$ Here, $x$ and $t$ represent the spatial and temporal coordinates, respectively, $\mathcal{N}_{x}$ is the differential operator with respect to $x$ , and $g$ and $u$ are given functions representing the boundary and initial conditions, respectively. The PINN model is then trained to minimize the mean squared error for the initial conditions, boundary conditions, and residual (physics) sampled at $N_{ic},N_{bc},N_{r}$ data points. We denote the sampling points for the initial conditions, boundary conditions, and residual as $\{x_{ic}^{i},u(x_{ic}^{i})\}_{i=1}^{N_{ic}}$ , $\{(x_{bc}^{i},t_{bc}^{i}),g(x_{bc}^{i},t_{bc}^{i})\}_{i=1}^{N_{bc}}$ , and $\{(x_{r}^{i},t_{r}^{i})\}_{i=1}^{N_{r}}$ , respectively. We then optimize the following weighted loss function with respect to the trainable parameters $\theta$ of the NN model $f_{\theta}$ :

$\displaystyle\mathcal{L}(\theta)$	$\displaystyle=\lambda_{ic}\mathcal{L}_{ic}(\theta)+\lambda_{bc}\mathcal{L}_{bc% }(\theta)+\lambda_{r}\mathcal{L}_{r}(\theta),$	(4)
$\displaystyle\mathcal{L}_{ic}(\theta)$	$\displaystyle=\frac{1}{N_{ic}}\sum_{i=1}^{N_{ic}}\left(f_{\theta}(x_{ic}^{i},0% )-u(x_{ic}^{i})\right)^{2},$	(5)
$\displaystyle\mathcal{L}_{bc}(\theta)$	$\displaystyle=\frac{1}{N_{bc}}\sum_{i=1}^{N_{bc}}\left(f_{\theta}(x_{bc}^{i},t% _{bc}^{i})-g(x_{bc}^{i},t_{bc}^{i})\right)^{2},$	(6)
$\displaystyle\mathcal{L}_{r}(\theta)$	$\displaystyle=\frac{1}{N_{r}}\sum_{i=1}^{N_{r}}\left(\frac{\partial}{\partial t% }f_{\theta}(x_{r}^{i},t_{r}^{i})+\mathcal{N}_{x}[f_{\theta}(x_{r}^{i},t_{r}^{i% })]\right)^{2},$	(7)

where $\lambda_{ic},\lambda_{bc},\lambda_{r}$ are weights for the different loss terms. The choice of these weights can have a significant impact on the training process and model performance. While adaptive schemes are available for choosing the weights [45, 32, 63, 64], in this work, we set the weights manually.

2.3 FBPINNs

In FBPINNs [37, 42, 43, 44], we decompose the space domain $\Omega$ or the space-time domain $\Omega\times[0,T]$ into overlap** subdomains. Each overlap** subdomain $\Omega_{j}$ is the interior of the support of a corresponding function $\omega_{j}$ , and all functions $\omega_{j}$ form a partition of unity. In particular, in the case of $L$ overlap** subdomains, we have

\Omega=\bigcup_{j=1}^{L}\Omega_{j},\quad{\rm supp}(\omega_{j})=\overline{% \Omega_{j}},\text{ and}\quad\sum_{j=1}^{L}\omega_{j}\equiv 1\text{ in }\Omega.

In one dimension, a uniform overlap** domain decomposition with overlap ratio $\delta>1$ is given by subdomains

\Omega_{j}=\left(\frac{(j-1)l-\delta l/2}{L-1},\frac{(j-1)l+\delta l/2}{L-1}% \right),

where $l=\max(x)-\min(x)$ .

There are multiple ways to define the partition of unity functions. Here, we construct them based on the expression

\omega_{j}=\frac{\hat{\omega}_{j}}{\sum_{j=1}^{L}\hat{\omega}_{j}},

(8)

where

\hat{\omega}_{j}(x)=\begin{cases}1&L=1,\\ \left[1+\cos\left(\pi(x-\mu_{j})/\sigma_{j}\right)\right]^{2}&L>1,\end{cases}

(9)

with $\mu_{j}=l(j-1)/(L-1)$ and $\sigma_{j}=(\delta l/2)/(L-1)$ representing the center and half-width of each subdomain, respectively. An example of the one-dimensional partition of unity functions with $L=4$ is depicted in fig. 2.

In multiple dimensions, $\mathbf{x}\in\mathbb{R}^{d}$ , we then obtain

\hat{\omega}_{j}(x)=\begin{cases}1&L=1,\\ \Pi_{i=1}^{d}\left[1+\cos\left(\pi(x_{i}-\mu_{ij})/\sigma_{ij}\right)\right]^{% 2}&L>1,\end{cases}

(10)

with $\mu_{ij}$ and $\sigma_{ij}$ representing the center and half-width of each subdomain $j$ in each direction $i$ . An example of of a single partition of unity function with $d=2$ and $L=4$ is shown in fig. 3.

Then, the FBPINN architecture reads

f_{\theta}(x)=\sum_{j=1}^{L}\omega_{j}(x)f_{j,\theta^{j}}(x),

where the $f_{j,\theta^{j}}$ is the neural network with parameters $\theta^{j}$ that corresponds to the subdomain $\Omega_{j}$ ; it is localized to $\Omega_{j}$ by multiplication with the partition of unity function $\omega_{j}$ , which is zero outside of $\Omega_{j}$ . The FBPINN model is trained in the same way as the PINN model, i.e., using initial, boundary, and residual loss terms; cf. section 2.2. Note that, in the original FBPINNs paper [37], hard enforcement of initial and boundary conditions is employed, such that only the residual loss term remains. Moreover, in [43], the approach has been extended to multilevel domain decompositions, and in [44], this domain decomposition approach has been applied to long-range time-dependent problems.

2.4 FBKANs

In FBKANs, we replace the NN used in FBPINNs with a KAN The function approximation in eq. 1 then becomes

	$\displaystyle f(\mathbf{x})$	$\displaystyle\approx\sum_{j=1}^{L}\omega_{j}(x)\left[\sum_{i_{n_{l}-1}=1}^{m_{% n_{l}-1}}\varphi^{j}_{n_{l}-1,i_{n_{l}},i_{n_{l}-1}}\left(\sum_{i_{n_{l}-2}=1}% ^{m_{n_{l}-2}}\ldots\left(\sum_{i_{1}=1}^{m_{1}}\varphi^{j}_{1,i_{2},i_{1}}% \left(\sum_{i_{0}=1}^{m_{0}}\varphi^{j}_{0,i_{1},i_{0}}(x_{i_{0}})\right)% \right)\ldots\right)\right]$		(11)
		$\displaystyle=\sum_{j=1}^{L}\omega_{j}(x)\mathcal{K}^{j}(x;\theta^{j})$

where $\mathcal{K}^{j}(x;\theta^{j})$ denotes the $j^{th}$ KAN with trainable parameters $\theta^{j}$ and $\omega_{j}$ denotes the corresponding finite basis partition of unity function introduced in section 2.3. FBKANs are trained to minimize the loss function

\mathcal{L}(\theta)=\lambda_{ic}\mathcal{L}_{ic}(\theta)+\lambda_{bc}\mathcal{% L}_{bc}(\theta)+\lambda_{r}\mathcal{L}_{r}(\theta)+\lambda_{data}\mathcal{L}_{% data}(\theta),

(12)

composed of

	$\displaystyle\mathcal{L}_{ic}(\theta)$	$\displaystyle=\frac{1}{N_{ic}}\sum_{i=1}^{N_{ic}}\left(\sum_{j=1}^{L}\omega_{j% }(x)\mathcal{K}^{j}(x_{ic}^{i};\theta^{j})-u(x_{ic}^{i})\right)^{2},$
	$\displaystyle\mathcal{L}_{bc}(\theta)$	$\displaystyle=\frac{1}{N_{bc}}\sum_{i=1}^{N_{bc}}\left(\sum_{j=1}^{L}\omega_{j% }(x)\mathcal{K}^{j}(x_{bc}^{i};\theta^{j})-g(x_{bc}^{i},t_{bc}^{i})\right)^{2},$
	$\displaystyle\mathcal{L}_{r}(\theta)$	$\displaystyle=\frac{1}{N_{r}}\sum_{i=1}^{N_{r}}\left(\frac{\partial}{\partial t% }\left[\sum_{j=1}^{L}\omega_{j}(x)\mathcal{K}^{j}(x_{r}^{i};\theta^{j})\right]% +\mathcal{N}_{x}\left[\sum_{j=1}^{L}\omega_{j}(x)\mathcal{K}^{j}(x_{r}^{i};% \theta^{j})\right]\right)^{2},$
	$\displaystyle\mathcal{L}_{data}(\theta)$	$\displaystyle=\frac{1}{N_{data}}\sum_{i=1}^{N_{data}}\left(\sum_{j=1}^{L}% \omega_{j}(x)\mathcal{K}^{j}(x_{data}^{i};\theta^{j})-f(x_{data}^{i})\right)^{% 2}.$

Here, $\theta=\{\theta^{j}\}_{j=1}^{L}$ is the set of trainable parameters, and the loss function $\mathcal{L}$ contains both a term for data-driven training $\mathcal{L}_{data}$ and terms for physics-informed training $\mathcal{L}_{ic}$ , $\mathcal{L}_{bc}$ , and $\mathcal{L}_{r}$ . In this way, FBKANs are adaptable to given problem characteristics.

For FBKANs, the grid of each local KAN $\mathcal{K}^{j}$ is set separately on the corresponding subdomain $\Omega^{j}$ . To do so, we densely sample points $x\in\Omega$ and compute the partition of unity function $\omega_{j}(x)$ . We then take the boundaries of each subdomain based on the partition of unity functions as

	$\displaystyle a^{j}$	$\displaystyle=\min_{x\text{ s.t. }\omega^{j}(x)>0.0001}x,$
	$\displaystyle b^{j}$	$\displaystyle=\max_{x\text{ s.t. }\omega^{j}(x)>0.0001}x.$

This gives a custom grid fit to each subdomain.

We evaluate the models using the relative $\ell_{2}$ error, calculated by

\frac{||f(x)-\sum_{j=1}^{L}\omega_{j}(x)\mathcal{K}^{j}(x;\theta^{j})||_{2}}{|% |f(x)||_{2}}.

Our implementation is based on Jax [65], using the Jax-KAN package [66] for the KAN implementation. For simplicity, we keep the hyperparameters of each KAN on each subdomain fixed, including the width and depth, reducing the total number of hyperparameters. However, in general, the parameters for each individual KAN $\mathcal{K}^{j}$ could be chosen differently.

3 Data-driven results

In this section, we consider data-driven problems, which are trained without physics meaning that $\lambda_{ic}=\lambda_{bc}=\lambda_{r}=0$ . These results specifically illustrate how FBKANs can improve predictions for noisy data and multiscale oscillations.

3.1 Test 1

In this section, we consider a toy problem

f(x)=\exp[\sin(0.3\pi x^{2})],

(13)

on $x\in[0,8]$ . This problem is designed to test both the scaling of FBKANs as we increase the number of subdomains, as well as the impact of noisy data on training FBKANs.

3.1.1 Scaling of FBKANs

First, we consider clean data, $N_{data}=1\,200$ , which is sampled uniformly in $x\in[0,8].$ We present results for training on $L=1,2,4,8,16,$ and $32$ subdomains. Note that the function described in eq. 13 is highly oscillatory, which makes it difficult to capture with a small KAN. As shown in fig. 4, increasing the number of subdomains from $L=2$ to $L=8$ and $L=32$ significantly decreases the pointwise error in the FBKAN predictions. Doubling the number of subdomains doubles the number of trainable parameters of the FBKAN model, increasing its expressivity. However, we observe approximately first order convergence of the relative $\ell_{2}$ error for a large number of subdomains in fig. 4.

3.1.2 FBKANs with noisy data

One strength of KANs over MLPs is their increased accuracy for noisy data [4]. We now test FBKANs on noisy training data using four subdomains ( $L=4$ ). Therefore, we sample 600 training points from a uniform distribution in $[0,8]$ and evaluate $f(x)$ . Then, we add Gaussian white noise with zero mean and varying magnitude to $f(x)$ , up to an average relative magnitude of $20\%$ . This becomes the training set. The method is then tested on 1 000 evenly spaced points in $[0,8]$ without noise.

The training set and results are shown in fig. 5. In the noisiest case, with relative noise of 18.1% added to the training set, the KAN yields a relative $\ell_{2}$ error of 0.1404, whereas the FBKAN yields a relative error of 0.0646. For all noise levels tested, the FBKAN consistently has a lower relative error than the plain KAN in fig. 5, and the predictions are robust to noisy data.

3.2 Test 2

We now consider the following equation

f(x,y)=\sin(6\pi x^{2})\sin(8\pi y^{2}),

(14)

for $(x,y)\in[0,1]\times[0,1].$ This data-driven example exhibits fine-scale oscillations. We test two cases: (1) a KAN and FBKAN with a fixed grid, denoted by KAN-1/FBKAN-1 and (2) a KAN and FBKAN with grid extension, denoted by KAN-2/FBKAN-2. The FBKAN has $L=4$ subdomains. The training set is composed of 10 000 points randomly sampled from a uniform distribution.

In both cases, we begin training with $g=5$ grid points. In the grid extension case, the grid increases every 600 iterations as listed in table 4, and the learning rate drops by $20\%$ each time the grid is updated. As can be observed in fig. 6, the final training loss for FBKAN-1 with a fixed grid is approximately the same as the training loss for KAN-2 with the grid extension approach, even though each FBKAN network has six times fewer grid points. This is also reflected in the relative errors reported in table 1. Comparing the KAN-1 and FBKAN-1 models in fig. 7, KAN-1 struggles to capture the fine-scale features of the data accuratelyand has a larger relative error. As can be seen in fig. 8, KAN-2 and FBKAN-2 both outperform their counterparts with a static grid, but FBKAN-2 is better able to capture the data than KAN-2.

Name	Grid type	Relative error
KAN-1	Fixed grid	$2.36\times 10^{-1}$
FBKAN-1	Fixed grid	$7.43\times 10^{-2}$
KAN-2	Grid extension	$8.10\times 10^{-2}$
FBKAN-2	Grid extension	$2.27\times 10^{-2}$

Table 1: Relative

\ell_{2}

errors for physics-informed Test 2.

4 Physics-informed results

4.1 Test 3

As the first physics-informed example, we consider the following one-dimensional problem with multiscale features:

	$\displaystyle\frac{df}{dx}$	$\displaystyle=4\cos(4x)+40\cos(40x),$
	$\displaystyle f(0)$	$\displaystyle=0,$

on $x\in[-4,4]$ with exact solution

f(x)=\sin(4x)+\sin(40x).

(15)

We sample 400 collocation points from a uniform distribution ( $N_{r}=400$ ) at every training iteration. The grid starts with $g=5$ and increases by $5$ every 1 000 iterations. The KAN reaches a relative $\ell_{2}$ error of 0.2407 with $g_{final}=20$ , whereas the FBKAN for four subdomains ( $L=4$ ) reaches a relative $\ell_{2}$ error of 0.0898 for $g_{final}=20$ . For eight subdomains ( $L=8$ ), the FBKAN reaches a relative error of 0.0369 with $g_{final}=15$ and 0.0662 with $g_{final}=10$ .

4.2 Test 4

We now test the Helmholtz equation

	$\displaystyle\frac{\partial^{2}f}{\partial y^{2}}+\frac{\partial^{2}f}{% \partial x^{2}}+k_{h}^{2}f-q(x,y)$	$\displaystyle=0,\ (x,y)\in[-1,1]\times[-1,1],$
	$\displaystyle f(-1,y)=f(1,y)$	$\displaystyle=0,\ y\in[-1,1],$
	$\displaystyle f(x,-1)=f(x,1)$	$\displaystyle=0,\ x\in[-1,1],$

with

q(x,y)=-(a_{1}\pi)^{2}\sin(a_{1}\pi x)\sin(a_{2}\pi y)-(a_{2}\pi)^{2}\sin(a_{1% }\pi x)\sin(a_{2}\pi y)+k_{h}\sin(a_{1}\pi x)\sin(a_{2}\pi y).

In our tests, we vary $a_{1}$ and $a_{2}$ .

For each choice of $(a_{1},a_{2})$ we consider three training cases. In the first case, we use higher-order splines with $k=5$ and a fixed grid with $g=5$ (denoted by KAN-1/FBKAN-1). In the second case, we take $k=3$ and use grid extension to increase $g$ by $5$ every $600$ iterations, starting with $g=5$ (denoted by KAN-2/FBKAN-2). We use one hidden layer with width 10 for both cases. Additionally, we test a third case, denoted by KAN-3/FBKAN-3, where we consider the same hyperparameters as KAN-1/FBKAN-1 but with width 5 in the hidden layer. The relative $\ell_{2}$ errors are reported in table 2. In all cases, the FBKAN outperforms the corresponding KAN. The grid extension KAN cases, where we increase the grid size over the training, comes at the expense of increasing the computational time by a factor of 1.4-1.5 in our tests. For $a_{1}=1,a_{2}=4$ , the smaller FBKAN-3 network is sufficient to represent the solution accurately. However, for larger values of $a_{1}$ and $a_{2}$ , the larger network in FBKAN-1 is necessary. The results for KAN-1/FBKAN-1 are shown in figs. 10 and 11 for $a_{1}=1,a_{2}=4$ and $a_{1}=4,a_{2}=4$ , respectively. In fig. 12, we consider $a_{1}=a_{2}=6$ with $L=4,9$ and $16$ , demonstrating the further refinement possible with additional subdomains.

	$a_{1}=1$ , $a_{2}=4$	$a_{1}=4$ , $a_{2}=4$	$a_{1}=6$ , $a_{2}=6$
KAN-1	0.0259	0.5465	1.1254
FBKAN-1, L=4	0.0102	0.0267	0.1151
FBKAN-1, L=9	0.0213	0.0239	0.0399
FBKAN-1, L=16	0.0037	0.0128	0.0321
KAN-2	0.0180	0.2045	0.5854
FBKAN-2	0.0112	0.0427	0.2272
KAN-3	0.3771	0.5488	1.2825
FBKAN-3	0.0214	0.2760	0.9797

Table 2: Relative

\ell_{2}

errors for physics-informed Test 3.

4.3 Test 5

Finally, we consider the wave equation

	$\displaystyle\frac{\partial^{2}f}{\partial t^{2}}-c^{2}\frac{\partial^{2}f}{% \partial x^{2}}$	$\displaystyle=0,\ (x,t)\in[0,1]\times[0,1],$
	$\displaystyle f(0,t)$	$\displaystyle=0,\ t\in[0,1],$
	$\displaystyle f(1,t)$	$\displaystyle=0,\ t\in[0,1],$
	$\displaystyle f(x,0)$	$\displaystyle=\sin(\pi x)+0.5\sin(4\pi x),\ x\in[0,1],$
	$\displaystyle f_{t}(x,0)$	$\displaystyle=0,\ x\in[0,1],$

which has the exact solution

f(x,t)=\sin(\pi x)\cos(c\pi t)+0.5\sin(4\pi x)\cos(4c\pi t).

We first consider $c=\sqrt{2}$ . The KAN has a relative $\ell_{2}$ error of 0.1402, and the FBKAN with $L=4$ has a relative $\ell_{2}$ error of 0.0153, as illustrated in fig. 13. We then consider the harder case with $c=2$ , shown in fig. 14. The KAN has a relative $\ell_{2}$ error of 0.1778 and the FBKAN with $L=4$ has a relative $\ell_{2}$ error of 0.0587.

5 Conclusions

We have developed domain decomposition-based KAN models for data-driven and physics-informed training with KANs; in accordance with FBPINNs, we denote them as FBKANs. FBKANs are scalable to complex problems and have a strong advantage over other domain decomposition-based approaches in that they do not require enforcement of transmission conditions between the subdomains via the loss function. They allow accurate training using an ensemble of small KANs combined using partition of unity functions, instead of a single large network.

One advantage of FBKANs is that they can be combined with existing techniques to improve the training of KANs and PI-KANs, including residual-based attention weights as introduced in [7], cKANs [13], deep operator KANs [6], and others. These methods would be directly compatible with the FBKAN framework. In future work, we will further examine the scalability of FBKANs and consider their application to higher dimensional problems. We will also consider multilevel FBKANs, following the multilevel FBPINNs [43] approach. Multilevel FBPINNs show improvement over FBPINNs for a large number of subdomains by providing a mechanism for global communication between subdomains. Multilevel FBKANs could offer a similar advantage, allowing robust training with an increasing number of subdomains.

6 Code and data availability

All code, trained models, and data required to replicate the examples presented in this paper will be released upon publication. Meanwhile, we have released code and Google Colab tutorials for FBKANs in Neuromancer [67] at https://github.com/pnnl/neuromancer/tree/feature/fbkans/examples/KANs, for the reader to explore the ideas implemented in this work.

7 Acknowledgements

The KAN diagrams in fig. 1 were developed with pykan [4]. This project was completed with support from the U.S. Department of Energy, Advanced Scientific Computing Research program, under the Scalable, Efficient and Accelerated Causal Reasoning Operators, Graphs and Spikes for Earth and Embedded Systems (SEA-CROGS) project (Project No. 80278) and under the Uncertainty Quantification for Multifidelity Operator Learning (MOLUcQ) project (Project No. 81739). The computational work was performed using PNNL Institutional Computing at Pacific Northwest National Laboratory. Pacific Northwest National Laboratory (PNNL) is a multi-program national laboratory operated for the U.S. Department of Energy (DOE) by Battelle Memorial Institute under Contract No. DE-AC05-76RL01830.

References

[1] George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021.
[2] Nathan Baker, Frank Alexander, Timo Bremer, Aric Hagberg, Yannis Kevrekidis, Habib Najm, Manish Parashar, Abani Patra, James Sethian, Stefan Wild, et al. Workshop report on basic research needs for scientific machine learning: Core technologies for artificial intelligence. Technical report, USDOE Office of Science (SC), Washington, DC (United States), 2019.
[3] Jonathan Carter, John Feddema, Doug Kothe, Rob Neely, Jason Pruet, Rick Stevens, Prasanna Balaprakash, Pete Beckman, Ian Foster, Kamil Iskra, et al. Advanced research directions on ai for science, energy, and security: Report on summer 2022 workshops. 2023.
[4] Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljačić, Thomas Y Hou, and Max Tegmark. Kan: Kolmogorov-arnold networks. arXiv preprint arXiv:2404.19756, 2024.
[5] Alireza Afzal Aghaei. fkan: Fractional kolmogorov-arnold networks with trainable jacobi basis functions. arXiv preprint arXiv:2406.07456, 2024.
[6] Diab W Abueidda, Panos Pantidis, and Mostafa E Mobasher. Deepokan: Deep operator network based on kolmogorov arnold networks for mechanics problems. arXiv preprint arXiv:2405.19143, 2024.
[7] Khemraj Shukla, Juan Diego Toscano, Zhicheng Wang, Zongren Zou, and George Em Karniadakis. A comprehensive and fair comparison between mlp and kan representations for differential equations and operator networks. arXiv preprint arXiv:2406.02917, 2024.
[8] Yizheng Wang, Jia Sun, **shuai Bai, Cosmin Anitescu, Mohammad Sadegh Eshaghi, Xiaoying Zhuang, Timon Rabczuk, and Yinghua Liu. Kolmogorov arnold informed neural network: A physics-informed deep learning framework for solving pdes based on kolmogorov arnold networks. arXiv preprint arXiv:2406.11045, 2024.
[9] Remi Genet and Hugo Inzirillo. Tkan: Temporal kolmogorov-arnold networks. arXiv preprint arXiv:2405.07344, 2024.
[10] Mehrdad Kiamari, Mohammad Kiamari, and Bhaskar Krishnamachari. Gkan: Graph kolmogorov-arnold networks. arXiv preprint arXiv:2406.06470, 2024.
[11] Gianluca De Carlo, Andrea Mastropietro, and Aris Anagnostopoulos. Kolmogorov-arnold graph neural networks, 2024.
[12] Roman Bresson, Giannis Nikolentzos, George Panagopoulos, Michail Chatzianastasis, Jun Pang, and Michalis Vazirgiannis. Kagnns: Kolmogorov-arnold networks meet graph learning, 2024.
[13] Sidharth SS. Chebyshev polynomial-based kolmogorov-arnold networks: An efficient architecture for nonlinear function approximation. arXiv preprint arXiv:2405.07200, 2024.
[14] Alexander Dylan Bodner, Antonio Santiago Tepsich, Jack Natan Spolski, and Santiago Pourteau. Convolutional kolmogorov-arnold networks. arXiv preprint arXiv:2406.13155, 2024.
[15] Qi Qiu, Tao Zhu, Helin Gong, Liming Chen, and Huansheng Ning. Relu-kan: New kolmogorov-arnold networks that only need matrix addition, dot multiplication, and relu. arXiv preprint arXiv:2406.02075, 2024.
[16] Minjong Cheon. Kolmogorov-arnold network for satellite image classification in remote sensing. arXiv preprint arXiv:2406.00600, 2024.
[17] Zhao**g Huang, Jiashuo Cui, Le** Yu, Luis Fernando Herbozo Contreras, and Omid Kavehei. Abnormality detection in time-series bio-signals using kolmogorov-arnold networks for resource-constrained devices. medRxiv, pages 2024–06, 2024.
[18] Basim Azam and Naveed Akhtar. Suitability of kans for computer vision: A preliminary investigation. arXiv preprint arXiv:2406.09087, 2024.
[19] Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
[20] Lu Lu, Pengzhan **, and George Em Karniadakis. DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv preprint arXiv:1910.03193, 2019.
[21] Shengze Cai, Zhi** Mao, Zhicheng Wang, Minglang Yin, and George Em Karniadakis. Physics-informed neural networks (pinns) for fluid mechanics: A review. Acta Mechanica Sinica, 37(12):1727–1738, 2021.
[22] Xiaowei **, Shengze Cai, Hui Li, and George Em Karniadakis. NSFnets (Navier-Stokes flow nets): Physics-informed neural networks for the incompressible Navier-Stokes equations. Journal of Computational Physics, 426:109951, 2021.
[23] Maziar Raissi, Alireza Yazdani, and George Em Karniadakis. Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations. Science, 367(6481):1026–1030, 2020.
[24] Mohammadamin Mahmoudabadbozchelou, George Em Karniadakis, and Safa Jamali. nn-pinns: Non-newtonian physics-informed neural networks for complex fluid modeling. Soft Matter, 18(1):172–185, 2022.
[25] Muhammad M Almajid and Moataz O Abu-Al-Saud. Prediction of porous media fluid flow using physics informed neural networks. Journal of Petroleum Science and Engineering, 208:109205, 2022.
[26] Wenqian Chen, Yucheng Fu, and Panos Stinis. Physics-informed machine learning of redox flow battery based on a two-dimensional unit cell model. Journal of Power Sources, 584:233548, 2023.
[27] George S Misyris, Andreas Venzke, and Spyros Chatzivasileiadis. Physics-informed neural networks for power systems. In 2020 IEEE Power & Energy Society General Meeting (PESGM), pages 1–5. IEEE, 2020.
[28] Bin Huang and Jianhui Wang. Applications of physics-informed neural networks in power systems-a review. IEEE Transactions on Power Systems, 38(1):572–588, 2022.
[29] Christian Moya and Guang Lin. DAE-PINN: a physics-informed neural network model for simulating differential algebraic equations with application to power networks. Neural Computing and Applications, 35(5):3789–3804, 2023.
[30] Murilo EC Bento. Physics-guided neural network for load margin assessment of power systems. IEEE Transactions on Power Systems, 2023.
[31] Franz Martin Rohrhofer, Stefan Posch, Clemens Gößnitzer, and Bernhard Geiger. On the role of fixed points of dynamical systems in training physics-informed neural networks. Transactions on Machine Learning Research, 2023(1):490, 2023.
[32] Sifan Wang, Xinling Yu, and Paris Perdikaris. When and why PINNs fail to train: A neural tangent kernel perspective. Journal of Computational Physics, 449:110768, 2022.
[33] Ameya D Jagtap and George Em Karniadakis. Extended physics-informed neural networks (xpinns): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. Communications in Computational Physics, 28(5), 2020.
[34] Michael Penwarden, Ameya D Jagtap, Shandian Zhe, George Em Karniadakis, and Robert M Kirby. A unified scalable framework for causal swee** strategies for physics-informed neural networks (PINNs) and their temporal decompositions. arXiv preprint arXiv:2302.14227, 2023.
[35] Revanth Mattey and Susanta Ghosh. A novel sequential method to train physics informed neural networks for Allen Cahn and Cahn Hilliard equations. Computer Methods in Applied Mechanics and Engineering, 390:114474, 2022.
[36] Colby L Wight and Jia Zhao. Solving Allen-Cahn and Cahn-Hilliard equations using the adaptive physics informed neural networks. arXiv preprint arXiv:2007.04542, 2020.
[37] Ben Moseley, Andrew Markham, and Tarje Nissen-Meyer. Finite basis physics-informed neural networks (FBPINNs): a scalable domain decomposition approach for solving differential equations. Advances in Computational Mathematics, 49(4):62, 2023.
[38] Alexander Heinlein, Axel Klawonn, Martin Lanser, and Janine Weber. Combining machine learning and domain decomposition methods for the solution of partial differential equations—a review. GAMM-Mitteilungen, 44(1):Paper No. e202100001, 28, 2021.
[39] Axel Klawonn, Martin Lanser, and Janine Weber. Machine learning and domain decomposition methods – a survey, December 2023. arXiv:2312.14050 [cs, math].
[40] Khemraj Shukla, Ameya D Jagtap, and George Em Karniadakis. Parallel physics-informed neural networks via domain decomposition. Journal of Computational Physics, 447:110683, 2021.
[41] Zheyuan Hu, Ameya D Jagtap, George Em Karniadakis, and Kenji Kawaguchi. When do extended physics-informed neural networks (xpinns) improve generalization? arXiv preprint arXiv:2109.09444, 2021.
[42] Victorita Dolean, Alexander Heinlein, Siddhartha Mishra, and Ben Moseley. Finite Basis Physics-Informed Neural Networks as a Schwarz Domain Decomposition Method. In Zdeněk Dostál, Tomáš Kozubek, Axel Klawonn, Ulrich Langer, Luca F. Pavarino, Jakub Šístek, and Olof B. Widlund, editors, Domain Decomposition Methods in Science and Engineering XXVII, pages 165–172, Cham, 2024. Springer Nature Switzerland.
[43] Victorita Dolean, Alexander Heinlein, Siddhartha Mishra, and Ben Moseley. Multilevel domain decomposition-based architectures for physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 429:117116, September 2024.
[44] Alexander Heinlein, Amanda A Howard, Damien Beecroft, and Panos Stinis. Multifidelity domain decomposition-based physics-informed neural networks for time-dependent problems. arXiv preprint arXiv:2401.07888, 2024.
[45] Levi McClenny and Ulisses Braga-Neto. Self-adaptive physics-informed neural networks using a soft attention mechanism. arXiv preprint arXiv:2009.04544, 2020.
[46] Sokratis J Anagnostopoulos, Juan Diego Toscano, Nikolaos Stergiopulos, and George Em Karniadakis. Residual-based attention in physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 421:116805, 2024.
[47] Chenxi Wu, Min Zhu, Qinyang Tan, Yadhu Kartha, and Lu Lu. A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 403:115671, 2023.
[48] Zhi** Mao and Xuhui Meng. Physics-informed neural networks with residual/gradient-based adaptive sampling methods for solving partial differential equations with sharp solutions. Applied Mathematics and Mechanics, 44(7):1069–1084, 2023.
[49] Jie Hou, Ying Li, and Shihui Ying. Enhancing PINNs for solving PDEs via adaptive collocation point movement and adaptive loss weighting. Nonlinear Dynamics, 111(16):15233–15261, 2023.
[50] Gregory Kang Ruey Lau, Apivich Hemachandra, See-Kiong Ng, and Bryan Kian Hsiang Low. Pinnacle: Pinn adaptive collocation and experimental points selection. arXiv preprint arXiv:2404.07662, 2024.
[51] Mohammad Amin Nabian, Rini Jasmine Gladstone, and Hadi Meidani. Efficient training of physics-informed neural networks via importance sampling. Computer-Aided Civil and Infrastructure Engineering, 36(8):962–977, 2021.
[52] Zhiwei Gao, Liang Yan, and Tao Zhou. Failure-informed adaptive sampling for PINNs. SIAM Journal on Scientific Computing, 45(4):A1971–A1994, 2023.
[53] Sifan Wang, Shyam Sankaran, and Paris Perdikaris. Respecting causality is all you need for training physics-informed neural networks. arXiv preprint arXiv:2203.07404, 2022.
[54] Michael Penwarden, Shandian Zhe, Akil Narayan, and Robert M Kirby. Multifidelity modeling for physics-informed neural networks (PINNs). Journal of Computational Physics, 451:110844, 2022.
[55] Xuhui Meng and George E Karniadakis. A composite neural network that learns from multi-fidelity data: Application to function approximation and inverse PDE problems. Journal of Computational Physics, 2019.
[56] Amanda Howard, Yucheng Fu, and Panos Stinis. A multifidelity approach to continual learning for physical systems. Machine Learning: Science and Technology, 5(2):025042, 2024.
[57] Amanda A Howard, Sarah H Murphy, Shady E Ahmed, and Panos Stinis. Stacked networks improve physics-informed training: applications to neural networks and deep operator networks. arXiv preprint arXiv:2311.06483, 2023.
[58] Yongji Wang and Ching-Yao Lai. Multi-stage neural networks: Function approximator of machine precision. Journal of Computational Physics, page 112865, 2024.
[59] Mark Ainsworth and Justin Dong. Galerkin neural networks: A framework for approximating variational equations with error control. SIAM Journal on Scientific Computing, 43(4):A2474–A2501, 2021.
[60] Mark Ainsworth and Justin Dong. Galerkin neural network approximation of singularly-perturbed elliptic systems. Computer Methods in Applied Mechanics and Engineering, 402:115169, 2022.
[61] Nathaniel Trask, Amelia Henriksen, Carianne Martinez, and Eric Cyr. Hierarchical partition of unity networks: fast multilevel training. In Mathematical and Scientific Machine Learning, pages 271–286. PMLR, 2022.
[62] Ziad Aldirany, Régis Cottereau, Marc Laforest, and Serge Prudhomme. Multi-level neural networks for accurate solutions of boundary-value problems. arXiv preprint arXiv:2308.11503, 2023.
[63] Sifan Wang, Hanwen Wang, and Paris Perdikaris. Improved architectures and training algorithms for deep operator networks. Journal of Scientific Computing, 92(2):35, 2022.
[64] Amanda A Howard, Saad Qadeer, Andrew William Engel, Adam Tsou, Max Vargas, Tony Chiang, and Panos Stinis. The conjugate kernel for efficient training of physics-informed deep operator networks. In ICLR 2024 Workshop on AI4DifferentialEquations In Science, 2024.
[65] James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy programs, 2018.
[66] Spyros Rigas and Michalis Papachristou. jaxKAN: A JAX-based implementation of Kolmogorov-Arnold Networks, May 2024.
[67] Jan Drgona, Aaron Tuor, James Koch, Madelyn Shapiro, and Draguna Vrabie. NeuroMANCER: Neural Modules with Adaptive Nonlinear Constraints and Efficient Regularizations. 2023.

Appendix A Training parameters

All results in this paper are implemented in JAX [65] using the Jax-KAN [66] KAN implementation. All networks are trained with the ADAM optimizer. For all FBKANs we take the domain overlap $\delta=1.9$ .

A.1 Test 1

parameter	section 3.1.1	section 3.1.2
KAN architecture	[1, 5, 1]	[1, 5, 1]
$L$	4 - 32	4
$g$	5	5
$k$	3	3
Learning rate	0.04	0.04
Iterations	4000	4000
$N_{data}$	1200	600

Table 3: Hyperparameters used for the results in section 3.1.

A.2 Test 2

Parameter	KAN-1/FBKAN-1	KAN-2/FBKAN-2
KAN architecture	[2, 10, 1]	[2, 5, 1]
$L$	4	4
$g$	5	[5, 10, 25, 30]
$g$ schedule	-	[0, 600, 1200, 1800]
$k$	3	3
Initial learning rate	0.02	0.02
Learning rate scale	-	0.8
Iterations	2400	2400
$N_{data}$	10000	10000

Table 4: Hyperparameters used for the results in section 3.2. The grid (

g

) schedule denotes the iterations at which the grid is updated. The learning rate scale denotes the change to the learning rate at each grid update. KAN-1/FBKAN-1 use fixed grids.

A.3 Test 3

Parameter
KAN architecture	[2, 10, 1]
$L$	4, 8
$g$	[5, 10, 15, 20]
$g$ schedule	[0, 1000, 2000, 3000]
$k$	3
Initial learning rate	0.01
Learning rate scale	0.8
Iterations	4000
$N_{r}$	400
$N_{ic}$	1
$\lambda_{r}$	1/40
$\lambda_{ic}$	1

Table 5: Hyperparameters used for the results in section 4.1. The grid (

g

) schedule denotes the iterations at which the grid is updated. The learning rate scale denotes the change to the learning rate at each grid update.

A.4 Test 4

Parameter	KAN-1/FBKAN-1	KAN-2/FBKAN-2	KAN-3/FBKAN-3
KAN architecture	[2, 10, 1]	[2, 10, 1]	[2, 5, 1]
$L$	4-16	4	4
$g$	5	[5, 10, 15]	5
$g$ schedule	-	[0, 3000, 6000]	-
$k$	5	3	5
Initial learning rate	0.005	0.005	0.005
Learning rate scale	-	0.8	-
Iterations	$a_{1}=1,a_{2}=4$ : 10000 $a_{1}=4,a_{2}=4$ : 10000 $a_{1}=6,a_{2}=6$ : 30000	10000	10000
$N_{r}$	800	800	800
$N_{bc}$	400	400	400
$\lambda_{r}$	0.01	0.01	0.01
$\lambda_{ic}$	1	1	1

Table 6: Hyperparameters used for the results in section 4.2. The grid (

g

) schedule denotes the iterations at which the grid is updated. The learning rate scale denotes the change to the learning rate at each grid update.

A.5 Test 5

Parameter	$c=\sqrt{2}$	$c=2$
KAN architecture	[2, 10, 1]	[2, 10, 10, 1]
$L$	4	4
$g$	10	10
$k$	5	5
Initial learning rate	0.001	0.0005
Iterations	60000	120000
$N_{r}$	1000	1200
$N_{ic}$	100	100
$N_{bc}$	200	200
$\lambda_{r}$	0.01	0.01
$\lambda_{ic}$	1	1
$\lambda_{bc}$	1	1

Table 7: Hyperparameters used for the results in section 4.3.