No One-Size-Fits-All Neurons: Task-based Neurons for Artificial Neural Networks

Feng-Lei Fan, Meng Wang, Hang-Cheng Dong, Jianwei Ma, Tieyong Zeng Jianwei Ma ([email protected]) and Tieyong Zeng ([email protected]) are co-corresponding authors. Feng-Lei Fan and Tieyong Zeng are with Department of Mathematics, Chinese University of Hong Kong, Shatin, Hong Kong. Hang-Cheng Dong is with School of Instrumentation Science and Engineering, Harbin Institute of Technology, Harbin, Heilongjiang Province 150001, China. Jianwei Ma is with Institute of Artificial Intelligence, School of Earth and Space Sciences, Peking University, Bei**g 100871, China, and School of Mathematics, Harbin Institute of Technology, Harbin 150001, China. Meng Wang is with School of Mathematics, Harbin Institute of Technology, Harbin, Heilongjiang Province 150001, China.
Abstract

In the past decade, many successful networks are on novel architectures, which almost exclusively use the same type of neurons. Recently, more and more deep learning studies have been inspired by the idea of NeuroAI and the neuronal diversity observed in human brains, leading to the proposal of novel artificial neuron designs. Designing well-performing neurons represents a new dimension relative to designing well-performing neural architectures. Biologically, the brain does not rely on a single type of neuron that universally functions in all aspects. Instead, it acts as a sophisticated designer of task-based neurons. In this study, we address the following question: since the human brain is a task-based neuron user, can the artificial network design go from the task-based architecture design to the task-based neuron design? Since methodologically there are no one-size-fits-all neurons, given the same structure, task-based neurons can enhance the feature representation ability relative to the existing universal neurons due to the intrinsic inductive bias for the task. Specifically, we propose a two-step framework for prototy** task-based neurons. First, symbolic regression is used to identify optimal formulas that fit input data by utilizing base functions such as logarithmic, trigonometric, and exponential functions. We introduce vectorized symbolic regression that stacks all variables in a vector and regularizes each input variable to perform the same computation, which can expedite the regression speed, facilitate parallel computation, and avoid overfitting. Second, we parameterize the acquired elementary formula to make parameters learnable, which serves as the aggregation function of the neuron. The activation functions such as ReLU and the sigmoidal functions remain the same because they have proven to be good. As the initial step, we evaluate the proposed framework via systematic experiments on tabular data and using polynomials as base functions. Empirically, experimental results on synthetic data, classic benchmarks, and real-world applications show that the proposed task-based neuron design is not only feasible but also delivers competitive performance over other state-of-the-art models. We have shared our code in https://github.com/NewT123-WM/Task_based_neurons.

Index Terms:
Machine learning, NeuroAI, neuronal diversity, symbolic regression, task-based neurons

1 Introduction

In the past decade, a majority of deep learning research is on designing outstanding architectures, such as the bottleneck in autoencoders [1], shortcuts [2, 3], and neural architecture search (NAS) [4]. Almost exclusively, these works employ neurons of the same type that use an inner product and a nonlinear activation. We refer to such a neuron as a linear neuron, and a network made of these neurons as a linear network (LN) hereafter. Recently, the field “NeuroAI” emerged [5] to advocate that a large amount of neuroscience knowledge can help catalyze the next generation of AI. This idea is well-motivated, as the brain remains the most intelligent system to date, and an artificial network can be regarded as a miniature of the brain. Following the advocacy of “NeuroAI”, it is noted that our brain is made up of many functionally and morphologically different neurons, while the existing mainstream artificial networks are homogeneous at the neuronal level. Thus, why not introduce neuronal diversity into artificial networks and examine associated merits?

Our overarching opinion is that the neuron type and architecture are two complementary dimensions of an artificial network. Designing well-performing neurons represents a new dimension relative to designing well-performing architectures. Therefore, the neuronal type should be given full attention to harness the full potential of connectionism. In recent years, a plethora of studies have introduced new neurons into deep learning [6, 7, 8, 9, 10, 11] such as polynomial neurons [6] and quadratic neurons [7, 8, 9, 10, 11]. Despite focusing only on a specific type of neuron, this thread of studies reasonably verifies the feasibility and potential of develo** deep learning with new neurons. However, the performance of these neurons is not universally satisfactory, and the improvement is minor on some tasks. Biological neuronal diversity, both in terms of morphology and functionality, arises from the brain’s needs to perform complex tasks [12]. The brain does not rely on a single type of neuron to universally function in all aspects. Instead, it acts as a sophisticated designer of task-based neurons. Hence, in the realm of deep learning, we think that promoting neuronal diversity should also not be limited to specific neuron types like linear or quadratic neurons. Instead, it should take into account the specific context of the tasks at hand.

Can we design different neurons for different tasks (task-based neurons)? Computationally, the philosophy of task-based architectures and task-based neurons is quite distinct. The former is “one-for-all”, which implicitly assumes that stacking a universal and basic type of neurons into different structures can solve a wide class of complicated nonlinear problems. This philosophy is well underpinned by the universal approximation theorem [13]. The latter is “one-for-one”, which assumes that there are no one-size-fits-all neuron types, and it is better to solve a specific problem by prototy** customized neurons. Because task-based neurons are imparted with the implicit bias for the task, the network of task-based neurons can integrate the task-driven forces of all these neurons, which given the same structure should exhibit stronger performance than the network of generic neurons. The key difference between task-based neurons and preset neurons is that the mathematical expression in the task-based neurons is adaptive according to the preference of the task, while in the preset neurons, the mathematical expression is preset.

Refer to caption
Figure 1: Two steps to establish a task-based neuron. a) The symbolic regression constructs an elementary neuronal model, which stacks all variables in a vector and regularizes each input variable to perform the same computation. b) The acquired elementary formula is parameterized to be learnable, serving as the aggregation function of a neuron.

Along this direction, three main challenges face us in prototy** task-based neurons: 1) How to efficiently design task-based neurons? 2) How to make the resultant neurons transferable to a network? 3) How to transfer the superiority at the neuronal level to the network level? Here, we propose a two-step framework to address these challenges: First, we introduce vectorized symbolic regression (VSR) to construct an elementary neuronal model, as depicted in Figure 1. Symbolic regression (SR) draws inspiration from scientific discoveries in physics [14, 15], aiming to identify optimal formulas that fit input data by utilizing base functions such as logarithmic, trigonometric, and exponential functions. The vectorized symbolic regression stacks all variables in a vector and regularizes each input variable to perform the same computation. Given the complexity and unclear nonlinearity of the tasks, formulas learned from vectorized symbolic regression can capture the underlying patterns in the data, and these patterns are different in different contexts. Thus, fixed formulas used in pre-designed neurons are disadvantageous. Second, we parameterize the acquired elementary formula to make parameters learnable, which serves as the aggregation function of the neuron. The role of the vectorized symbolic regression is to identify the basic patterns behind data, the parameterization allows the task-based neurons to adapt and interact with each other within a network. The activation functions such as ReLU and the sigmoidal functions remain the same when connected to a network, as these activation functions are widely tested as well-performed.

The vectorized symbolic regression in our framework greatly expedites the search process by avoiding learning highly complex and disordered formulas, particularly for high-dimensional inputs (challenge 1). It also facilitates parallel computing and ensures the feasibility of building a deep network with the designed neurons (challenge 2). The formulas learned by vectorized symbolic regression capture basic patterns in the data and are not sufficiently complex to solve highly intricate problems due to the ablated search space. By connecting task-based neurons into a network, we tap into the power of connectionism, enabling further amplification of the advantages without concerns about overfitting (challenge 3). We refer to a network made of task-based neurons as a task-based network (TN) hereafter.

As the initial step, we evaluate the feasibility and superiority of the proposed framework via systematic experiments on tabular data. Tabular data is one of the most common and important types of data in various domains, including finance, medicine, e-commerce, and many others [16]. Moreover, it encompasses various types of information, such as electronic health records, financial records, sensor readings, and more. Analyzing and understanding tabular data is crucial for making informed decisions and extracting valuable insights. Motivated by the success of quadratic and polynomial neurons, we mainly use polynomials as the base functions for symbolic regression. System experiments show that task-based neurons and associated networks can outperform networks of preset neurons and other state-of-the-art models. To summarize, our contributions are threefold:

  • Towards NeuroAI, we propose a framework to design task-based neurons, which is a new dimension compared to task-based architectures and can greatly expand the armory of deep learning models.

  • We propose the vectorized symbolic regression to solve the computational challenges in prototy** new neurons. Methodologically, our work is the first to introduce symbolic regression into the design of neurons in deep learning.

  • With systematic experiments over synthetic data, public data, and real-world applications, we confirm the effectiveness of the task-based neurons.

2 Related Work

Neuronal diversity. There has been a growing interest in recent years to prototype new neurons and introduce neuronal diversity into artificial networks [7]. It is important to clarify that modifying the activation function should not be considered as creating new neurons. This is because the decision boundary of a neuron is solely determined by the aggregation function, as long as the activation function is monotonic. Therefore, compared to modifying the aggregation function, changing the activation function has a relatively weaker influence on the behavior of a neuron. Currently, excluding spiking neurons [17] featuring the spatiotemporal processing ability, the exploration of neuronal diversity primarily revolves around polynomial or quadratic neurons [7], which replace the inner product with a polynomial or a specially-engineered quadratic function, expanding the range of computations by individual neurons.

In recent years, polynomial neurons were revisited in hope of enhancing the expressive ability of a single neuron. The key issue in designing polynomial neurons is to reduce the complexity such that they can be deployed into a deep network. [18, 6] decreased the complexity of polynomial neurons via tensor decomposition and factor sharing, while a majority of studies directly used quadratic neurons to save parameters to express high-order terms. Table I summarizes the recently-proposed quadratic neurons. As seen, given an n𝑛nitalic_n-dimensional input, the complexity of neurons in [19, 8, 9] is of 𝒪(n2)𝒪superscript𝑛2\mathcal{O}(n^{2})caligraphic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), which is still not bearable for deep networks, while neurons from [10, 20, 21, 22] enjoy the linear parametric complexity. Notably, neurons in [10, 20, 21, 22] are the special cases of [22]. Polynomial and quadratic neurons have demonstrated competitive performance in many tasks such as medical imaging [23], bearing fault diagnosis [11], inverse problems in partial differential equations (PDEs) [20], and signal processing [24].

TABLE I: A summary of the recently-proposed quadratic neurons. σ()𝜎\sigma(\cdot)italic_σ ( ⋅ ) is the nonlinear activation function. direct-product\odot denotes Hadamard product. 𝐖n×n𝐖superscript𝑛𝑛\mathbf{W}\in\mathbb{R}^{n\times n}bold_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT, win×1subscriptw𝑖superscript𝑛1\textbf{w}_{i}\in\mathbb{R}^{n\times 1}w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × 1 end_POSTSUPERSCRIPT, and the bias terms in these neurons are omitted for simplicity.
Authors Formulations
Zoumpourlis et al.(2017) [19] 𝐲=σ(𝒙𝐖𝒙+w𝒙)𝐲𝜎superscript𝒙top𝐖𝒙superscriptwtop𝒙\mathbf{y}=\sigma(\bm{x}^{\top}\mathbf{W}\bm{x}+\textbf{w}^{\top}\bm{x})bold_y = italic_σ ( bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_W bold_italic_x + w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x )
Jiang et al.(2019) [8] 𝐲=σ(𝒙𝐖𝒙\mathbf{y}=\sigma(\bm{x}^{\top}\mathbf{W}\bm{x}bold_y = italic_σ ( bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_W bold_italic_x)
Mantini&Shah(2021) [9]
Goyal et al.(2020) [10] 𝐲=σ(w(𝒙𝒙))𝐲𝜎superscriptwtopdirect-product𝒙𝒙\mathbf{y}=\sigma(\textbf{w}^{\top}(\bm{x}\odot\bm{x}))bold_y = italic_σ ( w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_x ⊙ bold_italic_x ) )
Bu&Karpatne(2021) [20] 𝐲=σ((w1𝒙)(w2𝒙))𝐲𝜎superscriptsubscriptw1top𝒙superscriptsubscriptw2top𝒙\mathbf{y}=\sigma((\textbf{w}_{1}^{\top}\bm{x})(\textbf{w}_{2}^{\top}\bm{x}))bold_y = italic_σ ( ( w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x ) ( w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x ) )
Xu et al.(2022) [21] 𝐲=σ((w1𝒙)(w2𝒙)+w3𝒙)𝐲𝜎superscriptsubscriptw1top𝒙superscriptsubscriptw2top𝒙superscriptsubscriptw3top𝒙\mathbf{y}=\sigma((\textbf{w}_{1}^{\top}\bm{x})(\textbf{w}_{2}^{\top}\bm{x})+% \textbf{w}_{3}^{\top}\bm{x})bold_y = italic_σ ( ( w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x ) ( w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x ) + w start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x )
Fan et al.(2018) [22] 𝐲=σ((w1𝒙)(w2𝒙)+w3(𝒙𝒙))𝐲𝜎superscriptsubscriptw1top𝒙superscriptsubscriptw2top𝒙superscriptsubscriptw3topdirect-product𝒙𝒙\mathbf{y}=\sigma((\textbf{w}_{1}^{\top}\bm{x})(\textbf{w}_{2}^{\top}\bm{x})+% \textbf{w}_{3}^{\top}(\bm{x}\odot\bm{x}))bold_y = italic_σ ( ( w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x ) ( w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x ) + w start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_x ⊙ bold_italic_x ) )

AutoML and Neural Architecture search (NAS). AutoML, short for Automated Machine Learning, attempts to automate and accelerate these steps to make machine learning more accessible to individuals and organizations that may not have extensive domain expertise [25]. With AutoML, users can focus on higher-level decisions and insights rather than getting swamped in the technical details of machine learning. Regarding selecting a neural network, neural architecture search (NAS) [26, 4] that automatically navigates the best-performing architecture from a set of possible architectures for a given dataset has gained lots of traction in the field of AutoML. NAS has been successful in discovering novel and efficient network architectures, leading to state-of-the-art results in various domains [27].

Although the neuron type and the architecture are the two most important elements of an artificial network, the neuronal type is much less explored. Our methodology for task-based neurons is parallel to NAS, which is a valuable addition to AutoML by extending its scope from architectures to neuronal types.

Symbolic Regression in Deep Learning Recent advances in symbolic regression include learning underlying PDEs from data [28], differentiable symbolic regression [29], and so on. Some works use neural networks to improve symbolic regression. For example, [30] uses a neural network architecture to span the hypothesis space of symbolic regression such that the formulas can be learned in an end-to-end manner. [31] extended symbolic regression to solve parametric systems whose coefficients may vary but the intrinsic structure of the underlying equation keeps intact. [32] proposed a Symbolic Network-based Rectifiable Learning Framework (SNR) that can correct errors generated in the learning-with-experience model.

However, the role of symbolic regression in deep learning is much less explored. To the best of our knowledge, our work is the first time that symbolic regression has been used for task-based neuronal designs.

3 Method

In this section, we first provide a detailed description of how to create task-based neurons using the proposed vectorized symbolic regression. We will explain step-by-step and highlight the benefits of using this approach in terms of efficiency, parallelism, and generalizability.

3.1 Vectorized Symbolic Regression

Unlike traditional regression algorithms that fit numerical coefficients, symbolic regression first encodes a formula into a tree structure and then uses a genetic algorithm to explore the space of possible mathematical expressions to identify the best formula. Because no gradients with respect to the mathematical formula can be computed, the most common technique for solving symbolic regression problems is genetic programming (GP) [33]. GP is a powerful population-based evolutionary algorithm, which mainly uses crossover (Figure 2) and mutation (Figure 3) to generate new formulas.

Refer to caption
Figure 2: A schematic diagram of crossover operation.

Crossover is a genetic programming operation to generate new individuals by means of subtree crossover among the selected individuals, and then explore the symbolic expression space. The specific method is to randomly select subtrees of the winner candidates and exchange them (Figure 2). This operation promotes diversity in the population and can lead to the discovery of new and more effective mathematical formulas.

Refer to caption
Figure 3: A schematic diagram of mutation operation.

Mutation is a genetic programming operation to randomly select a position of an individual, and generate a new individual through single-point mutation. Due to the randomness of mutation, it can re-join some functions and variables that were eliminated before, thereby potentially leading to the discovery of novel and effective expressions. By injecting variability into the population, mutation plays a crucial role in exploring the solution space and preventing premature convergence to suboptimal solutions (Figure 3).

In prototy** task-based neurons, we consider three important aspects: 1) How to efficiently design task-based neurons? 2) How to make the resultant neurons transferable to a network? 3) How to transfer the superiority at the neuronal level to the network level? We find that the traditional symbolic regression cannot fulfill these needs, particularly for high-dimensional inputs, due to three problems: First, the regression process of traditional symbolic regression becomes slow and computationally expensive for high-dimensional inputs. The search space becomes vast for high-dimensional inputs, as it requires checking an arbitrary form of interactions among two or more input variables. Second, the formulas learned by the traditional symbolic regression are heterogeneous, which suffers from the parametric explosion for high-dimensional inputs and does not support parallel computing and GPU acceleration. Thus, such formulas cannot serve as the aggregation function of a neuron because the resultant neuron cannot be easily integrated into deep networks. Third, the traditional symbolic regression may learn overly complex formulas, subjected to the risk of overfitting when connecting those neurons into a network.

To address these problems, we propose a solution called vectorized symbolic regression. This approach regularizes every variable to learn the same formula, allowing us to organize all variables into a vector. The formulas are then based on vector computation, as illustrated in Figure 4. Unlike traditional symbolic regression, which tends to identify a heterogeneous formula. The vectorized symbolic regression is simple yet mighty, which has valuable characteristics suitable to this task:

Refer to caption
Figure 4: The proposed vectorized symbolic regression.

Regression Speed: The vectorized symbolic regression decreases the computational complexity of the regression process, making it much faster than traditional symbolic regression, especially for high-dimensional inputs. This is because the search space is significantly reduced when all variables are regularized to learn the same formula.

Low Complexity and Parallel Computing: Due to the homogeneity, the proposed vectorized symbolic regression leads to mathematical formulas with much fewer parameters. Given d𝑑ditalic_d-dimensional inputs, the number of parameters is 𝒪(d)𝒪𝑑\mathcal{O}(d)caligraphic_O ( italic_d ), which is at the same level as the linear neuron. Moreover, because each variable conducts the same operation, formulas obtained from the proposed vectorized symbolic regression can be organized into the vector or matrix computation, which can facilitate parallel computation aided by GPUs. The low complexity and parallel computing allow for faster and more efficient training of deep networks composed of task-based neurons.

Generalization: The proposed vectorized symbolic regression has a significantly restricted search space. It is unlikely that a homogeneous formula can perfectly fit or overfit data all the time. Therefore, the learned formula tends to underfit data. The power of a neural network is not solely determined by neurons. We can introduce additional flexibility and adaptability to the network structure, enabling it to better handle complex problems and achieve the optimal generalization performance.

Remark. One may ask since linear neurons can already represent any function based on universal approximation [13], why are task-based neurons necessary? While it is true that there is no task that can only be done by task-based neurons but not by linear neurons, the key issue is effectiveness and efficiency. It was reported that a linear network needs an exponential number of parameters to learn the multiplication operation [34]. Task-based neurons search the suitable formulas from a broad function space, which can automatically integrate task-related priors, thereby leveraging the specific strengths of these neurons to tackle complex tasks effectively. Furthermore, task-based neurons can be optimized for a specific task, which can improve the efficiency of the network.

TABLE II: The comparison between SR and the proposed VSR.
traditional SR vectorized SR
Regression Speed
Parametric Complexity
Parallel Computing
Generalization

3.2 Parameterization

We expect that the vectorized symbolic regression can identify hidden patterns behind data collected from different tasks. Leveraging these patterns to prototype new neurons would be useful. These patterns are basic and not necessarily specific functions. For instance, we refer to a cell as circular that is characterized by an elliptical equation, but we don’t need to specify the radius of the circle. To take advantage of these patterns, we reparameterize the learned formula by making the fixed constants trainable. Such neurons will perform better than preset neurons since considering the complexity of tasks, there should be no one-size-fits-all neurons. By reparameterizing the learned formula, we can fine-tune the neuron’s behavior to better fit the task at hand. As mentioned earlier, the task-based neurons established through vectorized symbolic regression have limited expressive ability and cannot effectively scale to handle complex tasks on their own. Given a network, the trainable parameters allow for a more efficient and effective search for the optimal solution.

TABLE III: Comparison of the network built by neurons using randomly generated polynomials and expressions generated by symbolic regression. The number h1subscript1h_{1}italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-h2subscript2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-\cdots-hksubscript𝑘h_{k}italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT-y(z)𝑦𝑧y(z)italic_y ( italic_z ) means that this network has k𝑘kitalic_k hidden layers, each with hksubscript𝑘h_{k}italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT neurons, and z𝑧zitalic_z is the number of parameters used in such a network. For example, 5-3-1 (145) means this network has two hidden layers with 5 and 3 neurons respectively in each layer, and 145 parameters. RP denotes random polynomials.
Datasets Random Polynomial (RP) Structure Test results
TN RP TN RP
california housing (𝒙3𝒙)+(𝒙𝒙)+1superscriptsuperscriptdirect-product3𝒙𝒙topsuperscriptdirect-product𝒙𝒙top1(\bm{x}\odot^{3}\bm{x})^{\top}+(\bm{x}\odot\bm{x})^{\top}+1( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + 1 6-3-1 (148) 6-3-1 (148) 0.0720 (0.0024) 0.0770 (0.0051)
house sales (𝒙4𝒙)+(𝒙𝒙)+𝒙superscriptsuperscriptdirect-product4𝒙𝒙topsuperscriptdirect-product𝒙𝒙top𝒙(\bm{x}\odot^{4}\bm{x})^{\top}+(\bm{x}\odot\bm{x})^{\top}+\bm{x}( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_italic_x 6-4-1 (483) 8-5-1 (509) 0.0079 (0.0008) 0.0133 (0.0150)
airfoil self noise (𝒙𝒙)superscriptdirect-product𝒙𝒙top(\bm{x}\odot\bm{x})^{\top}( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT 4-1 (53) 8-1 (57) 0.0438 (0.0065) 0.0840 (0.0102)
wine quality (𝒙4𝒙)+(𝒙𝒙)superscriptsuperscriptdirect-product4𝒙𝒙topsuperscriptdirect-product𝒙𝒙top(\bm{x}\odot^{4}\bm{x})^{\top}+(\bm{x}\odot\bm{x})^{\top}( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT 6-1 (295) 9-5-1 (313) 0.0545 (0.0026) 0.0567 (0.0034)
fifa (𝒙5𝒙)+(𝒙𝒙)+𝒙superscriptsuperscriptdirect-product5𝒙𝒙topsuperscriptdirect-product𝒙𝒙top𝒙(\bm{x}\odot^{5}\bm{x})^{\top}+(\bm{x}\odot\bm{x})^{\top}+\bm{x}( bold_italic_x ⊙ start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_italic_x 3-1 (94) 5-1 (96) 0.0611 (0.0032) 0.0616 (0.0048)
diamonds (𝒙7𝒙)+(𝒙3𝒙)+(𝒙𝒙)superscriptsuperscriptdirect-product7𝒙𝒙topsuperscriptsuperscriptdirect-product3𝒙𝒙topsuperscriptdirect-product𝒙𝒙top(\bm{x}\odot^{7}\bm{x})^{\top}+(\bm{x}\odot^{3}\bm{x})^{\top}+(\bm{x}\odot\bm{% x})^{\top}( bold_italic_x ⊙ start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT 4-1 (205) 7-1 (218) 0.0109 (0.0036) 0.0123 (0.0052)
abalone (𝒙3𝒙)+𝒙+1superscriptsuperscriptdirect-product3𝒙𝒙top𝒙1(\bm{x}\odot^{3}\bm{x})^{\top}+\bm{x}+1( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_italic_x + 1 5-1 (141) 8-1 (153) 0.0239 (0.0024) 0.0240 (0.0030)
Bike Sharing Demand (𝒙𝒙)superscriptdirect-product𝒙𝒙top(\bm{x}\odot\bm{x})^{\top}( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT 8-1 (217) 10-8-1 (227) 0.0184 (0.0026) 0.0747 (0.0025)
space ga (𝒙4𝒙)+(𝒙3𝒙)+(𝒙𝒙)superscriptsuperscriptdirect-product4𝒙𝒙topsuperscriptsuperscriptdirect-product3𝒙𝒙topsuperscriptdirect-product𝒙𝒙top(\bm{x}\odot^{4}\bm{x})^{\top}+(\bm{x}\odot^{3}\bm{x})^{\top}+(\bm{x}\odot\bm{% x})^{\top}( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT 2-1 (59) 3-1 (67) 0.0057 (0.0029) 0.0078 (0.0041)
Airlines DepDelay (𝒙𝒙)superscriptdirect-product𝒙𝒙top(\bm{x}\odot\bm{x})^{\top}( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT 4-1 (53) 8-1 (57) 0.1645 (0.0055) 0.1664 (0.0058)
credit (𝒙4𝒙)+(𝒙3𝒙)+𝒙+1superscriptsuperscriptdirect-product4𝒙𝒙topsuperscriptsuperscriptdirect-product3𝒙𝒙top𝒙1(\bm{x}\odot^{4}\bm{x})^{\top}+(\bm{x}\odot^{3}\bm{x})^{\top}+\bm{x}+1( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_italic_x + 1 6-2 (152) 6-2 (224) 0.7441 (0.0092) 0.7427 (0.0118)
heloc (𝒙6𝒙)+(𝒙𝒙)+𝒙superscriptsuperscriptdirect-product6𝒙𝒙topsuperscriptdirect-product𝒙𝒙top𝒙(\bm{x}\odot^{6}\bm{x})^{\top}+(\bm{x}\odot\bm{x})^{\top}+\bm{x}( bold_italic_x ⊙ start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_italic_x 18-2 (1316) 18-2 (1316) 0.7077 (0.0145) 0.6944 (0.0157)
electricity (𝒙𝒙)superscriptdirect-product𝒙𝒙top(\bm{x}\odot\bm{x})^{\top}( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT 5-2 (107) 10-2 (112) 0.7862 (0.0075) 0.7647 (0.0083)
phoneme (𝒙5𝒙)+(𝒙4𝒙)+(𝒙3𝒙)+𝒙+1superscriptsuperscriptdirect-product5𝒙𝒙topsuperscriptsuperscriptdirect-product4𝒙𝒙topsuperscriptsuperscriptdirect-product3𝒙𝒙top𝒙1(\bm{x}\odot^{5}\bm{x})^{\top}+(\bm{x}\odot^{4}\bm{x})^{\top}+(\bm{x}\odot^{3}% \bm{x})^{\top}+\bm{x}+1( bold_italic_x ⊙ start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_italic_x + 1 5-3 (89) 5-3 (89) 0.8242 (0.0207) 0.8198 (0.0252)
bank-marketing (𝒙4𝒙)+(𝒙𝒙)+𝒙+1superscriptsuperscriptdirect-product4𝒙𝒙topsuperscriptdirect-product𝒙𝒙top𝒙1(\bm{x}\odot^{4}\bm{x})^{\top}+(\bm{x}\odot\bm{x})^{\top}+\bm{x}+1( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_italic_x + 1 5-2 (187) 7-2 (198) 0.7938 (0.0099) 0.7919 (0.0099)
MagicTelescope (𝒙4𝒙)+(𝒙𝒙)+𝒙+1superscriptsuperscriptdirect-product4𝒙𝒙topsuperscriptdirect-product𝒙𝒙top𝒙1(\bm{x}\odot^{4}\bm{x})^{\top}+(\bm{x}\odot\bm{x})^{\top}+\bm{x}+1( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_italic_x + 1 6-2 (296) 8-2 (298) 0.8449 (0.0092) 0.8417 (0.0092)
vehicle (𝒙4𝒙)+1superscriptsuperscriptdirect-product4𝒙𝒙top1(\bm{x}\odot^{4}\bm{x})^{\top}+1( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + 1 4-4 (360) 13-7-4 (377) 0.8176 (0.0362) 0.6894 (0.0401)
Oranges-vs.-Grapefruit (𝒙𝒙)superscriptdirect-product𝒙𝒙top(\bm{x}\odot\bm{x})^{\top}( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT 3-2 (47) 6-2 (50) 0.9305 (0.0037) 0.8952 (0.0160)
eye movements (𝒙3𝒙)+1superscriptsuperscriptdirect-product3𝒙𝒙top1(\bm{x}\odot^{3}\bm{x})^{\top}+1( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + 1 10-2 (452) 15-8-2 (461) 0.5849 (0.0125) 0.5823 (0.0146)
Contaminant (𝒙5𝒙)+(𝒙3𝒙)+𝒙superscriptsuperscriptdirect-product5𝒙𝒙topsuperscriptsuperscriptdirect-product3𝒙𝒙top𝒙(\bm{x}\odot^{5}\bm{x})^{\top}+(\bm{x}\odot^{3}\bm{x})^{\top}+\bm{x}( bold_italic_x ⊙ start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_italic_x 10-2 (1612) 15-6-2 (1679) 0.9208 (0.0192) 0.9142 (0.0203)
TABLE IV: Test results of the symbolic regression (SR), the vectorized symbolic regression (VSR), and task-based networks (TN).
Dataset #Instances #Features SR VSR TN (Strcture)
google stock price 12,454 4 0.0362 0.5505 0.0319 (4-1)
concrete strength 10,308 8 0.4617 0.7914 0.1144 (4-1)
health insurance 13,386 6 0.3072 0.7009 0.1849 (3-1)
white wine quality 395,611 11 0.7430 0.9288 0.6174 (4-1)
song popularity 1,492,613 13 1.0137 1.0075 0.9366 (13-1)

4 Analysis Experiments

In this section, we present a series of experiments designed to analyze the feasibility, necessity, and superiority of the proposed task-based neurons. For all experiments, we prescribe that the function space of symbolic regression is polynomial. We first validate via the synthetic data the feasibility of the proposed framework for the task-based neurons (Section 1.1 in Supplementary Materials, SMs). It is shown that the vectorized symbolic regression can capture correct hidden formulas from heavily-noised data. Then, we show the necessity of the proposed methodology relative to the symbolic regression. We find that the vectorized symbolic regression is inferior to the symbolic regression in fitting data, while the resultant task-based networks can outperform the symbolic regression. Next, we compare task-based neurons with conventional neurons and different quadratic neurons (Section 1.3 in SMs) to show the superiority of task-based neurons. Furthermore, we compare task-based neurons with neurons using random polynomials to confirm that the polynomials learned from the symbolic regression are reasonable. Lastly, we extend the search space from polynomial bases to trigonometric functions (Section 1.3 in SMs). By expanding the repertoire of functions that task-based neurons can search and utilize, their adaptability and effectiveness in handling more diverse and complex tasks are investigated.

4.1 Necessity of Task-based Neurons

One may argue that instead of using the vectorized symbolic regression to construct an elementary neuron, why not directly use symbolic regression to fit data? The reason is that symbolic regression is not good at handling high-dimensional data. Constructing a neuron is to tap into connectionism, which has been proven to be a powerful approach by numerous successes of deep learning.

To validate these points, we perform TN, traditional symbolic regression (SR), and vectorized symbolic regression (VSR) on five publicly available tabular datasets and compare their performance. For SR and VSR, prediction on the test set is made directly using the expression obtained by SR and VSR, respectively. For TN, we no longer perform the vectorized symbolic regression; instead, we use the mathematical expression obtained from VSR to build a neuron. The network structure is the fully connected network.

TABLE V: The formulas learned by the vectorized symbolic regression over 20 public data.
Datasets Instances Features Classes Predicted Function
california housing 20640 8 continuous 0.068(𝒙3𝒙)+0.15𝒙+0.760.068superscriptsuperscriptdirect-product3𝒙𝒙top0.15superscript𝒙top0.76\bm{0.068}(\bm{x}\odot^{3}\bm{x})^{\top}+\bm{0.15}\bm{x}^{\top}+0.76bold_0.068 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.15 bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + 0.76
house sales 21613 15 continuous 0.062(𝒙4𝒙)+0.025(𝒙3𝒙)0.010(𝒙𝒙)+0.067𝒙+0.740.062superscriptsuperscriptdirect-product4𝒙𝒙top0.025superscriptsuperscriptdirect-product3𝒙𝒙top0.010superscriptdirect-product𝒙𝒙top0.067superscript𝒙top0.74-\bm{0.062}(\bm{x}\odot^{4}\bm{x})^{\top}+\bm{0.025}(\bm{x}\odot^{3}\bm{x})^{% \top}-\bm{0.010}(\bm{x}\odot\bm{x})^{\top}+\bm{0.067}\bm{x}^{\top}+0.74- bold_0.062 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.025 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_0.010 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.067 bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + 0.74
airfoil self noise 1503 5 continuous 0.064(𝒙𝒙)0.038𝒙0.0870.064superscriptdirect-product𝒙𝒙top0.038superscript𝒙top0.087\bm{0.064}(\bm{x}\odot\bm{x})^{\top}-\bm{0.038}\bm{x}^{\top}-0.087bold_0.064 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_0.038 bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - 0.087
wine quality 6497 11 continuous 0.0076(𝒙4𝒙)+0.055(𝒙3𝒙)+0.10(𝒙𝒙)+0.055𝒙0.000340.0076superscriptsuperscriptdirect-product4𝒙𝒙top0.055superscriptsuperscriptdirect-product3𝒙𝒙top0.10superscriptdirect-product𝒙𝒙top0.055superscript𝒙top0.00034\bm{0.0076}(\bm{x}\odot^{4}\bm{x})^{\top}+\bm{0.055}(\bm{x}\odot^{3}\bm{x})^{% \top}+\bm{0.10}(\bm{x}\odot\bm{x})^{\top}+\bm{0.055}\bm{x}^{\top}-0.00034bold_0.0076 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.055 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.10 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.055 bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - 0.00034
fifa 18063 5 continuous 0.30(𝒙5𝒙)0.63(𝒙4𝒙)0.10(𝒙3𝒙)+0.38(𝒙𝒙)+0.13𝒙+0.0100.30superscriptsuperscriptdirect-product5𝒙𝒙top0.63superscriptsuperscriptdirect-product4𝒙𝒙top0.10superscriptsuperscriptdirect-product3𝒙𝒙top0.38superscriptdirect-product𝒙𝒙top0.13superscript𝒙top0.010\bm{0.30}(\bm{x}\odot^{5}\bm{x})^{\top}-\bm{0.63}(\bm{x}\odot^{4}\bm{x})^{\top% }-\bm{0.10}(\bm{x}\odot^{3}\bm{x})^{\top}+\bm{0.38}(\bm{x}\odot\bm{x})^{\top}+% \bm{0.13}\bm{x}^{\top}+0.010bold_0.30 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_0.63 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_0.10 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.38 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.13 bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + 0.010
diamonds 53940 9 continuous 0.075(𝒙7𝒙)+0.16(𝒙6𝒙)+0.10(𝒙5𝒙)0.27(𝒙4𝒙)+0.090(𝒙3𝒙)0.075superscriptsuperscriptdirect-product7𝒙𝒙top0.16superscriptsuperscriptdirect-product6𝒙𝒙top0.10superscriptsuperscriptdirect-product5𝒙𝒙top0.27superscriptsuperscriptdirect-product4𝒙𝒙top0.090superscriptsuperscriptdirect-product3𝒙𝒙top-\bm{0.075}(\bm{x}\odot^{7}\bm{x})^{\top}+\bm{0.16}(\bm{x}\odot^{6}\bm{x})^{% \top}+\bm{0.10}(\bm{x}\odot^{5}\bm{x})^{\top}-\bm{0.27}(\bm{x}\odot^{4}\bm{x})% ^{\top}+\bm{0.090}(\bm{x}\odot^{3}\bm{x})^{\top}- bold_0.075 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.16 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.10 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_0.27 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.090 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT
abalone 4177 8 continuous 0.088(𝒙3𝒙)0.12(𝒙𝒙)+0.046𝒙0.088superscriptsuperscriptdirect-product3𝒙𝒙top0.12superscriptdirect-product𝒙𝒙top0.046superscript𝒙top-\bm{0.088}(\bm{x}\odot^{3}\bm{x})^{\top}-\bm{0.12}(\bm{x}\odot\bm{x})^{\top}+% \bm{0.046}\bm{x}^{\top}- bold_0.088 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_0.12 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.046 bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT
Bike Sharing Demand 17379 12 continuous 0.081(𝒙𝒙)+0.054𝒙0.081superscriptdirect-product𝒙𝒙top0.054superscript𝒙top-\bm{0.081}(\bm{x}\odot\bm{x})^{\top}+\bm{0.054}\bm{x}^{\top}- bold_0.081 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.054 bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT
space ga 3107 6 continuous 0.052(𝒙4𝒙)+0.12(𝒙3𝒙)+0.025(𝒙𝒙)0.073𝒙+0.540.052superscriptsuperscriptdirect-product4𝒙𝒙top0.12superscriptsuperscriptdirect-product3𝒙𝒙top0.025superscriptdirect-product𝒙𝒙top0.073superscript𝒙top0.54\bm{0.052}(\bm{x}\odot^{4}\bm{x})^{\top}+\bm{0.12}(\bm{x}\odot^{3}\bm{x})^{% \top}+\bm{0.025}(\bm{x}\odot\bm{x})^{\top}-\bm{0.073}\bm{x}^{\top}+0.54bold_0.052 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.12 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.025 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_0.073 bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + 0.54
Airlines DepDelay 8000 5 continuous 0.010(𝒙𝒙)+0.042𝒙0.270.010superscriptdirect-product𝒙𝒙top0.042superscript𝒙top0.27\bm{0.010}(\bm{x}\odot\bm{x})^{\top}+\bm{0.042}\bm{x}^{\top}-0.27bold_0.010 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.042 bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - 0.27
credit 16714 10 2 0.43(𝒙4𝒙)+0.37(𝒙𝒙)+0.210.43superscriptsuperscriptdirect-product4𝒙𝒙top0.37superscriptdirect-product𝒙𝒙top0.21-\bm{0.43}(\bm{x}\odot^{4}\bm{x})^{\top}+\bm{0.37}(\bm{x}\odot\bm{x})^{\top}+0% .21- bold_0.43 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.37 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + 0.21
heloc 10000 22 2 0.031(𝒙6𝒙)0.026(𝒙5𝒙)+0.055(𝒙3𝒙)0.031superscriptsuperscriptdirect-product6𝒙𝒙top0.026superscriptsuperscriptdirect-product5𝒙𝒙top0.055superscriptsuperscriptdirect-product3𝒙𝒙top\bm{0.031}(\bm{x}\odot^{6}\bm{x})^{\top}-\bm{0.026}(\bm{x}\odot^{5}\bm{x})^{% \top}+\bm{0.055}(\bm{x}\odot^{3}\bm{x})^{\top}bold_0.031 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_0.026 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.055 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT
electricity 38474 8 2 0.21(𝒙𝒙)+0.21𝒙+1.180.21superscriptdirect-product𝒙𝒙top0.21superscript𝒙top1.18-\bm{0.21}(\bm{x}\odot\bm{x})^{\top}+\bm{0.21}\bm{x}^{\top}+1.18- bold_0.21 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.21 bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + 1.18
phoneme 3172 5 2 1.36(𝒙5𝒙)2.91(𝒙3𝒙)+0.60(𝒙𝒙)+1.22𝒙1.36superscriptsuperscriptdirect-product5𝒙𝒙top2.91superscriptsuperscriptdirect-product3𝒙𝒙top0.60superscriptdirect-product𝒙𝒙top1.22superscript𝒙top\bm{1.36}(\bm{x}\odot^{5}\bm{x})^{\top}-\bm{2.91}(\bm{x}\odot^{3}\bm{x})^{\top% }+\bm{0.60}(\bm{x}\odot\bm{x})^{\top}+\bm{1.22}\bm{x}^{\top}bold_1.36 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_2.91 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.60 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_1.22 bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT
bank-marketing 10578 7 2 1.04(𝒙4𝒙)0.14(𝒙3𝒙)+0.81(𝒙𝒙)+0.043𝒙0.0681.04superscriptsuperscriptdirect-product4𝒙𝒙top0.14superscriptsuperscriptdirect-product3𝒙𝒙top0.81superscriptdirect-product𝒙𝒙top0.043superscript𝒙top0.068-\bm{1.04}(\bm{x}\odot^{4}\bm{x})^{\top}-\bm{0.14}(\bm{x}\odot^{3}\bm{x})^{% \top}+\bm{0.81}(\bm{x}\odot\bm{x})^{\top}+\bm{0.043}\bm{x}^{\top}-0.068- bold_1.04 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_0.14 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.81 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.043 bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - 0.068
MagicTelescope 13376 10 2 0.30(𝒙4𝒙)+1.13(𝒙3𝒙)+0.46(𝒙𝒙)0.060𝒙+1.020.30superscriptsuperscriptdirect-product4𝒙𝒙top1.13superscriptsuperscriptdirect-product3𝒙𝒙top0.46superscriptdirect-product𝒙𝒙top0.060superscript𝒙top1.02-\bm{0.30}(\bm{x}\odot^{4}\bm{x})^{\top}+\bm{1.13}(\bm{x}\odot^{3}\bm{x})^{% \top}+\bm{0.46}(\bm{x}\odot\bm{x})^{\top}-\bm{0.060}\bm{x}^{\top}+1.02- bold_0.30 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_1.13 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.46 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_0.060 bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + 1.02
vehicle 846 18 4 0.074(𝒙4𝒙)+0.068(𝒙3𝒙)+0.072(𝒙𝒙)+0.0015𝒙0.074superscriptsuperscriptdirect-product4𝒙𝒙top0.068superscriptsuperscriptdirect-product3𝒙𝒙top0.072superscriptdirect-product𝒙𝒙top0.0015superscript𝒙top-\bm{0.074}(\bm{x}\odot^{4}\bm{x})^{\top}+\bm{0.068}(\bm{x}\odot^{3}\bm{x})^{% \top}+\bm{0.072}(\bm{x}\odot\bm{x})^{\top}+\bm{0.0015}\bm{x}^{\top}- bold_0.074 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.068 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.072 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.0015 bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT
Oranges-vs.-Grapefruit 10000 5 2 0.52(𝒙𝒙)+0.70𝒙+0.890.52superscriptdirect-product𝒙𝒙top0.70superscript𝒙top0.89-\bm{0.52}(\bm{x}\odot\bm{x})^{\top}+\bm{0.70}\bm{x}^{\top}+0.89- bold_0.52 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.70 bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + 0.89
eye movements 7608 20 2 0.017(𝒙3𝒙)0.011(𝒙𝒙)0.017superscriptsuperscriptdirect-product3𝒙𝒙top0.011superscriptdirect-product𝒙𝒙top-\bm{0.017}(\bm{x}\odot^{3}\bm{x})^{\top}-\bm{0.011}(\bm{x}\odot\bm{x})^{\top}- bold_0.017 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_0.011 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT
contaminant 2400 30 2 0.75(𝒙5𝒙)0.67(𝒙4𝒙)+0.38(𝒙3𝒙)+0.24(𝒙𝒙)+0.13𝒙0.75superscriptsuperscriptdirect-product5𝒙𝒙top0.67superscriptsuperscriptdirect-product4𝒙𝒙top0.38superscriptsuperscriptdirect-product3𝒙𝒙top0.24superscriptdirect-product𝒙𝒙top0.13superscript𝒙top-\bm{0.75}(\bm{x}\odot^{5}\bm{x})^{\top}-\bm{0.67}(\bm{x}\odot^{4}\bm{x})^{% \top}+\bm{0.38}(\bm{x}\odot^{3}\bm{x})^{\top}+\bm{0.24}(\bm{x}\odot\bm{x})^{% \top}+\bm{0.13}\bm{x}^{\top}- bold_0.75 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_0.67 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.38 ( bold_italic_x ⊙ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.24 ( bold_italic_x ⊙ bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_0.13 bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT

To eliminate the effect of large differences in the order of magnitude of features in the data, we do the normalization for features. The training and test sets are divided according to the ratio of 8:2:828:28 : 2, and MSE is the evaluation index. We use machine learning library gplearn111https://gplearn.readthedocs.io/en/stable/intro.html to implement symbolic regression, gplearn is a mature symbolic regression library based on Python, with good stability and superior performance. Traditional symbolic regression deals with each input variable individually. To implement the vectorized symbolic regression, we just need to feed the input vector into the symbolic regression in gplearn.

We set the fixed random seed to ensure that the symbolic regression can be repeated. For important hyperparameters, we finetune them to achieve the optimal performance for SR and VSR. Finally, in the traditional SR, the population size is set to 5,000; the max generation is 20; the crossover probability is set at 55%; the mutation probability at 40%; the reproduction probability is set at 5%. In VSR, the random seed is set to 100; the population size is set to 5,000; the max generation is set to 30; the crossover probability is set at 30%; the mutation probability at 60%; the reproduction probability at 10%; the tournament size is 3% of the population size. The range of numeric symbols returned is set to [1,1]11[-1,1][ - 1 , 1 ], following a uniform distribution.

The test results are shown in Table IV, from which we can draw two highlights: First, overall, the performance of the VSR is inferior to SR. While VSR slightly outcompetes SR on the dataset of song popularity, SR leads VSR by a large margin on the rest of datasets. This is because the search space of VSR is ablated. SR is more likely to find a better formula. Second, TN outperforms SR on five datasets, which suggests the power of connectionism and the efficacy of VSR to build a neuron. When a basic pattern regarding data is captured by a single neuron, the corresponding network can leverage these basic patterns to form an effective representation.

4.2 Superiority of Task-based Neurons over Neurons Using Random Polynomials

To further illustrate the necessity and effectiveness of using symbolic regression to generate neurons for different tasks, we perform experiments on 20 public datasets: 10 for classification and 10 for regression. MSE and classification accuracy are evaluation metrics for regression and classification, respectively. Datasets are collected from the scikit-learn package and the official website of OpenML. We first normalize the original data to [1,1]11[-1,1][ - 1 , 1 ]. Then we perform the vectorized symbolic regression on the normalized dataset.

The relevant information regarding these 20 datasets, and the regression results are shown in Table V. After learning formulas from different datasets, we use them to build a neuron and connect neurons into a network to conduct the training and test. We randomly generate a polynomial with the highest number of expressions as the highest number of expressions obtained by symbolic regression, as shown in Table III, and then use the randomly generated expressions to build a neuron and the associated network and compare it with the task-based network.

It should be noted that 1) on the dataset of electricity, Airlines DepDelay, and Oranges-vs.-Grapefruit, because the expression obtained by symbolic regression is of degree 2, we can only set the randomly generated polynomial to a polynomial with only one term and degree 2 (which is x2superscript𝑥2x^{2}italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT) to make it different from the expression obtained by the symbolic regression. 2) When using neurons to build the network, since the weight parameters and bias will be randomly initialized again, we set the coefficients and constant terms (if any) of the randomly generated neuron expression to 1. In fact, the coefficients and constants of the neuronal expressions do not affect the performance of the network, because they are randomly initialized.

It can be seen that the fitting ability of the network built by randomly generated neurons is weaker than that of the neural network built by neurons obtained by symbolic regression. On both regression and classification, task-based networks scale better and use fewer parameters. These results indicate that symbolic regression plays an important role in identifying the appropriate polynomials.

4.3 Superiority of Task-based Neurons over Linear Neurons

Here, we test the superiority of task-based neurons relative to linear ones. We use the same 20 datasets in the last subsection: 10 for regression and 10 for classification. We don’t need to repeat the process of the vectorized symbolic regression. Instead, we directly use polynomials learned in Table V. The training and test sets are divided according to the ratio of 8:2:828:28 : 2. For TN and LN, the data division and the batch size are the same. We select 5 different network structures for each dataset for a comprehensive comparison. When designing the network structures of TN, we ensure that the number of parameters of TN is fewer than the LN to show the superiority of task-based neurons in efficiency. The specific network structure and corresponding number of parameters are shown in SMs. Each dataset is tested 10 times for reliability of results. The MSE and classification accuracy are presented in the form of mean(std)meanstd\mathrm{mean}~{}(\mathrm{std})roman_mean ( roman_std ) in Table VI.

TABLE VI: Test results of linear networks and task-based networks of different structures on the public data
Datasets LN(S1) TN(S1) LN(S2) TN(S2) LN(S3) TN(S3) LN(S4) TN(S4) LN(S5) TN(S5)
california housing 0.0760 (0.0057) 0.0702 (0.0019) 0.0844 (0.0486) 0.0685 (0.0047) 0.0988 (0.0664) 0.0667 (0.0057) 0.0861 (0.0452) 0.0670 (0.0044) 0.1013 (0.0614) 0.0540 (0.0036)
house sales 0.0113 (0.0046) 0.0109 (0.0041) 0.0111 (0.0037) 0.0100 (0.0027) 0.0139 (0.0130) 0.0098 (0.0017) 0.0139 (0.0105) 0.0098 (0.0017) 0.0158 (0.0128) 0.0112 (0.0047)
airfoil self noise 0.0428 (0.0181) 0.0402 (0.0050) 0.0329 (0.0286) 0.0270 (0.0042) 0.0669 (0.0547) 0.0287 (0.0091) 0.0250 (0.0070) 0.0233 (0.0049) 0.0327 (0.0358) 0.0134 (0.0024)
wine quality 0.0636 (0.0084) 0.0625 (0.0083) 0.0619 (0.0041) 0.0602 (0.0021) 0.0634 (0.0098) 0.0592 (0.0051) 0.0613 (0.0076) 0.0592 (0.0051) 0.0604 (0.0094) 0.0597 (0.0066)
fifa 0.0895 (0.0283) 0.0637 (0.0055) 0.0728 (0.0085) 0.0608 (0.0023) 0.1138 (0.0477) 0.0622 (0.0032) 0.1093 (0.0425) 0.0634 (0.0051) 0.0946 (0.0420) 0.0597 (0.0018)
diamonds 0.0128 (0.0050) 0.0097 (0.0023) 0.0108 (0.0066) 0.0078 (0.0020) 0.0084 (0.0032) 0.0081 (0.0027) 0.0103 (0.0073) 0.0080 (0.0018) 0.0145 (0.0169) 0.0084 (0.0024)
abalone 0.0253 (0.0026) 0.0250 (0.0025) 0.0289 (0.0109) 0.0250 (0.0029) 0.0296 (0.0087) 0.0269 (0.0074) 0.0282 (0.0098) 0.0258 (0.0032) 0.0300 (0.0102) 0.0268 (0.0037)
Bike Sharing Demand 0.0144 (0.0040) 0.0132 (0.0022) 0.0133 (0.0047) 0.0110 (0.0022) 0.0103 (0.0015) 0.0092 (0.0010) 0.0137 (0.0043) 0.0101 (0.0021) 0.0104 (0.0018) 0.0083 (0.0011)
space ga 0.0077 (0.0021) 0.0069 (0.0015) 0.0063 (0.0020) 0.0054 (0.0007) 0.0120 (0.0050) 0.0054 (0.0015) 0.0108 (0.0053) 0.0061 (0.0036) 0.0103 (0.0046) 0.0049 (0.0006)
Airlines DepDelay 0.1631 (0.0055) 0.1617 (0.0049) 0.1643 (0.0057) 0.1616 (0.0046) 0.1662 (0.0062) 0.1649 (0.0051) 0.1636 (0.0060) 0.1634 (0.0047) 0.1651 (0.0063) 0.1635 (0.0059)
credit 0.7123 (0.0398) 0.7433 (0.0077) 0.7302 (0.0108) 0.7392 (0.0087) 0.7358 (0.0089) 0.7447 (0.0055) 0.7267 (0.0260) 0.7447 (0.0055) 0.7108 (0.0316) 0.7372 (0.0083)
heloc 0.7030 (0.0091) 0.7080 (0.0077) 0.6980 (0.0112) 0.7040 (0.0063) 0.6950 (0.0145) 0.6950 (0.0098) 0.6890 (0.0073) 0.6900 (0.0073) 0.6910 (0.0129) 0.6930 (0.0100)
electricity 0.7770 (0.0062) 0.7860 (0.0045) 0.7880 (0.0068) 0.7950 (0.0046) 0.7910 (0.0046) 0.8000 (0.0047) 0.7900 (0.0055) 0.7990 (0.0051) 0.7630 (0.0882) 0.8100 (0.0035)
phoneme 0.8170 (0.0321) 0.8420 (0.0099) 0.8020 (0.0994) 0.8510 (0.0116) 0.7790 (0.1460) 0.8560 (0.0107) 0.8470 (0.0131) 0.8550 (0.0079) 0.8120 (0.1080) 0.8540 (0.0090)
bank-marketing 0.7780 (0.0140) 0.7840 (0.0078) 0.7870 (0.0102) 0.7940 (0.0067) 0.7600 (0.0839) 0.7930 (0.0066) 0.7560 (0.0893) 0.7920 (0.0073) 0.7840 (0.0086) 0.7920 (0.0083)
MagicTelescope 0.8452 (0.0098) 0.8531 (0.0071) 0.8430 (0.0078) 0.8557 (0.0062) 0.8456 (0.0047) 0.8580 (0.0040) 0.8462 (0.0073) 0.8573 (0.0063) 0.8433 (0.0090) 0.8580 (0.0066)
vehicle 0.8090 (0.0335) 0.8180 (0.0277) 0.8090 (0.0258) 0.8240 (0.0237) 0.7980 (0.0312) 0.8150 (0.0241) 0.8100 (0.0377) 0.8110 (0.0264) 0.8090 (0.0349) 0.8140 (0.0214)
Oranges-vs.-Grapefruit 0.9288 (0.1381) 0.9429 (0.0049) 0.9430 (0.1489) 0.9740 (0.0020) 0.9429 (0.1490) 0.9751 (0.0020) 0.9426 (0.1457) 0.9751 (0.0020) 0.8927 (0.1944) 0.9758 (0.0076)
eye movements 0.5790 (0.0117) 0.5840 (0.0116) 0.5800 (0.0188) 0.5910 (0.0107) 0.5720 (0.0267) 0.5880 (0.0179) 0.5680 (0.0293) 0.5840 (0.0103) 0.5790 (0.0288) 0.5800 (0.0113)
Contaminant 0.9220 (0.0108) 0.9300 (0.0095) 0.9290 (0.0134) 0.9300 (0.0120) 0.9250 (0.0092) 0.9310 (0.0083) 0.9260 (0.0126) 0.9300 (0.0109) 0.9010 (0.0638) 0.9340 (0.0096)

Regression. The designs of LN and TN all adopt a fully connected network. The activation function is ReLU for LN and Sigmoid for TN. Both networks use MSELoss as the loss function and RMSProp as the optimizer. The details of the specific network structures and the corresponding number of parameters are shown in SMs.

In Table VI, in every dataset, the mean and standard deviation of fitting errors (MSE) of LN are larger than those of TN, indicating that TN has a stronger fitting ability and better generalization than LN. In some datasets such as fifa and airfoil self noise, TN leads LN by a large margin.

Classification. For this task, the network design uses a fully connected network both for LN and TN. The activation function uses ReLU for LN and Sigmoid for TN. Both networks use CrossEntropyLoss as the loss function and Adam as the optimizer. The network is trained and tested in the same way as the regression tasks.

The classification results are shown in Table VI. It is observed that TN has higher training accuracy and test accuracy, indicating that TN has stronger fitting ability. For every dataset, the accuracy of TN over test sets is higher than LN, and the standard deviation is smaller than LN. For datasets like phoneme, electricity, and Orange-vs-Grapefruit, the improvement by TN is at least 3%, which is significant. Moreover, TN achieves better classification accuracy with fewer parameters.

5 Comparative Experiments

In this section, we compare the task-based networks with other state-of-the-art models over two real-world tasks. To highlight the superiority of the network using task-based neurons, we select advanced machine learning models for comparison, namely XGBoost[35], LightGBM[36], CatBoost[37], TabNet[38], TabTransformer[39], FT-Transformer[16] and DANETs[40]. All these models are either classic models or recent models that were published in prestigious venues of machine learning.

High-energy Particle Collision Prediction. High-energy particle collision experiments are an important means to help people understand the fundamental composition and evolution of our universe. In the collision, the medium quickly becomes a soup of deconfined quarks, gluons, and partons within the first few microseconds. Quark-Gluon Plasma (QGP) is the name of this blazing and dense fireball. To investigate the distinctive qualities and evolution of the QGP, particles created at each phase of the medium developed in high-energy collision tests are employed as a probe. The important QGP phase probe is the J/ψ𝐽𝜓J/\psiitalic_J / italic_ψ meson, which is the bound state of the charm quark (c𝑐citalic_c) and its antiparticle (c¯¯𝑐\bar{c}over¯ start_ARG italic_c end_ARG). The invariant mass spectrum of the particle is very useful for selecting the distribution region of the J/ψ𝐽𝜓J/\psiitalic_J / italic_ψ signal. In high-energy physics, predicting the invariant mass spectrum of the particle is a critical research topic.

TABLE VII: The test results (MSE errors) of different models on particle collision dataset.
Method particle collision asteroid prediction
XGBoost 0.0094±0.0006plus-or-minus0.00940.00060.0094\pm 0.00060.0094 ± 0.0006 0.0646±0.1031plus-or-minus0.06460.10310.0646\pm 0.10310.0646 ± 0.1031
LightGBM 0.0056±0.0004plus-or-minus0.00560.00040.0056\pm 0.00040.0056 ± 0.0004 0.1391±0.1676plus-or-minus0.13910.16760.1391\pm 0.16760.1391 ± 0.1676
CatBoost 0.0028±0.0002plus-or-minus0.00280.00020.0028\pm 0.00020.0028 ± 0.0002 0.0817±0.0846plus-or-minus0.08170.08460.0817\pm 0.08460.0817 ± 0.0846
TabNet 0.0040±0.0006plus-or-minus0.00400.00060.0040\pm 0.00060.0040 ± 0.0006 0.0627±0.0939plus-or-minus0.06270.09390.0627\pm 0.09390.0627 ± 0.0939
TabTransformer 0.0038±0.0008plus-or-minus0.00380.00080.0038\pm 0.00080.0038 ± 0.0008 0.4219±0.2776plus-or-minus0.42190.27760.4219\pm 0.27760.4219 ± 0.2776
FT-Transformer 0.0050±0.0020plus-or-minus0.00500.00200.0050\pm 0.00200.0050 ± 0.0020 0.2136±0.2189plus-or-minus0.21360.21890.2136\pm 0.21890.2136 ± 0.2189
DANETs 0.0076±0.0009plus-or-minus0.00760.00090.0076\pm 0.00090.0076 ± 0.0009 0.1709±0.1859plus-or-minus0.17090.18590.1709\pm 0.18590.1709 ± 0.1859
Task-based Network 0.0016±0.0005plus-or-minus0.00160.0005\mathbf{0.0016\pm 0.0005}bold_0.0016 ± bold_0.0005 0.0513±0.0551plus-or-minus0.05130.0551\mathbf{0.0513\pm 0.0551}bold_0.0513 ± bold_0.0551

CERN collected the collision data and disclosed a dataset on Kaggle 222 https://www.kaggle.com/datasets/fedesoriano/cern-electron-collision-data. This data set is used to predict the invariant mass of the two electrons in the collision experiment by the properties of electrons such as energy, charge, transverse momentum, and pseudorapidity. It consists of 99,915 observations, and each observation has 16 features.

Asteroid Prediction. Asteroids are petite celestial bodies composed of rocks that revolve around the Sun in elliptical orbits. They are remnants left over from the early stages of solar system formation. Two categories of asteroids, namely Near-Earth Asteroids (NEAs) and Potentially Hazardous Asteroids (PHAs), attract particular interest among researchers. NEAs are characterized by their proximity to Earth’s orbit, indicating the potential risk of collision with our planet. PHAs, in particular, raise concerns due to their potential to approach Earth’s orbit even closer, coupled with their substantial size, which could result in significant damage upon collision with Earth. The prediction of diameter holds crucial significance in the identification of NEAs and PHAs, as the size of these celestial bodies profoundly influences the manner in which they interact with Earth. Smaller asteroids are more prone to be burned out within the Earth’s atmosphere prior to reaching the terrestrial surface, while larger asteroids possess substantial potential for catastrophic destruction upon impact. By accurately predicting the diameters of NEAs and PHAs, researchers can strengthen their assessment of the potential threats these celestial entities pose to our planet, thereby facilitating the formulation of strategies to mitigate any potential hazards.

To predict the diameters of NEAs and PHAs, physicists from the Jet Propulsion Laboratory of CalTech collected a dataset that consists of 137,636 observations and each observation has 19 features, available on Kaggle 333https://www.kaggle.com/datasets/basu369victor/prediction-of-asteroid-diameter.

For both datasets, the training set, validation set, and test set are randomly divided at a ratio of 8:1:1. We choose the MSE as the evaluation metric. To ensure a reliable comparison and mitigate the influence of randomness, we conduct 10 times tests for each model, and the final results are presented in the form of mean±(std)plus-or-minusmeanstd\mathrm{mean}\pm(\mathrm{std})roman_mean ± ( roman_std ). The detailed test results are shown in Table VII.

It can be seen that while TabTransformer has good performance on the particle collision dataset but unsatisfactory performance on the asteroid dataset, CatBoost and TabNet have the consistent performance on both datasets. The highlight of Table VII is that the task-based network is the best performer on both datasets. It leads TabTransformer, FT-transformer, and DANETs by a large margin.

6 Conclusion and Future Work

In this paper, towards NeuroAI, we have proposed the roadmap for task-based neurons via symbolic regression, which is a new frontier of neural network research compared to the architecture design. Systematic experiments over 10 synthetic datasets (Supplementary Materials), 25 public datasets, and 3 real-world applications, have confirmed the potential of the task-based neuronal designs.

In the future, on the one hand, the process of the vectorized symbolic regression should be assiduously optimized. We can further investigate how to select suitable base functions for different scenarios to replace the simple symbolic regression algorithm in this paper. Also, from the perspective of algorithmic efficiency, the proposed vectorized symbolic regression has not been facilitated by parallel acceleration and GPU acceleration, so the regression speed has room for improvement. On the other hand, we will explore synergizing task-based neurons and task-based architectures to build a more powerful network model.

Reference

  • [1] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in MICCAI, pp. 234–241, Springer, 2015.
  • [2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
  • [3] F. Fan, D. Wang, H. Guo, Q. Zhu, P. Yan, G. Wang, and H. Yu, “On a sparse shortcut topology of artificial neural networks,” IEEE Transactions on Artificial Intelligence, 2021.
  • [4] C. Yang, G. Bender, H. Liu, P.-J. Kindermans, M. Udell, Y. Lu, Q. V. Le, and D. Huang, “Tabnas: Rejection sampling for neural architecture search on tabular datasets,” Advances in Neural Information Processing Systems, vol. 35, pp. 11906–11917, 2022.
  • [5] A. Zador, B. Richards, B. Ölveczky, S. Escola, Y. Bengio, K. Boahen, M. Botvinick, D. Chklovskii, A. Churchland, C. Clopath, et al., “Toward next-generation artificial intelligence: Catalyzing the neuroai revolution,” arXiv preprint arXiv:2210.08340, 2022.
  • [6] G. Chrysos, S. Moschoglou, G. Bouritsas, J. Deng, Y. Panagakis, and S. P. Zafeiriou, “Deep polynomial neural networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
  • [7] F.-L. Fan, Y. Li, H. Peng, T. Zeng, and F. Wang, “Towards neuroai: Introducing neuronal diversity into artificial neural networks,” arXiv preprint arXiv:2301.09245, 2023.
  • [8] Y. Jiang, F. Yang, H. Zhu, D. Zhou, and X. Zeng, “Nonlinear cnn: improving cnns with quadratic convolutions,” Neural Computing and Applications, vol. 32, no. 12, pp. 8507–8516, 2020.
  • [9] P. Mantini and S. K. Shah, “Cqnn: Convolutional quadratic neural networks,” in 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9819–9826, IEEE, 2021.
  • [10] M. Goyal, R. Goyal, and B. Lall, “Improved polynomial neural networks with normalised activations,” in 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, IEEE, 2020.
  • [11] J.-X. Liao, H.-C. Dong, Z.-Q. Sun, J. Sun, S. Zhang, and F.-L. Fan, “Attention-embedded quadratic network (qttention) for effective and interpretable bearing fault diagnosis,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–13, 2023.
  • [12] H. Peng, P. Xie, L. Liu, X. Kuang, Y. Wang, L. Qu, H. Gong, S. Jiang, A. Li, Z. Ruan, et al., “Morphological diversity of single neurons in molecularly defined cell types,” Nature, vol. 598, no. 7879, pp. 174–181, 2021.
  • [13] K. Hornik, M. Stinchcombe, and H. White, “Universal approximation of an unknown map** and its derivatives using multilayer feedforward networks,” Neural Networks, vol. 3, no. 5, pp. 551–560, 1990.
  • [14] M. Schmidt and H. Lipson, “Distilling free-form natural laws from experimental data,” Science, vol. 324, no. 5923, pp. 81–85, 2009.
  • [15] D. J. Bartlett, H. Desmond, and P. G. Ferreira, “Exhaustive symbolic regression,” IEEE Transactions on Evolutionary Computation, 2023.
  • [16] Y. Gorishniy, I. Rubachev, V. Khrulkov, and A. Babenko, “Revisiting deep learning models for tabular data,” Advances in Neural Information Processing Systems, vol. 34, pp. 18932–18943, 2021.
  • [17] S. Ostojic, “Two types of asynchronous activity in networks of excitatory and inhibitory spiking neurons,” Nature Neuroscience, vol. 17, no. 4, pp. 594–600, 2014.
  • [18] G. Liu and J. Wang, “Dendrite net: A white-box module for classification, regression, and system identification,” IEEE Transactions on Cybernetics, 2021.
  • [19] G. Zoumpourlis, A. Doumanoglou, N. Vretos, and P. Daras, “Non-linear convolution filters for cnn-based learning,” in ICCV, pp. 4761–4769, 2017.
  • [20] J. Bu and A. Karpatne, “Quadratic residual networks: A new class of neural networks for solving forward and inverse problems in physics involving pdes,” in Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), pp. 675–683, SIAM, 2021.
  • [21] Z. Xu, F. Yu, J. Xiong, and X. Chen, “Quadralib: A performant quadratic neural network library for architecture optimization and design exploration,” Proceedings of Machine Learning and Systems, vol. 4, pp. 503–514, 2022.
  • [22] F. Fan, W. Cong, and G. Wang, “A new type of neurons for machine learning,” International Journal for Numerical Methods in Biomedical Engineering, vol. 34, no. 2, p. e2920, 2018.
  • [23] F. Fan, H. Shan, M. K. Kalra, R. Singh, G. Qian, M. Getzin, Y. Teng, J. Hahn, and G. Wang, “Quadratic autoencoder (q-ae) for low-dose ct denoising,” IEEE Transactions on Medical Imaging, vol. 39, no. 6, pp. 2035–2050, 2019.
  • [24] M. U. Demirezen, “Quadratic residual multiplicative filter neural networks for efficient approximation of complex sensor signals,” IEEE Access, 2023.
  • [25] S. K. Karmaker, M. M. Hassan, M. J. Smith, L. Xu, C. Zhai, and K. Veeramachaneni, “Automl to date and beyond: Challenges and opportunities,” ACM Computing Surveys (CSUR), vol. 54, no. 8, pp. 1–36, 2021.
  • [26] S. Hassantabar, X. Dai, and N. K. Jha, “Curious: Efficient neural architecture search based on a performance predictor and evolutionary search,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 11, pp. 4975–4990, 2022.
  • [27] Y. Jaafra, J. L. Laurent, A. Deruyver, and M. S. Naceur, “Reinforcement learning for neural architecture search: A review,” Image and Vision Computing, vol. 89, pp. 57–66, 2019.
  • [28] E. Kiyani, K. Shukla, G. E. Karniadakis, and M. Karttunen, “A framework based on symbolic regression coupled with extended physics-informed neural networks for gray-box learning of equations of motion from data,” Computer Methods in Applied Mechanics and Engineering, vol. 415, p. 116258, 2023.
  • [29] P. Zeng, X. Song, A. Lensen, Y. Ou, Y. Sun, M. Zhang, and J. Lv, “Differentiable genetic programming for high-dimensional symbolic regression,” arXiv preprint arXiv:2304.08915, 2023.
  • [30] S. Kim and et al., “Integration of neural network-based symbolic regression in deep learning for scientific discovery,” IEEE transactions on neural networks and learning systems, vol. 32, no. 9, pp. 4166–4177, 2020.
  • [31] M. Zhang, S. Kim, P. Y. Lu, and M. Soljacic, “Deep learning and symbolic regression for discovering parametric equations,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
  • [32] J. Liu, W. Li, L. Yu, M. Wu, L. Sun, W. Li, and Y. Li, “Snr: Symbolic network-based rectifiable learning framework for symbolic regression,” Neural Networks, vol. 165, pp. 1021–1034, 2023.
  • [33] N. L. Cramer, “A representation for the adaptive generation of simple sequential programs,” in Proceedings of the First International Conference on Genetic Algorithms and Their Applications, pp. 183–187, Psychology Press, 2014.
  • [34] D. Yarotsky, “Error bounds for approximations with deep relu networks,” Neural Networks, vol. 94, pp. 103–114, 2017.
  • [35] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in KDD, pp. 785–794, 2016.
  • [36] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” Advances in neural information processing systems, vol. 30, 2017.
  • [37] J. T. Hancock and T. M. Khoshgoftaar, “Catboost for big data: an interdisciplinary review,” Journal of Big Data, vol. 7, no. 1, pp. 1–45, 2020.
  • [38] S. Ö. Arik and T. Pfister, “Tabnet: Attentive interpretable tabular learning,” in AAAI, vol. 35, pp. 6679–6687, 2021.
  • [39] X. Huang, A. Khetan, M. Cvitkovic, and Z. Karnin, “Tabtransformer: Tabular data modeling using contextual embeddings,” arXiv preprint arXiv:2012.06678, 2020.
  • [40] J. Chen, K. Liao, Y. Wan, D. Z. Chen, and J. Wu, “Danets: Deep abstract networks for tabular data classification and regression,” in AAAI, vol. 36, pp. 3930–3938, 2022.