\contourlength

1.4pt

Discovering Nuclear Models from Symbolic Machine Learning

Jose M. Munoz [email protected]    Silviu M. Udrescu [email protected]    Ronald F. Garcia Ruiz [email protected] Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
Abstract

Numerous phenomenological nuclear models have been proposed to describe specific observables within different regions of the nuclear chart. However, develo** a unified model that describes the complex behavior of all nuclei remains an open challenge. Here, we explore whether novel symbolic Machine Learning (ML) can rediscover traditional nuclear physics models or identify alternatives with improved simplicity, fidelity, and predictive power. To address this challenge, we developed a Multi-objective Iterated Symbolic Regression approach that handles symbolic regressions over multiple target observables, accounts for experimental uncertainties and is robust against high-dimensional problems. As a proof of principle, we applied this method to describe the nuclear binding energies and charge radii of light and medium mass nuclei. Our approach identified simple analytical relationships based on the number of protons and neutrons, providing interpretable models with precision comparable to state-of-the-art nuclear models. Additionally, we integrated this ML-discovered model with an existing complementary model to estimate the limits of nuclear stability. These results highlight the potential of symbolic ML to develop accurate nuclear models and guide our description of complex many-body problems.

preprint: APS/123-QED

Introduction

The atomic nucleus, a strongly correlated quantum many-body system, exhibits a vast array of phenomena that cannot be explained by a single, unified theory. This is mainly because the underlying theory of the nuclear force, quantum chromodynamics (QCD), is non-perturbative at low energies [1]. Despite the exciting progress that has been made in develo** a comprehensive characterization of the nuclear force and powerful many-body methods, a theoretical description of all the nuclei, with a direct link to QCD, remains a major unsolved challenge [2]. Consequently, physicists have developed a myriad of models, each designed to address specific aspects of nuclear structure and reactions. These models range from the simple liquid drop model to sophisticated many-body calculations, which are continuously refined to reproduce available nuclear data, such as binding energies, nuclear radii, electromagnetic moments, excitation energies, and decay properties [3, 4, 5, 6, 7, 8, 9]. Thus, one hopes that improved models can provide an interpretation of the observed phenomena and predict the behavior of nuclei at the extremes of the nuclear chart, where data is not yet available.

Recent advancements in Machine Learning (ML) have emerged as a pragmatic tool to tackle this challenge, with the expectation of fitting models that could reproduce the vast amount of available nuclear properties [10, 11, 12, 13, 14, 7, 15]. Nonetheless, this approach results in black boxes with obscure physical interpretability [16, 17, 18], and often its use is limited to specialized users who can have access to and run specific software. Moreover, conventional ML models rely on extensive experimental data or high-fidelity emulators, which may not always be available and are very prone to over-fitting, drastically limiting their extrapolation power [19, 20].

An alternative approach is given by symbolic ML, which aims to provide analytical expressions to describe a given observable [21]. Intriguingly, this methodology is remarkably similar to how nuclear physicists have traditionally tackled the problem of describing complex nuclear phenomena. Unlike traditional regression techniques that fit data to a pre-specified model, such as linear or polynomial functions, modern symbolic regression approaches do not assume any particular prior model. In contrast, it changes the existing functional forms in an evolutive manner [22, 23, 24]. Because of this, discovering complex equations is now conceivable, something that could not be accomplished by evaluating every possible mathematical expression by brute force [25, 26]. These methods have been widely successfully applied in scientific research, including Cosmology [27, 28], Astronomy [29, 30], Materials science [31], Chemistry [32], Dynamics [33], and Particle physics [34, 35]. However, applying existing methods to describe nuclear physics observables poses several challenges, as most of them have not been developed to solve multi-objective problems and/or do not consider experimental uncertainties.

In this work, we explore the natural question of whether symbolic ML approaches, with minimal human bias, can rediscover traditional nuclear physics models or propose alternative ones with similar or better simplicity, fidelity, and predictive power. To achieve these goals, we developed a Symbolic Regression technique that can systematically handle regressions over a set of targets, considers experimental uncertainties, and is robust to high-dimensional problems.

Our approach achieves three key results. First, we provide an interpretable model that discovers relatively simple analytical relationships between the number of nucleons and nuclear properties such as the charge radius and binding energy. Second, we use ML techniques to estimate these nuclear observables beyond our current experimental knowledge. Finally, an improved model, with quantifiable uncertainties, is used to provide predictions of the limits of stability of both neutron-deficient and neutron-rich nuclei.

Symbolic Regression for Nuclear Observables

We developed a Multi-objective Iterated Symbolic Regression (MISR) to iteratively search for analytical models that best describe a set of target variables up to arbitrary accuracy, leading to an expansion-like expression. We build on top of symbolic regression methods [36] to adopt multi-objective capabilities, which allows us to fit the description of a set of diverse target variables, considering the associated experimental uncertainty for each variable. A diagrammatic representation of the MISR approach is shown in Fig. 1. It is worth noting the MISR is in principle agnostic to the SR tool to be employed, as it only constitutes a block of the pipeline. As illustrated in the figure, the main variable, regressed via the standard symbolic regression, is then used to trace over the Pareto-optimal expressions for minimizing the error over the auxiliary variables in the several k𝑘kitalic_k random sub-samples of the data (known as k𝑘kitalic_k-folds). This process is then repeated using boosting to fit the residuals of the current analytical model until we reach a stop** criteria [37, 38]. Part of these advances were motivated by the application of multi-objective approaches for dynamic modeling [39].

Refer to caption
Figure 1: Diagrammatic representation of Multi-objective Iterated Symbolic Regression (MISR) inner pipeline. The process iteratively refines the symbolic regressed models. See the text for more details.

Unlike traditional regression techniques that offer a static view of feature relevance, MISR continuously re-evaluates in each boosting step the feature importance via both Boosted Decision trees and Mutual Information Regression [37, 40]. This allows each one of the obtained terms to represent a particular behavior within the modeled phenomena. This process not only enhances the predictive power but also streamlines the model by eliminating redundant or less significant features.

One of the critical challenges in ML approaches is the quantification of uncertainty in predictive modeling, especially given the often limited and low-precision experimental data that exists for certain nuclear properties. To achieve an estimation of uncertainty for the MISR model, we resample different training datasets using jackknife bootstrap** [41] to link each one of the model’s predictions to a probability distribution, from which we sample the posteriors. In addition, we include a truncation uncertainty, for a specific cutoff, given by the absolute value of the subsequent expansion term. This methodology was inspired by an analogy to how effective theories are developed [42].

This iterative refinement allows us to express the target data as a sum of the resultant n𝑛nitalic_n analytical forms f(i)(N,Z)superscript𝑓𝑖𝑁𝑍f^{(i)}(N,Z)italic_f start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_N , italic_Z ), and estimate the uncertainty of the model. The MISR can thus be expressed as

MISRn(N,Z)=i=1n𝒩(1,σi)f(i)(N,Z),MISR𝑛𝑁𝑍superscriptsubscript𝑖1𝑛𝒩1subscript𝜎𝑖superscript𝑓𝑖𝑁𝑍\text{MISR}n\ (N,Z)=\sum_{i=1}^{n}\mathcal{N}(1,\sigma_{i})\cdot f^{(i)}(N,Z),MISR italic_n ( italic_N , italic_Z ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT caligraphic_N ( 1 , italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ italic_f start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_N , italic_Z ) , (1)

where N𝑁Nitalic_N and Z𝑍Zitalic_Z are the input variables, representing the number of neutrons and protons in the nuclei, respectively. 𝒩(1,σi)𝒩1subscript𝜎𝑖\mathcal{N}(1,\sigma_{i})caligraphic_N ( 1 , italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is a Gaussian distribution numerically fitted after the training process using the discriminative jackknife technique [41].

We define the nuclear mass as A=N+Z𝐴𝑁𝑍A=N+Zitalic_A = italic_N + italic_Z, isospin asymmetry, I=(NZ)/A𝐼𝑁𝑍𝐴I=(N-Z)/Aitalic_I = ( italic_N - italic_Z ) / italic_A, and the Casten factor P=(NnNp)/(Nn+Np)𝑃subscript𝑁𝑛subscript𝑁𝑝subscript𝑁𝑛subscript𝑁𝑝P=(N_{n}N_{p})/(N_{n}+N_{p})italic_P = ( italic_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) / ( italic_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ), where Npsubscript𝑁𝑝N_{p}italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT and Nnsubscript𝑁𝑛N_{n}italic_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT represent the valence protons and neutrons filling pre-defined shell-model orbits, i.e. the difference between the number of nucleons and the closest magic number {8,20,28,50,82}820285082\{8,20,28,50,82\}{ 8 , 20 , 28 , 50 , 82 } [43]. The model was trained using the input variables {(N,Z,A,I,P,Nn,Np)}𝑁𝑍𝐴𝐼𝑃subscript𝑁𝑛subscript𝑁𝑝\{(N,Z,A,I,P,N_{n},N_{p})\}{ ( italic_N , italic_Z , italic_A , italic_I , italic_P , italic_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) }. More details of the methodology can be found in the section Methods I.1.

As a proof-of-principle of our approach, we focus on the study of nuclear binding energies and charge radii. However, our proposed method can be generally applied to describe other nuclear observables, with an arbitrary set of input variables. Binding energy, the energy required to disassemble a nucleus into its constituent protons and neutrons, provides important information to guide our understanding of atomic nuclei. This observable is arguably one of the most studied nuclear observables, with a relatively large collection of experimental data available. To train our algorithm, we used a subset of the experimental data of binding energies reported in the AME2020 mass evaluation dataset [44, 45, 46], and experimental charge radii reported in [47]. For the charge radii —characterizing the charge distributions inside the nuclei— there is a more scarce set of experimental measurements [48]. This is particularly challenging for light isotopes, which, combined with the large variations in their radii, motivated us to focus on nuclei with 12Z5012𝑍5012\leq Z\leq 5012 ≤ italic_Z ≤ 50, for which reliable experimental data exists. From these data sets, we sampled uniformly 20%percent2020\%20 % of the nuclei for testing purposes. The cutoff on light and medium mass nuclei allows us to study the extrapolation capabilities of our algorithm and provide a test of the model’s ability to deduce complex nuclear properties from relatively small data sets. This demonstrates the broader applicability of our MIRS method, which can describe physical observables where data scarcity is common.

Results for nuclear binding energies

The study of nuclear binding energies has led to the proposal of several analytical formulas. One of the simplest, the Liquid Drop Model (LDM) [49], represents the nucleus as a liquid drop to explain its global properties as a function of Z𝑍Zitalic_Z and A𝐴Aitalic_A:

BELDM=αvAαsA2/3αcZ2A1/3αa(A2Z)2A+δ,𝐵subscript𝐸𝐿𝐷𝑀subscript𝛼𝑣𝐴subscript𝛼𝑠superscript𝐴23subscript𝛼𝑐superscript𝑍2superscript𝐴13subscript𝛼𝑎superscript𝐴2𝑍2𝐴𝛿BE_{LDM}=\alpha_{v}A-\alpha_{s}A^{2/3}-\alpha_{c}\frac{Z^{2}}{A^{1/3}}-\alpha_% {a}\frac{(A-2Z)^{2}}{A}+\delta,italic_B italic_E start_POSTSUBSCRIPT italic_L italic_D italic_M end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT italic_A - italic_α start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT - italic_α start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT divide start_ARG italic_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_A start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT end_ARG - italic_α start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT divide start_ARG ( italic_A - 2 italic_Z ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_A end_ARG + italic_δ , (2)

where avsubscript𝑎𝑣a_{v}italic_a start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT, assubscript𝑎𝑠a_{s}italic_a start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, acsubscript𝑎𝑐a_{c}italic_a start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, and aasubscript𝑎𝑎a_{a}italic_a start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT are the volume, surface, Coulomb, and asymmetry coefficients, respectively, and δ(A,Z)𝛿𝐴𝑍\delta(A,Z)italic_δ ( italic_A , italic_Z ) represents the pairing term between neutrons and proton computed as

δ(A,Z)={+δ0A1/2if Z and N are even,0if Z or N is odd ,δ0A1/2if Z and N are odd,𝛿𝐴𝑍casessubscript𝛿0superscript𝐴12if Z and N are even0if Z or N is odd subscript𝛿0superscript𝐴12if Z and N are odd\delta(A,Z)=\begin{cases}+\delta_{0}A^{-1/2}&\text{if Z and N are even},\\ 0&\text{if Z or N is odd },\\ -\delta_{0}A^{-1/2}&\text{if Z and N are odd},\end{cases}italic_δ ( italic_A , italic_Z ) = { start_ROW start_CELL + italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT end_CELL start_CELL if Z and N are even , end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL if Z or N is odd , end_CELL end_ROW start_ROW start_CELL - italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT end_CELL start_CELL if Z and N are odd , end_CELL end_ROW (3)

where δ0subscript𝛿0\delta_{0}italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is another fitted constant [49]. More complex empirical models such as the Finite Range Droplet Model (FRDM) [50], and the Duflo-Zuker (DZ) [51], attempt to incorporate microscopic corrections to these macroscopic trends, refining the reproduction of nuclear binding energies across the chart of nuclides. The DZ model integrates both macroscopic and microscopic aspects of nuclear structure, featuring a wide array of terms to capture various models of nuclear interactions and nucleon configurations. The DZ model is not represented by a concise formula, but rather as a computational tool. This model is relatively complex and includes several terms to numerically account for nuclear shell effects, monopole Hamiltonian contributions, deformation energies, symmetry, and Coulomb energies.

State-of-the-art nuclear models, such as Density Functional Theory (DFT) [52] and ab initio calculations [53], follow a very different approach. These models aim to provide a microscopic description of nuclei, starting from inter-nucleon interactions and solving numerically the nuclear quantum many-body problem. DFT, in particular, has been very successful in describing the binding energy and radii across the nuclear chart [4, 54]. For ab initio methods, the simultaneous reproduction of both binding energy and nuclear charge radii remains an open challenge [6, 55].

To explore the multi-objective capabilities of our model, we apply the MISR algorithm to predict the binding energy, using as auxiliary observables the binding energy per nucleon, and the one and two nucleon separation energies across the same Z𝑍Zitalic_Z region, defined as Sn=BE(Z,N+1)BE(Z,N)subscript𝑆𝑛𝐵𝐸𝑍𝑁1𝐵𝐸𝑍𝑁S_{n}=BE(Z,N+1)-BE(Z,N)italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_B italic_E ( italic_Z , italic_N + 1 ) - italic_B italic_E ( italic_Z , italic_N ), S2n=BE(Z,N+2)BE(Z,N)subscript𝑆2𝑛𝐵𝐸𝑍𝑁2𝐵𝐸𝑍𝑁S_{2n}=BE(Z,N+2)-BE(Z,N)italic_S start_POSTSUBSCRIPT 2 italic_n end_POSTSUBSCRIPT = italic_B italic_E ( italic_Z , italic_N + 2 ) - italic_B italic_E ( italic_Z , italic_N ), Sp=BE(Z+1,N)BE(Z,N)subscript𝑆𝑝𝐵𝐸𝑍1𝑁𝐵𝐸𝑍𝑁S_{p}=BE(Z+1,N)-BE(Z,N)italic_S start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = italic_B italic_E ( italic_Z + 1 , italic_N ) - italic_B italic_E ( italic_Z , italic_N ), S2p=BE(Z+2,N)BE(Z,N)subscript𝑆2𝑝𝐵𝐸𝑍2𝑁𝐵𝐸𝑍𝑁S_{2p}=BE(Z+2,N)-BE(Z,N)italic_S start_POSTSUBSCRIPT 2 italic_p end_POSTSUBSCRIPT = italic_B italic_E ( italic_Z + 2 , italic_N ) - italic_B italic_E ( italic_Z , italic_N ).

At leading order, in the first iteration of MISR, we obtained a relatively simple expression:

BEMISR(1)=η0Z(1+1NaNZ2)[I(bA1/3NZ)+c]superscriptsubscriptBE𝑀𝐼𝑆𝑅1subscript𝜂0𝑍11𝑁𝑎𝑁superscript𝑍2delimited-[]𝐼𝑏superscript𝐴13𝑁𝑍𝑐\text{BE}_{MISR}^{(1)}=\eta_{0}Z\left(1+\frac{1}{N}-\frac{aN}{Z^{2}}\right)% \left[I\left(b\,-\frac{{A^{1/3}}N}{Z}\right)+c\right]BE start_POSTSUBSCRIPT italic_M italic_I italic_S italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT = italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_Z ( 1 + divide start_ARG 1 end_ARG start_ARG italic_N end_ARG - divide start_ARG italic_a italic_N end_ARG start_ARG italic_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) [ italic_I ( italic_b - divide start_ARG italic_A start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT italic_N end_ARG start_ARG italic_Z end_ARG ) + italic_c ] (4)

with a=1.10𝑎1.10a=-1.10italic_a = - 1.10, b=32.43𝑏32.43b=32.43italic_b = 32.43, c=16.70𝑐16.70c=16.70italic_c = 16.70, and η0=1.0subscript𝜂01.0\eta_{0}=1.0italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.0 MeV. Additional MISR iterations, up to n=10𝑛10n=10italic_n = 10, are presented in the section Methods, Appendix 4.

The differences between the experimental values and model predictions for various observables are shown in Fig. 2. These differences decrease as one includes more orders of the MISR expansion, illustrating how the regression converges without a sign of overfitting to the trained region. In principle, one can keep an arbitrary number of terms in the expansion. However, the Pareto Frontier hints it is optimal to keep up to 10 terms.

Refer to caption
Figure 2: Convergence of different observables fitting the MISR on the nuclear binding energies. The colored area illustrates the standard deviation of the residuals among the training set and the dashed line shows the mean over the test nuclei. The LDM results are illustrated as horizontal lines for reference.

As illustrated in Fig. 2, our MISR results for n=2𝑛2n=2italic_n = 2 exhibit a significant improvement in the prediction of nuclear binding energies with respect to the LDM. This is also true for the neutron and proton separation energies.

Refer to caption
Figure 3: a) Predicted binding energy per nucleon (BE/A)𝐵𝐸𝐴(BE/A)( italic_B italic_E / italic_A ) as a function of neutron number with associated model uncertainties denoted by the color bar. b) Absolute error on BE/Anorm𝐵𝐸𝐴\|BE/A\|∥ italic_B italic_E / italic_A ∥ predictions (Experiment - BEMISR𝐵subscript𝐸𝑀𝐼𝑆𝑅BE_{MISR}italic_B italic_E start_POSTSUBSCRIPT italic_M italic_I italic_S italic_R end_POSTSUBSCRIPT), showcasing the distribution of discrepancies for different neutron numbers. c) Residuals of the binding energy (Experiment - Model) obtained for our MISR10 results and the Duflo-Zucker models. Vertical dashed lines show the traditional nuclear magic numbers.

The results for the binding energy per nucleon, BE/A𝐵𝐸𝐴BE/Aitalic_B italic_E / italic_A, together with the distribution of their predicted uncertainties are shown in Fig. 3. The ability to provide uncertainty estimates allows for a more robust analysis of the predictive power of the model. This feature of MISR assesses where additional experimental data or refinement of the model might be necessary. The standard deviation σ(BE)𝜎𝐵𝐸\sigma(BE)italic_σ ( italic_B italic_E ) of the binding energy predictions, as indicated by the color gradient, tends to correlate with the residual distribution. We remark that the uncertainty is not uniform across the nuclear chart. Instead, the highest uncertainties coincide with the transitional regions between different nuclear shells. These regions are known for increased structural changes that impact nuclear stability and are more challenging to predict accurately. This shell structure effect is illustrated in Fig. 3 c). The MISR expansion is well-behaved around closed-shell structures but its accuracy decreases for open-shell nuclei. The contrary is true for the Duflo-Zuker model, with the discrepancy with respect to the experiment larger around neutron magic numbers.

In Table 1 we compare the performance of different nuclear models in terms of Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Note that these results represent only a subset of DFT calculations and do not include the latest DFT developments. MAE is calculated as the average absolute difference between the predicted and experimental values, while RMSE is derived from the square root of the average of the squared differences between the predicted and experimental binding energy values. For comparison, the table includes the results from DFT calculations using different density functionals such as DD-ME2 [56], NL3* [57], HFB24 [58], and UNEDF1 [59]. Remarkably, the MISR10 model shows competitive performance, particularly when considering the fact that it is built on a small number of input variables.

Table 1: Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) obtained for the BE𝐵𝐸BEitalic_B italic_E of nuclei with 12Z5012𝑍5012\leq Z\leq 5012 ≤ italic_Z ≤ 50. The results from MISR1 and MISR10 are compared with different nuclear models. See text for more details.
Model Ref. MAE [MeV] RMSE [MeV]
MISR1 - 5.11 6.17
MISR10 - 0.78 0.99
DD-ME2 [56] 2.48 2.83
NL3* [57] 1.95 2.45
Duflo-Zuker [51] 0.62 0.87
FRDM [50] 0.67 0.90
HFB24 [58] 0.55 0.73
UNEDF1 [59] 1.88 2.25
LDM [49] 3.56 5.08

Results for the nuclear charge radii

Now we shift our focus to the study of the nuclear charge radius, rcsubscript𝑟𝑐r_{c}italic_r start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT. This nuclear property is highly sensitive to the details of the inter-nucleon interactions, not yet fully understood. The reproduction of the magnitude of rcsubscript𝑟𝑐r_{c}italic_r start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is an unresolved challenge for ab initio nuclear theory [60].

At leading order, we obtain the first MISR expression to be:

rMISR(1)=A1/3(a+bZ)+c(PZN)eII,superscriptsubscript𝑟𝑀𝐼𝑆𝑅1superscript𝐴13𝑎𝑏𝑍𝑐𝑃𝑍𝑁superscript𝑒𝐼𝐼\displaystyle r_{MISR}^{(1)}=A^{1/3}\left(a+\frac{b}{Z}\right)+c\left(P-\frac{% Z}{N}\right)e^{I}-I,italic_r start_POSTSUBSCRIPT italic_M italic_I italic_S italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT = italic_A start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ( italic_a + divide start_ARG italic_b end_ARG start_ARG italic_Z end_ARG ) + italic_c ( italic_P - divide start_ARG italic_Z end_ARG start_ARG italic_N end_ARG ) italic_e start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT - italic_I , (5)

with a=0.95𝑎0.95a=0.95italic_a = 0.95 fm, b=1.48𝑏1.48b=1.48italic_b = 1.48 fm, c=0.017𝑐0.017c=0.017italic_c = 0.017 fm. Additional terms obtained for higher order corrections, up to n=10𝑛10n=10italic_n = 10, can be found in the section Methods, Appendix 5. The first term is similar to the well-known droplet model for the isospin symmetric case, rcA1/3similar-tosubscript𝑟𝑐superscript𝐴13r_{c}\sim A^{1/3}italic_r start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∼ italic_A start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT. Interestingly, the second term in this equation shows an exponential dependence on the isospin asymmetry, I𝐼Iitalic_I, which, to our knowledge, does not appear in other phenomenological models for the charge radii. Similar exponential terms have been proposed in nuclear mass models [61].

The difference between the experiment and our MISR results is shown in Fig. 4. The MISR expansion captures the overall trend of the charge radii. We do find a higher residual outlier in the distribution for Z=34,N=40formulae-sequence𝑍34𝑁40Z=34,N=40italic_Z = 34 , italic_N = 40 (74Se), which is mainly due to its relatively large experimental uncertainty. Similarly, as with the results obtained for binding energies, the residuals are larger between magic numbers.

Refer to caption
Figure 4: Charge radii differences between the experimental value and prediction obtained by the MISR10 model. The magnitude of these differences is shown with different colors as a function of the neutron and proton numbers.

The performance obtained on the training and testing sets are shown in Table 2. Our MISR results are contrasted with different DFT calculations in the same table. For comparison, we include an analytical model proposed by Nerlo-Pomorska with Casten modifications [62]:

rcNP=r0A1/3[1α1NZA+(α2A)1/3+α3PA+α4δA],superscriptsubscript𝑟𝑐𝑁𝑃subscript𝑟0superscript𝐴13delimited-[]1subscript𝛼1𝑁𝑍𝐴superscriptsubscript𝛼2𝐴13subscript𝛼3𝑃𝐴subscript𝛼4𝛿𝐴r_{c}^{NP}=r_{0}A^{1/3}\left[1-\alpha_{1}\frac{N-Z}{A}+\left(\frac{\alpha_{2}}% {A}\right)^{1/3}+\frac{\alpha_{3}P}{A}+\frac{\alpha_{4}\delta}{A}\right],italic_r start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N italic_P end_POSTSUPERSCRIPT = italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT [ 1 - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG italic_N - italic_Z end_ARG start_ARG italic_A end_ARG + ( divide start_ARG italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_A end_ARG ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_P end_ARG start_ARG italic_A end_ARG + divide start_ARG italic_α start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_δ end_ARG start_ARG italic_A end_ARG ] , (6)

where the constants αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are fitted to the experimental data.

Table 2: Mean Absolute Error (MAE) between experiment and model for the charge radii of nuclei with 12Z5012𝑍5012\leq Z\leq 5012 ≤ italic_Z ≤ 50. We label MISR1 as the first iteration’s result of the MISR. Our results are compared with different DFT calculations and the NP formula. See text for more details.
Model Ref. Train [fm] Test[fm]
MISR1 - 0.017 0.014
MISR10 - 0.009 0.009
DD-ME2 [56] 0.019 0.016
NL3* [57] 0.028 0.026
SKMS [63] 0.019 0.016
UNEDF1 [59] 0.026 0.020
NP [62] 0.023 0.018

Our MISR for the nuclear charge radii exhibits an overall good agreement with the experimental data, which is better than all models presented in Table 2. This is the case even at leading order, Eq. 5, with an expression of remarkable simplicity.

While Equation 5 is obtained by training only on light nuclei, it extrapolates well to heavy nuclei. This is illustrated in Fig. 5. This implies that our multi-objective optimization yields physical information valid across the entire nuclear chart. A similar extrapolation is observed for the binding energy predicted by Equation 4, which shows a smooth linear increase with the proton number. This suggests that adding a simple linear term in Z𝑍Zitalic_Z can provide a major improvement for heavy nuclei. Such a term would likely appear if large nuclei were included in the training process. We choose not to add this term manually, as our goal is to allow the ML algorithm to discover analytical expressions with minimal human intervention.

Refer to caption
Figure 5: Top: Distribution of the residual on the charge radii as a function of Z𝑍Zitalic_Z for the first term in MISR (Experiment - rMISR1subscript𝑟𝑀𝐼𝑆𝑅1r_{MISR1}italic_r start_POSTSUBSCRIPT italic_M italic_I italic_S italic_R 1 end_POSTSUBSCRIPT). Bottom: Same but for binding energy (Experiment - BEMISR1𝐵subscript𝐸𝑀𝐼𝑆𝑅1BE_{MISR1}italic_B italic_E start_POSTSUBSCRIPT italic_M italic_I italic_S italic_R 1 end_POSTSUBSCRIPT). MISR was trained only using the blue points.

In contrast to nuclear binding energies, the nuclear charge radii are known to exhibit more complex and distinct behavior across the nuclear chart. In what follows, we explore the performance of our MISR result in describing complex charge radii trends for selected isotopic chains with Z=18,20,22,25𝑍18202225Z=18,20,22,25italic_Z = 18 , 20 , 22 , 25. Our results are compared with the experimental values in Fig. 6.

Refer to caption
Figure 6: Charge radii trends of elements with Z=18,20,22,25𝑍18202225Z=18,20,22,25italic_Z = 18 , 20 , 22 , 25. Experimental results [64, 65, 66, 67] are contrasted with our MISR10 results.

Overall, the MISR prediction, including uncertainty estimation, provides a good description of the experimental trends. The model captures the expected kinks at magic numbers and part of the odd-even staggering effects, with large discrepancies found for Ca (Z=20𝑍20Z=20italic_Z = 20). These isotopes exhibit a unique trend, which is known to be challenging to describe by nuclear theory [65, 66].

A different, distinct charge radii trend is observed for Krypton (Kr) isotopes (Z=36𝑍36Z=36italic_Z = 36). The MISR results are compared with the experiment in Fig. 7. Here, the MISR model performs remarkably well, capturing all the main features: i. the kink at N=50𝑁50N=50italic_N = 50; ii. a parabolic trend for neutron-deficient isotopes; iii. staggering between isotopes with odd and even neutron numbers.

Refer to caption
Figure 7: Nuclear charge radii values for the Krypton isotopic chain. Experimental results [64] are compared with different nuclear models. The colored area represents the uncertainty of the MISR model.

As illustrated in Fig. 7, the analytical formula (NP) from [62] also performs relatively well for the particular case of Kr isotopes. The difference between the NP and MISR becomes significant for neutron-rich nuclei.

Limits of the nuclear landscape

Inspired by the fact that our MISR model appears to be highly complementary to the DZ model, i.e. MISR exhibits better performance around closed shells, while the DZ performs better for open-shell nuclei (see Fig. 3), we combined them as an ensemble model. Both models can be combined using a Bayesian Automatic Relevance Determination (ARD) regression [68], which can determine the weight of each model and provide a combined prediction with uncertainty. The obtained assembled model, labeled as ARD provides an MAE of 0.389 (0.411) MeV on the train (test) samples, yielding a relative improvement of around 20%percent2020\%20 % with respect to the best-performing HFB24 model [58], a comparison of different models is presented in Table 3.

Table 3: Mean Absolute Error (MAE) between experiment and model for binding energy and separation energies of nuclei with 12Z5012𝑍5012\leq Z\leq 5012 ≤ italic_Z ≤ 50 over the complete set of nuclei (testing and training). Our results are compared with different DFT calculations and phenomenological models. See text for more details.
Model Ref. BE𝐵𝐸BEitalic_B italic_E [MeV] Snsubscript𝑆𝑛S_{n}italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT[MeV] Spsubscript𝑆𝑝S_{p}italic_S start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT[MeV]
MISR10 - 0.79 0.46 0.96
ARD - 0.39 0.28 0.77
Duflo-Zuker [51] 0.62 0.30 0.81
FRDM [50] 0.67 0.34 0.84
HFB24 [58] 0.55 0.48 0.91
UNEDF1 [59] 1.88 0.41 0.82

This motivates us to use the ARD model to study open questions in nuclear science, such as the limits of the existence of nuclear matter. The ARD model provides a robust estimation of nucleon separation energies, which is critical for predicting the limits of stability of nuclei. Using our model uncertainties, the probability of having a bound nucleus can be estimated for any number of protons and neutrons. Proceeding with this approach, we compute the one and two nucleon separation energies and obtain the probability of having a positive central value via the Cumulative Density Probability of a Gaussian distribution [53]. The results for the limits of stability predicted by the ARD model for nuclei with 12Z5012𝑍5012\leq Z\leq 5012 ≤ italic_Z ≤ 50 are presented in Fig. 8. Where data are available, overall good agreement with the experiment was found.

Refer to caption
Figure 8: Predict limits of stability obtained with the ARD model. The colors represent the estimated probability of a bound nucleus for a particular combination of N𝑁Nitalic_N and Z𝑍Zitalic_Z. The green lines represent the threshold of having a bound nucleus, defined as Pbound0.5subscript𝑃𝑏𝑜𝑢𝑛𝑑0.5P_{bound}\geq 0.5italic_P start_POSTSUBSCRIPT italic_b italic_o italic_u italic_n italic_d end_POSTSUBSCRIPT ≥ 0.5. Experimental limits are shown with black symbols.

Conclusions and outlooks

We developed a Multi-objective Iterative Symbolic Regression framework, labeled as MISR, to enable the discovery of analytical models of nuclear properties. As a proof of principle, the MISR method was employed to describe the binding energies and charge radii of light and medium mass nuclei. Remarkably, simple expressions were found as functions of the neutron and proton numbers. The models found can provide a relatively good description of the available data, with precision comparable to that of state-of-the-art nuclear models.

The MISR model was combined with the well-known DZ model to enable a powerful, complementary description of binding energies and nucleon separation energies. The combined model, ARD, resulted in an overall agreement with the experiment. The uncertainty estimation provided by the ARD model was used to estimate the limits of stability of nuclei in the region 12Z5012𝑍5012\leq Z\leq 5012 ≤ italic_Z ≤ 50. These results highlight the potential of integrating physics-informed ML approaches with established complementary theoretical models to improve predictions of yet-to-be-explored regions of the nuclear chart.

By combining the predictive power of machine learning with the interpretability of symbolic expressions, MISR offers a promising avenue for advancing our understanding of complex nuclear phenomena and, hopefully, guiding the development of more accurate and insightful nuclear models.

Future work will focus on extending the application of MISR to describe complementary nuclear observables using different sets of input variables. Of particular interest would be the development of the MISR method to discover the analytical forms of nuclear interactions and, more generally, interaction Hamiltonians of quantum many-body systems. This could be combined with powerful many-body methods to establish a direct link between microscopic interactions and observables. Work is ongoing to generalize the MISR approach for this purpose, including vectors and tensor operators.

Acknowledgements

This work was supported by the Office of Nuclear Physics, U.S. Department of Energy, under grants DE-SC0021176 and DE-SC0021179. We are grateful for useful discussions and suggestions from S. Wilkins, J. Holt, A. Belley, A. Galindo, and K. Matchev for their helpful comments and insights.

References

  • Tews et al. [2020a] I. Tews, Z. Davoudi, A. Ekström, J. D. Holt,  and J. E. Lynn, Journal of Physics G: Nuclear and Particle Physics 47, 103001 (2020a).
  • Epelbaum et al. [2009] E. Epelbaum, H.-W. Hammer,  and U.-G. Meißner, Rev. Mod. Phys. 81, 1773 (2009).
  • Navratil et al. [2016] P. Navratil, S. Quaglioni, G. Hupin, C. Romero-Redondo,  and A. Calci, Physica Scripta 91, 053002 (2016).
  • Reinhard and Nazarewicz [2017] P.-G. Reinhard and W. Nazarewicz, Phys. Rev. C 95, 064328 (2017).
  • Sassarini et al. [2022] P. L. Sassarini, J. Dobaczewski, J. Bonnard,  and R. F. Garcia Ruiz, J. Phys. G 49, 11LT01 (2022)arXiv:2111.04675 [nucl-th] .
  • Tews et al. [2020b] I. Tews, Z. Davoudi, A. Ekström, J. D. Holt,  and J. E. Lynn, J. Phys. G 47, 103001 (2020b)arXiv:2001.03334 [nucl-th] .
  • Lovato et al. [2022] A. Lovato, C. Adams, G. Carleo,  and N. Rocco, Physical Review Research 4, 043178 (2022).
  • Vernon et al. [2022] A. R. Vernon et al.Nature 607, 260 (2022).
  • Karthein et al. [2023] J. Karthein et al., arXiv , arXiv:2310.15093 (2023).
  • Gao et al. [2021] Z.-P. Gao, Y.-J. Wang, H.-L. Lü, Q.-F. Li, C.-W. Shen,  and L. Liu, Nuclear Science and Techniques 32, 109 (2021).
  • Su et al. [2023] P. Su, W.-B. He,  and D.-Q. Fang, Symmetry 15, 1040 (2023).
  • Neufcourt et al. [2018a] L. Neufcourt, Y. Cao, W. Nazarewicz, F. Viens, et al., Physical Review C 98, 034318 (2018a).
  • Akkoyun and Yakhelef [2022] S. Akkoyun and A. Yakhelef, Phys. Rev. C 105, 044309 (2022)arXiv:2112.12562 [nucl-th] .
  • Mumpower et al. [2023] M. Mumpower, M. Li, T. M. Sprouse, B. S. Meyer, A. E. Lovell,  and A. T. Mohan, Frontiers in Physics 11, 1198572 (2023).
  • Neufcourt et al. [2018b] L. Neufcourt, Y. Cao, W. Nazarewicz,  and F. Viens, Phys. Rev. C 98, 034318 (2018b)arXiv:1806.00552 [nucl-th] .
  • Munoz et al. [2023] J. M. Munoz, S. Akkoyun, Z. P. Reyes,  and L. A. Pachon, Phys. Rev. C 107, 034308 (2023).
  • Nolte et al. [2023] N. Nolte, O. Kitouni, S. Trifinopoulos, S. Kantamneni,  and M. Williams, in 1st Workshop on the Synergy of Scientific and Machine Learning Modeling @ ICML (2023).
  • Mumpower et al. [2022] M. R. Mumpower, T. M. Sprouse, A. E. Lovell,  and A. T. Mohan, Phys. Rev. C 106, L021301 (2022).
  • Belkin et al. [2018] M. Belkin, D. J. Hsu,  and P. Mitra, Advances in neural information processing systems 31 (2018).
  • Jiang et al. [2019] W. G. Jiang, G. Hagen,  and T. Papenbrock, Phys. Rev. C 100, 054326 (2019).
  • Gerwin [1974] D. Gerwin, Behavioral Science 19, 314 (1974).
  • Schmidt and Lipson [2009] M. Schmidt and H. Lipson, Science 324, 81 (2009)https://www.science.org/doi/pdf/10.1126/science.1165893 .
  • Keren et al. [2023] L. S. Keren, A. Liberzon,  and T. Lazebnik, Scientific Reports 13, 1249 (2023).
  • Keijzer [2004] M. Keijzer, Genetic Programming and Evolvable Machines 5, 259 (2004).
  • Udrescu and Tegmark [2020] S.-M. Udrescu and M. Tegmark, Science Advances 6 (2020), 10.1126/sciadv.aay2631.
  • Udrescu et al. [2020] S.-M. Udrescu, A. Tan, J. Feng, O. Neto, T. Wu,  and M. Tegmark, Advances in Neural Information Processing Systems 33, 4860 (2020).
  • Villaescusa-Navarro et al. [2021] F. Villaescusa-Navarro et al. (CAMELS), Astrophys. J. 915, 71 (2021)arXiv:2010.00619 [astro-ph.CO] .
  • Tenachi et al. [2023] W. Tenachi, R. Ibata,  and F. I. Diakogiannis, The Astrophysical Journal 959, 99 (2023).
  • Matchev et al. [2022] K. T. Matchev, K. Matcheva,  and A. Roman, The Astrophysical Journal 930, 33 (2022).
  • Lemos et al. [2023] P. Lemos, N. Jeffrey, M. Cranmer, S. Ho,  and P. Battaglia, Machine Learning: Science and Technology 4, 045002 (2023).
  • Wang et al. [2019] Y. Wang, N. Wagner,  and J. M. Rondinelli, MRS Communications 9, 793 (2019).
  • Weng et al. [2020] B. Weng, Z. Song, R. Zhu, Q. Yan, Q. Sun, C. G. Grice, Y. Yan,  and W.-J. Yin, Nature Communications 11, 3513 (2020).
  • Derner et al. [2020] E. Derner, J. Kubalík, N. Ancona,  and R. Babuška, Applied Soft Computing 94, 106432 (2020).
  • Dong et al. [2023] Z. Dong, K. Kong, K. T. Matchev,  and K. Matcheva, Physical Review D 107, 055018 (2023).
  • Tsoi et al. [2024] H. F. Tsoi, V. Loncar, S. Dasu,  and P. Harris,  (2024), arXiv:2401.09949 [cs.LG] .
  • Cranmer et al. [2020] M. Cranmer, A. Sanchez Gonzalez, P. Battaglia, R. Xu, K. Cranmer, D. Spergel,  and S. Ho, Advances in Neural Information Processing Systems 33, 17429 (2020).
  • Schapire [2003] R. E. Schapire, Nonlinear estimation and classification , 149 (2003).
  • Ngatchou et al. [2005] P. Ngatchou, A. Zarei,  and A. El-Sharkawi, in Proceedings of the 13th international conference on, intelligent systems application to power systems (IEEE, 2005) pp. 84–91.
  • Kubalík et al. [2021] J. Kubalík, E. Derner,  and R. Babuška, Expert Systems with Applications 182, 115210 (2021).
  • Kozachenko and Leonenko [1987] L. F. Kozachenko and N. N. Leonenko, Problemy Peredachi Informatsii 23, 9 (1987).
  • Alaa and Van Der Schaar [2020] A. Alaa and M. Van Der Schaar, in International Conference on Machine Learning (PMLR, 2020) pp. 165–174.
  • Furnstahl et al. [2015] R. Furnstahl, N. Klco, D. Phillips,  and S. Wesolowski, Physical Review C 92, 024005 (2015).
  • Casten et al. [1987] R. F. Casten, D. S. Brenner,  and P. E. Haustein, Phys. Rev. Lett. 58, 658 (1987).
  • Wang et al. [2021] M. Wang, W. Huang, F. G. Kondev, G. Audi,  and S. Naimi, Chinese Physics C 45, 030003 (2021).
  • Buskirk et al. [2023] L. Buskirk, K. Godbey, P. Giuliani, W. Nazarewicz,  and Y. Yamauchi, in APS Meeting Abstracts, APS Meeting Abstracts (2023) p. M06.002.
  • [46] F. L. for Nuclear Sciences and N. S. C. Laboratory, “Dft mass tables,” https://massexplorer.frib.msu.edu/content/DFTMassTables.html, accessed: 2024-06-27.
  • Marinova and Angeli [2013] K. Marinova and I. Angeli, “Nuclear charge radii,” International Atomic Energy Agency (IAEA) (2013).
  • Yang et al. [2023] X. Yang, S. Wang, S. Wilkins,  and R. G. Ruiz, Progress in Particle and Nuclear Physics 129, 104005 (2023).
  • Benzaid et al. [2020] D. Benzaid, S. Bentridi, A. Kerraci,  and N. Amrani, Nuclear Science and Techniques 31, 9 (2020).
  • Möller et al. [2012] P. Möller, W. D. Myers, H. Sagawa,  and S. Yoshida, Physical Review Letters 108, 052501 (2012).
  • Mendoza-Temis et al. [2010] J. Mendoza-Temis, J. G. Hirsch,  and A. P. Zuker, Nucl. Phys. A 843, 14 (2010)arXiv:0912.0882 [nucl-th] .
  • Erler et al. [2012] J. Erler, N. Birge, M. Kortelainen, W. Nazarewicz, E. Olsen, A. M. Perhac,  and M. Stoitsov, Nature (London) 486, 509 (2012).
  • Stroberg et al. [2021] S. R. Stroberg, J. D. Holt, A. Schwenk,  and J. Simonis, Phys. Rev. Lett. 126, 022501 (2021).
  • Perera et al. [2021] U. C. Perera, A. V. Afanasjev,  and P. Ring, Phys. Rev. C 104, 064313 (2021)arXiv:2108.02245 [nucl-th] .
  • Malbrunot-Ettenauer and Kaufmann [2022] S. Malbrunot-Ettenauer and e. a. Kaufmann, Phys. Rev. Lett. 128, 022502 (2022).
  • Day Goodacre et al. [2021] T. Day Goodacre, A. V. Afanasjev,  and et. al., Phys. Rev. Lett. 126, 032502 (2021).
  • Agbemava et al. [2014] S. E. Agbemava, A. V. Afanasjev, D. Ray,  and P. Ring, Phys. Rev. C 89, 054320 (2014).
  • Goriely et al. [2016] S. Goriely, N. Chamel,  and J. Pearson, in Journal of Physics: Conference Series, Vol. 665 (2016) p. 012038.
  • Kortelainen et al. [2012] M. Kortelainen, J. McDonnell, W. Nazarewicz, P.-G. Reinhard, J. Sarich, N. Schunck, M. V. Stoitsov,  and S. M. Wild, Physical Review C 85 (2012), 10.1103/physrevc.85.024304.
  • Rossi [2021] M. Rossi, The Journal of Chemical Physics 154 (2021).
  • Seeger [1961] P. A. Seeger, Nuclear Physics 25, 1 (1961).
  • Sheng et al. [2015] Z. Sheng, G. Fan, J. Qian,  and J. Hu, The European Physical Journal A 51, 40 (2015).
  • Buskirk et al. [2023] L. Buskirk, K. Godbey, W. Nazarewicz,  and W. Satula, “Nucleonic shells and nuclear masses,”  (2023), arXiv:2309.16871 [nucl-th] .
  • Angeli and Marinova [2013] I. Angeli and K. Marinova, Atomic Data and Nuclear Data Tables 99, 69 (2013).
  • Garcia Ruiz et al. [2016] R. F. Garcia Ruiz, M. Bissell, K. Blaum, A. Ekström, N. Frömmgen, G. Hagen, M. Hammen, K. Hebeler, J. Holt, G. Jansen, et al., Nature Physics 12, 594 (2016).
  • Koszorus et al. [2021] A. Koszorus, X. F. Yang, W. G. Jiang, S. J. Novario, S. W. Bai, J. Billowes, C. L. Binnersley, M. L. Bissell, T. E. Cocolios, B. S. Cooper, R. P. de Groote, A. Ekstrom, K. T. Flanagan, C. Forssen, S. Franchoo, R. F. G. Ruiz, F. P. Gustafsson, G. Hagen, G. R. Jansen, A. Kanellakopoulos, M. Kortelainen, W. Nazarewicz, G. Neyens, T. Papenbrock, P.-G. Reinhard, C. M. Ricketts, B. K. Sahoo, A. R. Vernon,  and S. G. Wilkins, Nature Physics 17, 439 (2021).
  • Heylen et al. [2016] H. Heylen, C. Babcock, R. Beerwerth, J. Billowes, M. L. Bissell, K. Blaum, J. Bonnard, P. Campbell, B. Cheal, T. Day Goodacre, D. Fedorov, S. Fritzsche, R. F. Garcia Ruiz, W. Geithner, C. Geppert, W. Gins, L. K. Grob, M. Kowalska, K. Kreim, S. M. Lenzi, I. D. Moore, B. Maass, S. Malbrunot-Ettenauer, B. Marsh, R. Neugart, G. Neyens, W. Nörtershäuser, T. Otsuka, J. Papuga, R. Rossel, S. Rothe, R. Sánchez, Y. Tsunoda, C. Wraith, L. Xie, X. F. Yang,  and D. T. Yordanov, Phys. Rev. C 94, 054321 (2016).
  • MacKay et al. [1994] D. J. MacKay et al., ASHRAE transactions 100, 1053 (1994).
  • Wilstrup and Kasak [2021] C. Wilstrup and J. Kasak, arXiv preprint arXiv:2103.15147  (2021).
  • Gregorutti et al. [2017] B. Gregorutti, B. Michel,  and P. Saint-Pierre, Statistics and Computing 27, 659 (2017).
  • Peng et al. [2005] H. Peng, F. Long,  and C. Ding, IEEE Transactions on pattern analysis and machine intelligence 27, 1226 (2005).

I METHODS

I.1 Multi-objective Iterated Symbolic Regression

Symbolic Regression (SR) methods optimize both the parameters and structure of analytical models and have been increasingly applied across various fields such as physics, biology, and finance due to their interpretability and ability to produce simple mathematical expressions [36]. SR evolves a population of candidate solutions by crossover and elimination, based on fitness criteria to better fit the data. Moreover, it differs from traditional regression methods in that it does not assume a predefined model structure. Instead, it evolves the structure of the model itself, allowing for discoveries of novel relationships between variables, while requiring significantly less data to train [69]. In the following, we present Multi-objective Iterated Symbolic Regression (MISR), a novel framework for performing SR.

Overview of the Core Algorithm

The core algorithm of MISR is rooted in the principles of SR but extends beyond traditional methodologies to address complex challenges in physics. It does so by leveraging a framework that iteratively refines feature selection, equation generation, and model optimization. This iterative process aims to construct a comprehensive model by aggregating simpler sub-models, each capturing unique aspects of the underlying physical processes.

1
Input: Dataset D𝐷Ditalic_D with features X𝑋Xitalic_X and target variable Y𝑌Yitalic_Y.
Data: Hyperparameters: Number of folds k𝑘kitalic_k, number of terms in symbolic regression ntsubscript𝑛𝑡n_{t}italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, feature subset size s𝑠sitalic_s, maximum iterations maxiter𝑚𝑎subscript𝑥𝑖𝑡𝑒𝑟max_{iter}italic_m italic_a italic_x start_POSTSUBSCRIPT italic_i italic_t italic_e italic_r end_POSTSUBSCRIPT, improvement ratio threshold θ𝜃\thetaitalic_θ.
2
3Feature Importance Assessment:
4 Use Random Forest Regressor and Mutual Information regression on training data to evaluate feature importances.
5 Iterative Model Building:
6 while iteration maxiterabsent𝑚𝑎subscript𝑥𝑖𝑡𝑒𝑟\leq max_{iter}≤ italic_m italic_a italic_x start_POSTSUBSCRIPT italic_i italic_t italic_e italic_r end_POSTSUBSCRIPT and improvement ratio θabsent𝜃\geq\theta≥ italic_θ do
7       Perform k-fold cross-validation on the training set.
8       for each fold do
9             Select a subset of features Xsubsubscript𝑋𝑠𝑢𝑏X_{sub}italic_X start_POSTSUBSCRIPT italic_s italic_u italic_b end_POSTSUBSCRIPT of size s𝑠sitalic_s based on importance scores.
10             Conduct SR with ntsubscript𝑛𝑡n_{t}italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT terms on Xsubsubscript𝑋𝑠𝑢𝑏X_{sub}italic_X start_POSTSUBSCRIPT italic_s italic_u italic_b end_POSTSUBSCRIPT to predict Y𝑌Yitalic_Y.
11             Store top-performing equations based on the multi-objective evaluation.
12            
13      Select the best-performing equation across all folds.
14       Update Y𝑌Yitalic_Y to be the residuals of the current best model.
15       Increment iteration.
16      
17Hyperparameter optimization:
18 Utilize the validation dataset to fine-tune k𝑘kitalic_k, ntsubscript𝑛𝑡n_{t}italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, s𝑠sitalic_s, maxiter𝑚𝑎subscript𝑥𝑖𝑡𝑒𝑟max_{iter}italic_m italic_a italic_x start_POSTSUBSCRIPT italic_i italic_t italic_e italic_r end_POSTSUBSCRIPT, and θ𝜃\thetaitalic_θ.
19 Model Finalization:
Aggregate the equations from each iteration to form the final model.
Algorithm 1 MISR

A key aspect of our methodology is the dynamic assessment of feature importance in each step. For this, we fit a Boosted Decision Tree’s feature importances averaged with the Mutual Information score of each input feature [70, 71]. Each iteration involves k-fold cross-validation, where we downsample the feature space based on its importance.

SR framework

Within each fold and iteration, we perform a symbolic regression constrained by the number of terms—a significant hyperparameter influencing equation simplicity. Each expression acts as an individual within the evolutionary algorithm, its fitness gauged via the mean error description length (MEDL) based on optimized parameters [25, 26]. These regressions are executed in parallel across folds for computational efficiency. The top-performing equations are identified based on their performance score on various objective variables which are related by our physical models, with experimental uncertainties inversely weighing the evaluation for robustness.

Iterative refinement and residual modeling

Our iterative approach refines the model’s target variable to be the residuals from the best-performing model of the preceding iteration by repeating the fitting process in the residuals. The iterative process continues until reaching a predefined maximum number of steps or when the improvement ratio diminishes below a certain threshold.

Estimation of weighting distribution

After we have modeled all of the expanding terms in our expression, we estimate the errors via the technique of Discriminative Jackknife [41] on the test set. Then the inference uncertainty is obtained via resampling from the distribution of the weights modeled as Gaussian distributions and adding the subsequent term value in the expansion cutoff. Fig. 9 illustrates the distribution of sampled weights for charge radii’s regressor. The figure demonstrates how the Jackknife resampling allows the assignment of characteristic Gaussian distributions to each one of the terms in the MISR expansion.

Refer to caption
Figure 9: Distribution of sampled weights for charge radii’s regressor after performing bootstrap** on Discriminative Jackknife. See the text for more details.

The total related uncertainty is computed by approximating all of the sampling weights to a Gaussian distribution and bootstrap** to one σ𝜎\sigmaitalic_σ since we assume the independence in each distribution. Then, we orthonormally add the uncertainties with the approximation coming from the cutoff in the expansion (i.e. the next expansion term).

Nucleon separation energies

Fig. 10 presents the trend of neutron separation energies for different neutron-rich isotopes of Argonne (Ar), Calcium (Ca), Titanium (Ti), and Manganese (Mn). Experimental results are compared with the ARD model

Refer to caption
Figure 10: Experimental results for the neutron separation energy, Snsubscript𝑆𝑛S_{n}italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, are compared with the results from the ARD model for neutron-rich isotopes of Argonne (Ar), Calcium (Ca), Titanium (Ti), and Manganese (Mn). The uncertainties of the ARD model are shown as shaded areas.

As a complementary example, the proton separation energies of Krypton isotopes are shown in Fig. 11.

Refer to caption
Figure 11: Proton separation energy of Krypton isotopes (Z=36). Experimental results are compared with our MISR and ARDR results. Results from the DZ model and DFT calculations are included for comparison.

I.2 MISR for Binding Energy

In table 4 we present the obtained BEMISR𝐵subscript𝐸𝑀𝐼𝑆𝑅BE_{MISR}italic_B italic_E start_POSTSUBSCRIPT italic_M italic_I italic_S italic_R end_POSTSUBSCRIPT models term by term. We label the functions mod2(Z)γZ\mod_{2}{(Z)}\equiv\gamma_{Z}roman_mod start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z ) ≡ italic_γ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and mod2(N)γN\mod_{2}{(N)}\equiv\gamma_{N}roman_mod start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_N ) ≡ italic_γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT respectively. In addition, we denote δγZγN𝛿subscript𝛾𝑍subscript𝛾𝑁\delta\equiv\gamma_{Z}-\gamma_{N}italic_δ ≡ italic_γ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT as is done traditionally in the literature.

Model Order Obtained Function
1 (ZN+Z1.06NZ)(I(32.4A3NZ)+16.7)𝑍𝑁𝑍1.06𝑁𝑍𝐼32.43𝐴𝑁𝑍16.7\left(\frac{Z}{N}+Z-\frac{1.06N}{Z}\right)\left(I\left(32.4-\frac{\sqrt[3]{A}N% }{Z}\right)+16.7\right)( divide start_ARG italic_Z end_ARG start_ARG italic_N end_ARG + italic_Z - divide start_ARG 1.06 italic_N end_ARG start_ARG italic_Z end_ARG ) ( italic_I ( 32.4 - divide start_ARG nth-root start_ARG 3 end_ARG start_ARG italic_A end_ARG italic_N end_ARG start_ARG italic_Z end_ARG ) + 16.7 )
2 3.42(Z14.6)(A32.19I4.38)(I0.110log(A))+δP+0.3013.42𝑍14.63𝐴2.19𝐼4.38𝐼0.110𝐴𝛿𝑃0.3013.42(Z-14.6)\left(\sqrt[3]{A}-2.19I-4.38\right)(I-0.110\log(A))+\delta-P+0.3013.42 ( italic_Z - 14.6 ) ( nth-root start_ARG 3 end_ARG start_ARG italic_A end_ARG - 2.19 italic_I - 4.38 ) ( italic_I - 0.110 roman_log ( italic_A ) ) + italic_δ - italic_P + 0.301
3 2.02e0.40γZγNP(0.040Z)Z+2.990.867(NZ)20.426P(log(Z)3.30)+I2.02superscript𝑒0.40subscript𝛾𝑍subscript𝛾𝑁𝑃superscript0.040𝑍𝑍2.99superscript0.867superscript𝑁𝑍20.426𝑃𝑍3.30𝐼-2.02e^{-0.40\gamma_{Z}\gamma_{N}P-(0.040Z)^{Z}}+2.99\cdot 0.867^{(N-Z)^{2}}-0% .426P(\log(Z)-3.30)+I- 2.02 italic_e start_POSTSUPERSCRIPT - 0.40 italic_γ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_P - ( 0.040 italic_Z ) start_POSTSUPERSCRIPT italic_Z end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT + 2.99 ⋅ 0.867 start_POSTSUPERSCRIPT ( italic_N - italic_Z ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT - 0.426 italic_P ( roman_log ( italic_Z ) - 3.30 ) + italic_I
4 A2/3eA2/3+Z1.10ZIlog(ZN)+0.634eA2/3+A3N+0.290γNγZ+0.246superscript𝐴23superscript𝑒superscript𝐴23superscript𝑍1.10𝑍𝐼𝑍𝑁0.634superscript𝑒superscript𝐴233𝐴𝑁0.290subscript𝛾𝑁subscript𝛾𝑍0.246A^{2/3}e^{-A^{2/3}+Z^{1.10}-Z}I\log\left(\frac{Z}{N}\right)+0.634e^{A^{2/3}+% \sqrt[3]{A}-N}+0.290\gamma_{N}\gamma_{Z}+0.246italic_A start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_A start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_Z start_POSTSUPERSCRIPT 1.10 end_POSTSUPERSCRIPT - italic_Z end_POSTSUPERSCRIPT italic_I roman_log ( divide start_ARG italic_Z end_ARG start_ARG italic_N end_ARG ) + 0.634 italic_e start_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + nth-root start_ARG 3 end_ARG start_ARG italic_A end_ARG - italic_N end_POSTSUPERSCRIPT + 0.290 italic_γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT + 0.246
5 (0.0000154)PA2P3(P(NZ)2+N)(0.0000154(N1)2+P)superscript0.0000154𝑃superscript𝐴2𝑃3𝑃superscript𝑁𝑍2𝑁0.0000154superscript𝑁12𝑃(0.0000154)^{P}A^{\frac{2P}{3}}\left(P(N-Z)^{2}+N\right)\left(0.0000154(N-1)^{% 2}+P\right)( 0.0000154 ) start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT divide start_ARG 2 italic_P end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT ( italic_P ( italic_N - italic_Z ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_N ) ( 0.0000154 ( italic_N - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_P )
6 γN(1γZ)Nexp((A3ZN1.21)(2(P+0.108)(PA3PN)γZ(1γN)ZN0.426))subscript𝛾𝑁1subscript𝛾𝑍𝑁3𝐴𝑍𝑁1.212𝑃0.108superscript𝑃3𝐴superscript𝑃𝑁subscript𝛾𝑍1subscript𝛾𝑁𝑍𝑁0.426\frac{\gamma_{N}(1-\gamma_{Z})}{N}-\exp\left(\left(\sqrt[3]{A}-\frac{Z}{N}-1.2% 1\right)\left(2(P+0.108)\left(P^{\sqrt[3]{A}}-P^{N}\right)-\gamma_{Z}(1-\gamma% _{N})-\frac{Z}{N}-0.426\right)\right)divide start_ARG italic_γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( 1 - italic_γ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) end_ARG start_ARG italic_N end_ARG - roman_exp ( ( nth-root start_ARG 3 end_ARG start_ARG italic_A end_ARG - divide start_ARG italic_Z end_ARG start_ARG italic_N end_ARG - 1.21 ) ( 2 ( italic_P + 0.108 ) ( italic_P start_POSTSUPERSCRIPT nth-root start_ARG 3 end_ARG start_ARG italic_A end_ARG end_POSTSUPERSCRIPT - italic_P start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ) - italic_γ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ( 1 - italic_γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) - divide start_ARG italic_Z end_ARG start_ARG italic_N end_ARG - 0.426 ) )
7 1.35I((0.324I)(ZN1.78)(4.30NZ0.111(A+eP))P+1.35I)1.35𝐼0.324𝐼𝑍𝑁1.784.30𝑁𝑍0.111𝐴superscript𝑒𝑃𝑃1.35𝐼1.35I\left((0.324-I)\left(-\frac{Z}{N}-1.78\right)\left(\frac{4.30N}{Z}-0.111% \left(A+e^{P}\right)\right)-P+1.35I\right)1.35 italic_I ( ( 0.324 - italic_I ) ( - divide start_ARG italic_Z end_ARG start_ARG italic_N end_ARG - 1.78 ) ( divide start_ARG 4.30 italic_N end_ARG start_ARG italic_Z end_ARG - 0.111 ( italic_A + italic_e start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ) ) - italic_P + 1.35 italic_I )
8 (0.801γN(1γZ)+0.570P2I)(0.112+(A(NZ)2+ANZ)0.801N)superscript0.801subscript𝛾𝑁1subscript𝛾𝑍0.570𝑃2𝐼0.112𝐴superscript𝑁𝑍2𝐴𝑁𝑍superscript0.801𝑁(-0.801^{\gamma_{N}(1-\gamma_{Z})}+0.570P-2I)(-0.112+(A-(N-Z)^{2}+\frac{AN}{Z}% )0.801^{N})( - 0.801 start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( 1 - italic_γ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT + 0.570 italic_P - 2 italic_I ) ( - 0.112 + ( italic_A - ( italic_N - italic_Z ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_A italic_N end_ARG start_ARG italic_Z end_ARG ) 0.801 start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT )
9 9.2010231330A13(1.97+γN(1+γZ)γZ+P)(1330+N22NZ+Z2)(670+N22NZ+Z2)9.20superscript1023superscript1330superscript𝐴131.97subscript𝛾𝑁1subscript𝛾𝑍subscript𝛾𝑍𝑃1330superscript𝑁22𝑁𝑍superscript𝑍2670superscript𝑁22𝑁𝑍superscript𝑍29.20\cdot 10^{-23}\cdot 1330^{A^{\frac{1}{3}}}(-1.97+\gamma_{N}(-1+\gamma_{Z})% -\gamma_{Z}+P)(-1330+N^{2}-2NZ+Z^{2})(-670+N^{2}-2NZ+Z^{2})9.20 ⋅ 10 start_POSTSUPERSCRIPT - 23 end_POSTSUPERSCRIPT ⋅ 1330 start_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( - 1.97 + italic_γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( - 1 + italic_γ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) - italic_γ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT + italic_P ) ( - 1330 + italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_N italic_Z + italic_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ( - 670 + italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_N italic_Z + italic_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
10 3.02exp(1.91P1.94(ZN)A2/30.895e0.227NN20.0268(N1)2)3.021.91superscript𝑃1.94superscript𝑍𝑁superscript𝐴230.895superscript𝑒0.227𝑁superscript𝑁20.0268superscript𝑁123.02\exp\left(-1.91P^{1.94}\left(\frac{Z}{N}\right)^{A^{2/3}}-0.895e^{-0.227N}% N^{2}-0.0268(N-1)^{2}\right)3.02 roman_exp ( - 1.91 italic_P start_POSTSUPERSCRIPT 1.94 end_POSTSUPERSCRIPT ( divide start_ARG italic_Z end_ARG start_ARG italic_N end_ARG ) start_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT - 0.895 italic_e start_POSTSUPERSCRIPT - 0.227 italic_N end_POSTSUPERSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 0.0268 ( italic_N - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
Table 4: Analytical expression for the binding energy, BEMISR𝐵subscript𝐸𝑀𝐼𝑆𝑅BE_{MISR}italic_B italic_E start_POSTSUBSCRIPT italic_M italic_I italic_S italic_R end_POSTSUBSCRIPT, discovered by the MISR model for the first ten iterations. Each row represents a distinct iteration of the MISR process, showing the mathematical expression derived for each one. The expressions detail how the nuclear binding energy (BE) is modeled as a function of nuclear properties such as neutron number (N), proton number (Z), atomic mass (A𝐴Aitalic_A), isospin asymmetry (I𝐼Iitalic_I), and the Casten factor (P𝑃Pitalic_P).

I.3 MISR for Charge Radii

Table 5 presents the obtained rcMISR𝑟subscript𝑐𝑀𝐼𝑆𝑅rc_{MISR}italic_r italic_c start_POSTSUBSCRIPT italic_M italic_I italic_S italic_R end_POSTSUBSCRIPT models term by term. We label the functions mod2(Z)γZ\mod_{2}{(Z)}\equiv\gamma_{Z}roman_mod start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z ) ≡ italic_γ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and mod2(N)γN\mod_{2}{(N)}\equiv\gamma_{N}roman_mod start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_N ) ≡ italic_γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT respectively. In addition, we denote δγZγN𝛿subscript𝛾𝑍subscript𝛾𝑁\delta\equiv\gamma_{Z}-\gamma_{N}italic_δ ≡ italic_γ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT as is done traditionally in the literature.

Model Order Obtained Function
1 A1/3(0.950+1.48Z)+0.0170(PZN)eIIsuperscript𝐴130.9501.48𝑍0.0170𝑃𝑍𝑁superscript𝑒𝐼𝐼A^{1/3}\left(0.950+\frac{1.48}{Z}\right)+0.0170\left(P-\frac{Z}{N}\right)e^{I}-Iitalic_A start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ( 0.950 + divide start_ARG 1.48 end_ARG start_ARG italic_Z end_ARG ) + 0.0170 ( italic_P - divide start_ARG italic_Z end_ARG start_ARG italic_N end_ARG ) italic_e start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT - italic_I
2 0.00400((26.2P2)(A+26.2(P22.35NZ))Z2+δ1.70)0.0040026.2superscript𝑃2𝐴26.2superscript𝑃22.35𝑁𝑍superscript𝑍2𝛿1.700.00400\left(-\frac{\left(26.2-P^{2}\right)\left(A+26.2\left(P^{2}-\frac{2.35N% }{Z}\right)\right)}{Z^{2}}+\delta-1.70\right)0.00400 ( - divide start_ARG ( 26.2 - italic_P start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ( italic_A + 26.2 ( italic_P start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 2.35 italic_N end_ARG start_ARG italic_Z end_ARG ) ) end_ARG start_ARG italic_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_δ - 1.70 )
3 0.0000762(Nn1.81Np)(Nn+Np3.26)0.0000762subscript𝑁𝑛1.81subscript𝑁𝑝subscript𝑁𝑛subscript𝑁𝑝3.260.0000762(N_{n}-1.81N_{p})(N_{n}+N_{p}-3.26)0.0000762 ( italic_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - 1.81 italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) ( italic_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT - 3.26 )
4 0.00105((ZN)NP+1.61)0.00105superscript𝑍𝑁𝑁𝑃1.61-0.00105\left(\left(\frac{Z}{N}\right)^{N}-P+1.61\right)- 0.00105 ( ( divide start_ARG italic_Z end_ARG start_ARG italic_N end_ARG ) start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT - italic_P + 1.61 )
5 0.00200P2(Np0.463)(P0.486)0.00200superscript𝑃2subscript𝑁𝑝0.463𝑃0.486\frac{0.00200P^{2}}{(N_{p}-0.463)(P-0.486)}divide start_ARG 0.00200 italic_P start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT - 0.463 ) ( italic_P - 0.486 ) end_ARG
6 119200ZN18100119200𝑍𝑁18100\frac{1}{\frac{19200Z}{N}-18100}divide start_ARG 1 end_ARG start_ARG divide start_ARG 19200 italic_Z end_ARG start_ARG italic_N end_ARG - 18100 end_ARG
7 0.000588N(A57.4)(Nn+0.132)0.000588𝑁𝐴57.4subscript𝑁𝑛0.132\frac{0.000588N}{(A-57.4)(N_{n}+0.132)}divide start_ARG 0.000588 italic_N end_ARG start_ARG ( italic_A - 57.4 ) ( italic_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + 0.132 ) end_ARG
8 78600.000118N(γN(1γZ)+0.508)(1Nn0.000220)N7860superscript0.000118𝑁subscript𝛾𝑁1subscript𝛾𝑍0.508superscript1subscript𝑁𝑛0.000220𝑁7860\cdot 0.000118^{N}(\gamma_{N}(1-\gamma_{Z})+0.508)\left(\frac{1}{N_{n}-0.0% 00220}\right)^{N}7860 ⋅ 0.000118 start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( 1 - italic_γ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) + 0.508 ) ( divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - 0.000220 end_ARG ) start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT
9 1.64(1γN)(1γZ)(0.0195NpZ0.597)N1.641subscript𝛾𝑁1subscript𝛾𝑍superscript0.0195𝑁𝑝𝑍0.597𝑁1.64(1-\gamma_{N})(1-\gamma_{Z})\left(-\frac{0.0195Np}{Z}-0.597\right)^{N}1.64 ( 1 - italic_γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) ( 1 - italic_γ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) ( - divide start_ARG 0.0195 italic_N italic_p end_ARG start_ARG italic_Z end_ARG - 0.597 ) start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT
10 NZ(0.00520Nn(γN(γZ1)γZ+2)A0.000400)𝑁𝑍0.00520superscriptsubscript𝑁𝑛subscript𝛾𝑁subscript𝛾𝑍1subscript𝛾𝑍2𝐴0.000400\frac{N}{Z}\left(\frac{0.00520N_{n}^{(\gamma_{N}(\gamma_{Z}-1)-\gamma_{Z}+2)}}% {A}-0.000400\right)divide start_ARG italic_N end_ARG start_ARG italic_Z end_ARG ( divide start_ARG 0.00520 italic_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT - 1 ) - italic_γ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT + 2 ) end_POSTSUPERSCRIPT end_ARG start_ARG italic_A end_ARG - 0.000400 )
Table 5: Analytical expression for the charge radius, rCsubscript𝑟𝐶r_{C}italic_r start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT, discovered by the MISR model for the first ten iterations. It includes coefficients and terms involving atomic mass (A𝐴Aitalic_A), proton number (Z), neutron number (N), the Casten factor (P𝑃Pitalic_P), valence protons (Npsubscript𝑁𝑝N_{p}italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT), and valence neutrons (Nnsubscript𝑁𝑛N_{n}italic_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT). The expressions are designed to incrementally capture the details of the nuclear charge radius, reflecting the refinement of the model through each iteration.