Interpretable Symbolic Regression for Data Science: Analysis of the 2022 Competition
Authors:
F. O. de Franca,
M. Virgolin,
M. Kommenda,
M. S. Majumder,
M. Cranmer,
G. Espada,
L. Ingelse,
A. Fonseca,
M. Landajuela,
B. Petersen,
R. Glatt,
N. Mundhenk,
C. S. Lee,
J. D. Hochhalter,
D. L. Randall,
P. Kamienny,
H. Zhang,
G. Dick,
A. Simon,
B. Burlacu,
Jaan Kasak,
Meera Machado,
Casper Wilstrup,
W. G. La Cava
Abstract:
Symbolic regression searches for analytic expressions that accurately describe studied phenomena. The main attraction of this approach is that it returns an interpretable model that can be insightful to users. Historically, the majority of algorithms for symbolic regression have been based on evolutionary algorithms. However, there has been a recent surge of new proposals that instead utilize appr…
▽ More
Symbolic regression searches for analytic expressions that accurately describe studied phenomena. The main attraction of this approach is that it returns an interpretable model that can be insightful to users. Historically, the majority of algorithms for symbolic regression have been based on evolutionary algorithms. However, there has been a recent surge of new proposals that instead utilize approaches such as enumeration algorithms, mixed linear integer programming, neural networks, and Bayesian optimization. In order to assess how well these new approaches behave on a set of common challenges often faced in real-world data, we hosted a competition at the 2022 Genetic and Evolutionary Computation Conference consisting of different synthetic and real-world datasets which were blind to entrants. For the real-world track, we assessed interpretability in a realistic way by using a domain expert to judge the trustworthiness of candidate models.We present an in-depth analysis of the results obtained in this competition, discuss current challenges of symbolic regression algorithms and highlight possible improvements for future competitions.
△ Less
Submitted 3 July, 2023; v1 submitted 3 April, 2023;
originally announced April 2023.
Initialisation and Grammar Design in Grammar-Guided Evolutionary Computation
Authors:
Grant Dick,
Peter A. Whigham
Abstract:
Grammars provide a convenient and powerful mechanism to define the space of possible solutions for a range of problems. However, when used in grammatical evolution (GE), great care must be taken in the design of a grammar to ensure that the polymorphic nature of the genotype-to-phenotype map** does not impede search. Additionally, recent work has highlighted the importance of the initialisation…
▽ More
Grammars provide a convenient and powerful mechanism to define the space of possible solutions for a range of problems. However, when used in grammatical evolution (GE), great care must be taken in the design of a grammar to ensure that the polymorphic nature of the genotype-to-phenotype map** does not impede search. Additionally, recent work has highlighted the importance of the initialisation method on GE's performance. While recent work has shed light on the matters of initialisation and grammar design with respect to GE, their impact on other methods, such as random search and context-free grammar genetic programming (CFG-GP), is largely unknown. This paper examines GE, random search and CFG-GP under a range of benchmark problems using several different initialisation routines and grammar designs. The results suggest that CFG-GP is less sensitive to initialisation and grammar design than both GE and random search: we also demonstrate that observed cases of poor performance by CFG-GP are managed through simple adjustment of tuning parameters. We conclude that CFG-GP is a strong base from which to conduct grammar-guided evolutionary search, and that future work should focus on understanding the parameter space of CFG-GP for better application.
△ Less
Submitted 15 April, 2022;
originally announced April 2022.
Interval Arithmetic and Interval-Aware Operators for Genetic Programming
Authors:
Grant Dick
Abstract:
Symbolic regression via genetic programming is a flexible approach to machine learning that does not require up-front specification of model structure. However, traditional approaches to symbolic regression require the use of protected operators, which can lead to perverse model characteristics and poor generalisation. In this paper, we revisit interval arithmetic as one possible solution to allow…
▽ More
Symbolic regression via genetic programming is a flexible approach to machine learning that does not require up-front specification of model structure. However, traditional approaches to symbolic regression require the use of protected operators, which can lead to perverse model characteristics and poor generalisation. In this paper, we revisit interval arithmetic as one possible solution to allow genetic programming to perform regression using unprotected operators. Using standard benchmarks, we show that using interval arithmetic within model evaluation does not prevent invalid solutions from entering the population, meaning that search performance remains compromised. We extend the basic interval arithmetic concept with `safe' search operators that integrate interval information into their process, thereby greatly reducing the number of invalid solutions produced during search. The resulting algorithms are able to more effectively identify good models that generalise well to unseen data. We conclude with an analysis of the sensitivity of interval arithmetic-based operators with respect to the accuracy of the supplied input feature intervals.
△ Less
Submitted 17 April, 2017;
originally announced April 2017.