SRBench: A Living Benchmark for Symbolic Regression

The methods for symbolic regression (SR) have come a long way since the days of Koza-style genetic programming (GP).

Our goal with this project is to keep a living benchmark of modern symbolic regression, in the context of state-of-the-art ML methods.

Currently these are the challenges, as we see it:

Lack of cross-pollination between the GP community and the ML community (different conferences, journals, societies etc)
Lack of strong benchmarks in SR literature (small problems, toy datasets, weak comparator methods)
Lack of a unified framework for SR, or GP

We are addressing the lack of pollination by making these comparisons open source, reproduceable and public, and hoping to share them widely with the entire ML research community. We are trying to address the lack of strong benchmarks by providing open source benchmarking of many SR methods on large sets of problems, with strong baselines for comparison. To handle the lack of a unified framework, we’ve specified minimal requirements for contributing a method to this benchmark: a scikit-learn compatible API.

Benchmarked Methods

This benchmark currently consists of 14 symbolic regression methods, 7 other ML methods, and 252 datasets from PMLB, including real-world and synthetic datasets from processes with and without ground-truth models.

Methods currently benchmarked:

Age-Fitness Pareto Optimization (Schmidt and Lipson 2009) paper , code
Age-Fitness Pareto Optimization with Co-evolved Fitness Predictors (Schmidt and Lipson 2009) paper , code
AIFeynman 2.0 (Udrescu et al. 2020) paper , code
Bayesian Symbolic Regression (Jin et al. 2020) paper , code
Deep Symbolic Regression (Petersen et al. 2020) paper , code
Fast Function Extraction (McConaghy 2011) paper , code
Feature Engineering Automation Tool (La Cava et al. 2017) paper , code
epsilon-Lexicase Selection (La Cava et al. 2016) paper , code
GP-based Gene-pool Optimal Mixing Evolutionary Algorithm (Virgolin et al. 2017) paper , code
gplearn (Stephens) code
Interaction-Transformation Evolutionary Algorithm (de Franca and Aldeia, 2020) paper , code
Multiple Regression GP (Arnaldo et al. 2014) paper , code
Operon (Burlacu et al. 2020) paper , code
Semantic Backpropagation GP (Virgolin et al. 2019) paper , code

Functioning methods staged for Benchmarking:

Starting from 2024, we moved to using docker containers for the different methods. All methods below are fully functioning as docker images and have been benchmarked in an alternative view of SRBench (see our call for action paper)

Method
AFP - paper	AFP_fe	AFP_ehc - paper
Bingo - paper	Brush - paper	BSR - paper
E2E - paper	EPLEX - paper	EQL - paper
FEAT - paper	FFX - paper	Genetic Engine - paper
GPGomea - paper	GPlearn - paper	GPZGD - paper
ITEA - paper	NeSymRes - paper	Operon - paper
Ps-Tree - paper	PySR - paper	Qlattice - paper
Rils-rols - paper	TIR - paper	TPSR - paper
uDSR - paper

Contribute

We are actively updating and expanding this benchmark. Want to add your method? See our Contribution Guide.

Benchmark results

We made available all of our experiments’ results as feather files inside /results/.

Contributing and running it locally

Check out CONTRIBUTING.md file to see how to set up your algorithm. This guide will detail all the requirements in order to submit a pull request with a compatible interface with SRBench.

The analyze file is the main entry point as it will parse the flags and create specific python commands to run each experiment independently. Some examples on how to invoke the experiments are available at the docs/user_guide.md.

Reproducing the experiments

Note on Git LFS: This repository uses Git Large File Storage (LFS) for storing large dataset files. GitHub provides sufficient free bandwidth and storage per month for GitHub Free accounts; additional bandwidth requires a paid plan. If you have a GitHub Student account, you may see larger capacities. If you don’t need the actual results files, you can clone without LFS using:

GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/cavalab/srbench.git

A detailed guide on how to reproduce the experiments by yourself is provided in docs/user_guide.md.

Once you get all the results, you nee to collate the results using the collate scripts in ./postprocessing/scripts

References

Call for action

An alternative approach for benchmarking symbolic regression methods, including all $25$ available methods, was reported in our GECCO 2025 SR workshop paper:

Imai Aldeia, G. S., Zhang, H., Bomarito, G., Cranmer, M., Fonseca, A., Burlacu, B., La Cava, W., and de França, F. 2025. Call for Action: towards the next generation of symbolic regression benchmark. Proceedings of the Genetic and Evolutionary Computation Conference Companion

doi, preprint

v2.0

A pre-print of the current version of the benchmark is available: v2.0 was reported in our Neurips 2021 paper:

La Cava, W., Orzechowski, P., Burlacu, B., de França, F. O., Virgolin, M., Jin, Y., Kommenda, M., & Moore, J. H. (2021). Contemporary Symbolic Regression Methods and their Relative Performance. Neurips Track on Datasets and Benchmarks.

arXiv, neurips.cc

v1.0

v1.0 was reported in our GECCO 2018 paper:

Orzechowski, P., La Cava, W., & Moore, J. H. (2018). Where are we now? A large benchmark study of recent symbolic regression methods. GECCO 2018.

DOI, Preprint

Contact

William La Cava (@lacava), william dot lacava at childrens dot harvard dot edu