SRBench: A Living Benchmark for Symbolic Regression

The methods for symbolic regression (SR) have come a long way since the days of Koza-style genetic programming (GP). Our goal with this project is to keep a living benchmark of modern symbolic regression, in the context of state-of-the-art ML methods.

Currently these are the challenges, as we see it:

  • Lack of cross-pollination between the GP community and the ML community (different conferences, journals, societies etc)
  • Lack of strong benchmarks in SR literature (small problems, toy datasets, weak comparator methods)
  • Lack of a unified framework for SR, or GP

We are addressing the lack of pollination by making these comparisons open source, reproduceable and public, and hoping to share them widely with the entire ML research community. We are trying to address the lack of strong benchmarks by providing open source benchmarking of many SR methods on large sets of problems, with strong baselines for comparison. To handle the lack of a unified framework, we’ve specified minimal requirements for contributing a method to this benchmark: a scikit-learn compatible API.

Benchmarked Methods

This benchmark currently consists of 14 symbolic regression methods, 7 other ML methods, and 252 datasets from PMLB, including real-world and synthetic datasets from processes with and without ground-truth models.

Methods currently benchmarked:

  • Age-Fitness Pareto Optimization (Schmidt and Lipson 2009) paper , code
  • Age-Fitness Pareto Optimization with Co-evolved Fitness Predictors (Schmidt and Lipson 2009) paper , code
  • AIFeynman 2.0 (Udrescu et al. 2020) paper , code
  • Bayesian Symbolic Regression (Jin et al. 2020) paper , code
  • Deep Symbolic Regression (Petersen et al. 2020) paper , code
  • Fast Function Extraction (McConaghy 2011) paper , code
  • Feature Engineering Automation Tool (La Cava et al. 2017) paper , code
  • epsilon-Lexicase Selection (La Cava et al. 2016) paper , code
  • GP-based Gene-pool Optimal Mixing Evolutionary Algorithm (Virgolin et al. 2017) paper , code
  • gplearn (Stephens) code
  • Interaction-Transformation Evolutionary Algorithm (de Franca and Aldeia, 2020) paper , code
  • Multiple Regression GP (Arnaldo et al. 2014) paper , code
  • Operon (Burlacu et al. 2020) paper , code

  • Semantic Backpropagation GP (Virgolin et al. 2019) paper , code

Methods Staged for Benchmarking:

  • PySR (Cranmer 2020) code
  • PSTree (Zhang 2021) code
  • RILS-ROLS (Kartelj 2023) code


We are actively updating and expanding this benchmark. Want to add your method? See our Contribution Guide.


A pre-print of the current version of the benchmark is available: v2.0 was reported in our Neurips 2021 paper:

La Cava, W., Orzechowski, P., Burlacu, B., de França, F. O., Virgolin, M., Jin, Y., Kommenda, M., & Moore, J. H. (2021). Contemporary Symbolic Regression Methods and their Relative Performance. Neurips Track on Datasets and Benchmarks. arXiv,

v1.0 was reported in our GECCO 2018 paper:

Orzechowski, P., La Cava, W., & Moore, J. H. (2018). Where are we now? A large benchmark study of recent symbolic regression methods. GECCO 2018. DOI, Preprint


William La Cava (@lacava), william dot lacava at childrens dot harvard dot edu