Documentation

ellyn is fast because it uses a c++ library to do most of the computation. However, once you have it installed, you can use it just like you would any other scikit-learn estimator, which makes it easy to do cross validation, ensemble learning, or to build any other kind of ML pipeline design. Follow the installation guide to get it up and running.

Installation

These instructions are written for an anaconda3 default python installation, but you can easily modify the paths to point to your installation.

git clone http://github.com/EpistasisLab/ellyn

cd ellyn

conda env create environment.yml

conda activate ellyn-env

python setup.py install

environment.yml lists the package dependencies for ellyn, if you’d like to install them yourself.

Usage

In a python script, import ellyn:

from ellyn import ellyn

ellyn uses the same nomenclature as sklearn supervised learning modules.

Regression

By default, ellyn does regression. You can initialize a few learner in python as:

learner = ellyn()

or specify the generations, population size and selection algorithm as:

learner = ellyn(g = 100, popsize = 25, selection = 'lexicase')

Classification

To do classification, ellyn implements the M4GP algorithm (PDF) for (multi-class) classification. To use it, pass these parameters:

learner = ellyn(classification=True, 
                class_m4gp=True, 
                prto_arch_on=True,
                selection='lexicase',
                fit_type='F1' # can be 'F1' or 'F1W' (weighted F1)
               )

Given a set of data with variables X and target Y, fit ellyn using the fit() method:

learner.fit(X,Y)

You have now learned a model for your data. Predict your model’s response on a new set of variables as

y_pred = learner.predict(X_unseen)

Call ellyn from the terminal as

python -m ellyn.ellyn data_file_name -g 100 -p 50 -sel lexicase

try python -m ellyn.ellyn --help to see options.

GP options

ellyn uses a stack-based, syntax-free, linear genome for constructing candidate equations.

Selection/Survival options

tournament
deterministic crowding
lexicase selection
age-fitness pareto optimization
SPEA2
random

Variation

subtree, uniform, and point mutation
subtree, unifrom, and point crossover

Parameter learning

stochastic hill climbing

Cite

ellyn has been used in several publications. Cite the one that best represents your use case, or you can cite my dissertation if you’re not sure.

2021

La Cava, W., Orzechowski, P., Burlacu, B., França, F. O. de, Virgolin, M., Jin, Y., Kommenda, M., & Moore, J. H. (2021). Contemporary Symbolic Regression Methods and their Relative Performance. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (Accepted). arXiv, repo

2019

La Cava, W., Silva, S., Danai, K., Spector, L., Vanneschi, L., & Moore, J. H. (2019). Multidimensional genetic programming for multiclass classification. Swarm and Evolutionary Computation. ScienceDirect, PDF,

2018

Orzechowski, P., La Cava, W., & Moore, J. H. (2018). Where are we now? A large benchmark study of recent symbolic regression methods. GECCO 2018. DOI, Preprint

2017

La Cava, W., Silva, S., Vanneschi, L., Spector, L., and Moore, J. (2017). “Genetic programming representations for multi-dimensional feature learning in biomedical classification.” Evo Applications, EvoStar 2017, Amsterdam, Netherlands. preprint

2016

La Cava, William G., “Automatic Development and Adaptation of Concise Nonlinear Models for System Identification” (2016). Doctoral Dissertations May 2014 - current. 731. link
La Cava, W., Danai, K., Spector, L., (2016). “Inference of Compact Nonlinear Dynamic Models by Epigenetic Local Search.” Engineering Applications of Artificial Intelligence. doi:10.1016/j.engappai.2016.07.004
La Cava, W., Spector, L., Danai, K. (2016). “epsilon-Lexicase selection for regression.” Proceedings of the Genetic and Evolutionary Computation Conference (GECCO). ACM, Denver, CO. preprint

2015

La Cava, W., Danai, K., Spector, L., Fleming, P., Wright, A., Lackner, M. (2015). “Automatic identification of wind turbine models using evolutionary multi-objective optimization.” Renewable Energy. doi:10.1016/j.renene.2015.09.068