Working with Programs

In Brush, a Program is an executable data structure. You may think of it as a model or a function mapping feature inputs to data labels. We call them programs because that’s what they are: executable data structures,
and that is what they are called in the genetic algorithm literature, to distinguish them from optimizing bits or strings.

The Brush Program class operates similarly to a sklearn estimator: it has fit and predict methods that are called in during training or inference, respectively.

Types of Programs

There are four fundamental “types” of Brush programs:

  • Regressors: map inputs to a continous endpoint

  • Binary Classifiers: map inputs to a binary endpoint, as well as a continuous value in \([0, 1]\)

  • Multi-class Classifiers: map inputs to a category

    • Under development

  • Representors: map inputs to a lower dimensional space.

    • Under development

Representation

Internally, the programs are represented as syntax trees. We use the tree.hh tree class which gives trees an STL-like feel.

Generation

We generate random programs using Sean Luke’s PTC2 algorithm.

Evaluation

TODO

Visualizing Programs

Programs in Brush are symbolic tree structures, and can be viewed in a few ways:

  1. As a string using get_model()

  2. As a string-like tree using get_model("tree")

  3. As a graph using graphviz and get_model("dot").

Let’s look at a regresion example.

import pandas as pd
from pybrush import BrushRegressor

# load data
df = pd.read_csv('../examples/datasets/d_enc.csv')
X = df.drop(columns='label')
y = df['label']

X.info() # we have several float and two integer features
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 8 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   x0      768 non-null    float64
 1   x1      768 non-null    float64
 2   x2      768 non-null    float64
 3   x3      768 non-null    float64
 4   x4      768 non-null    float64
 5   x5      768 non-null    int64  
 6   x6      768 non-null    float64
 7   x7      768 non-null    int64  
dtypes: float64(6), int64(2)
memory usage: 48.1 KB
# import and make a regressor
est = BrushRegressor(
    # Uncomment the line below to constrain the search space to fewer functions
    functions=['SplitBest', 'SplitOn', 'Geq', 'Eq', 'Mul', 'Add', 'Cos', 'Exp'],
    max_depth=3,
    verbosity=1 # set verbosity==1 to see a progress bar
)

# use like you would a sklearn regressor
est.fit(X,y)

print("Best model:", est.best_estimator_.get_model("tree"))

y_pred = est.predict(X)
print('score:', est.score(X,y))
Completed 100% [====================]
Best model: If(x0>=0.76)
|- If(x0>=0.82)
|  |- 29.83
|  |- 48.74*x0
|- 1.36*Add
|  |- 0.01*x1
|  |- 8.16*x6
score: 0.8950117552104557

You can see the fitness of the final individual by accessing the fitness attribute. Each fitness value corresponds to the objective of same index defined earlier for the BrushRegressor class. By default, it will try to minimize "scorer" and "size", where scorer is a string with a loss function name, set with scorer parameter.

print(est.best_estimator_.fitness)
print(est.objectives)
Fitness(10.703141 34.000000 )
['scorer', 'linear_complexity']

A fitness in Brush is actually more than a tuple. It is a class that has all boolean comparison operators overloaded to allow an ease of use when prototyping with Brush.

It also infers the weight of each objective to automatically handle minimization or maximization objetives.

To see the weights, you can try:

est.best_estimator_.fitness.weights
[-1.0, -1.0]

Other information of the best estimator can also be accessed through its fitness attribute:

print(est.best_estimator_.fitness.size)
print(est.best_estimator_.fitness.complexity)
print(est.best_estimator_.fitness.depth)
25
250
3

Serialization

Brush let’s you serialize the entire individual, or just the program or fitness it wraps. It uses JSON to serialize the objects, and this is implemented with the get and set states of an object:

estimator_dict = est.best_estimator_.__getstate__()

for k, v in estimator_dict.items():
    print(k, v)
fitness {'complexity': 250, 'crowding_dist': 3.4028234663852886e+38, 'dcounter': 0, 'depth': 3, 'dominated': [], 'linear_complexity': 34, 'loss': 9.187416076660156, 'loss_v': 10.703141212463379, 'prev_complexity': 250, 'prev_depth': 3, 'prev_linear_complexity': 34, 'prev_loss': 9.187414169311523, 'prev_loss_v': 10.703254699707031, 'prev_size': 25, 'rank': 1, 'size': 25, 'values': [10.703141212463379, 34.0], 'weights': [-1.0, -1.0], 'wvalues': [-10.703141212463379, -34.0]}
id 291
is_fitted_ False
objectives ['mse', 'linear_complexity']
parent_id [274]
program {'Tree': [{'W': 0.7599999904632568, 'arg_types': ['ArrayF', 'ArrayF'], 'center_op': True, 'feature': 'x0', 'feature_type': 'ArrayI', 'is_weighted': True, 'name': 'SplitBest', 'node_is_fixed': False, 'node_type': 'SplitBest', 'prob_change': 1.0, 'ret_type': 'ArrayF', 'sig_dual_hash': 14679000877885575597, 'sig_hash': 14400282083458657357, 'weight_is_fixed': False}, {'W': 0.8199999928474426, 'arg_types': ['ArrayF', 'ArrayF'], 'center_op': True, 'feature': 'x0', 'feature_type': 'ArrayI', 'is_weighted': True, 'name': 'SplitBest', 'node_is_fixed': False, 'node_type': 'SplitBest', 'prob_change': 1.0, 'ret_type': 'ArrayF', 'sig_dual_hash': 14679000877885575597, 'sig_hash': 14400282083458657357, 'weight_is_fixed': False}, {'W': 29.8331241607666, 'arg_types': [], 'center_op': True, 'feature': 'constF', 'feature_type': 'ArrayF', 'is_weighted': True, 'name': 'Constant', 'node_is_fixed': False, 'node_type': 'Constant', 'prob_change': 0.6167147755622864, 'ret_type': 'ArrayF', 'sig_dual_hash': 7018942542468397869, 'sig_hash': 14162902253047951597, 'weight_is_fixed': False}, {'W': 48.73774337768555, 'arg_types': [], 'center_op': True, 'feature': 'x0', 'feature_type': 'ArrayF', 'is_weighted': True, 'name': 'Terminal', 'node_is_fixed': False, 'node_type': 'Terminal', 'prob_change': 0.6343384981155396, 'ret_type': 'ArrayF', 'sig_dual_hash': 7018942542468397869, 'sig_hash': 14162902253047951597, 'weight_is_fixed': False}, {'W': 1.3606120347976685, 'arg_types': ['ArrayF', 'ArrayF'], 'center_op': True, 'feature': '', 'feature_type': 'ArrayF', 'is_weighted': True, 'name': 'Add', 'node_is_fixed': False, 'node_type': 'Add', 'prob_change': 1.0, 'ret_type': 'ArrayF', 'sig_dual_hash': 14679000877885575597, 'sig_hash': 14400282083458657357, 'weight_is_fixed': False}, {'W': 0.013320915400981903, 'arg_types': [], 'center_op': True, 'feature': 'x1', 'feature_type': 'ArrayF', 'is_weighted': True, 'name': 'Terminal', 'node_is_fixed': False, 'node_type': 'Terminal', 'prob_change': 0.6729995608329773, 'ret_type': 'ArrayF', 'sig_dual_hash': 7018942542468397869, 'sig_hash': 14162902253047951597, 'weight_is_fixed': False}, {'W': 8.156413078308105, 'arg_types': [], 'center_op': True, 'feature': 'x6', 'feature_type': 'ArrayF', 'is_weighted': True, 'name': 'Terminal', 'node_is_fixed': False, 'node_type': 'Terminal', 'prob_change': 0.20750471949577332, 'ret_type': 'ArrayF', 'sig_dual_hash': 7018942542468397869, 'sig_hash': 14162902253047951597, 'weight_is_fixed': False}], 'is_fitted_': True}
variation insert

With serialization, you can use pickle to save and load just programs or even the entire individual.

import pickle
import os, tempfile

individual_file = os.path.join(tempfile.mkdtemp(), 'individual.json')
with open(individual_file, "wb") as f:
    pickle.dump(est.best_estimator_, f)

program_file = os.path.join(tempfile.mkdtemp(), 'program.json')
with open(program_file, "wb") as f:
    pickle.dump(est.best_estimator_.program, f)

Then we can load it later with:

with open(individual_file, "rb") as f:
    loaded_estimator = pickle.load(f)
    print(loaded_estimator.get_model())
If(x0>=0.76,If(x0>=0.82,29.83,48.74*x0),1.36*Add(0.01*x1,8.16*x6))

String

Now that we have trained a model, est.best_estimator_ contains our symbolic model. We can view it as a string:

print(est.best_estimator_.get_model())
If(x0>=0.76,If(x0>=0.82,29.83,48.74*x0),1.36*Add(0.01*x1,8.16*x6))

Quick Little Tree

Or, we can view it as a compact tree:

print(est.best_estimator_.get_model("tree"))
If(x0>=0.76)
|- If(x0>=0.82)
|  |- 29.83
|  |- 48.74*x0
|- 1.36*Add
|  |- 0.01*x1
|  |- 8.16*x6

GraphViz

If we are feeling fancy 🎩, we can also view it as a graph in dot format. Let’s import graphviz and make a nicer plot.

import graphviz

model = est.best_estimator_.get_model("dot")
graphviz.Source(model)
../_images/07acfcad110a11f326e6ca2424de549749e1caf2f78487954caa89dec8296876.svg

The model variable is now a little program in the dot language, which we can inspect directly.

print(model)
digraph G {
y [shape=box];
y -> "177604080" [label="0.76"];
"177604080" [label="x0 >= 0.76?"];
"177604080" -> "10fd43150" [headlabel="",taillabel="Y"];
"177604080" -> "17359a090" [headlabel="1.36",taillabel="N"];
"10fd43150" [label="x0 >= 0.82?"];
"10fd43150" -> "109926ea0" [headlabel="",taillabel="Y"];
"10fd43150" -> "x0" [headlabel="48.74",taillabel="N"];
"109926ea0" [label="29.83"];
"x0" [label="x0"];
"17359a090" [label="Add"];
"17359a090" -> "x1" [label="0.01"];
"17359a090" -> "x6" [label="8.16"];
"x1" [label="x1"];
"x6" [label="x6"];
label="^ split feature fixed, * split threshold fixed";
labelloc=bottom;
fontsize=10;}

Tweaking Graphs

The dot manual has lots of options for tweaking the graphs. You can do this by manually editing model, but brush also provides a function, get_dot_model(), to which you can pass additional arguments to dot.

For example, let’s view the graph from Left-to-Right:

model = est.best_estimator_.get_dot_model("rankdir=LR;")
graphviz.Source(model)
../_images/3ac4f587e8adf545df085178e75a2e5b720f19c5187cc27b1c8752c67db13f07.svg

A classification example

from pybrush import BrushClassifier
from sklearn.preprocessing import StandardScaler

# load data
df = pd.read_csv('../examples/datasets/d_analcatdata_aids.csv')
X = df.drop(columns='target')
y = df['target']

X.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Age     50 non-null     int64  
 1   Race    50 non-null     int64  
 2   AIDS    50 non-null     float64
 3   Total   50 non-null     float64
dtypes: float64(2), int64(2)
memory usage: 1.7 KB
est = BrushClassifier(
    # Uncomment the line below to constrain the search space to fewer functions
    functions=['SplitBest', 'SplitOn', 'Geq', 'Eq', 'Mul', 'Add', 'Cos', 'Exp'],
    max_gens=100,
    max_size=50,
    max_depth=5,
    objectives=["scorer", "linear_complexity"],  
    scorer="average_precision_score",
    pop_size=100,
    bandit='dynamic_thompson',
    verbosity=1
)

est.fit(X,y)

print("Best model:", est.best_estimator_.get_model("tree"))
Completed 100% [====================]
Best model: Logistic
|- -0.38+Add
|  |- 0.00*Add
|  |  |- Cos
|  |  |  |- AIDS
|  |  |- 0.01*AIDS

Notice that classification problems have a fixed logistic root on their programs. When printing the dot model, Brush will highlight fixed nodes using a light coral color.

est.engine_.search_space.print()

print("Best model:", est.best_estimator_.get_model())
print('score:', est.score(X,y))

print("Best model:", est.best_estimator_.get_model("tree"))

model = est.best_estimator_.get_dot_model()
graphviz.Source(model)
=== Search space ===
terminal_map: {"ArrayB": ["1.00"], "ArrayI": ["Age", "Race", "1.00"], "ArrayF": ["AIDS", "Total", "1.00"]}
terminal_weights: {"ArrayB": [0.3165285], "ArrayI": [0.5779346, 0.42670068, 0.10736023], "ArrayF": [0.539658, 0.18103841, 0.7904958]}
SplitBest node_map[ArrayI][["ArrayI", "ArrayI"]][SplitBest] = SplitBest, weight = 0.5382333
Mul node_map[ArrayI][["ArrayI", "ArrayI"]][Mul] = Mul, weight = 0.96832097
Add node_map[ArrayI][["ArrayI", "ArrayI"]][Add] = Add, weight = 0.7737981
OffsetSum node_map[MatrixF][["ArrayF", "ArrayF", "ArrayF", "ArrayF"]][OffsetSum] = 0.00+Add, weight = 0
Logistic node_map[MatrixF][["ArrayF", "ArrayF", "ArrayF", "ArrayF"]][Logistic] = Logistic, weight = 0
Cos node_map[MatrixF][["ArrayF", "ArrayF", "ArrayF", "ArrayF"]][Cos] = Cos, weight = 0.53532857
Exp node_map[MatrixF][["ArrayF", "ArrayF", "ArrayF", "ArrayF"]][Exp] = Exp, weight = 0.35211778
OffsetSum node_map[MatrixF][["ArrayF", "ArrayF", "ArrayF"]][OffsetSum] = 0.00+Add, weight = 0
Logistic node_map[MatrixF][["ArrayF", "ArrayF", "ArrayF"]][Logistic] = Logistic, weight = 0
Cos node_map[MatrixF][["ArrayF", "ArrayF", "ArrayF"]][Cos] = Cos, weight = 0.6508668
Exp node_map[MatrixF][["ArrayF", "ArrayF", "ArrayF"]][Exp] = Exp, weight = 0.6810962
OffsetSum node_map[MatrixF][["ArrayF", "ArrayF"]][OffsetSum] = 0.00+Add, weight = 0
Logistic node_map[MatrixF][["ArrayF", "ArrayF"]][Logistic] = Logistic, weight = 0
Cos node_map[MatrixF][["ArrayF", "ArrayF"]][Cos] = Cos, weight = 0.117200986
Exp node_map[MatrixF][["ArrayF", "ArrayF"]][Exp] = Exp, weight = 0.14627638
SplitBest node_map[ArrayF][["ArrayF", "ArrayF"]][SplitBest] = SplitBest, weight = 0.27913103
Mul node_map[ArrayF][["ArrayF", "ArrayF"]][Mul] = Mul, weight = 0.385618
Add node_map[ArrayF][["ArrayF", "ArrayF"]][Add] = Add, weight = 0.2764419
OffsetSum node_map[ArrayF][["ArrayF"]][OffsetSum] = 0.00+Add, weight = 0
Logistic node_map[ArrayF][["ArrayF"]][Logistic] = Logistic, weight = 0
Cos node_map[ArrayF][["ArrayF"]][Cos] = Cos, weight = 0.32952592
Exp node_map[ArrayF][["ArrayF"]][Exp] = Exp, weight = 0.7656854

Best model: Logistic(Add(-0.38,0.00*Add(Cos(AIDS),0.01*AIDS)))
score: 0.68
Best model: Logistic
|- -0.38+Add
|  |- 0.00*Add
|  |  |- Cos
|  |  |  |- AIDS
|  |  |- 0.01*AIDS
../_images/c34fac8a76db2bb0d58da9d544d7ea18ca9ddf243632d2915ec404d314550079.svg