Saving and loading populations

Saving and loading populations#

Another feature Brush implements is the ability to save and load entire populations. We use JSON notation to store the population into a file that is human readable. The same way, we can feed an estimator a previous population file to serve as starting point for the evolution.

In this notebook, we will walk through how to use the save_population and load_population parameters.

We start by getting a sample dataset and splitting it into X and y:

import pandas as pd
from pybrush import BrushRegressor

# load data
df = pd.read_csv('../examples/datasets/d_enc.csv')
X = df.drop(columns='label')
y = df['label']

To save the population after finishing the evolution, you nee to set save_population parameter to a value different than an empty string. Then, the final population is going to be stored in that specific file.

In this example, we create a temporary file.

import pickle
import os, tempfile

pop_file = os.path.join(tempfile.mkdtemp(), 'population.json')

# set verbosity==2 to see the full report
est = BrushRegressor(
    functions=['SplitBest','Add','Mul','Sin','Cos','Exp','Logabs'],
    max_gens=10,
    save_population=pop_file,
    verbosity=2
)

est.fit(X,y)
y_pred = est.predict(X)
print('score:', est.score(X,y))
Generation 1/10 [//////                                            ]
Train Loss (Med): 11.75939 (74.37032)
Val Loss (Med): 11.75939 (74.37032)
Median Size (Max): 3 (19)
Median complexity (Max): 9 (432)
Time (s): 0.12205

Generation 2/10 [///////////                                       ]
Train Loss (Med): 11.58283 (17.94969)
Val Loss (Med): 11.58283 (17.94969)
Median Size (Max): 3 (19)
Median complexity (Max): 9 (368)
Time (s): 0.27800

Generation 3/10 [////////////////                                  ]
Train Loss (Med): 11.15674 (17.94969)
Val Loss (Med): 11.15674 (17.94969)
Median Size (Max): 3 (20)
Median complexity (Max): 10 (915)
Time (s): 0.41845

Generation 4/10 [/////////////////////                             ]
Train Loss (Med): 10.62121 (17.94969)
Val Loss (Med): 10.62121 (17.94969)
Median Size (Max): 3 (20)
Median complexity (Max): 9 (381)
Time (s): 0.56585

Generation 5/10 [//////////////////////////                        ]
Train Loss (Med): 10.51181 (17.94969)
Val Loss (Med): 10.51181 (17.94969)
Median Size (Max): 3 (20)
Median complexity (Max): 9 (412)
Time (s): 0.73561

Generation 6/10 [///////////////////////////////                   ]
Train Loss (Med): 10.51181 (17.94969)
Val Loss (Med): 10.51181 (17.94969)
Median Size (Max): 3 (20)
Median complexity (Max): 9 (412)
Time (s): 0.89526

Generation 7/10 [////////////////////////////////////              ]
Train Loss (Med): 10.51181 (17.94969)
Val Loss (Med): 10.51181 (17.94969)
Median Size (Max): 3 (20)
Median complexity (Max): 9 (412)
Time (s): 1.03213

Generation 8/10 [/////////////////////////////////////////         ]
Train Loss (Med): 10.43982 (17.94969)
Val Loss (Med): 10.43982 (17.94969)
Median Size (Max): 3 (20)
Median complexity (Max): 9 (412)
Time (s): 1.19282

Generation 9/10 [//////////////////////////////////////////////    ]
Train Loss (Med): 10.33524 (17.94969)
Val Loss (Med): 10.33524 (17.94969)
Median Size (Max): 3 (20)
Median complexity (Max): 9 (368)
Time (s): 1.33781

Generation 10/10 [//////////////////////////////////////////////////]
Train Loss (Med): 10.33524 (17.94969)
Val Loss (Med): 10.33524 (17.94969)
Median Size (Max): 3 (20)
Median complexity (Max): 9 (368)
Time (s): 1.50192

Saved population to file /tmp/tmpw7jkwa5m/population.json
score: 0.8856532915521027

Loading a previous population is done providing load_population a string value corresponding to a JSON file generated by Brush. In our case, we will use the same file from the previous code block.

After loading the population, we run the evolution for 10 more generations, and we can see that the first generation started from the previous population. This means that the population was successfully saved and loaded.

est = BrushRegressor(
    functions=['SplitBest','Add','Mul','Sin','Cos','Exp','Logabs'],
    load_population=pop_file,
    max_gens=10,
    verbosity=2
)

est.fit(X,y)
y_pred = est.predict(X)
print('score:', est.score(X,y))
Loaded population from /tmp/tmpw7jkwa5m/population.json of size = 200
Generation 1/10 [//////                                            ]
Train Loss (Med): 10.33524 (17.94969)
Val Loss (Med): 10.33524 (17.94969)
Median Size (Max): 3 (20)
Median complexity (Max): 9 (368)
Time (s): 0.16596

Generation 2/10 [///////////                                       ]
Train Loss (Med): 10.33524 (17.94969)
Val Loss (Med): 10.33524 (17.94969)
Median Size (Max): 3 (18)
Median complexity (Max): 9 (240)
Time (s): 0.31669

Generation 3/10 [////////////////                                  ]
Train Loss (Med): 10.26326 (17.94969)
Val Loss (Med): 10.26326 (17.94969)
Median Size (Max): 3 (20)
Median complexity (Max): 9 (368)
Time (s): 0.45045

Generation 4/10 [/////////////////////                             ]
Train Loss (Med): 10.26326 (17.94969)
Val Loss (Med): 10.26326 (17.94969)
Median Size (Max): 3 (19)
Median complexity (Max): 9 (368)
Time (s): 0.63331

Generation 5/10 [//////////////////////////                        ]
Train Loss (Med): 10.26326 (16.41696)
Val Loss (Med): 10.26326 (16.41696)
Median Size (Max): 5 (17)
Median complexity (Max): 33 (330)
Time (s): 0.78002

Generation 6/10 [///////////////////////////////                   ]
Train Loss (Med): 9.70269 (17.94969)
Val Loss (Med): 9.70269 (17.94969)
Median Size (Max): 3 (19)
Median complexity (Max): 9 (330)
Time (s): 0.91656

Generation 7/10 [////////////////////////////////////              ]
Train Loss (Med): 9.67577 (17.94969)
Val Loss (Med): 9.67577 (17.94969)
Median Size (Max): 3 (19)
Median complexity (Max): 9 (330)
Time (s): 1.10225

Generation 8/10 [/////////////////////////////////////////         ]
Train Loss (Med): 9.67577 (16.41696)
Val Loss (Med): 9.67577 (16.41696)
Median Size (Max): 5 (19)
Median complexity (Max): 33 (330)
Time (s): 1.30773

Generation 9/10 [//////////////////////////////////////////////    ]
Train Loss (Med): 9.67577 (16.41696)
Val Loss (Med): 9.67577 (16.41696)
Median Size (Max): 5 (19)
Median complexity (Max): 33 (330)
Time (s): 1.44840

Generation 10/10 [//////////////////////////////////////////////////]
Train Loss (Med): 9.67577 (15.67545)
Val Loss (Med): 9.67577 (15.67545)
Median Size (Max): 6 (19)
Median complexity (Max): 36 (723)
Time (s): 1.65144

score: 0.892949582824199

You can open the serialized file and change individuals’ programs manually.

This also allow us to have checkpoints in the execution.