Brush C++ API
A flexible interpretable machine learning framework
|
Holds a search space, consisting of operations and terminals and functions, and methods to sample that space to create programs. More...
#include <search_space.h>
Public Types | |
using | ArgsHash = std::size_t |
template<typename T > | |
using | Map |
Public Member Functions | |
template<typename PT > | |
PT | make_program (const Parameters ¶ms, int max_d=0, int max_size=0) |
Makes a random program. | |
RegressorProgram | make_regressor (int max_d=0, int max_size=0, const Parameters ¶ms=Parameters()) |
Makes a random regressor program. Convenience wrapper for make_program. | |
ClassifierProgram | make_classifier (int max_d=0, int max_size=0, const Parameters ¶ms=Parameters()) |
Makes a random classifier program. Convenience wrapper for make_program. | |
MulticlassClassifierProgram | make_multiclass_classifier (int max_d=0, int max_size=0, const Parameters ¶ms=Parameters()) |
Makes a random multiclass classifier program. Convenience wrapper for make_program. | |
RepresenterProgram | make_representer (int max_d=0, int max_size=0, const Parameters ¶ms=Parameters()) |
Makes a random representer program. Convenience wrapper for make_program. | |
SearchSpace ()=default | |
SearchSpace (const Dataset &d, const unordered_map< string, float > &user_ops={}, bool weights_init=true) | |
Construct a search space. | |
void | init (const Dataset &d, const unordered_map< string, float > &user_ops={}, bool weights_init=true) |
Called by the constructor to initialize the search space. | |
bool | check (DataType R) const |
check if a return type is in the node map | |
bool | check (DataType R, size_t sig_hash) const |
check if a function signature is in the search space | |
bool | check (DataType R, size_t sig_hash, NodeType type) const |
check if a typed Node is in the search space | |
template<typename Iter > | |
bool | has_solution_space (Iter start, Iter end) const |
Takes iterators to weight vectors and checks if they have a non-empty solution space. An empty solution space is defined as having no non-zero, positive values. | |
template<typename F > | |
Node | get (const string &name) |
Node | get (NodeType type, DataType R, size_t sig_hash) |
get a typed node | |
template<typename S > | |
Node | get (NodeType type, DataType R, S sig) |
get a typed node. | |
vector< float > | get_weights () const |
get weights of the return types | |
vector< float > | get_weights (DataType ret) const |
get weights of the argument types matching return type ret . | |
vector< float > | get_weights (DataType ret, ArgsHash sig_hash) const |
get the weights of nodes matching a signature. | |
std::optional< Node > | sample_terminal (bool force_return=false) const |
Get a random terminal. | |
std::optional< Node > | sample_terminal (DataType R, bool force_return=false) const |
Get a random terminal with return type R | |
std::optional< Node > | sample_op (DataType ret) const |
get an operator matching return type ret . | |
std::optional< Node > | sample_op (NodeType type, DataType R) |
Get a specific node type that matches a return value. | |
std::optional< Node > | sample_op_with_arg (DataType ret, DataType arg, bool terminal_compatible=true, int max_args=0) const |
get operator with at least one argument matching arg | |
std::optional< Node > | get_node_like (Node node) const |
get a node with a signature matching node | |
std::optional< tree< Node > > | sample_subtree (Node root, int max_d, int max_size) const |
create a subtree with maximum size and depth restrictions and root of type root_type | |
void | print () const |
prints the search space map. | |
template<typename P > | |
P | make_program (const Parameters ¶ms, int max_d, int max_size) |
Public Attributes | |
Map< Node > | node_map |
Maps return types to argument types to node types. | |
Map< float > | node_map_weights |
A map of weights corresponding to elements in node_map, used to weight probabilities of each node being sampled from the map. | |
unordered_map< DataType, vector< Node > > | terminal_map |
Maps return types to terminals. | |
unordered_map< DataType, vector< float > > | terminal_weights |
A map of weights corresponding to elements in terminal_map, used to weight probabilities of each terminal being sampled from the map. | |
vector< DataType > | terminal_types |
A vector storing the available return types of terminals. | |
Private Member Functions | |
tree< Node > & | PTC2 (tree< Node > &Tree, tree< Node >::iterator root, int max_d, int max_size) const |
template<NodeType NT, typename S > | |
constexpr void | AddNode (const unordered_map< string, float > &user_ops, const vector< DataType > &unique_data_types) |
template<NodeType NT, typename Sigs , std::size_t... Is> | |
constexpr void | AddNodes (const unordered_map< string, float > &user_ops, const vector< DataType > &unique_data_types, std::index_sequence< Is... >) |
template<NodeType NT> | |
void | MakeNodes (const unordered_map< string, float > &user_ops, const vector< DataType > &unique_data_types) |
template<std::size_t... Is> | |
void | GenerateNodeMap (const unordered_map< string, float > &user_ops, const vector< DataType > &unique_data_types, std::index_sequence< Is... >) |
Static Private Member Functions | |
template<NodeType NT, typename S > requires (!is_in_v<NT, NodeType::Terminal, NodeType::Constant, NodeType::MeanLabel>) | |
static constexpr std::optional< Node > | CreateNode (const auto &unique_data_types, bool use_all, bool weighted) |
Holds a search space, consisting of operations and terminals and functions, and methods to sample that space to create programs.
The set of operators is a user controlled parameter; however, we can automate, to some extent, the set of possible operators based on the data types in the problem. Constraints on operators based on data types:
When sampling in the search space (using any of the sampling functions sample_op
or sample_terminal
), some methods can fail to return a value — given a specific set of parameters to a function, the candidate solutions set may be empty — and, for these methods, the return type is either a valid value, or a std::nullopt
. This is controlled wrapping the return type with std::optional
.
Definition at line 83 of file search_space.h.
using Brush::SearchSpace::ArgsHash = std::size_t |
Definition at line 85 of file search_space.h.
using Brush::SearchSpace::Map |
Definition at line 88 of file search_space.h.
|
default |
|
inline |
Construct a search space.
d | A dataset containing terminal definitions |
user_ops | Optional user-provided dictionary of operators with their probability of being chosen |
weights_init | whether the terminal prob_change should be estimated from correlations with the target value |
Definition at line 181 of file search_space.h.
check if a return type is in the node map
R | data type |
Definition at line 194 of file search_space.h.
check if a function signature is in the search space
R | return type |
sig_hash | signature hash |
Definition at line 206 of file search_space.h.
check if a typed Node is in the search space
R | return type |
sig_hash | signature hash |
type | the node type |
Definition at line 222 of file search_space.h.
|
inlinestaticconstexprprivate |
|
inlineprivate |
Definition at line 659 of file search_space.h.
get a typed node.
S | the signature of the node, inferred. |
type | the node type |
R | the return type of the node |
sig | the signature of the node |
Definition at line 265 of file search_space.h.
get a typed node
type | the node type |
R | the return type of the node |
sig_hash | the signature hash of the node |
Definition at line 252 of file search_space.h.
get a node with a signature matching node
node | the node to match |
std::optional
that may contain a Node Definition at line 550 of file search_space.h.
|
inline |
get weights of the return types
Definition at line 269 of file search_space.h.
get weights of the argument types matching return type ret
.
ret | return type |
Definition at line 289 of file search_space.h.
get the weights of nodes matching a signature.
ret | return type |
sig_hash | signature hash |
Definition at line 308 of file search_space.h.
Takes iterators to weight vectors and checks if they have a non-empty solution space. An empty solution space is defined as having no non-zero, positive values.
T | type of iterator. |
start | Start iterator |
end | End iterator |
Definition at line 241 of file search_space.h.
void Brush::SearchSpace::init | ( | const Dataset & | d, |
const unordered_map< string, float > & | user_ops = {}, | ||
bool | weights_init = true ) |
Called by the constructor to initialize the search space.
d | A dataset containing terminal definitions |
user_ops | Optional user-provided dictionary of operators with their probability of being chosen |
weights_init | whether the terminal prob_change should be estimated from correlations with the target value |
Definition at line 166 of file search_space.cpp.
ClassifierProgram Brush::SearchSpace::make_classifier | ( | int | max_d = 0, |
int | max_size = 0, | ||
const Parameters & | params = Parameters() ) |
Makes a random classifier program. Convenience wrapper for make_program.
max_d | max depth of the program |
max_size | max size of the program |
Definition at line 407 of file search_space.cpp.
MulticlassClassifierProgram Brush::SearchSpace::make_multiclass_classifier | ( | int | max_d = 0, |
int | max_size = 0, | ||
const Parameters & | params = Parameters() ) |
Makes a random multiclass classifier program. Convenience wrapper for make_program.
max_d | max depth of the program |
max_size | max size of the program |
Definition at line 412 of file search_space.cpp.
P Brush::SearchSpace::make_program | ( | const Parameters & | params, |
int | max_d, | ||
int | max_size ) |
PT Brush::SearchSpace::make_program | ( | const Parameters & | params, |
int | max_d = 0, | ||
int | max_size = 0 ) |
Makes a random program.
We use an implementation of PTC2 for strongly typed GP from
Sean Luke. "Two fast tree-creation algorithms for genetic programming" (https://doi.org/10.1109/4235.873237)
PT | program type |
max_d | max depth of the program |
max_size | max size of the programd |
RegressorProgram Brush::SearchSpace::make_regressor | ( | int | max_d = 0, |
int | max_size = 0, | ||
const Parameters & | params = Parameters() ) |
Makes a random regressor program. Convenience wrapper for make_program.
max_d | max depth of the program |
max_size | max size of the program |
Definition at line 402 of file search_space.cpp.
RepresenterProgram Brush::SearchSpace::make_representer | ( | int | max_d = 0, |
int | max_size = 0, | ||
const Parameters & | params = Parameters() ) |
Makes a random representer program. Convenience wrapper for make_program.
max_d | max depth of the program |
max_size | max size of the program |
Definition at line 418 of file search_space.cpp.
void Brush::SearchSpace::print | ( | ) | const |
prints the search space map.
Definition at line 162 of file search_space.cpp.
|
private |
Definition at line 268 of file search_space.cpp.
get an operator matching return type ret
.
ret | return type |
std::optional
that may contain a randomly chosen operator matching return type ret
Definition at line 420 of file search_space.h.
Get a specific node type that matches a return value.
type | the node type |
R | the return type |
std::optional
that may contain a Node of type type
with return type R
. Definition at line 454 of file search_space.h.
|
inline |
get operator with at least one argument matching arg
ret | return type |
arg | argument type to match |
terminal_compatible | if true, the other args the returned operator takes must exist in the terminal types. |
max_args | if zero, there is no limit on number of arguments of the operator. If not, the operator can have at most max_args arguments. |
std::optional
that may contain a matching operator respecting all restrictions. Definition at line 492 of file search_space.h.
std::optional< tree< Node > > Brush::SearchSpace::sample_subtree | ( | Node | root, |
int | max_d, | ||
int | max_size ) const |
create a subtree with maximum size and depth restrictions and root of type root_type
root_type | return type |
max_d | the maximum depth |
max_size | the maximum size of the tree (will be sampled between [1, max_size]) |
std::optional
that may contain a tree Definition at line 235 of file search_space.cpp.
Get a random terminal.
std::optional
that may contain a terminal Node. Definition at line 319 of file search_space.h.
|
inline |
Get a random terminal with return type R
std::optional
that may contain a terminal Node of type R
. Definition at line 380 of file search_space.h.
Maps return types to argument types to node types.
schema:
{ return_type : { arguments_type : {node_type : node } }}
Definition at line 100 of file search_space.h.
A map of weights corresponding to elements in node_map, used to weight probabilities of each node being sampled from the map.
Definition at line 103 of file search_space.h.
Maps return types to terminals.
schema:
{ return_type : vector of Nodes }
Definition at line 113 of file search_space.h.
vector<DataType> Brush::SearchSpace::terminal_types |
A vector storing the available return types of terminals.
Definition at line 119 of file search_space.h.
A map of weights corresponding to elements in terminal_map, used to weight probabilities of each terminal being sampled from the map.
Definition at line 116 of file search_space.h.