SearchSpace#
-
struct SearchSpace#
Holds a search space, consisting of operations and terminals and functions, and methods to sample that space to create programs.
The set of operators is a user controlled parameter; however, we can automate, to some extent, the set of possible operators based on the data types in the problem. Constraints on operators based on data types:
only user specified operators are included.
operators whose arguments are covered by terminal types are included first. Then, a second pass includes any operators whose arguments are covered by terminal_types + return types of the current set of operators. One could imagine this continuing ad infinitum, but we just do two passes for simplicity.
assertion check to make sure there is at least one operator that returns the output type of the model.
When sampling in the search space (using any of the sampling functions
sample_op
orsample_terminal
), some methods can fail to return a value — given a specific set of parameters to a function, the candidate solutions set may be empty — and, for these methods, the return type is either a valid value, or astd::nullopt
. This is controlled wrapping the return type withstd::optional
.Public Types
-
using ArgsHash = std::size_t#
Public Functions
-
template<typename PT>
PT make_program(const Parameters ¶ms, int max_d = 0, int max_size = 0)# Makes a random program.
We use an implementation of PTC2 for strongly typed GP from
Sean Luke. “Two fast tree-creation algorithms for genetic programming” (https://doi.org/10.1109/4235.873237)
- Template Parameters:
PT – program type
- Parameters:
max_d – max depth of the program
max_size – max size of the programd
- Returns:
a program of type PTsize
-
RegressorProgram make_regressor(int max_d = 0, int max_size = 0, const Parameters ¶ms = Parameters())#
Makes a random regressor program. Convenience wrapper for make_program.
- Parameters:
max_d – max depth of the program
max_size – max size of the program
- Returns:
a regressor program
-
ClassifierProgram make_classifier(int max_d = 0, int max_size = 0, const Parameters ¶ms = Parameters())#
Makes a random classifier program. Convenience wrapper for make_program.
- Parameters:
max_d – max depth of the program
max_size – max size of the program
- Returns:
a classifier program
-
MulticlassClassifierProgram make_multiclass_classifier(int max_d = 0, int max_size = 0, const Parameters ¶ms = Parameters())#
Makes a random multiclass classifier program. Convenience wrapper for make_program.
- Parameters:
max_d – max depth of the program
max_size – max size of the program
- Returns:
a multiclass classifier program
-
RepresenterProgram make_representer(int max_d = 0, int max_size = 0, const Parameters ¶ms = Parameters())#
Makes a random representer program. Convenience wrapper for make_program.
- Parameters:
max_d – max depth of the program
max_size – max size of the program
- Returns:
a representer program
-
SearchSpace() = default#
-
inline SearchSpace(const Dataset &d, const unordered_map<string, float> &user_ops = {}, bool weights_init = true)#
Construct a search space.
- Parameters:
d – A dataset containing terminal definitions
user_ops – Optional user-provided dictionary of operators with their probability of being chosen
weights_init – whether the terminal prob_change should be estimated from correlations with the target value
-
void init(const Dataset &d, const unordered_map<string, float> &user_ops = {}, bool weights_init = true)#
Called by the constructor to initialize the search space.
- Parameters:
d – A dataset containing terminal definitions
user_ops – Optional user-provided dictionary of operators with their probability of being chosen
weights_init – whether the terminal prob_change should be estimated from correlations with the target value
-
inline bool check(DataType R) const#
check if a return type is in the node map
- Parameters:
R – data type
- Returns:
true if it exists
-
inline bool check(DataType R, size_t sig_hash) const#
check if a function signature is in the search space
- Parameters:
R – return type
sig_hash – signature hash
- Returns:
true if it exists
-
inline bool check(DataType R, size_t sig_hash, NodeType type) const#
check if a typed Node is in the search space
- Parameters:
R – return type
sig_hash – signature hash
type – the node type
- Returns:
true if it exists
-
template<typename Iter>
inline bool has_solution_space(Iter start, Iter end) const# Takes iterators to weight vectors and checks if they have a non-empty solution space. An empty solution space is defined as having no non-zero, positive values.
- Template Parameters:
T – type of iterator.
- Parameters:
start – Start iterator
end – End iterator
- Returns:
true if at least one weight is positive
-
inline Node get(NodeType type, DataType R, size_t sig_hash)#
get a typed node
- Parameters:
type – the node type
R – the return type of the node
sig_hash – the signature hash of the node
- Returns:
the matching Node
-
template<typename S>
inline Node get(NodeType type, DataType R, S sig)# get a typed node.
- Template Parameters:
S – the signature of the node, inferred.
- Parameters:
type – the node type
R – the return type of the node
sig – the signature of the node
- Returns:
the matching Node
-
inline vector<float> get_weights() const#
get weights of the return types
- Returns:
a weight vector, each element corresponding to a return type.
-
inline vector<float> get_weights(DataType ret) const#
get weights of the argument types matching return type
ret
.- Parameters:
ret – return type
- Returns:
a weight vector, each element corresponding to an args type.
-
inline vector<float> get_weights(DataType ret, ArgsHash sig_hash) const#
get the weights of nodes matching a signature.
- Parameters:
ret – return type
sig_hash – signature hash
- Returns:
a weight vector, each element corresponding to a node.
-
inline std::optional<Node> sample_terminal(bool force_return = false) const#
Get a random terminal.
- Returns:
std::optional
that may contain a terminal Node.
-
inline std::optional<Node> sample_terminal(DataType R, bool force_return = false) const#
Get a random terminal with return type
R
- Returns:
std::optional
that may contain a terminal Node of typeR
.
-
inline std::optional<Node> sample_op(DataType ret) const#
get an operator matching return type
ret
.- Parameters:
ret – return type
- Returns:
std::optional
that may contain a randomly chosen operator matching return typeret
-
inline std::optional<Node> sample_op(NodeType type, DataType R)#
Get a specific node type that matches a return value.
- Parameters:
type – the node type
R – the return type
- Returns:
std::optional
that may contain a Node of typetype
with return typeR
.
-
inline std::optional<Node> sample_op_with_arg(DataType ret, DataType arg, bool terminal_compatible = true, int max_args = 0) const#
get operator with at least one argument matching arg
- Parameters:
ret – return type
arg – argument type to match
terminal_compatible – if true, the other args the returned operator takes must exist in the terminal types.
max_args – if zero, there is no limit on number of arguments of the operator. If not, the operator can have at most
max_args
arguments.
- Returns:
std::optional
that may contain a matching operator respecting all restrictions.
-
inline std::optional<Node> get_node_like(Node node) const#
get a node with a signature matching
node
- Parameters:
node – the node to match
- Returns:
std::optional
that may contain a Node
-
std::optional<tree<Node>> sample_subtree(Node root, int max_d, int max_size) const#
create a subtree with maximum size and depth restrictions and root of type
root_type
- Parameters:
root_type – return type
max_d – the maximum depth
max_size – the maximum size of the tree (will be sampled between [1, max_size])
- Returns:
std::optional
that may contain a tree
-
void print() const#
prints the search space map.
-
template<typename P>
P make_program(const Parameters ¶ms, int max_d, int max_size)#
Public Members
-
Map<Node> node_map#
Maps return types to argument types to node types.
schema:
{ return_type : { arguments_type : {node_type : node } }}
-
Map<float> node_map_weights#
A map of weights corresponding to elements in node_map, used to weight probabilities of each node being sampled from the map.
-
unordered_map<DataType, vector<Node>> terminal_map#
Maps return types to terminals.
schema:
{ return_type : vector of Nodes }
-
unordered_map<DataType, vector<float>> terminal_weights#
A map of weights corresponding to elements in terminal_map, used to weight probabilities of each terminal being sampled from the map.
-
vector<DataType> terminal_types#
A vector storing the available return types of terminals.