SearchSpace#

struct SearchSpace#

Holds a search space, consisting of operations and terminals and functions, and methods to sample that space to create programs.

The set of operators is a user controlled parameter; however, we can automate, to some extent, the set of possible operators based on the data types in the problem. Constraints on operators based on data types:

  • only user specified operators are included.

  • operators whose arguments are covered by terminal types are included first. Then, a second pass includes any operators whose arguments are covered by terminal_types + return types of the current set of operators. One could imagine this continuing ad infinitum, but we just do two passes for simplicity.

  • assertion check to make sure there is at least one operator that returns the output type of the model.

When sampling in the search space (using any of the sampling functions sample_op or sample_terminal), some methods can fail to return a value — given a specific set of parameters to a function, the candidate solutions set may be empty — and, for these methods, the return type is either a valid value, or a std::nullopt. This is controlled wrapping the return type with std::optional.

Public Types

using ArgsHash = std::size_t#
template<typename T>
using Map = unordered_map<DataType, unordered_map<ArgsHash, unordered_map<NodeType, T>>>#

Public Functions

template<typename PT>
PT make_program(const Parameters &params, int max_d = 0, int max_size = 0)#

Makes a random program.

We use an implementation of PTC2 for strongly typed GP from

Sean Luke. “Two fast tree-creation algorithms for genetic programming” (https://doi.org/10.1109/4235.873237)

Template Parameters:

PT – program type

Parameters:
  • max_d – max depth of the program

  • max_size – max size of the programd

Returns:

a program of type PTsize

RegressorProgram make_regressor(int max_d = 0, int max_size = 0, const Parameters &params = Parameters())#

Makes a random regressor program. Convenience wrapper for make_program.

Parameters:
  • max_d – max depth of the program

  • max_size – max size of the program

Returns:

a regressor program

ClassifierProgram make_classifier(int max_d = 0, int max_size = 0, const Parameters &params = Parameters())#

Makes a random classifier program. Convenience wrapper for make_program.

Parameters:
  • max_d – max depth of the program

  • max_size – max size of the program

Returns:

a classifier program

MulticlassClassifierProgram make_multiclass_classifier(int max_d = 0, int max_size = 0, const Parameters &params = Parameters())#

Makes a random multiclass classifier program. Convenience wrapper for make_program.

Parameters:
  • max_d – max depth of the program

  • max_size – max size of the program

Returns:

a multiclass classifier program

RepresenterProgram make_representer(int max_d = 0, int max_size = 0, const Parameters &params = Parameters())#

Makes a random representer program. Convenience wrapper for make_program.

Parameters:
  • max_d – max depth of the program

  • max_size – max size of the program

Returns:

a representer program

SearchSpace() = default#
inline SearchSpace(const Dataset &d, const unordered_map<string, float> &user_ops = {}, bool weights_init = true)#

Construct a search space.

Parameters:
  • d – A dataset containing terminal definitions

  • user_ops – Optional user-provided dictionary of operators with their probability of being chosen

  • weights_init – whether the terminal prob_change should be estimated from correlations with the target value

void init(const Dataset &d, const unordered_map<string, float> &user_ops = {}, bool weights_init = true)#

Called by the constructor to initialize the search space.

Parameters:
  • d – A dataset containing terminal definitions

  • user_ops – Optional user-provided dictionary of operators with their probability of being chosen

  • weights_init – whether the terminal prob_change should be estimated from correlations with the target value

inline bool check(DataType R) const#

check if a return type is in the node map

Parameters:

R – data type

Returns:

true if it exists

inline bool check(DataType R, size_t sig_hash) const#

check if a function signature is in the search space

Parameters:
  • R – return type

  • sig_hash – signature hash

Returns:

true if it exists

inline bool check(DataType R, size_t sig_hash, NodeType type) const#

check if a typed Node is in the search space

Parameters:
  • R – return type

  • sig_hash – signature hash

  • type – the node type

Returns:

true if it exists

template<typename Iter>
inline bool has_solution_space(Iter start, Iter end) const#

Takes iterators to weight vectors and checks if they have a non-empty solution space. An empty solution space is defined as having no non-zero, positive values.

Template Parameters:

T – type of iterator.

Parameters:
  • start – Start iterator

  • end – End iterator

Returns:

true if at least one weight is positive

template<typename F>
Node get(const string &name)#
inline Node get(NodeType type, DataType R, size_t sig_hash)#

get a typed node

Parameters:
  • type – the node type

  • R – the return type of the node

  • sig_hash – the signature hash of the node

Returns:

the matching Node

template<typename S>
inline Node get(NodeType type, DataType R, S sig)#

get a typed node.

Template Parameters:

S – the signature of the node, inferred.

Parameters:
  • type – the node type

  • R – the return type of the node

  • sig – the signature of the node

Returns:

the matching Node

inline vector<float> get_weights() const#

get weights of the return types

Returns:

a weight vector, each element corresponding to a return type.

inline vector<float> get_weights(DataType ret) const#

get weights of the argument types matching return type ret.

Parameters:

ret – return type

Returns:

a weight vector, each element corresponding to an args type.

inline vector<float> get_weights(DataType ret, ArgsHash sig_hash) const#

get the weights of nodes matching a signature.

Parameters:
  • ret – return type

  • sig_hash – signature hash

Returns:

a weight vector, each element corresponding to a node.

inline std::optional<Node> sample_terminal(bool force_return = false) const#

Get a random terminal.

Returns:

std::optional that may contain a terminal Node.

inline std::optional<Node> sample_terminal(DataType R, bool force_return = false) const#

Get a random terminal with return type R

Returns:

std::optional that may contain a terminal Node of type R.

inline std::optional<Node> sample_op(DataType ret) const#

get an operator matching return type ret.

Parameters:

ret – return type

Returns:

std::optional that may contain a randomly chosen operator matching return type ret

inline std::optional<Node> sample_op(NodeType type, DataType R)#

Get a specific node type that matches a return value.

Parameters:
  • type – the node type

  • R – the return type

Returns:

std::optional that may contain a Node of type type with return type R.

inline std::optional<Node> sample_op_with_arg(DataType ret, DataType arg, bool terminal_compatible = true, int max_args = 0) const#

get operator with at least one argument matching arg

Parameters:
  • ret – return type

  • arg – argument type to match

  • terminal_compatible – if true, the other args the returned operator takes must exist in the terminal types.

  • max_args – if zero, there is no limit on number of arguments of the operator. If not, the operator can have at most max_args arguments.

Returns:

std::optional that may contain a matching operator respecting all restrictions.

inline std::optional<Node> get_node_like(Node node) const#

get a node with a signature matching node

Parameters:

node – the node to match

Returns:

std::optional that may contain a Node

std::optional<tree<Node>> sample_subtree(Node root, int max_d, int max_size) const#

create a subtree with maximum size and depth restrictions and root of type root_type

Parameters:
  • root_type – return type

  • max_d – the maximum depth

  • max_size – the maximum size of the tree (will be sampled between [1, max_size])

Returns:

std::optional that may contain a tree

void print() const#

prints the search space map.

template<typename P>
P make_program(const Parameters &params, int max_d, int max_size)#

Public Members

Map<Node> node_map#

Maps return types to argument types to node types.

schema:

{ return_type : { arguments_type : {node_type : node } }}

Map<float> node_map_weights#

A map of weights corresponding to elements in node_map, used to weight probabilities of each node being sampled from the map.

unordered_map<DataType, vector<Node>> terminal_map#

Maps return types to terminals.

schema:

 { return_type : vector of Nodes }

unordered_map<DataType, vector<float>> terminal_weights#

A map of weights corresponding to elements in terminal_map, used to weight probabilities of each terminal being sampled from the map.

vector<DataType> terminal_types#

A vector storing the available return types of terminals.