Feat C++ API
A feature engineering automation tool
|
This class implements the Random Forests algorithm. In Random Forests algorithm, we train a number of randomized CART trees (see class CRandomCARTree) using the supplied training data. The number of trees to be trained is a parameter (called number of bags) controlled by the user. Test feature vectors are classified/regressed by combining the outputs of all these trained candidate trees using a combination rule (see class CCombinationRule). The feature for calculating out-of-box error is also provided to help determine the appropriate number of bags. The evaluatin criteria for calculating this out-of-box error is specified by the user (see class CEvaluation). More...
#include <MyRandomForest.h>
Public Member Functions | |
CMyRandomForest () | |
virtual | ~CMyRandomForest () |
virtual const char * | get_name () const |
void | set_weights (SGVector< float64_t > weights) |
SGVector< float64_t > | get_weights () const |
void | set_feature_types (SGVector< bool > ft) |
SGVector< bool > | get_feature_types () const |
virtual EProblemType | get_machine_problem_type () const |
void | set_machine_problem_type (EProblemType mode) |
void | set_num_random_features (int32_t rand_featsize) |
int32_t | get_num_random_features () const |
std::vector< double > | feature_importances () |
void | set_probabilities (CLabels *labels, CFeatures *data=NULL) |
Protected Member Functions | |
virtual bool | train_machine (CFeatures *data=NULL) |
virtual void | set_machine_parameters (CMachine *m, SGVector< index_t > idx) |
Private Attributes | |
SGVector< float64_t > | m_weights |
SGMatrix< float64_t > | m_sorted_transposed_feats |
SGMatrix< index_t > | m_sorted_indices |
This class implements the Random Forests algorithm. In Random Forests algorithm, we train a number of randomized CART trees (see class CRandomCARTree) using the supplied training data. The number of trees to be trained is a parameter (called number of bags) controlled by the user. Test feature vectors are classified/regressed by combining the outputs of all these trained candidate trees using a combination rule (see class CCombinationRule). The feature for calculating out-of-box error is also provided to help determine the appropriate number of bags. The evaluatin criteria for calculating this out-of-box error is specified by the user (see class CEvaluation).
Definition at line 48 of file MyRandomForest.h.
CMyRandomForest::CMyRandomForest | ( | ) |
constructor
Definition at line 36 of file MyRandomForest.cc.
|
virtual |
destructor
Definition at line 45 of file MyRandomForest.cc.
std::vector< double > CMyRandomForest::feature_importances | ( | ) |
WGL: return Gini importance scores for features
Definition at line 139 of file MyRandomForest.cc.
SGVector< bool > CMyRandomForest::get_feature_types | ( | ) | const |
get feature types of various features
Definition at line 65 of file MyRandomForest.cc.
|
virtual |
get problem type - multiclass classification or regression
Definition at line 71 of file MyRandomForest.cc.
|
inlinevirtual |
int32_t CMyRandomForest::get_num_random_features | ( | ) | const |
get number of random features to be chosen during node splits
Definition at line 91 of file MyRandomForest.cc.
SGVector< float64_t > CMyRandomForest::get_weights | ( | ) | const |
get weights
Definition at line 54 of file MyRandomForest.cc.
void CMyRandomForest::set_feature_types | ( | SGVector< bool > | ft | ) |
set feature types of various features
ft | bool vector true for nominal feature false for continuous feature type |
Definition at line 59 of file MyRandomForest.cc.
|
protectedvirtual |
sets parameters of CARTree - sets machine labels and weights here
m | machine |
idx | indices of training vectors chosen in current bag |
Definition at line 97 of file MyRandomForest.cc.
void CMyRandomForest::set_machine_problem_type | ( | EProblemType | mode | ) |
set problem type - multiclass classification or regression
mode | EProblemType PT_MULTICLASS or PT_REGRESSION |
Definition at line 77 of file MyRandomForest.cc.
void CMyRandomForest::set_num_random_features | ( | int32_t | rand_featsize | ) |
set number of random features to be chosen during node splits
rand_featsize | number of randomly chosen features during each node split |
Definition at line 83 of file MyRandomForest.cc.
void CMyRandomForest::set_probabilities | ( | CLabels * | labels, |
CFeatures * | data = NULL |
||
) |
WGL: sets the probabilities on each label according to m_certainty
Definition at line 160 of file MyRandomForest.cc.
void CMyRandomForest::set_weights | ( | SGVector< float64_t > | weights | ) |
set weights
weights | of training feature vectors |
Definition at line 49 of file MyRandomForest.cc.
|
protectedvirtual |
Definition at line 123 of file MyRandomForest.cc.
|
private |
Indices of pre-sorted features
Definition at line 137 of file MyRandomForest.h.
|
private |
Pre-sorted features
Definition at line 134 of file MyRandomForest.h.
|
private |
weights
Definition at line 131 of file MyRandomForest.h.