Brush C++ API
A flexible interpretable machine learning framework
Loading...
Searching...
No Matches
Brush::Data::Dataset Class Reference

holds variable type data. More...

#include <data.h>

Collaboration diagram for Brush::Data::Dataset:

Public Member Functions

Dataset operator() (const vector< size_t > &idx) const
 return a slice of the data using indices idx
 
void init ()
 call init at the end of constructors to define metafeatures of the data.
 
map< string, Statemake_features (const ArrayXXf &X, const map< string, State > &Z={}, const vector< string > &vn={})
 turns input data into a feature map
 
map< string, Statecopy_and_make_features (const ArrayXXf &X, const Dataset &ref_dataset, const vector< string > &vn={})
 turns input into a feature map, with feature types copied from a reference
 
 Dataset (std::map< string, State > &d, const Ref< const ArrayXf > &y_=ArrayXf(), bool c=false, float validation_size=0.0, float batch_size=1.0)
 
 Dataset (const ArrayXXf &X, const Ref< const ArrayXf > &y_=ArrayXf(), const vector< string > &vn={}, const map< string, State > &Z={}, bool c=false, float validation_size=0.0, float batch_size=1.0)
 
 Dataset (const ArrayXXf &X, const vector< string > &vn, bool c=false, float validation_size=0.0, float batch_size=1.0)
 
 Dataset (const ArrayXXf &X, const Dataset &ref_dataset, const vector< string > &vn, bool c=false)
 
void print () const
 
auto get_X () const
 
Dataset get_training_data () const
 
Dataset get_validation_data () const
 
int get_n_samples () const
 
int get_n_features () const
 
Dataset get_batch () const
 select random subset of data for training weights.
 
float get_batch_size ()
 
void set_batch_size (float new_size)
 
std::array< Dataset, 2 > split (const ArrayXb &mask) const
 
State operator[] (std::string name) const
 

Public Attributes

std::vector< DataTypeunique_data_types
 keeps track of the unique data types in the dataset.
 
std::vector< DataTypefeature_types
 types of data in the features.

 
std::unordered_map< DataType, vector< string > > features_of_type
 map from data types to features having that type.
 
std::map< string, Statefeatures
 dataset features, as key value pairs
 
ArrayXf y
 length N array, the target label
 
bool classification
 whether this is a classification problem
 
std::optional< std::reference_wrapper< const ArrayXXf > > Xref
 
float validation_size
 percentage of original data used for train. if 0.0, then all data is used for train and validation
 
bool use_validation
 
float batch_size
 percentage of training data size to use in each batch. if 1.0, then all data is used
 
bool use_batch
 

Private Attributes

vector< size_ttraining_data_idx
 
vector< size_tvalidation_data_idx
 

Detailed Description

holds variable type data.

Definition at line 50 of file data.h.

Constructor & Destructor Documentation

◆ Dataset() [1/4]

Brush::Data::Dataset::Dataset ( std::map< string, State > & d,
const Ref< const ArrayXf > & y_ = ArrayXf(),
bool c = false,
float validation_size = 0.0,
float batch_size = 1.0 )
inline
  1. initialize data from a map.

Definition at line 110 of file data.h.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ Dataset() [2/4]

Brush::Data::Dataset::Dataset ( const ArrayXXf & X,
const Ref< const ArrayXf > & y_ = ArrayXf(),
const vector< string > & vn = {},
const map< string, State > & Z = {},
bool c = false,
float validation_size = 0.0,
float batch_size = 1.0 )
inline
  1. initialize data from a matrix with feature columns.

Definition at line 126 of file data.h.

◆ Dataset() [3/4]

Brush::Data::Dataset::Dataset ( const ArrayXXf & X,
const vector< string > & vn,
bool c = false,
float validation_size = 0.0,
float batch_size = 1.0 )
inline
  1. initialize data from X and feature names

Definition at line 147 of file data.h.

◆ Dataset() [4/4]

Brush::Data::Dataset::Dataset ( const ArrayXXf & X,
const Dataset & ref_dataset,
const vector< string > & vn,
bool c = false )
inline

Definition at line 166 of file data.h.

Here is the call graph for this function:

Member Function Documentation

◆ copy_and_make_features()

map< string, State > Brush::Data::Dataset::copy_and_make_features ( const ArrayXXf & X,
const Dataset & ref_dataset,
const vector< string > & vn = {} )

turns input into a feature map, with feature types copied from a reference

Definition at line 283 of file data.cpp.

Here is the call graph for this function:

◆ get_batch()

Dataset Brush::Data::Dataset::get_batch ( ) const

select random subset of data for training weights.

Definition at line 146 of file data.cpp.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ get_batch_size()

float Brush::Data::Dataset::get_batch_size ( )

Definition at line 231 of file data.cpp.

◆ get_n_features()

int Brush::Data::Dataset::get_n_features ( ) const
inline

Definition at line 215 of file data.h.

Here is the caller graph for this function:

◆ get_n_samples()

int Brush::Data::Dataset::get_n_samples ( ) const
inline

Definition at line 209 of file data.h.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ get_training_data()

Dataset Brush::Data::Dataset::get_training_data ( ) const

Definition at line 173 of file data.cpp.

Here is the caller graph for this function:

◆ get_validation_data()

Dataset Brush::Data::Dataset::get_validation_data ( ) const

Definition at line 174 of file data.cpp.

Here is the caller graph for this function:

◆ get_X()

auto Brush::Data::Dataset::get_X ( ) const
inline

Definition at line 197 of file data.h.

◆ init()

void Brush::Data::Dataset::init ( )

call init at the end of constructors to define metafeatures of the data.

Definition at line 178 of file data.cpp.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ make_features()

map< string, State > Brush::Data::Dataset::make_features ( const ArrayXXf & X,
const map< string, State > & Z = {},
const vector< string > & vn = {} )

turns input data into a feature map

Definition at line 238 of file data.cpp.

Here is the call graph for this function:

◆ operator()()

Dataset Brush::Data::Dataset::operator() ( const vector< size_t > & idx) const

return a slice of the data using indices idx

Definition at line 117 of file data.cpp.

Here is the call graph for this function:

◆ operator[]()

State Brush::Data::Dataset::operator[] ( std::string name) const
inline

Definition at line 224 of file data.h.

◆ print()

void Brush::Data::Dataset::print ( ) const
inline

Definition at line 181 of file data.h.

Here is the call graph for this function:

◆ set_batch_size()

void Brush::Data::Dataset::set_batch_size ( float new_size)

Definition at line 232 of file data.cpp.

Here is the call graph for this function:

◆ split()

array< Dataset, 2 > Brush::Data::Dataset::split ( const ArrayXb & mask) const

Definition at line 162 of file data.cpp.

Here is the call graph for this function:

Member Data Documentation

◆ batch_size

float Brush::Data::Dataset::batch_size

percentage of training data size to use in each batch. if 1.0, then all data is used

Definition at line 87 of file data.h.

◆ classification

bool Brush::Data::Dataset::classification

whether this is a classification problem

Definition at line 79 of file data.h.

◆ feature_types

std::vector<DataType> Brush::Data::Dataset::feature_types

types of data in the features.

Definition at line 65 of file data.h.

◆ features

std::map<string, State> Brush::Data::Dataset::features

dataset features, as key value pairs

Definition at line 71 of file data.h.

◆ features_of_type

std::unordered_map<DataType,vector<string> > Brush::Data::Dataset::features_of_type

map from data types to features having that type.

Definition at line 68 of file data.h.

◆ training_data_idx

vector<size_t> Brush::Data::Dataset::training_data_idx
private

Definition at line 57 of file data.h.

◆ unique_data_types

std::vector<DataType> Brush::Data::Dataset::unique_data_types

keeps track of the unique data types in the dataset.

Definition at line 62 of file data.h.

◆ use_batch

bool Brush::Data::Dataset::use_batch

Definition at line 88 of file data.h.

◆ use_validation

bool Brush::Data::Dataset::use_validation

Definition at line 84 of file data.h.

◆ validation_data_idx

vector<size_t> Brush::Data::Dataset::validation_data_idx
private

Definition at line 58 of file data.h.

◆ validation_size

float Brush::Data::Dataset::validation_size

percentage of original data used for train. if 0.0, then all data is used for train and validation

Definition at line 83 of file data.h.

◆ Xref

std::optional<std::reference_wrapper<const ArrayXXf> > Brush::Data::Dataset::Xref

Definition at line 80 of file data.h.

◆ y

ArrayXf Brush::Data::Dataset::y

length N array, the target label

Definition at line 76 of file data.h.


The documentation for this class was generated from the following files: