| Signature | Description | Parameters |
|---|---|---|
template<typename T, typename ITR> std::size_t load_column(const char *name, Index2D<const ITR &> range, nan_policy padding = nan_policy::pad_with_nans, bool do_lock = true); |
It copies the data from iterators begin to end to the named column. If column does not exist, it will be created. If the column exist, it will be over written. Returns number of items loaded |
T: Type of data being copied ITR: Type of the iterator name: Name of the column range: The begin and end iterators for data padding: If true, it pads the data column with nan if it is shorter than the index column. do_lock: This is used to optimize DataFrame internal library code. DataFrame users should always use the default. |
template<typename T> std::size_t load_column(const char *name, std::vector<T> &&data, nan_policy padding = nan_policy::pad_with_nans, bool do_lock = true); |
It moves the data to the named column in DataFrame. If column does not exist, it will be created. If the column exist, it will be over written. Returns number of items loaded |
T: Type of data being copied ITR: Type of the iterator name: Name of the column data: Data vector range: The begin and end iterators for data padding: If true, it pads the data column with nan if it is shorter than the index column. do_lock: This is used to optimize DataFrame internal library code. DataFrame users should always use the default. |
template<typename T> std::size_t load_column(const char *name, const std::vector<T> &data, nan_policy padding = nan_policy::pad_with_nans, bool do_lock = true); |
It copies the data to the named column in DataFrame. If column does not exist, it will be created. If the column exist, it will be over written. Returns number of items loaded |
T: Type of data being copied ITR: Type of the iterator name: Name of the column data: Data vector range: The begin and end iterators for data padding: If true, it pads the data column with nan if it is shorter than the index column. do_lock: This is used to optimize DataFrame internal library code. DataFrame users should always use the default. |
template<typename NT, typename ET> std::size_t load_column( const char *new_col_name, const char *existing_col_name, std::function<NT(const IndexType &, const ET &)> &&func, nan_policy padding = nan_policy::pad_with_nans, bool do_lock = true); |
This method feeds an existing column data, along with index data, into the given functor which for each data point creates a new data point for a new column with the given name Returns number of items loaded |
NT: Type of the new column ET: Type of the existing column func: Functor to create the new column content padding: If true, it pads the data column with nan if it is shorter than the index column. do_lock: This is used to optimize DataFrame internal library code. DataFrame users should always use the default. |
template<typename T> std::size_t load_align_column( const char *name, const std::vector<T> &&data, std::size_t interval, bool start_from_beginning, const T &null_value = get_nan<T>(), std::function<std::size_t (const IndexType &, const IndexType &)> diff_func = [](const IndexType &t_1, const IndexType &t) -> std::size_t { return (static_cast<std::size_t>(t - t_1)); }); |
This method creates a column similar to above, but it assumes data is bucketed (bar) values. That means the data vector contains statistical figure(s) for time buckets and must be aligned with the index column at bucket intervals. For example, index column is in minutes unit. And data vector is the sum of 5-minute buckets of some column, or some data set not present in DataFrame. The values in data vector will be aligned with the index column at every 5 minutes interval. The in-between values will be "null_value". NOTE: The data vector must contain (index size / interval) number of values or less, if index has values per interval. Otherwise, data must contain appropriate number of values. NOTE: The index must be in ascending order |
T: Type of data being loaded name: Name of the column data: Data vector interval: Bucket interval measured in index units distance start_from_beginning: If true, the first data value will be associated with the first index value. If false, the first data value will be associated with index value interval away from the first index value null_value: The value to fill the new column in-between intervals. The default is T version of NaN. For None NaN'able types, it will be default value for T diff_func: Function to calculate distance between two index values |
template<typename T, typename ITR> std::size_t load_random_sample(const char *name, Index2D<const ITR &> range, long num_recs = -1, seed_t seed = seed_t(-1)); |
This loads a random sample of data given inside the range universe into a new column of the DataFrame. |
T: Type of the new column ITR: Type of the iterators inside the range name: Name of the new column range: The begin and end iterators for data num_recs: Number of data points sampled and loaded in the new column. The default is the same number as the index column. seed: User could specify a seed. The same seed should always produce the same random selections. |
static void test_load_align_column() { std::cout << "\nTesting load_align_column( ) ..." << std::endl; std::vector<unsigned long> idxvec = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 }; std::vector<int> intvec = { -1, 2, 3, 4, 5, 8, -6, 7, 11, 14, -9, 12, 13, 14, 15 }; std::vector<double> summary_vec = { 100, 200, 300, 400, 500 }; MyDataFrame df; df.load_data(std::move(idxvec), std::make_pair("int_col", intvec)); df.load_align_column("summary_col", std::move(summary_vec), 5, true); std::vector<double> summary_vec_2 = { 102, 202, 302, 402, 502 }; df.load_align_column("summary_col_2", std::move(summary_vec_2), 5, false); assert(df.get_column<double>("summary_col").size() == 28); assert(df.get_column<double>("summary_col_2").size() == 28); assert(df.get_column<double>("summary_col")[0] == 100); assert(std::isnan(df.get_column<double>("summary_col_2")[0])); assert(df.get_column<double>("summary_col")[5] == 200); assert(std::isnan(df.get_column<double>("summary_col")[6])); assert(df.get_column<double>("summary_col_2")[5] == 102); assert(df.get_column<double>("summary_col")[20] == 500); assert(df.get_column<double>("summary_col_2")[25] == 502); assert(std::isnan(df.get_column<double>("summary_col")[27])); assert(std::isnan(df.get_column<double>("summary_col")[26])); assert(std::isnan(df.get_column<double>("summary_col_2")[27])); assert(std::isnan(df.get_column<double>("summary_col_2")[26])); }
// ---------------------------------------------------------------------------- static void test_load_column() { std::cout << "\nTesting load_column( ) ..." << std::endl; MyDataFrame df; StlVecType<unsigned long> idxvec = { 1UL, 2UL, 3UL, 10UL, 5UL, 7UL, 8UL, 12UL, 9UL, 12UL, 10UL, 13UL, 10UL, 15UL, 14UL }; StlVecType<double> dblvec = { 0.0, 15.0, -14.0, 2.0, 1.0, -12.0, 11.0, 8.0, 7.0, 0.0, 5.0, 4.0, 3.0, 9.0, -10.0 }; StlVecType<double> dblvec2 = { 1.0, 0.05, 0.28, 0.31, 0.01, 0.68, 0.12, 1, 0.98, 0.9, 0.81, 0.82, 0.777, 0.34, 0.25 }; StlVecType<std::string> strvec = { "zz", "bb", "zz", "ww", "ee", "ff", "gg", "hh", "zz", "jj", "kk", "ll", "mm", "nn", "zz" }; df.load_data(std::move(idxvec), std::make_pair("dbl_col", dblvec), std::make_pair("dbl_col_2", dblvec2), std::make_pair("str_col", strvec)); auto lbd = [](const unsigned long &, const double &val) -> double { return (val * 2.0); }; df.load_column<double, double>("new_dbl_col", "dbl_col", std::move(lbd)); { StlVecType<double> new_dbl_col = { 0, 30, -28, 4, 2, -24, 22, 16, 14, 0, 10, 8, 6, 18, -20 }; assert((df.get_column<double>("new_dbl_col") == new_dbl_col)); } }
// ---------------------------------------------------------------------------- static void test_load_random_sample() { std::cout << "\nTesting load_random_sample( ) ..." << std::endl; using iter_t = std::vector<std::string>::const_iterator; StrDataFrame ibm; try { ibm.read("IBM.csv", io_format::csv2); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; ::exit(-1); } const std::vector<std::string> universe { "AA", "BB", "CC", "DD", "EE", "FF", "GG", "HH", "II", "JJ", "KK", "LL", "MM", "NN", "OO", "PP", "QQ", "RR", "SS", "TT", "UU", "VV", "WW", "XX", "YY", "ZZ", }; ibm.load_random_sample<std::string, iter_t>("Random Sample", { universe.cbegin(), universe.cend() }, ibm.get_index().size(), 123); const auto &str_col = ibm.get_column<std::string>("Random Sample"); assert(str_col.size() == ibm.get_index().size()); assert(str_col[0] == "SS"); assert(str_col[5] == "RR"); assert(str_col[63] == "AA"); assert(str_col[94] == "ZZ"); assert(str_col[5011] == "ZZ"); assert(str_col[5030] == "CC"); }