| Signature | Description |
|---|---|
enum class bucket_type : unsigned char { by_distance = 1, // Bucketize by distance between two index values (i.g. X2 - X1 = N) by_count = 2, // Bucketize by counting of index values (e.g. every N index items) }; |
This determines the bucketization logic |
| Signature | Description | Parameters |
|---|---|---|
template<typename V, typename I_V, typename ... Ts> DataFrame bucketize(bucket_type bt, const V &value, I_V &&idx_visitor, Ts&& ... args) const; |
It bucketizes the data and index into intervals, based on index values and bucket_type. You must specify how the index column is bucketized, by providing a visitor. You must specify how each column is bucketized, by providing 3-member tuples (triples). Each triple must have the following members:
|
V: Type of value to be uased for bucketizing based on bucket_type I_V: Type of visitor to be used to bucketize the index column Ts: Types of triples to specify each column's bucketization bt: bucket_type to specify bucketization logic value: The value to be uased to bucketize based on bucket_type. For example, if bucket_type is by_distance, then value is the distance between two index values. If bucket_type is by_count, then value is an integer count. idx_visitor: A visitor to specify the index bucketization args: Variable argument list of triples as specified above |
template<typename V, typename I_V, typename ... Ts> std::future<DataFrame> bucketize_async(bucket_type bt, const V &value, I_V &&idx_visitor, Ts&& ... args) const; |
Same as bucketize() above, but executed asynchronously | |
template<typename I_V, typename ... Ts> DataFrame<DateTime, H> resample(time_frequency tf, size_type interval_num, I_V &&idx_visitor, Ts && ... args) const requires std::same_as<I, DateTime>; |
This is very similar to bucketize() but specialized for DataFrame with a DateTime index column. It bucketizes the data based on specific time periods. You must specify how the index column is bucketized, by providing a visitor. You must specify how each column is bucketized, by providing 3-member tuples (triples). Each triple must have the following members:
NOTE: The calling DataFrame must be sorted by index, otherwise the behavior is undefined. |
tf: Time frequency period to bucketize the data with interval_num: Number of time frequency periods to bucketize the data with idx_visitor: A visitor to specify the index bucketization args: Variable argument list of triples as specified above |
template<typename I_V, typename ... Ts> std::future<DataFrame<DateTime, H>> resample_async(time_frequency tf, size_type interval_num, I_V &&idx_visitor, Ts && ... args) const requires std::same_as<I, DateTime>; |
Same as resample() above, but executed asynchronously |
static void test_bucketize() { std::cout << "\nTesting bucketize( ) ..." << std::endl; MyDataFrame df; try { df.read("FORD.csv", io_format::csv2); auto fut = df.bucketize_async(bucket_type::by_distance, 100, LastVisitor<MyDataFrame::IndexType, MyDataFrame::IndexType>(), std::make_tuple("Date", "Date", LastVisitor<std::string>()), std::make_tuple("FORD_Close", "High", MaxVisitor<double>()), std::make_tuple("FORD_Close", "Low", MinVisitor<double>()), std::make_tuple("FORD_Close", "Open", FirstVisitor<double>()), std::make_tuple("FORD_Close", "Close", LastVisitor<double>()), std::make_tuple("FORD_Close", "Mean", MeanVisitor<double>()), std::make_tuple("FORD_Close", "Std", StdVisitor<double>()), std::make_tuple("FORD_Volume", "Volume", SumVisitor<long>())); MyDataFrame result = fut.get(); result.write<std::ostream, std::string, double, long>(std::cout, io_format::csv2); // FORD index is just an increasing number starting from 0. // So, by_count should give the same result as by_distance // auto fut2 = df.bucketize_async(bucket_type::by_count, 100, LastVisitor<MyDataFrame::IndexType, MyDataFrame::IndexType>(), std::make_tuple("Date", "Date", LastVisitor<std::string>()), std::make_tuple("FORD_Close", "High", MaxVisitor<double>()), std::make_tuple("FORD_Close", "Low", MinVisitor<double>()), std::make_tuple("FORD_Close", "Open", FirstVisitor<double>()), std::make_tuple("FORD_Close", "Close", LastVisitor<double>()), std::make_tuple("FORD_Close", "Mean", MeanVisitor<double>()), std::make_tuple("FORD_Close", "Std", StdVisitor<double>()), std::make_tuple("FORD_Volume", "Volume", SumVisitor<long>())); MyDataFrame result2 = fut2.get(); assert(result.is_equal(result2)); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; } }
// ---------------------------------------------------------------------------- static void test_resample() { std::cout << "\nTesting resample ..." << std::endl; DTDataFrame df; try { df.read("DT_IBM.csv", io_format::csv2); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; ::exit(-1); } const auto buckets1 = df.resample(time_frequency::daily, 75, LastVisitor<DTDataFrame::IndexType, MyDataFrame::IndexType>(), std::make_tuple("IBM_Close", "High", MaxVisitor<double>()), std::make_tuple("IBM_Close", "Low", MinVisitor<double>()), std::make_tuple("IBM_Close", "Open", FirstVisitor<double>()), std::make_tuple("IBM_Close", "Close", LastVisitor<double>()), std::make_tuple("IBM_Close", "Mean", MeanVisitor<double>()), std::make_tuple("IBM_Close", "Std", StdVisitor<double>()), std::make_tuple("IBM_Volume", "Volume", SumVisitor<long>()), std::make_tuple("IBM_Close", "Count", CountVisitor<double>())); buckets1.write<std::ostream, double, long, std::size_t>(std::cout, io_format::pretty_prt, { .precision = 2, .dt_format = DT_FORMAT::ISO_DT }); std::cout << "\n\n\n"; const auto buckets2 = df.resample(time_frequency::weekly, 10, LastVisitor<DTDataFrame::IndexType, MyDataFrame::IndexType>(), std::make_tuple("IBM_Close", "High", MaxVisitor<double>()), std::make_tuple("IBM_Close", "Low", MinVisitor<double>()), std::make_tuple("IBM_Close", "Open", FirstVisitor<double>()), std::make_tuple("IBM_Close", "Close", LastVisitor<double>()), std::make_tuple("IBM_Close", "Mean", MeanVisitor<double>()), std::make_tuple("IBM_Close", "Std", StdVisitor<double>()), std::make_tuple("IBM_Volume", "Volume", SumVisitor<long>()), std::make_tuple("IBM_Close", "Count", CountVisitor<double>())); buckets2.write<std::ostream, double, long, std::size_t>(std::cout, io_format::pretty_prt, { .precision = 2, .dt_format = DT_FORMAT::ISO_DT }); std::cout << "\n\n\n"; const auto buckets3 = df.resample(time_frequency::monthly, 6, LastVisitor<DTDataFrame::IndexType, MyDataFrame::IndexType>(), std::make_tuple("IBM_Close", "High", MaxVisitor<double>()), std::make_tuple("IBM_Close", "Low", MinVisitor<double>()), std::make_tuple("IBM_Close", "Open", FirstVisitor<double>()), std::make_tuple("IBM_Close", "Close", LastVisitor<double>()), std::make_tuple("IBM_Close", "Mean", MeanVisitor<double>()), std::make_tuple("IBM_Close", "Std", StdVisitor<double>()), std::make_tuple("IBM_Volume", "Volume", SumVisitor<long>()), std::make_tuple("IBM_Close", "Count", CountVisitor<double>())); buckets3.write<std::ostream, double, long, std::size_t>(std::cout, io_format::pretty_prt, { .precision = 2, .dt_format = DT_FORMAT::ISO_DT }); std::cout << "\n\n\n"; auto fut = df.resample_async(time_frequency::annual, 2, LastVisitor<DTDataFrame::IndexType, MyDataFrame::IndexType>(), std::make_tuple("IBM_Close", "High", MaxVisitor<double>()), std::make_tuple("IBM_Close", "Low", MinVisitor<double>()), std::make_tuple("IBM_Close", "Open", FirstVisitor<double>()), std::make_tuple("IBM_Close", "Close", LastVisitor<double>()), std::make_tuple("IBM_Close", "Mean", MeanVisitor<double>()), std::make_tuple("IBM_Close", "Std", StdVisitor<double>()), std::make_tuple("IBM_Volume", "Volume", SumVisitor<long>()), std::make_tuple("IBM_Close", "Count", CountVisitor<double>())); const auto buckets4 = fut.get(); buckets4.write<std::ostream, double, long, std::size_t>(std::cout, io_format::pretty_prt, { .precision = 2, .dt_format = DT_FORMAT::ISO_DT }); }