Back to Documentations

Signature Description
enum class bucket_type : unsigned char  {
    by_distance = 1, // Bucketize by distance between two index values (i.g. X2 - X1 = N)
    by_count = 2,    // Bucketize by counting of index values (e.g. every N index items)
};
This determines the bucketization logic

Signature Description Parameters
template<typename V, typename I_V, typename ... Ts>
DataFrame
bucketize(bucket_type bt,
          const V &value,
          I_V &&idx_visitor,
          Ts&& ... args) const;
It bucketizes the data and index into intervals, based on index values and bucket_type.
You must specify how the index column is bucketized, by providing a visitor.
You must specify how each column is bucketized, by providing 3-member tuples (triples). Each triple must have the following members:
  1. Current DataFrame column name
  2. Column name for the new bucketized DataFrame
  3. A visitor to aggregate/bucketize current column to new column
The result of each bucket will be stored in a new DataFrame and returned. Some data at the end of source columns may not be included in the result columns, because based on bucket_type they may not fit into the bucket. The index of each bucket will be determined by idx_visitor.
V: Type of value to be uased for bucketizing based on bucket_type
I_V: Type of visitor to be used to bucketize the index column
Ts: Types of triples to specify each column's bucketization
bt: bucket_type to specify bucketization logic
value: The value to be uased to bucketize based on bucket_type. For example, if bucket_type is by_distance, then value is the distance between two index values. If bucket_type is by_count, then value is an integer count.
idx_visitor: A visitor to specify the index bucketization
args: Variable argument list of triples as specified above
template<typename V, typename I_V, typename ... Ts>
std::future<DataFrame>
bucketize_async(bucket_type bt,
                const V &value,
                I_V &&idx_visitor,
                Ts&& ... args) const;
Same as bucketize() above, but executed asynchronously
template<typename I_V, typename ... Ts>
DataFrame<DateTime, H>
resample(time_frequency tf,
         size_type interval_num,
         I_V &&idx_visitor,
         Ts && ... args) const
    requires std::same_as<I, DateTime>;
This is very similar to bucketize() but specialized for DataFrame with a DateTime index column. It bucketizes the data based on specific time periods.
You must specify how the index column is bucketized, by providing a visitor.
You must specify how each column is bucketized, by providing 3-member tuples (triples). Each triple must have the following members:
  1. Current DataFrame column name
  2. Column name for the new bucketized DataFrame
  3. A visitor to aggregate/bucketize current column to new column
The result of each bucket will be stored in a new DataFrame and returned. Some data at the end of source columns may not be included in the result columns, because based on time frequency and interval they may not fit into the bucket. The index of each bucket will be determined by idx_visitor.

NOTE: The calling DataFrame must be sorted by index, otherwise the behavior is undefined.
tf: Time frequency period to bucketize the data with
interval_num: Number of time frequency periods to bucketize the data with
idx_visitor: A visitor to specify the index bucketization
args: Variable argument list of triples as specified above
template<typename I_V, typename ... Ts>
std::future<DataFrame<DateTime, H>>
resample_async(time_frequency tf,
               size_type interval_num,
               I_V &&idx_visitor,
               Ts && ... args) const
    requires std::same_as<I, DateTime>;
Same as resample() above, but executed asynchronously
static void test_bucketize()  {

    std::cout << "\nTesting bucketize( ) ..." << std::endl;

    MyDataFrame df;

    try  {
        df.read("FORD.csv", io_format::csv2);

        auto        fut =
            df.bucketize_async(bucket_type::by_distance,
                               100,
                               LastVisitor<MyDataFrame::IndexType, MyDataFrame::IndexType>(), 
                               std::make_tuple("Date", "Date", LastVisitor<std::string>()),
                               std::make_tuple("FORD_Close", "High", MaxVisitor<double>()),
                               std::make_tuple("FORD_Close", "Low", MinVisitor<double>()),
                               std::make_tuple("FORD_Close", "Open", FirstVisitor<double>()),
                               std::make_tuple("FORD_Close", "Close", LastVisitor<double>()),
                               std::make_tuple("FORD_Close", "Mean", MeanVisitor<double>()),
                               std::make_tuple("FORD_Close", "Std", StdVisitor<double>()),
                               std::make_tuple("FORD_Volume", "Volume", SumVisitor<long>()));
        MyDataFrame result = fut.get();

        result.write<std::ostream, std::string, double, long>(std::cout, io_format::csv2);

        // FORD index is just an increasing number starting from 0.
        // So, by_count should give the same result as by_distance
        //
        auto        fut2 =
            df.bucketize_async(bucket_type::by_count,
                               100,
                               LastVisitor<MyDataFrame::IndexType, MyDataFrame::IndexType>(), 
                               std::make_tuple("Date", "Date", LastVisitor<std::string>()),
                               std::make_tuple("FORD_Close", "High", MaxVisitor<double>()),
                               std::make_tuple("FORD_Close", "Low", MinVisitor<double>()),
                               std::make_tuple("FORD_Close", "Open", FirstVisitor<double>()),
                               std::make_tuple("FORD_Close", "Close", LastVisitor<double>()),
                               std::make_tuple("FORD_Close", "Mean", MeanVisitor<double>()),
                               std::make_tuple("FORD_Close", "Std", StdVisitor<double>()),
                               std::make_tuple("FORD_Volume", "Volume", SumVisitor<long>()));
        MyDataFrame result2 = fut2.get();

        assert(result.is_equal(result2));
    }
    catch (const DataFrameError &ex)  {
        std::cout << ex.what() << std::endl;
    }
}
// ----------------------------------------------------------------------------

static void test_resample()  {

    std::cout << "\nTesting resample ..." << std::endl;

    DTDataFrame df;

    try  {
        df.read("DT_IBM.csv", io_format::csv2);
    }
    catch (const DataFrameError &ex)  {
        std::cout << ex.what() << std::endl;
        ::exit(-1);
    }

    const auto  buckets1 = df.resample(time_frequency::daily, 75,
                                       LastVisitor<DTDataFrame::IndexType, MyDataFrame::IndexType>(),
                                       std::make_tuple("IBM_Close", "High", MaxVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Low", MinVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Open", FirstVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Close", LastVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Mean", MeanVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Std", StdVisitor<double>()),
                                       std::make_tuple("IBM_Volume", "Volume", SumVisitor<long>()),
                                       std::make_tuple("IBM_Close", "Count", CountVisitor<double>()));

    buckets1.write<std::ostream, double, long, std::size_t>(std::cout, io_format::pretty_prt, { .precision = 2, .dt_format = DT_FORMAT::ISO_DT });
    std::cout << "\n\n\n";

    const auto  buckets2 = df.resample(time_frequency::weekly, 10,
                                       LastVisitor<DTDataFrame::IndexType, MyDataFrame::IndexType>(),
                                       std::make_tuple("IBM_Close", "High", MaxVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Low", MinVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Open", FirstVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Close", LastVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Mean", MeanVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Std", StdVisitor<double>()),
                                       std::make_tuple("IBM_Volume", "Volume", SumVisitor<long>()),
                                       std::make_tuple("IBM_Close", "Count", CountVisitor<double>()));

    buckets2.write<std::ostream, double, long, std::size_t>(std::cout, io_format::pretty_prt, { .precision = 2, .dt_format = DT_FORMAT::ISO_DT });
    std::cout << "\n\n\n";

    const auto  buckets3 = df.resample(time_frequency::monthly, 6,
                                       LastVisitor<DTDataFrame::IndexType, MyDataFrame::IndexType>(),
                                       std::make_tuple("IBM_Close", "High", MaxVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Low", MinVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Open", FirstVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Close", LastVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Mean", MeanVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Std", StdVisitor<double>()),
                                       std::make_tuple("IBM_Volume", "Volume", SumVisitor<long>()),
                                       std::make_tuple("IBM_Close", "Count", CountVisitor<double>()));

    buckets3.write<std::ostream, double, long, std::size_t>(std::cout, io_format::pretty_prt, { .precision = 2, .dt_format = DT_FORMAT::ISO_DT });
    std::cout << "\n\n\n";

    auto        fut = df.resample_async(time_frequency::annual, 2,
                                        LastVisitor<DTDataFrame::IndexType, MyDataFrame::IndexType>(),
                                        std::make_tuple("IBM_Close", "High", MaxVisitor<double>()),
                                        std::make_tuple("IBM_Close", "Low", MinVisitor<double>()),
                                        std::make_tuple("IBM_Close", "Open", FirstVisitor<double>()),
                                        std::make_tuple("IBM_Close", "Close", LastVisitor<double>()),
                                        std::make_tuple("IBM_Close", "Mean", MeanVisitor<double>()),
                                        std::make_tuple("IBM_Close", "Std", StdVisitor<double>()),
                                        std::make_tuple("IBM_Volume", "Volume", SumVisitor<long>()),
                                        std::make_tuple("IBM_Close", "Count", CountVisitor<double>()));
    const auto  buckets4 = fut.get();

    buckets4.write<std::ostream, double, long, std::size_t>(std::cout, io_format::pretty_prt, { .precision = 2, .dt_format = DT_FORMAT::ISO_DT });
}

C++ DataFrame