Signature	Description
enum class bucket_type : unsigned char { by_distance = 1, // Bucketize by distance between two index values (i.g. X₂ - X₁ = N) by_count = 2, // Bucketize by counting of index values (e.g. every N index items) };	This determines the bucketization logic

Signature

Description

enum class bucket_type : unsigned char  {
    by_distance = 1, // Bucketize by distance between two index values (i.g. X₂ - X₁ = N)
    by_count = 2,    // Bucketize by counting of index values (e.g. every N index items)
};

This determines the bucketization logic

Signature	Description	Parameters
template<typename V, typename I_V, typename ... Ts> DataFrame bucketize(bucket_type bt, const V &value, I_V &&idx_visitor, Ts&& ... args) const;	It bucketizes the data and index into intervals, based on index values and bucket_type. You must specify how the index column is bucketized, by providing a visitor. You must specify how each column is bucketized, by providing 3-member tuples (triples). Each triple must have the following members: Current DataFrame column name Column name for the new bucketized DataFrame A visitor to aggregate/bucketize current column to new column The result of each bucket will be stored in a new DataFrame and returned. Some data at the end of source columns may not be included in the result columns, because based on bucket_type they may not fit into the bucket. The index of each bucket will be determined by idx_visitor.	V: Type of value to be uased for bucketizing based on bucket_type I_V: Type of visitor to be used to bucketize the index column Ts: Types of triples to specify each column's bucketization bt: bucket_type to specify bucketization logic value: The value to be uased to bucketize based on bucket_type. For example, if bucket_type is by_distance, then value is the distance between two index values. If bucket_type is by_count, then value is an integer count. idx_visitor: A visitor to specify the index bucketization args: Variable argument list of triples as specified above
template<typename V, typename I_V, typename ... Ts> std::future<DataFrame> bucketize_async(bucket_type bt, const V &value, I_V &&idx_visitor, Ts&& ... args) const;	Same as bucketize() above, but executed asynchronously
template<typename I_V, typename ... Ts> DataFrame<DateTime, H> resample(time_frequency tf, size_type interval_num, I_V &&idx_visitor, Ts && ... args) const requires std::same_as<I, DateTime>;	This is very similar to bucketize() but specialized for DataFrame with a DateTime index column. It bucketizes the data based on specific time periods. You must specify how the index column is bucketized, by providing a visitor. You must specify how each column is bucketized, by providing 3-member tuples (triples). Each triple must have the following members: Current DataFrame column name Column name for the new bucketized DataFrame A visitor to aggregate/bucketize current column to new column The result of each bucket will be stored in a new DataFrame and returned. Some data at the end of source columns may not be included in the result columns, because based on time frequency and interval they may not fit into the bucket. The index of each bucket will be determined by idx_visitor. NOTE: The calling DataFrame must be sorted by index, otherwise the behavior is undefined.	tf: Time frequency period to bucketize the data with interval_num: Number of time frequency periods to bucketize the data with idx_visitor: A visitor to specify the index bucketization args: Variable argument list of triples as specified above
template<typename I_V, typename ... Ts> std::future<DataFrame<DateTime, H>> resample_async(time_frequency tf, size_type interval_num, I_V &&idx_visitor, Ts && ... args) const requires std::same_as<I, DateTime>;	Same as resample() above, but executed asynchronously

static void test_bucketize()  {

    std::cout << "\nTesting bucketize( ) ..." << std::endl;

    MyDataFrame df;

    try  {
        df.read("FORD.csv", io_format::csv2);

        auto        fut =
            df.bucketize_async(bucket_type::by_distance,
                               100,
                               LastVisitor<MyDataFrame::IndexType, MyDataFrame::IndexType>(), 
                               std::make_tuple("Date", "Date", LastVisitor<std::string>()),
                               std::make_tuple("FORD_Close", "High", MaxVisitor<double>()),
                               std::make_tuple("FORD_Close", "Low", MinVisitor<double>()),
                               std::make_tuple("FORD_Close", "Open", FirstVisitor<double>()),
                               std::make_tuple("FORD_Close", "Close", LastVisitor<double>()),
                               std::make_tuple("FORD_Close", "Mean", MeanVisitor<double>()),
                               std::make_tuple("FORD_Close", "Std", StdVisitor<double>()),
                               std::make_tuple("FORD_Volume", "Volume", SumVisitor<long>()));
        MyDataFrame result = fut.get();

        result.write<std::ostream, std::string, double, long>(std::cout, io_format::csv2);

        // FORD index is just an increasing number starting from 0.
        // So, by_count should give the same result as by_distance
        //
        auto        fut2 =
            df.bucketize_async(bucket_type::by_count,
                               100,
                               LastVisitor<MyDataFrame::IndexType, MyDataFrame::IndexType>(), 
                               std::make_tuple("Date", "Date", LastVisitor<std::string>()),
                               std::make_tuple("FORD_Close", "High", MaxVisitor<double>()),
                               std::make_tuple("FORD_Close", "Low", MinVisitor<double>()),
                               std::make_tuple("FORD_Close", "Open", FirstVisitor<double>()),
                               std::make_tuple("FORD_Close", "Close", LastVisitor<double>()),
                               std::make_tuple("FORD_Close", "Mean", MeanVisitor<double>()),
                               std::make_tuple("FORD_Close", "Std", StdVisitor<double>()),
                               std::make_tuple("FORD_Volume", "Volume", SumVisitor<long>()));
        MyDataFrame result2 = fut2.get();

        assert(result.is_equal(result2));
    }
    catch (const DataFrameError &ex)  {
        std::cout << ex.what() << std::endl;
    }
}

// ----------------------------------------------------------------------------

static void test_resample()  {

    std::cout << "\nTesting resample ..." << std::endl;

    DTDataFrame df;

    try  {
        df.read("DT_IBM.csv", io_format::csv2);
    }
    catch (const DataFrameError &ex)  {
        std::cout << ex.what() << std::endl;
        ::exit(-1);
    }

    const auto  buckets1 = df.resample(time_frequency::daily, 75,
                                       LastVisitor<DTDataFrame::IndexType, MyDataFrame::IndexType>(),
                                       std::make_tuple("IBM_Close", "High", MaxVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Low", MinVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Open", FirstVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Close", LastVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Mean", MeanVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Std", StdVisitor<double>()),
                                       std::make_tuple("IBM_Volume", "Volume", SumVisitor<long>()),
                                       std::make_tuple("IBM_Close", "Count", CountVisitor<double>()));

    buckets1.write<std::ostream, double, long, std::size_t>(std::cout, io_format::pretty_prt, { .precision = 2, .dt_format = DT_FORMAT::ISO_DT });
    std::cout << "\n\n\n";

    const auto  buckets2 = df.resample(time_frequency::weekly, 10,
                                       LastVisitor<DTDataFrame::IndexType, MyDataFrame::IndexType>(),
                                       std::make_tuple("IBM_Close", "High", MaxVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Low", MinVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Open", FirstVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Close", LastVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Mean", MeanVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Std", StdVisitor<double>()),
                                       std::make_tuple("IBM_Volume", "Volume", SumVisitor<long>()),
                                       std::make_tuple("IBM_Close", "Count", CountVisitor<double>()));

    buckets2.write<std::ostream, double, long, std::size_t>(std::cout, io_format::pretty_prt, { .precision = 2, .dt_format = DT_FORMAT::ISO_DT });
    std::cout << "\n\n\n";

    const auto  buckets3 = df.resample(time_frequency::monthly, 6,
                                       LastVisitor<DTDataFrame::IndexType, MyDataFrame::IndexType>(),
                                       std::make_tuple("IBM_Close", "High", MaxVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Low", MinVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Open", FirstVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Close", LastVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Mean", MeanVisitor<double>()),
                                       std::make_tuple("IBM_Close", "Std", StdVisitor<double>()),
                                       std::make_tuple("IBM_Volume", "Volume", SumVisitor<long>()),
                                       std::make_tuple("IBM_Close", "Count", CountVisitor<double>()));

    buckets3.write<std::ostream, double, long, std::size_t>(std::cout, io_format::pretty_prt, { .precision = 2, .dt_format = DT_FORMAT::ISO_DT });
    std::cout << "\n\n\n";

    auto        fut = df.resample_async(time_frequency::annual, 2,
                                        LastVisitor<DTDataFrame::IndexType, MyDataFrame::IndexType>(),
                                        std::make_tuple("IBM_Close", "High", MaxVisitor<double>()),
                                        std::make_tuple("IBM_Close", "Low", MinVisitor<double>()),
                                        std::make_tuple("IBM_Close", "Open", FirstVisitor<double>()),
                                        std::make_tuple("IBM_Close", "Close", LastVisitor<double>()),
                                        std::make_tuple("IBM_Close", "Mean", MeanVisitor<double>()),
                                        std::make_tuple("IBM_Close", "Std", StdVisitor<double>()),
                                        std::make_tuple("IBM_Volume", "Volume", SumVisitor<long>()),
                                        std::make_tuple("IBM_Close", "Count", CountVisitor<double>()));
    const auto  buckets4 = fut.get();

    buckets4.write<std::ostream, double, long, std::size_t>(std::cout, io_format::pretty_prt, { .precision = 2, .dt_format = DT_FORMAT::ISO_DT });
}