Back to Documentations

Signature Description Parameters
#include <DataFrame/DataFrameStatsVisitors.h>

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
struct PolyFitVisitor;

// -------------------------------------

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
using pfit_v = PolyFitVisitor<T, I, A>;
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface.

This functor fits a N-degree polynomial through the given x-y coordinates by the way of Least-Squares and Gaussian-Elimination

This works with both scalar and multidimensional (i.e. vectors or arrays) datasets.

get_resut() gives you the vector of "degree + 1" coefficients.
get_y_fits() returns the vector of y-fits for each given x. Y-fits are always scalar whether the X column is scalar or multidimensional.
get_slope() returns the first (0-degree) coefficient.
get_residual() returns the sum of weighted squared residuals.
using weight_func = std::function<T (const I &idx, size_t val_index)>;

explicit
PolyFitVisitor(std::size_t degree,
               weight_func w_func = [](const I &, size_t) -> data_t { return (1); });
        
degree: The polynomial degree
w_func: A functor that provides weights to be applied to sigma values. w_func is passed two parameters: (1) The value of the index column corresponding to the given y value, (2) The corresponding index into the y vector. The default is no weights.
T: Column data type.
I: Index type.
A: Memory alignment boundary for vectors. Default is system default alignment
#include <DataFrame/DataFrameStatsVisitors.h>

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
struct LogFitVisitor;

// -------------------------------------

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
using lfit_v = LogFitVisitor<T, I, A>;
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface.

This functor fits a y = b0 + b1 * log(x) through the given x-y coordinates by the way of calling PolyFit above on log(x).

This works with both scalar and multidimensional (i.e. vectors or arrays) datasets.

get_resut() gives you the vector of "degree + 1" coefficients.
get_y_fits() returns the vector of y-fits for each given x. Y-fits are always scalar whether the X column is scalar or multidimensional.
get_slope() returns the first (0-degree) coefficient.
get_residual() returns the sum of weighted squared residuals.
using weight_func = std::function<T (const I &idx, size_t val_index)>;

explicit
LogFitVisitor(std::size_t degree = 1,
              weight_func w_func = [](const I &, size_t) -> data_t { return (1); });
        
degree: The polynomial degree
w_func: A functor that provides weights to be applied to sigma values. w_func is passed two parameters: (1) The value of the index column corresponding to the given y value, (2) The corresponding index into the y vector. The default is no weights.
T: Column data type.
I: Index type.
A: Memory alignment boundary for vectors. Default is system default alignment
#include <DataFrame/DataFrameStatsVisitors.h>

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
struct ExponentialFitVisitor;

// -------------------------------------

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
using efit_v = ExponentialFitVisitor<T, I, A>;
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface.

get_result() gives you the vector of y fits for each given x.
get_slope() returns the slope -- coefficient of the exponential function.
get_residual() returns the sum of squared residuals.
get_intercept() returns the intercept.
ExponentialFitVisitor();
        
T: Column data type.
I: Index type.
A: Memory alignment boundary for vectors. Default is system default alignment
#include <DataFrame/DataFrameStatsVisitors.h>

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
struct PowerFitVisitor;

// -------------------------------------

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
using powfit_v = PowerFitVisitor<T, I, A>;
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface.

The PowerFit method fits a function of the form y = axb to data by performing a least-squares fit using the transformed model function ln(y) = a0 + b*ln(x) where a0 is ln(a). The new function is linear in the model parameters, a0 and b.

This works with both scalar and multidimensional (i.e. vectors or arrays) datasets.

get_result() gives you the vector of y fits for each given x.
get_slope() returns the slope -- logarithmic constant. In case of multidimensional input data, this is a vector of size dimensions.
get_residual() returns the sum of squared residuals.
get_intercept() returns the intercept -- coefficient of logarithmic term.
PowerFitVisitor();
        
T: Column data type.
I: Index type.
A: Memory alignment boundary for vectors. Default is system default alignment
#include <DataFrame/DataFrameStatsVisitors.h>

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
struct QuadraticFitVisitor;

// -------------------------------------

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
using qfit_v = QuadraticFitVisitor<T, I, A>;
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface.

Quadratic regression is a statistical method used to model a relationship between variables with a parabolic best-fit curve, rather than a straight line. It's ideal when the data relationship appears curvilinear. The goal is to fit a quadratic equation y = ax2 + bx + c to the observed data, providing a nuanced model of the relationship. Contrary to historical or biological connotations, "regression" in this mathematical context refers to advancing our understanding of complex relationships among variables, particularly when data follows a curvilinear pattern. This is equivalent to using PolyFitVisitor with degree 2. But this is a simpler and faster algorithm.

This works with both scalar and multidimensional (i.e. vectors or arrays) datasets.

get_result() gives you the vector of y fits for each given x.
get_residual() returns the sum of squared residuals.
get_coeffs() returns the vector of coefficients in PolyFitVisitor monomial order. This only works for multidimensional input.
get_slope() returns the quadratic coefficient. This only works for scalar input.
get_intercept() returns the linear coefficient. This only works for scalar input.
get_constant() which returns the constant term. This only works for scalar input.
QuadraticFitVisitor();
        
T: Column data type.
I: Index type.
A: Memory alignment boundary for vectors. Default is system default alignment
#include <DataFrame/DataFrameStatsVisitors.h>

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
struct LinearFitVisitor;

// -------------------------------------

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
using linfit_v = LinearFitVisitor<T, I, A>;
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface.

This works with both scalar and multidimensional (i.e. vectors or arrays) datasets.

get_result() gives you the vector of y fits for each given x.
get_slope() returns the slope.
get_residual() returns the sum of squared residuals.
get_intercept() returns the intercept.
explicit
LinearFitVisitor(bool do_solve = true);
do_solve: This only applies to input columns that are multidimensional.
          If true, it uses an efficient matrix solve method to do the problem.
          But if you have dependent or near-dependent features in your dataset,
          solve throws an exception for singular matrices. In that case you must
          set do_solve to false, which causes the visitors to use the more
          expensive SVD method.
        
T: Column data type.
I: Index type.
A: Memory alignment boundary for vectors. Default is system default alignment
#include <DataFrame/DataFrameStatsVisitors.h>

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
struct CubicSplineFitVisitor;

// -------------------------------------

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
using csfit_v = CubicSplineFitVisitor<T, I, A>;
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface.

This functor uses cubic spline method to fit y into x-y coordinates specified by input. It is best described here.
yi(x) = ai + bi(x - xi) + ci(x - xi)2 + di(x - xi)3
a is just the input y vector

This works with both scalar and multidimensional (i.e. vectors or arrays) datasets.

get_result() gives you the b vector of coefficients. In case of multidimensional data, this is a vector of dimension size of vectors of coefficients.
get_c_vec() returns the c vector of coefficients. In case of multidimensional data, this is a vector of dimension size of vectors of coefficients.
get_d_vec() returns the d vector of coefficients. In case of multidimensional data, this is a vector of dimension size of vectors of coefficients.
CubicSplineFitVisitor();
        
T: Column data type.
I: Index type.
A: Memory alignment boundary for vectors. Default is system default alignment
static void test_PolyFitVisitor()  {

    using MyDataFrame = StdDataFrame<unsigned long>;

    std::cout << "\nTesting PolyFitVisitor{  } ..." << std::endl;

    std::vector<unsigned long>  idx =
        { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460, 123461, 123462, 123466,
          123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473,
          123467, 123468, 123469, 123470, 123471, 123472, 123473,
        };
    MyDataFrame                 df;

    df.load_index(std::move(idx));
    df.load_column<double>("X1", { 1, 2, 3, 4, 5 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("Y1", { 6, 7, 8, 9, 3 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("X2", { 0.0, 1.0, 2.0, 3.0,  4.0,  5.0 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("Y2", { 0.0, 0.8, 0.9, 0.1, -0.8, -1.0 }, nan_policy::dont_pad_with_nans);

    PolyFitVisitor<double>  poly_v1 (2);
    PolyFitVisitor<double>  poly_v12 (2,
                                      [](const unsigned int &, std::size_t i) -> double {
                                          const std::array<double, 5> weights = { 0.1, 0.8, 0.3, 0.5, 0.2 };

                                          return (weights[i]);
                                      });
    auto                    result1 = df.single_act_visit<double, double>("X1", "Y1", poly_v1).get_result();
    auto                    result12 = df.single_act_visit<double, double>("X1", "Y1", poly_v12).get_result();
    auto                    actual1 = std::vector<double> { 0.8, 5.6, -1 };
    auto                    actual1_y = std::vector<double>{ 5.4, 8, 8.6, 7.2, 3.8 };
    auto                    actual12 = std::vector<double>{ -1.97994, 6.99713, -1.14327 };

    assert(std::fabs(poly_v1.get_residual() - 5.6) < 0.00001);
    for (size_t i = 0; i < result1.size(); ++i)
       assert(fabs(result1[i] - actual1[i]) < 0.00001);
    for (size_t i = 0; i < poly_v1.get_y_fits().size(); ++i)
       assert(fabs(poly_v1.get_y_fits()[i] - actual1_y[i]) < 0.01);

    assert(std::fabs(poly_v12.get_residual() - 0.70981) < 0.00001);
    for (size_t i = 0; i < result12.size(); ++i)
       assert(fabs(result12[i] - actual12[i]) < 0.00001);

    PolyFitVisitor<double>  poly_v2 (3);
    auto                    result2 = df.single_act_visit<double, double>("X2", "Y2", poly_v2).get_result();
    auto                    actual2 = std::vector<double> { -0.0396825, 1.69312, -0.813492, 0.087037 };

    for (size_t i = 0; i < result2.size(); ++i)
       assert(fabs(result2[i] - actual2[i]) < 0.00001);

    // Now multidimensional data
    //
    constexpr std::size_t   dim { 2 };

    using ary_col_t = std::array<double, dim>;
    using vec_col_t = std::vector<double>;

    std::vector<ary_col_t>  ary_md_x  { { 0.0, 0.0 }, { 1.0, 0.0 }, { 0.0, 1.0 }, { 1.0, 1.0 }, { 2.0, 0.0 }, { 0.0, 2.0 }, { 2.0, 2.0 }, { 1.0, 2.0 }, { 2.0, 1.0 }, { 1.5, 1.5 }, };
    std::vector<vec_col_t>  vec_md_x  { { 0.0, 0.0 }, { 1.0, 0.0 }, { 0.0, 1.0 }, { 1.0, 1.0 }, { 2.0, 0.0 }, { 0.0, 2.0 }, { 2.0, 2.0 }, { 1.0, 2.0 }, { 2.0, 1.0 }, { 1.5, 1.5 }, };

    df.load_column<ary_col_t>("ARY MD X", std::move(ary_md_x), nan_policy::dont_pad_with_nans);
    df.load_column<vec_col_t>("VEC MD X", std::move(vec_md_x), nan_policy::dont_pad_with_nans);

    auto    ground_truth =
        [](const ary_col_t &x) -> double  {
            return (1.0 +
                    3.0 * (x[0] + x[1]) +
                    2.0 * (x[0] * x[0] + x[1] * x[1]));
        };

    std::vector<double> md_y;
    const auto          &md_x_col = df.get_column<ary_col_t>("ARY MD X");

    md_y.reserve(md_x_col.size());
    for (const auto &x : md_x_col)
        md_y.push_back(ground_truth(x));
    df.load_column<double>("MD Y", std::move(md_y), nan_policy::dont_pad_with_nans);

    PolyFitVisitor<ary_col_t>   ary_poly { 2 };
    PolyFitVisitor<vec_col_t>   vec_poly { 2 };

    df.single_act_visit<ary_col_t, double>("ARY MD X", "MD Y", ary_poly);
    df.single_act_visit<vec_col_t, double>("VEC MD X", "MD Y", vec_poly);

    assert(ary_poly.get_result().size() == 6);
    assert(vec_poly.get_result().size() == 6);
    assert(std::fabs(ary_poly.get_result()[0] - 1.0) < 0.0001);
    assert(std::fabs(ary_poly.get_result()[1] - 3.0) < 0.0001);
    assert(std::fabs(ary_poly.get_result()[2] - 3.0) < 0.0001);
    assert(std::fabs(ary_poly.get_result()[3] - 2.0) < 0.0001);
    assert(std::fabs(ary_poly.get_result()[4] - 0.0) < 0.0001);
    assert(std::fabs(ary_poly.get_result()[5] - 2.0) < 0.0001);
    assert(std::fabs(vec_poly.get_result()[0] - 1.0) < 0.0001);
    assert(std::fabs(vec_poly.get_result()[1] - 3.0) < 0.0001);
    assert(std::fabs(vec_poly.get_result()[2] - 3.0) < 0.0001);
    assert(std::fabs(vec_poly.get_result()[3] - 2.0) < 0.0001);
    assert(std::fabs(vec_poly.get_result()[4] - 0.0) < 0.0001);
    assert(std::fabs(vec_poly.get_result()[5] - 2.0) < 0.0001);

    assert(ary_poly.get_y_fits().size() == md_x_col.size());
    assert(vec_poly.get_y_fits().size() == md_x_col.size());
    assert(std::fabs(ary_poly.get_y_fits()[0] - 1.0)  < 0.0001);
    assert(std::fabs(ary_poly.get_y_fits()[5] - 15.0) < 0.0001);
    assert(std::fabs(ary_poly.get_y_fits()[9] - 19.0) < 0.0001);
    assert(std::fabs(vec_poly.get_y_fits()[0] - 1.0)  < 0.0001);
    assert(std::fabs(vec_poly.get_y_fits()[5] - 15.0) < 0.0001);
    assert(std::fabs(vec_poly.get_y_fits()[9] - 19.0) < 0.0001);

    assert(ary_poly.get_residual() < 1.0e-20);
    assert(vec_poly.get_residual() < 1.0e-20);
}

// -----------------------------------------------------------------------------

static void test_LogFitVisitor()  {

    using MyDataFrame = StdDataFrame<unsigned long>;

    std::cout << "\nTesting LogFitVisitor{  } ..." << std::endl;

    std::vector<unsigned long>  idx =
        { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460, 123461, 123462, 123466,
          123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473,
          123467, 123468, 123469, 123470, 123471, 123472, 123473,
        };
    MyDataFrame                 df;

    df.load_index(std::move(idx));
    df.load_column<double>("X1", { 1, 2, 3, 4, 5 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("Y1", { 6, 7, 8, 9, 3 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("X2", { 1, 2, 4, 6, 8 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("Y2", { 1, 3, 4, 5, 6 }, nan_policy::dont_pad_with_nans);

    LogFitVisitor<double>   log_v1;
    auto                    result1 = df.single_act_visit<double, double>("X1", "Y1", log_v1).get_result();
    auto                    actual1 = std::vector<double> { 6.98618, -0.403317 };
    auto                    actual1_y = std::vector<double> { 6.98618, 6.70662, 6.54309, 6.42706, 6.33706 };

    assert(std::fabs(log_v1.get_residual() - 20.9372) < 0.0001);
    for (size_t i = 0; i < result1.size(); ++i)
       assert(fabs(result1[i] - actual1[i]) < 0.00001);
    for (size_t i = 0; i < log_v1.get_y_fits().size(); ++i)
       assert(fabs(log_v1.get_y_fits()[i] - actual1_y[i]) < 0.01);

    LogFitVisitor<double>   log_v2;
    auto                    result2 = df.single_act_visit<double, double>("X2", "Y2", log_v2).get_result();
    auto                    actual2 = std::vector<double> { 1.11199, 2.25859 };

    assert(std::fabs(log_v2.get_residual() - 0.237476) < 0.00001);
    for (size_t i = 0; i < result2.size(); ++i)
       assert(fabs(result2[i] - actual2[i]) < 0.00001);

    // Now multidimensional data
    //
    constexpr std::size_t   dim { 2 };

    using ary_col_t = std::array<double, dim>;
    using vec_col_t = std::vector<double>;

    std::vector<ary_col_t>  ary_md_x  { { 1.0, 2.0 }, { 2.0, 3.0 }, { 3.0, 5.0 }, { 4.0, 1.5 }, { 5.0, 4.0 }, { 6.0, 2.5 }, { 7.0, 6.0 }, { 8.0, 3.5 }, { 9.0, 7.0 }, { 10.0, 8.0 } };
    std::vector<vec_col_t>  vec_md_x  { { 1.0, 2.0 }, { 2.0, 3.0 }, { 3.0, 5.0 }, { 4.0, 1.5 }, { 5.0, 4.0 }, { 6.0, 2.5 }, { 7.0, 6.0 }, { 8.0, 3.5 }, { 9.0, 7.0 }, { 10.0, 8.0 } };

    df.load_column<ary_col_t>("ARY MD X", std::move(ary_md_x), nan_policy::dont_pad_with_nans);
    df.load_column<vec_col_t>("VEC MD X", std::move(vec_md_x), nan_policy::dont_pad_with_nans);

    auto    ground_truth =
        [](const ary_col_t &x) -> double  {
            return (3.0 * std::log(x[0]) + 1.5 * std::log(x[1]) + 2.0);
        };

    std::vector<double> md_y;
    const auto          &md_x_col = df.get_column<ary_col_t>("ARY MD X");

    md_y.reserve(md_x_col.size());
    for (const auto &x : md_x_col)
        md_y.push_back(ground_truth(x));
    df.load_column<double>("MD Y", std::move(md_y), nan_policy::dont_pad_with_nans);

    auto                        w_func { [](const unsigned long &, std::size_t) -> double  { return (1); } };
    LogFitVisitor<ary_col_t>    ary_log { w_func, 2 };
    LogFitVisitor<vec_col_t>    vec_log { w_func, 2 };

    df.single_act_visit<ary_col_t, double>("ARY MD X", "MD Y", ary_log);
    df.single_act_visit<vec_col_t, double>("VEC MD X", "MD Y", vec_log);

    assert(ary_log.get_result().size() == 6);
    assert(vec_log.get_result().size() == 6);
    assert(std::fabs(ary_log.get_result()[0] - 2.0) < 0.01);
    assert(std::fabs(ary_log.get_result()[2] - 3.0) < 0.01);
    assert(ary_log.get_result()[5] < 1.0e-12);
    assert(std::fabs(vec_log.get_result()[0] - 2.0) < 0.00001);
    assert(std::fabs(vec_log.get_result()[2] - 3.0) < 0.000001);
    assert(vec_log.get_result()[5] < 1.0e-12);

    assert(ary_log.get_y_fits().size() == md_x_col.size());
    assert(vec_log.get_y_fits().size() == md_x_col.size());
    assert(std::fabs(ary_log.get_y_fits()[0] - 3.03972) < 0.00001);
    assert(std::fabs(ary_log.get_y_fits()[5] - 8.74971) < 0.00001);
    assert(std::fabs(ary_log.get_y_fits()[9] - 12.0269) < 0.0001);
    assert(std::fabs(vec_log.get_y_fits()[0] - 3.03972) < 0.00001);
    assert(std::fabs(vec_log.get_y_fits()[5] - 8.74971) < 0.00001);
    assert(std::fabs(vec_log.get_y_fits()[9] - 12.0269) < 0.0001);

    assert(ary_log.get_residual() < 1.0e-22);
    assert(ary_log.get_residual() < 1.0e-22);
    assert(std::fabs(ary_log.get_slope() - 2.0) < 0.0001);
    assert(std::fabs(ary_log.get_slope() - 2.0) < 0.01);
}

// -----------------------------------------------------------------------------

static void test_ExponentialFitVisitor()  {

    std::cout << "\nTesting ExponentialFitVisitor{  } ..." << std::endl;

    std::vector<unsigned long>  idx =
        { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460, 123461, 123462, 123466,
          123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473,
          123467, 123468, 123469, 123470, 123471, 123472, 123473,
        };
    MyDataFrame                 df;

    df.load_index(std::move(idx));
    df.load_column<double>("X1", { 1, 2, 3, 4, 5 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("Y1", { 6, 7, 8, 9, 3 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("X2", { 1, 2, 4, 6, 8 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("Y2", { 1, 3, 4, 5, 6 }, nan_policy::dont_pad_with_nans);

    ExponentialFitVisitor<double>   exp_v1;
    auto                            result1 = df.single_act_visit<double, double>("X1", "Y1", exp_v1).get_result();
    auto                            actual1 = std::vector<double> { 7.7647, 6.9316, 6.1879, 5.5239, 4.93126 };

    assert(std::fabs(exp_v1.get_residual() - 22.2154) < 0.0001);
    for (size_t i = 0; i < result1.size(); ++i)
        assert(fabs(result1[i] - actual1[i]) < 0.0001);

    efit_v<double>  exp_v2;
    auto            result2 = df.single_act_visit<double, double>("X2", "Y2", exp_v2).get_result();
    auto            actual2 = std::vector<double> { 1.63751, 2.02776, 3.10952, 4.76833, 7.31206 };

    assert(std::fabs(exp_v2.get_residual() - 3.919765) < 0.00001);
    for (size_t i = 0; i < result2.size(); ++i)
        assert(fabs(result2[i] - actual2[i]) < 0.0001);
}

// -----------------------------------------------------------------------------

static void test_PowerFitVisitor()  {

    using MyDataFrame = StdDataFrame<unsigned long>;

    std::cout << "\nTesting PowerFitVisitor{  } ..." << std::endl;

    std::vector<unsigned long>  idx =
        { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460, 123461, 123462, 123466,
          123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473,
          123467, 123468, 123469, 123470, 123471, 123472, 123473,
        };
    MyDataFrame                 df;

    df.load_index(std::move(idx));
    df.load_column<double>("X1", { 1, 2, 3, 4, 5, 6 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("Y1", { 2.98, 7.26, 11.75, 22.72, 27.20, 38.57 }, nan_policy::dont_pad_with_nans);

    PowerFitVisitor<double> pow_v;

    df.single_act_visit<double, double>("X1", "Y1", pow_v);
    assert(std::fabs(pow_v.get_residual() - 13.4058) < 0.0001);
    assert(std::fabs(pow_v.get_slope() - 1.4332) < 0.0001);
    assert(std::fabs(pow_v.get_intercept() - 1.0313) < 0.0001);

    const auto  actual = std::vector<double> { 2.8047, 7.5739, 13.5423, 20.4527, 28.1605, 36.5697 };

    for (size_t i = 0; i < pow_v.get_result().size(); ++i)
        assert(fabs(pow_v.get_result()[i] - actual[i]) < 0.0001);

    // Now multidimensional data
    //
    constexpr std::size_t   dim { 3 };

    using ary_col_t = std::array<double, dim>;
    using vec_col_t = std::vector<double>;

    std::vector<ary_col_t>  ary_md_x  {
        { 1.0, 2.0, 3.0 }, { 2.0, 3.0, 1.5 }, { 3.0, 1.0, 4.0 }, { 4.0, 2.5, 2.0 }, { 1.5, 4.0, 1.0 }, { 5.0, 1.5, 3.5 },
        { 2.5, 3.5, 2.5 }, { 3.5, 2.0, 1.0 }, { 1.0, 5.0, 4.5 }, { 4.5, 3.0, 2.0 },
    };
    std::vector<vec_col_t>  vec_md_x  {
        { 1.0, 2.0, 3.0 }, { 2.0, 3.0, 1.5 }, { 3.0, 1.0, 4.0 }, { 4.0, 2.5, 2.0 }, { 1.5, 4.0, 1.0 }, { 5.0, 1.5, 3.5 },
        { 2.5, 3.5, 2.5 }, { 3.5, 2.0, 1.0 }, { 1.0, 5.0, 4.5 }, { 4.5, 3.0, 2.0 },
    };

    df.load_column<ary_col_t>("ARY MD X", std::move(ary_md_x), nan_policy::dont_pad_with_nans);
    df.load_column<vec_col_t>("VEC MD X", std::move(vec_md_x), nan_policy::dont_pad_with_nans);

    std::vector<double> ground_truth {
        2.0 * std::pow(1.0,1.5) * std::pow(2.0,0.8) * std::pow(3.0,0.5) + 0.05,
        2.0 * std::pow(2.0,1.5) * std::pow(3.0,0.8) * std::pow(1.5,0.5) - 0.08,
        2.0 * std::pow(3.0,1.5) * std::pow(1.0,0.8) * std::pow(4.0,0.5) + 0.03,
        2.0 * std::pow(4.0,1.5) * std::pow(2.5,0.8) * std::pow(2.0,0.5) - 0.06,
        2.0 * std::pow(1.5,1.5) * std::pow(4.0,0.8) * std::pow(1.0,0.5) + 0.02,
        2.0 * std::pow(5.0,1.5) * std::pow(1.5,0.8) * std::pow(3.5,0.5) - 0.10,
        2.0 * std::pow(2.5,1.5) * std::pow(3.5,0.8) * std::pow(2.5,0.5) + 0.07,
        2.0 * std::pow(3.5,1.5) * std::pow(2.0,0.8) * std::pow(1.0,0.5) - 0.04,
        2.0 * std::pow(1.0,1.5) * std::pow(5.0,0.8) * std::pow(4.5,0.5) + 0.01,
        2.0 * std::pow(4.5,1.5) * std::pow(3.0,0.8) * std::pow(2.0,0.5) - 0.09,
    };

    df.load_column<double>("MD Y", std::move(ground_truth), nan_policy::dont_pad_with_nans);

    PowerFitVisitor<ary_col_t>  ary_pow;
    PowerFitVisitor<vec_col_t>  vec_pow;

    df.single_act_visit<ary_col_t, double>("ARY MD X", "MD Y", ary_pow);
    df.single_act_visit<vec_col_t, double>("VEC MD X", "MD Y", vec_pow);

    // Y-Fits
    //
    assert(ary_pow.get_result().size() == 10);
    assert(vec_pow.get_result().size() == 10);
    assert(std::fabs(ary_pow.get_result()[0] - 6.062) < 0.001);
    assert(std::fabs(ary_pow.get_result()[5] - 57.8006) < 0.0001);
    assert(std::fabs(ary_pow.get_result()[9] - 64.829) < 0.001);
    assert(std::fabs(vec_pow.get_result()[0] - 6.062) < 0.001);
    assert(std::fabs(vec_pow.get_result()[5] - 57.8006) < 0.0001);
    assert(std::fabs(vec_pow.get_result()[9] - 64.829) < 0.001);

    assert(ary_pow.get_slope().size() == 3);
    assert(vec_pow.get_slope().size() == 3);
    assert(std::fabs(ary_pow.get_slope()[0] - 1.49562) < 0.00001);
    assert(std::fabs(ary_pow.get_slope()[2] - 0.50078) < 0.00001);
    assert(std::fabs(vec_pow.get_slope()[0] - 1.49562) < 0.00001);
    assert(std::fabs(vec_pow.get_slope()[2] - 0.50078) < 0.00001);

    assert(std::fabs(ary_pow.get_residual() - 0.0312596) < 0.0000001);
    assert(std::fabs(ary_pow.get_intercept() - 0.699298) < 0.000001);
}

// -----------------------------------------------------------------------------

static void test_QuadraticFitVisitor()  {

    using MyDataFrame = StdDataFrame<unsigned long>;

    std::cout << "\nTesting QuadraticFitVisitor{  } ..." << std::endl;

    std::vector<unsigned long>  idx =
        { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460, 123461, 123462, 123466,
          123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473,
          123467, 123468, 123469, 123470, 123471, 123472, 123473,
        };
    MyDataFrame                 df;

    df.load_index(std::move(idx));
    df.load_column<double>("X1", { 10, 15, 20, 24, 30, 34, 40, 45, 48, 50, 58 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("Y1", { 115.6, 157.2, 189.2, 220.8, 253.8, 269.2, 284.8, 285, 277.4, 269.2, 244.2 }, nan_policy::dont_pad_with_nans);

    QuadraticFitVisitor<double, unsigned long, 64>  qud_v;

    df.single_act_visit<double, double>("X1", "Y1", qud_v);
    assert(std::fabs(qud_v.get_residual() - 188.2975) < 0.0001);
    assert(std::fabs(qud_v.get_slope() - -0.1561) < 0.0001);
    assert(std::fabs(qud_v.get_intercept() - 13.4522) < 0.0001);
    assert(std::fabs(qud_v.get_constant() - -9.0735) < 0.0001);

    const auto  actual = std::vector<double> { 109.8382, 157.5867, 197.5302, 223.8654, 254.0023, 267.8496, 279.2546, 280.1733, 276.9781, 273.287, 246.0347 };

    for (size_t i = 0; i < qud_v.get_result().size(); ++i)
        assert(fabs(qud_v.get_result()[i] - actual[i]) < 0.0001);

    // Now multidimensional data
    //
    constexpr std::size_t   dim { 3 };

    using ary_col_t = std::array<double, dim>;
    using vec_col_t = std::vector<double>;

    std::vector<ary_col_t>  ary_md_x  {
        { 0.0,  0.0,  0.0 }, { 1.0,  0.0,  0.0 }, { 0.0,  1.0,  0.0 }, { 0.0,  0.0,  1.0 }, {-1.0,  1.0,  0.0 }, { 1.0, -1.0,  1.0 },
        { 2.0,  1.0, -1.0 }, {-1.0, -1.0,  2.0 }, { 1.0,  2.0,  1.0 }, { 2.0, -1.0,  2.0 }
    };
    std::vector<vec_col_t>  vec_md_x  {
        { 0.0,  0.0,  0.0 }, { 1.0,  0.0,  0.0 }, { 0.0,  1.0,  0.0 }, { 0.0,  0.0,  1.0 }, {-1.0,  1.0,  0.0 }, { 1.0, -1.0,  1.0 },
        { 2.0,  1.0, -1.0 }, {-1.0, -1.0,  2.0 }, { 1.0,  2.0,  1.0 }, { 2.0, -1.0,  2.0 }
    };

    df.load_column<ary_col_t>("ARY MD X", std::move(ary_md_x), nan_policy::dont_pad_with_nans);
    df.load_column<vec_col_t>("VEC MD X", std::move(vec_md_x), nan_policy::dont_pad_with_nans);

    // True model: y = 2 + x0 - x1 + 2*x2 + x0^2 + 2*x0*x1 + 3*x1^2 - x2^2
    // (x0*x2 and x1*x2 terms are intentionally absent → coeffs should be ~0)
    //
    auto                ground_truth =
        [](double x0, double x1, double x2) {
            return (2.0 + x0 - x1 + 2.0 * x2 + x0 * x0 + 2.0 * x0 * x1 + 3.0 * x1 * x1 - x2 * x2);
        };
    std::vector<double> md_y;
    const auto          &md_x_col = df.get_column<ary_col_t>("ARY MD X");

    md_y.reserve(md_x_col.size());
    for (const auto &p : md_x_col)
        md_y.push_back(ground_truth(p[0], p[1], p[2]));
    df.load_column<double>("MD Y", std::move(md_y), nan_policy::dont_pad_with_nans);

    QuadraticFitVisitor<ary_col_t>  ary_qud;
    QuadraticFitVisitor<vec_col_t>  vec_qud;

    df.single_act_visit<ary_col_t, double>("ARY MD X", "MD Y", ary_qud);
    df.single_act_visit<vec_col_t, double>("VEC MD X", "MD Y", vec_qud);

    // Y-Fits
    //
    assert(ary_qud.get_result().size() == 10);
    assert(vec_qud.get_result().size() == 10);
    assert(std::fabs(ary_qud.get_result()[0] - 2.0) < 0.01);
    assert(std::fabs(ary_qud.get_result()[5] - 7.0) < 0.01);
    assert(std::fabs(ary_qud.get_result()[9] - 8.0) < 0.01);
    assert(std::fabs(vec_qud.get_result()[0] - 2.0) < 0.01);
    assert(std::fabs(vec_qud.get_result()[5] - 7.0) < 0.01);
    assert(std::fabs(vec_qud.get_result()[9] - 8.0) < 0.01);

    assert(ary_qud.get_coeffs().size() == 10);
    assert(vec_qud.get_coeffs().size() == 10);
    assert(std::fabs(ary_qud.get_coeffs()[0] - 2.0) < 0.01);
    assert(std::fabs(ary_qud.get_coeffs()[5] - -9.09495e-13) < 0.0000000001);
    assert(std::fabs(ary_qud.get_coeffs()[9] - 1.0) < 0.01);
    assert(std::fabs(vec_qud.get_coeffs()[0] - 2.0) < 0.01);
    assert(std::fabs(vec_qud.get_coeffs()[5] - -9.09495e-13) < 0.0000000001);
    assert(std::fabs(vec_qud.get_coeffs()[9] - 1.0) < 0.01);

    assert(ary_qud.get_residual() < 1.0e-21);
    assert(vec_qud.get_residual() < 1.0e-21);
}

// -----------------------------------------------------------------------------

static void test_LinearFitVisitor()  {

    using MyDataFrame = StdDataFrame<unsigned long>;

    std::cout << "\nTesting LinearFitVisitor{  } ..." << std::endl;

    std::vector<unsigned long>  idx =
        { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460, 123461, 123462, 123466,
          123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473,
          123467, 123468, 123469, 123470, 123471, 123472, 123473,
        };
    MyDataFrame                 df;

    df.load_index(std::move(idx));
    df.load_column<double>("X1", { 1, 2, 3, 4, 5 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("Y1", { 6, 7, 8, 9, 3 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("X2", { 1, 2, 4, 6, 8 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("Y2", { 1, 3, 4, 5, 6 }, nan_policy::dont_pad_with_nans);

    LinearFitVisitor<double>    lin_v1;
    auto                        &result1 = df.single_act_visit<double, double>("X1", "Y1", lin_v1).get_result();
    const auto                  actual1 = std::vector<double> { 7.4, 7, 6.6, 6.2, 5.8 };

    assert(std::fabs(lin_v1.get_residual() - 19.6) < 0.01);
    for (size_t i = 0; i < result1.size(); ++i)
        assert(fabs(result1[i] - actual1[i]) < 0.0001);

    linfit_v<double>    lin_v2;
    auto                result2 = df.single_act_visit<double, double>("X2", "Y2", lin_v2).get_result();
    auto                actual2 = std::vector<double> { 1.73171, 2.37805, 3.67073, 4.96341, 6.2561 };

    assert(std::fabs(lin_v2.get_residual() - 1.097561) < 0.00001);
    for (size_t i = 0; i < result2.size(); ++i)
        assert(fabs(result2[i] - actual2[i]) < 0.0001);

    // Now multidimensional data
    //
    constexpr std::size_t   dim { 2 };

    using ary_col_t = std::array<double, dim>;
    using vec_col_t = std::vector<double>;

    std::vector<ary_col_t>  ary_md_x  { { 1.0, 2.0 }, { 2.0, 1.0 }, { 3.0, 3.0 }, { 4.0, 0.5 }, { 1.5, 2.5 }, { 5.0, 1.5 }, { 2.5, 4.0 }, { 3.5, 0.0 }, { 0.5, 3.5 }, { 4.5, 2.0 } };
    std::vector<vec_col_t>  vec_md_x  { { 1.0, 2.0 }, { 2.0, 1.0 }, { 3.0, 3.0 }, { 4.0, 0.5 }, { 1.5, 2.5 }, { 5.0, 1.5 }, { 2.5, 4.0 }, { 3.5, 0.0 }, { 0.5, 3.5 }, { 4.5, 2.0 } };

    df.load_column<ary_col_t>("ARY MD X", std::move(ary_md_x), nan_policy::dont_pad_with_nans);
    df.load_column<vec_col_t>("VEC MD X", std::move(vec_md_x), nan_policy::dont_pad_with_nans);

    std::vector<double> ground_truth {
        0.5,   // 3 + 1.5(1.0) - 2.0(2.0)
        4.0,   // 3 + 1.5(2.0) - 2.0(1.0)
        1.5,   // 3 + 1.5(3.0) - 2.0(3.0)
        8.0,   // 3 + 1.5(4.0) - 2.0(0.5)
        0.25,  // 3 + 1.5(1.5) - 2.0(2.5)
        7.5,   // 3 + 1.5(5.0) - 2.0(1.5)
       -1.25,  // 3 + 1.5(2.5) - 2.0(4.0)
        8.25,  // 3 + 1.5(3.5) - 2.0(0.0)
       -3.25,  // 3 + 1.5(0.5) - 2.0(3.5)
        5.75,  // 3 + 1.5(4.5) - 2.0(2.0)
    };

    df.load_column<double>("MD Y", std::move(ground_truth), nan_policy::dont_pad_with_nans);

    LinearFitVisitor<ary_col_t> ary_lin { false };
    LinearFitVisitor<vec_col_t> vec_lin;

    df.single_act_visit<ary_col_t, double>("ARY MD X", "MD Y", ary_lin);
    df.single_act_visit<vec_col_t, double>("VEC MD X", "MD Y", vec_lin);

    // Y-Fits
    //
    assert(ary_lin.get_result().size() == 10);
    assert(vec_lin.get_result().size() == 10);
    assert(std::fabs(ary_lin.get_result()[0] - 0.5) < 0.01);
    assert(std::fabs(ary_lin.get_result()[5] - 7.5) < 0.01);
    assert(std::fabs(ary_lin.get_result()[9] - 5.75) < 0.01);
    assert(std::fabs(vec_lin.get_result()[0] - 0.5) < 0.01);
    assert(std::fabs(vec_lin.get_result()[5] - 7.5) < 0.01);
    assert(std::fabs(vec_lin.get_result()[9] - 5.75) < 0.01);

    assert(ary_lin.get_slope().size() == dim);
    assert(vec_lin.get_slope().size() == dim);
    assert(std::fabs(ary_lin.get_slope()[0] - 1.5) < 0.01);
    assert(std::fabs(ary_lin.get_slope()[1] - -2.0) < 0.01);
    assert(std::fabs(vec_lin.get_slope()[0] - 1.5) < 0.01);
    assert(std::fabs(vec_lin.get_slope()[1] - -2.0) < 0.01);

    assert(ary_lin.get_residual() < 1.0e-29);
    assert(vec_lin.get_residual() < 1.0e-29);

    assert(std::fabs(ary_lin.get_intercept() - 3.0) < 0.01);
    assert(std::fabs(vec_lin.get_intercept() - 3.0) < 0.01);
}
// -----------------------------------------------------------------------------

static void test_CubicSplineFitVisitor()  {

    using MyDataFrame = StdDataFrame<unsigned long>;

    std::cout << "\nTesting CubicSplineFitVisitor{  } ..." << std::endl;

    std::vector<unsigned long>  idx =
        { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460, 123461, 123462, 123466,
          123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473,
          123467, 123468, 123469, 123470, 123471, 123472, 123473,
        };
    MyDataFrame                 df;

    df.load_index(std::move(idx));
    df.load_column<double>("X1", { 1, 2, 3, 4, 5 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("Y1", { 6, 7, 8, 9, 3 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("X2", { 1, 2, 4, 6, 8 }, nan_policy::dont_pad_with_nans);
    df.load_column<double>("Y2", { 1, 3, 4, 5, 6 }, nan_policy::dont_pad_with_nans);

    CubicSplineFitVisitor<double>   csp_v1;

    df.single_act_visit<double, double>("X1", "Y1", csp_v1);

    const auto  &result1 = csp_v1.get_result();
    const auto  &c_vec = csp_v1.get_c_vec();
    const auto  &d_vec = csp_v1.get_d_vec();

    assert(result1.size() == 4);
    assert(std::fabs(result1[0] - 1.125) < 0.001);
    assert(std::fabs(result1[3] - -2.25) < 0.001);

    assert(c_vec.size() == 5);
    assert(std::fabs(c_vec[0] - 0) < 0.001);
    assert(std::fabs(c_vec[2] - 1.5) < 0.001);
    assert(std::fabs(c_vec[4] - 0) < 0.001);

    assert(d_vec.size() == 4);
    assert(std::fabs(d_vec[0] - -0.125) < 0.001);
    assert(std::fabs(d_vec[3] - 1.875) < 0.001);

    // Now multidimensional data
    //
    constexpr std::size_t   dim { 3 };
    constexpr std::size_t   n { 10 };

    using ary_col_t = std::array<double, dim>;
    using vec_col_t = std::vector<double>;

    std::vector<double>     x_col(n);
    std::vector<ary_col_t>  ary_y_col(n);
    std::vector<vec_col_t>  vec_y_col(n);

    for (std::size_t i { 0 }; i < n; ++i)  {
        x_col[i] = double(i) * (2.0 * M_PI / double(n - 1));
        ary_y_col[i] = { std::cos(x_col[i]), std::sin(x_col[i]), x_col[i] / 5.0 };
        vec_y_col[i] = { std::cos(x_col[i]), std::sin(x_col[i]), x_col[i] / 5.0 };
    }
    df.load_column<double>("MD X", std::move(x_col), nan_policy::dont_pad_with_nans);
    df.load_column<ary_col_t>("ARY MD Y", std::move(ary_y_col), nan_policy::dont_pad_with_nans);
    df.load_column<vec_col_t>("VEC MD Y", std::move(vec_y_col), nan_policy::dont_pad_with_nans);

    CubicSplineFitVisitor<ary_col_t>    ary_cpv;
    CubicSplineFitVisitor<vec_col_t>    vec_cpv;

    df.single_act_visit<double, ary_col_t>("MD X", "ARY MD Y", ary_cpv);
    df.single_act_visit<double, vec_col_t>("MD X", "VEC MD Y", vec_cpv);

    // B vector
    //
    assert(vec_cpv.get_result().size() == dim);
    for (const auto &vec : vec_cpv.get_result())
        assert(vec.size() == 9);
    assert(std::fabs(vec_cpv.get_result()[0][0] - -0.209847) < 0.000001);
    assert(std::fabs(vec_cpv.get_result()[1][7] - 0.173405) < 0.000001);
    assert(std::fabs(vec_cpv.get_result()[2][4] - 0.2) < 0.01);
    assert(ary_cpv.get_result().size() == dim);
    for (const auto &ary : ary_cpv.get_result())
        assert(ary.size() == 9);
    assert(std::fabs(ary_cpv.get_result()[0][0] - -0.209847) < 0.000001);
    assert(std::fabs(ary_cpv.get_result()[1][7] - 0.173405) < 0.000001);
    assert(std::fabs(ary_cpv.get_result()[2][4] - 0.2) < 0.01);

    assert(vec_cpv.get_c_vec().size() == dim);
    for (const auto &vec : vec_cpv.get_c_vec())
        assert(vec.size() == 10);
    assert(std::fabs(vec_cpv.get_c_vec()[0][1] - -0.538305) < 0.000001);
    assert(std::fabs(vec_cpv.get_c_vec()[1][7] - 0.51271) < 0.00001);
    assert(std::fabs(vec_cpv.get_c_vec()[2][4] - -1.58565e-16) < 0.00000001);
    assert(ary_cpv.get_c_vec().size() == dim);
    for (const auto &ary : ary_cpv.get_c_vec())
        assert(ary.size() == 10);
    assert(std::fabs(ary_cpv.get_c_vec()[0][1] - -0.538305) < 0.000001);
    assert(std::fabs(ary_cpv.get_c_vec()[1][7] - 0.51271) < 0.00001);
    assert(std::fabs(ary_cpv.get_c_vec()[2][4] - -1.58565e-16) < 0.00000001);

    assert(vec_cpv.get_d_vec().size() == dim);
    for (const auto &vec : vec_cpv.get_d_vec())
        assert(vec.size() == 9);
    assert(std::fabs(vec_cpv.get_d_vec()[0][0] - -0.257022) < 0.000001);
    assert(std::fabs(vec_cpv.get_d_vec()[1][7] - -0.085019) < 0.000001);
    assert(std::fabs(vec_cpv.get_c_vec()[2][4] - 1.80644e-16) < 0.00000001);
    assert(ary_cpv.get_d_vec().size() == dim);
    for (const auto &ary : ary_cpv.get_d_vec())
        assert(ary.size() == 9);
    assert(std::fabs(ary_cpv.get_d_vec()[0][0] - -0.257022) < 0.000001);
    assert(std::fabs(ary_cpv.get_d_vec()[1][7] - -0.085019) < 0.000001);
    assert(std::fabs(ary_cpv.get_c_vec()[2][4] - 1.80644e-16) < 0.00000001);
}

C++ DataFrame