| Signature | Description | Parameters |
|---|---|---|
#include <DataFrame/DataFrameStatsVisitors.h> template<typename T, typename I = unsigned long, std::size_t A = 0> struct PolyFitVisitor; // ------------------------------------- template<typename T, typename I = unsigned long, std::size_t A = 0> using pfit_v = PolyFitVisitor<T, I, A>; |
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface. This functor fits a N-degree polynomial through the given x-y coordinates by the way of Least-Squares and Gaussian-Elimination This works with both scalar and multidimensional (i.e. vectors or arrays) datasets. get_resut() gives you the vector of "degree + 1" coefficients. get_y_fits() returns the vector of y-fits for each given x. Y-fits are always scalar whether the X column is scalar or multidimensional. get_slope() returns the first (0-degree) coefficient. get_residual() returns the sum of weighted squared residuals.
using weight_func = std::function<T (const I &idx, size_t val_index)>;
explicit
PolyFitVisitor(std::size_t degree,
weight_func w_func = [](const I &, size_t) -> data_t { return (1); });
degree: The polynomial degreew_func: A functor that provides weights to be applied to sigma values. w_func is passed two parameters: (1) The value of the index column corresponding to the given y value, (2) The corresponding index into the y vector. The default is no weights. |
T: Column data type. I: Index type. A: Memory alignment boundary for vectors. Default is system default alignment |
#include <DataFrame/DataFrameStatsVisitors.h> template<typename T, typename I = unsigned long, std::size_t A = 0> struct LogFitVisitor; // ------------------------------------- template<typename T, typename I = unsigned long, std::size_t A = 0> using lfit_v = LogFitVisitor<T, I, A>; |
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface. This functor fits a y = b0 + b1 * log(x) through the given x-y coordinates by the way of calling PolyFit above on log(x). This works with both scalar and multidimensional (i.e. vectors or arrays) datasets. get_resut() gives you the vector of "degree + 1" coefficients. get_y_fits() returns the vector of y-fits for each given x. Y-fits are always scalar whether the X column is scalar or multidimensional. get_slope() returns the first (0-degree) coefficient. get_residual() returns the sum of weighted squared residuals.
using weight_func = std::function<T (const I &idx, size_t val_index)>;
explicit
LogFitVisitor(std::size_t degree = 1,
weight_func w_func = [](const I &, size_t) -> data_t { return (1); });
degree: The polynomial degreew_func: A functor that provides weights to be applied to sigma values. w_func is passed two parameters: (1) The value of the index column corresponding to the given y value, (2) The corresponding index into the y vector. The default is no weights. |
T: Column data type. I: Index type. A: Memory alignment boundary for vectors. Default is system default alignment |
#include <DataFrame/DataFrameStatsVisitors.h> template<typename T, typename I = unsigned long, std::size_t A = 0> struct ExponentialFitVisitor; // ------------------------------------- template<typename T, typename I = unsigned long, std::size_t A = 0> using efit_v = ExponentialFitVisitor<T, I, A>; |
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface. get_result() gives you the vector of y fits for each given x. get_slope() returns the slope -- coefficient of the exponential function. get_residual() returns the sum of squared residuals. get_intercept() returns the intercept.
ExponentialFitVisitor();
|
T: Column data type. I: Index type. A: Memory alignment boundary for vectors. Default is system default alignment |
#include <DataFrame/DataFrameStatsVisitors.h> template<typename T, typename I = unsigned long, std::size_t A = 0> struct PowerFitVisitor; // ------------------------------------- template<typename T, typename I = unsigned long, std::size_t A = 0> using powfit_v = PowerFitVisitor<T, I, A>; |
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface. The PowerFit method fits a function of the form y = axb to data by performing a least-squares fit using the transformed model function ln(y) = a0 + b*ln(x) where a0 is ln(a). The new function is linear in the model parameters, a0 and b. This works with both scalar and multidimensional (i.e. vectors or arrays) datasets. get_result() gives you the vector of y fits for each given x. get_slope() returns the slope -- logarithmic constant. In case of multidimensional input data, this is a vector of size dimensions. get_residual() returns the sum of squared residuals. get_intercept() returns the intercept -- coefficient of logarithmic term.
PowerFitVisitor();
|
T: Column data type. I: Index type. A: Memory alignment boundary for vectors. Default is system default alignment |
#include <DataFrame/DataFrameStatsVisitors.h> template<typename T, typename I = unsigned long, std::size_t A = 0> struct QuadraticFitVisitor; // ------------------------------------- template<typename T, typename I = unsigned long, std::size_t A = 0> using qfit_v = QuadraticFitVisitor<T, I, A>; |
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface. Quadratic regression is a statistical method used to model a relationship between variables with a parabolic best-fit curve, rather than a straight line. It's ideal when the data relationship appears curvilinear. The goal is to fit a quadratic equation y = ax2 + bx + c to the observed data, providing a nuanced model of the relationship. Contrary to historical or biological connotations, "regression" in this mathematical context refers to advancing our understanding of complex relationships among variables, particularly when data follows a curvilinear pattern. This is equivalent to using PolyFitVisitor with degree 2. But this is a simpler and faster algorithm. This works with both scalar and multidimensional (i.e. vectors or arrays) datasets. get_result() gives you the vector of y fits for each given x. get_residual() returns the sum of squared residuals. get_coeffs() returns the vector of coefficients in PolyFitVisitor monomial order. This only works for multidimensional input. get_slope() returns the quadratic coefficient. This only works for scalar input. get_intercept() returns the linear coefficient. This only works for scalar input. get_constant() which returns the constant term. This only works for scalar input.
QuadraticFitVisitor();
|
T: Column data type. I: Index type. A: Memory alignment boundary for vectors. Default is system default alignment |
#include <DataFrame/DataFrameStatsVisitors.h> template<typename T, typename I = unsigned long, std::size_t A = 0> struct LinearFitVisitor; // ------------------------------------- template<typename T, typename I = unsigned long, std::size_t A = 0> using linfit_v = LinearFitVisitor<T, I, A>; |
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface. This works with both scalar and multidimensional (i.e. vectors or arrays) datasets. get_result() gives you the vector of y fits for each given x. get_slope() returns the slope. get_residual() returns the sum of squared residuals. get_intercept() returns the intercept.
explicit
LinearFitVisitor(bool do_solve = true);
do_solve: This only applies to input columns that are multidimensional.
If true, it uses an efficient matrix solve method to do the problem.
But if you have dependent or near-dependent features in your dataset,
solve throws an exception for singular matrices. In that case you must
set do_solve to false, which causes the visitors to use the more
expensive SVD method.
|
T: Column data type. I: Index type. A: Memory alignment boundary for vectors. Default is system default alignment |
#include <DataFrame/DataFrameStatsVisitors.h> template<typename T, typename I = unsigned long, std::size_t A = 0> struct CubicSplineFitVisitor; // ------------------------------------- template<typename T, typename I = unsigned long, std::size_t A = 0> using csfit_v = CubicSplineFitVisitor<T, I, A>; |
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface. This functor uses cubic spline method to fit y into x-y coordinates specified by input. It is best described here. yi(x) = ai + bi(x - xi) + ci(x - xi)2 + di(x - xi)3 a is just the input y vector This works with both scalar and multidimensional (i.e. vectors or arrays) datasets. get_result() gives you the b vector of coefficients. In case of multidimensional data, this is a vector of dimension size of vectors of coefficients. get_c_vec() returns the c vector of coefficients. In case of multidimensional data, this is a vector of dimension size of vectors of coefficients. get_d_vec() returns the d vector of coefficients. In case of multidimensional data, this is a vector of dimension size of vectors of coefficients.
CubicSplineFitVisitor();
|
T: Column data type. I: Index type. A: Memory alignment boundary for vectors. Default is system default alignment |
static void test_PolyFitVisitor() { using MyDataFrame = StdDataFrame<unsigned long>; std::cout << "\nTesting PolyFitVisitor{ } ..." << std::endl; std::vector<unsigned long> idx = { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460, 123461, 123462, 123466, 123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473, }; MyDataFrame df; df.load_index(std::move(idx)); df.load_column<double>("X1", { 1, 2, 3, 4, 5 }, nan_policy::dont_pad_with_nans); df.load_column<double>("Y1", { 6, 7, 8, 9, 3 }, nan_policy::dont_pad_with_nans); df.load_column<double>("X2", { 0.0, 1.0, 2.0, 3.0, 4.0, 5.0 }, nan_policy::dont_pad_with_nans); df.load_column<double>("Y2", { 0.0, 0.8, 0.9, 0.1, -0.8, -1.0 }, nan_policy::dont_pad_with_nans); PolyFitVisitor<double> poly_v1 (2); PolyFitVisitor<double> poly_v12 (2, [](const unsigned int &, std::size_t i) -> double { const std::array<double, 5> weights = { 0.1, 0.8, 0.3, 0.5, 0.2 }; return (weights[i]); }); auto result1 = df.single_act_visit<double, double>("X1", "Y1", poly_v1).get_result(); auto result12 = df.single_act_visit<double, double>("X1", "Y1", poly_v12).get_result(); auto actual1 = std::vector<double> { 0.8, 5.6, -1 }; auto actual1_y = std::vector<double>{ 5.4, 8, 8.6, 7.2, 3.8 }; auto actual12 = std::vector<double>{ -1.97994, 6.99713, -1.14327 }; assert(std::fabs(poly_v1.get_residual() - 5.6) < 0.00001); for (size_t i = 0; i < result1.size(); ++i) assert(fabs(result1[i] - actual1[i]) < 0.00001); for (size_t i = 0; i < poly_v1.get_y_fits().size(); ++i) assert(fabs(poly_v1.get_y_fits()[i] - actual1_y[i]) < 0.01); assert(std::fabs(poly_v12.get_residual() - 0.70981) < 0.00001); for (size_t i = 0; i < result12.size(); ++i) assert(fabs(result12[i] - actual12[i]) < 0.00001); PolyFitVisitor<double> poly_v2 (3); auto result2 = df.single_act_visit<double, double>("X2", "Y2", poly_v2).get_result(); auto actual2 = std::vector<double> { -0.0396825, 1.69312, -0.813492, 0.087037 }; for (size_t i = 0; i < result2.size(); ++i) assert(fabs(result2[i] - actual2[i]) < 0.00001); // Now multidimensional data // constexpr std::size_t dim { 2 }; using ary_col_t = std::array<double, dim>; using vec_col_t = std::vector<double>; std::vector<ary_col_t> ary_md_x { { 0.0, 0.0 }, { 1.0, 0.0 }, { 0.0, 1.0 }, { 1.0, 1.0 }, { 2.0, 0.0 }, { 0.0, 2.0 }, { 2.0, 2.0 }, { 1.0, 2.0 }, { 2.0, 1.0 }, { 1.5, 1.5 }, }; std::vector<vec_col_t> vec_md_x { { 0.0, 0.0 }, { 1.0, 0.0 }, { 0.0, 1.0 }, { 1.0, 1.0 }, { 2.0, 0.0 }, { 0.0, 2.0 }, { 2.0, 2.0 }, { 1.0, 2.0 }, { 2.0, 1.0 }, { 1.5, 1.5 }, }; df.load_column<ary_col_t>("ARY MD X", std::move(ary_md_x), nan_policy::dont_pad_with_nans); df.load_column<vec_col_t>("VEC MD X", std::move(vec_md_x), nan_policy::dont_pad_with_nans); auto ground_truth = [](const ary_col_t &x) -> double { return (1.0 + 3.0 * (x[0] + x[1]) + 2.0 * (x[0] * x[0] + x[1] * x[1])); }; std::vector<double> md_y; const auto &md_x_col = df.get_column<ary_col_t>("ARY MD X"); md_y.reserve(md_x_col.size()); for (const auto &x : md_x_col) md_y.push_back(ground_truth(x)); df.load_column<double>("MD Y", std::move(md_y), nan_policy::dont_pad_with_nans); PolyFitVisitor<ary_col_t> ary_poly { 2 }; PolyFitVisitor<vec_col_t> vec_poly { 2 }; df.single_act_visit<ary_col_t, double>("ARY MD X", "MD Y", ary_poly); df.single_act_visit<vec_col_t, double>("VEC MD X", "MD Y", vec_poly); assert(ary_poly.get_result().size() == 6); assert(vec_poly.get_result().size() == 6); assert(std::fabs(ary_poly.get_result()[0] - 1.0) < 0.0001); assert(std::fabs(ary_poly.get_result()[1] - 3.0) < 0.0001); assert(std::fabs(ary_poly.get_result()[2] - 3.0) < 0.0001); assert(std::fabs(ary_poly.get_result()[3] - 2.0) < 0.0001); assert(std::fabs(ary_poly.get_result()[4] - 0.0) < 0.0001); assert(std::fabs(ary_poly.get_result()[5] - 2.0) < 0.0001); assert(std::fabs(vec_poly.get_result()[0] - 1.0) < 0.0001); assert(std::fabs(vec_poly.get_result()[1] - 3.0) < 0.0001); assert(std::fabs(vec_poly.get_result()[2] - 3.0) < 0.0001); assert(std::fabs(vec_poly.get_result()[3] - 2.0) < 0.0001); assert(std::fabs(vec_poly.get_result()[4] - 0.0) < 0.0001); assert(std::fabs(vec_poly.get_result()[5] - 2.0) < 0.0001); assert(ary_poly.get_y_fits().size() == md_x_col.size()); assert(vec_poly.get_y_fits().size() == md_x_col.size()); assert(std::fabs(ary_poly.get_y_fits()[0] - 1.0) < 0.0001); assert(std::fabs(ary_poly.get_y_fits()[5] - 15.0) < 0.0001); assert(std::fabs(ary_poly.get_y_fits()[9] - 19.0) < 0.0001); assert(std::fabs(vec_poly.get_y_fits()[0] - 1.0) < 0.0001); assert(std::fabs(vec_poly.get_y_fits()[5] - 15.0) < 0.0001); assert(std::fabs(vec_poly.get_y_fits()[9] - 19.0) < 0.0001); assert(ary_poly.get_residual() < 1.0e-20); assert(vec_poly.get_residual() < 1.0e-20); } // ----------------------------------------------------------------------------- static void test_LogFitVisitor() { using MyDataFrame = StdDataFrame<unsigned long>; std::cout << "\nTesting LogFitVisitor{ } ..." << std::endl; std::vector<unsigned long> idx = { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460, 123461, 123462, 123466, 123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473, }; MyDataFrame df; df.load_index(std::move(idx)); df.load_column<double>("X1", { 1, 2, 3, 4, 5 }, nan_policy::dont_pad_with_nans); df.load_column<double>("Y1", { 6, 7, 8, 9, 3 }, nan_policy::dont_pad_with_nans); df.load_column<double>("X2", { 1, 2, 4, 6, 8 }, nan_policy::dont_pad_with_nans); df.load_column<double>("Y2", { 1, 3, 4, 5, 6 }, nan_policy::dont_pad_with_nans); LogFitVisitor<double> log_v1; auto result1 = df.single_act_visit<double, double>("X1", "Y1", log_v1).get_result(); auto actual1 = std::vector<double> { 6.98618, -0.403317 }; auto actual1_y = std::vector<double> { 6.98618, 6.70662, 6.54309, 6.42706, 6.33706 }; assert(std::fabs(log_v1.get_residual() - 20.9372) < 0.0001); for (size_t i = 0; i < result1.size(); ++i) assert(fabs(result1[i] - actual1[i]) < 0.00001); for (size_t i = 0; i < log_v1.get_y_fits().size(); ++i) assert(fabs(log_v1.get_y_fits()[i] - actual1_y[i]) < 0.01); LogFitVisitor<double> log_v2; auto result2 = df.single_act_visit<double, double>("X2", "Y2", log_v2).get_result(); auto actual2 = std::vector<double> { 1.11199, 2.25859 }; assert(std::fabs(log_v2.get_residual() - 0.237476) < 0.00001); for (size_t i = 0; i < result2.size(); ++i) assert(fabs(result2[i] - actual2[i]) < 0.00001); // Now multidimensional data // constexpr std::size_t dim { 2 }; using ary_col_t = std::array<double, dim>; using vec_col_t = std::vector<double>; std::vector<ary_col_t> ary_md_x { { 1.0, 2.0 }, { 2.0, 3.0 }, { 3.0, 5.0 }, { 4.0, 1.5 }, { 5.0, 4.0 }, { 6.0, 2.5 }, { 7.0, 6.0 }, { 8.0, 3.5 }, { 9.0, 7.0 }, { 10.0, 8.0 } }; std::vector<vec_col_t> vec_md_x { { 1.0, 2.0 }, { 2.0, 3.0 }, { 3.0, 5.0 }, { 4.0, 1.5 }, { 5.0, 4.0 }, { 6.0, 2.5 }, { 7.0, 6.0 }, { 8.0, 3.5 }, { 9.0, 7.0 }, { 10.0, 8.0 } }; df.load_column<ary_col_t>("ARY MD X", std::move(ary_md_x), nan_policy::dont_pad_with_nans); df.load_column<vec_col_t>("VEC MD X", std::move(vec_md_x), nan_policy::dont_pad_with_nans); auto ground_truth = [](const ary_col_t &x) -> double { return (3.0 * std::log(x[0]) + 1.5 * std::log(x[1]) + 2.0); }; std::vector<double> md_y; const auto &md_x_col = df.get_column<ary_col_t>("ARY MD X"); md_y.reserve(md_x_col.size()); for (const auto &x : md_x_col) md_y.push_back(ground_truth(x)); df.load_column<double>("MD Y", std::move(md_y), nan_policy::dont_pad_with_nans); auto w_func { [](const unsigned long &, std::size_t) -> double { return (1); } }; LogFitVisitor<ary_col_t> ary_log { w_func, 2 }; LogFitVisitor<vec_col_t> vec_log { w_func, 2 }; df.single_act_visit<ary_col_t, double>("ARY MD X", "MD Y", ary_log); df.single_act_visit<vec_col_t, double>("VEC MD X", "MD Y", vec_log); assert(ary_log.get_result().size() == 6); assert(vec_log.get_result().size() == 6); assert(std::fabs(ary_log.get_result()[0] - 2.0) < 0.01); assert(std::fabs(ary_log.get_result()[2] - 3.0) < 0.01); assert(ary_log.get_result()[5] < 1.0e-12); assert(std::fabs(vec_log.get_result()[0] - 2.0) < 0.00001); assert(std::fabs(vec_log.get_result()[2] - 3.0) < 0.000001); assert(vec_log.get_result()[5] < 1.0e-12); assert(ary_log.get_y_fits().size() == md_x_col.size()); assert(vec_log.get_y_fits().size() == md_x_col.size()); assert(std::fabs(ary_log.get_y_fits()[0] - 3.03972) < 0.00001); assert(std::fabs(ary_log.get_y_fits()[5] - 8.74971) < 0.00001); assert(std::fabs(ary_log.get_y_fits()[9] - 12.0269) < 0.0001); assert(std::fabs(vec_log.get_y_fits()[0] - 3.03972) < 0.00001); assert(std::fabs(vec_log.get_y_fits()[5] - 8.74971) < 0.00001); assert(std::fabs(vec_log.get_y_fits()[9] - 12.0269) < 0.0001); assert(ary_log.get_residual() < 1.0e-22); assert(ary_log.get_residual() < 1.0e-22); assert(std::fabs(ary_log.get_slope() - 2.0) < 0.0001); assert(std::fabs(ary_log.get_slope() - 2.0) < 0.01); } // ----------------------------------------------------------------------------- static void test_ExponentialFitVisitor() { std::cout << "\nTesting ExponentialFitVisitor{ } ..." << std::endl; std::vector<unsigned long> idx = { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460, 123461, 123462, 123466, 123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473, }; MyDataFrame df; df.load_index(std::move(idx)); df.load_column<double>("X1", { 1, 2, 3, 4, 5 }, nan_policy::dont_pad_with_nans); df.load_column<double>("Y1", { 6, 7, 8, 9, 3 }, nan_policy::dont_pad_with_nans); df.load_column<double>("X2", { 1, 2, 4, 6, 8 }, nan_policy::dont_pad_with_nans); df.load_column<double>("Y2", { 1, 3, 4, 5, 6 }, nan_policy::dont_pad_with_nans); ExponentialFitVisitor<double> exp_v1; auto result1 = df.single_act_visit<double, double>("X1", "Y1", exp_v1).get_result(); auto actual1 = std::vector<double> { 7.7647, 6.9316, 6.1879, 5.5239, 4.93126 }; assert(std::fabs(exp_v1.get_residual() - 22.2154) < 0.0001); for (size_t i = 0; i < result1.size(); ++i) assert(fabs(result1[i] - actual1[i]) < 0.0001); efit_v<double> exp_v2; auto result2 = df.single_act_visit<double, double>("X2", "Y2", exp_v2).get_result(); auto actual2 = std::vector<double> { 1.63751, 2.02776, 3.10952, 4.76833, 7.31206 }; assert(std::fabs(exp_v2.get_residual() - 3.919765) < 0.00001); for (size_t i = 0; i < result2.size(); ++i) assert(fabs(result2[i] - actual2[i]) < 0.0001); } // ----------------------------------------------------------------------------- static void test_PowerFitVisitor() { using MyDataFrame = StdDataFrame<unsigned long>; std::cout << "\nTesting PowerFitVisitor{ } ..." << std::endl; std::vector<unsigned long> idx = { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460, 123461, 123462, 123466, 123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473, }; MyDataFrame df; df.load_index(std::move(idx)); df.load_column<double>("X1", { 1, 2, 3, 4, 5, 6 }, nan_policy::dont_pad_with_nans); df.load_column<double>("Y1", { 2.98, 7.26, 11.75, 22.72, 27.20, 38.57 }, nan_policy::dont_pad_with_nans); PowerFitVisitor<double> pow_v; df.single_act_visit<double, double>("X1", "Y1", pow_v); assert(std::fabs(pow_v.get_residual() - 13.4058) < 0.0001); assert(std::fabs(pow_v.get_slope() - 1.4332) < 0.0001); assert(std::fabs(pow_v.get_intercept() - 1.0313) < 0.0001); const auto actual = std::vector<double> { 2.8047, 7.5739, 13.5423, 20.4527, 28.1605, 36.5697 }; for (size_t i = 0; i < pow_v.get_result().size(); ++i) assert(fabs(pow_v.get_result()[i] - actual[i]) < 0.0001); // Now multidimensional data // constexpr std::size_t dim { 3 }; using ary_col_t = std::array<double, dim>; using vec_col_t = std::vector<double>; std::vector<ary_col_t> ary_md_x { { 1.0, 2.0, 3.0 }, { 2.0, 3.0, 1.5 }, { 3.0, 1.0, 4.0 }, { 4.0, 2.5, 2.0 }, { 1.5, 4.0, 1.0 }, { 5.0, 1.5, 3.5 }, { 2.5, 3.5, 2.5 }, { 3.5, 2.0, 1.0 }, { 1.0, 5.0, 4.5 }, { 4.5, 3.0, 2.0 }, }; std::vector<vec_col_t> vec_md_x { { 1.0, 2.0, 3.0 }, { 2.0, 3.0, 1.5 }, { 3.0, 1.0, 4.0 }, { 4.0, 2.5, 2.0 }, { 1.5, 4.0, 1.0 }, { 5.0, 1.5, 3.5 }, { 2.5, 3.5, 2.5 }, { 3.5, 2.0, 1.0 }, { 1.0, 5.0, 4.5 }, { 4.5, 3.0, 2.0 }, }; df.load_column<ary_col_t>("ARY MD X", std::move(ary_md_x), nan_policy::dont_pad_with_nans); df.load_column<vec_col_t>("VEC MD X", std::move(vec_md_x), nan_policy::dont_pad_with_nans); std::vector<double> ground_truth { 2.0 * std::pow(1.0,1.5) * std::pow(2.0,0.8) * std::pow(3.0,0.5) + 0.05, 2.0 * std::pow(2.0,1.5) * std::pow(3.0,0.8) * std::pow(1.5,0.5) - 0.08, 2.0 * std::pow(3.0,1.5) * std::pow(1.0,0.8) * std::pow(4.0,0.5) + 0.03, 2.0 * std::pow(4.0,1.5) * std::pow(2.5,0.8) * std::pow(2.0,0.5) - 0.06, 2.0 * std::pow(1.5,1.5) * std::pow(4.0,0.8) * std::pow(1.0,0.5) + 0.02, 2.0 * std::pow(5.0,1.5) * std::pow(1.5,0.8) * std::pow(3.5,0.5) - 0.10, 2.0 * std::pow(2.5,1.5) * std::pow(3.5,0.8) * std::pow(2.5,0.5) + 0.07, 2.0 * std::pow(3.5,1.5) * std::pow(2.0,0.8) * std::pow(1.0,0.5) - 0.04, 2.0 * std::pow(1.0,1.5) * std::pow(5.0,0.8) * std::pow(4.5,0.5) + 0.01, 2.0 * std::pow(4.5,1.5) * std::pow(3.0,0.8) * std::pow(2.0,0.5) - 0.09, }; df.load_column<double>("MD Y", std::move(ground_truth), nan_policy::dont_pad_with_nans); PowerFitVisitor<ary_col_t> ary_pow; PowerFitVisitor<vec_col_t> vec_pow; df.single_act_visit<ary_col_t, double>("ARY MD X", "MD Y", ary_pow); df.single_act_visit<vec_col_t, double>("VEC MD X", "MD Y", vec_pow); // Y-Fits // assert(ary_pow.get_result().size() == 10); assert(vec_pow.get_result().size() == 10); assert(std::fabs(ary_pow.get_result()[0] - 6.062) < 0.001); assert(std::fabs(ary_pow.get_result()[5] - 57.8006) < 0.0001); assert(std::fabs(ary_pow.get_result()[9] - 64.829) < 0.001); assert(std::fabs(vec_pow.get_result()[0] - 6.062) < 0.001); assert(std::fabs(vec_pow.get_result()[5] - 57.8006) < 0.0001); assert(std::fabs(vec_pow.get_result()[9] - 64.829) < 0.001); assert(ary_pow.get_slope().size() == 3); assert(vec_pow.get_slope().size() == 3); assert(std::fabs(ary_pow.get_slope()[0] - 1.49562) < 0.00001); assert(std::fabs(ary_pow.get_slope()[2] - 0.50078) < 0.00001); assert(std::fabs(vec_pow.get_slope()[0] - 1.49562) < 0.00001); assert(std::fabs(vec_pow.get_slope()[2] - 0.50078) < 0.00001); assert(std::fabs(ary_pow.get_residual() - 0.0312596) < 0.0000001); assert(std::fabs(ary_pow.get_intercept() - 0.699298) < 0.000001); } // ----------------------------------------------------------------------------- static void test_QuadraticFitVisitor() { using MyDataFrame = StdDataFrame<unsigned long>; std::cout << "\nTesting QuadraticFitVisitor{ } ..." << std::endl; std::vector<unsigned long> idx = { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460, 123461, 123462, 123466, 123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473, }; MyDataFrame df; df.load_index(std::move(idx)); df.load_column<double>("X1", { 10, 15, 20, 24, 30, 34, 40, 45, 48, 50, 58 }, nan_policy::dont_pad_with_nans); df.load_column<double>("Y1", { 115.6, 157.2, 189.2, 220.8, 253.8, 269.2, 284.8, 285, 277.4, 269.2, 244.2 }, nan_policy::dont_pad_with_nans); QuadraticFitVisitor<double, unsigned long, 64> qud_v; df.single_act_visit<double, double>("X1", "Y1", qud_v); assert(std::fabs(qud_v.get_residual() - 188.2975) < 0.0001); assert(std::fabs(qud_v.get_slope() - -0.1561) < 0.0001); assert(std::fabs(qud_v.get_intercept() - 13.4522) < 0.0001); assert(std::fabs(qud_v.get_constant() - -9.0735) < 0.0001); const auto actual = std::vector<double> { 109.8382, 157.5867, 197.5302, 223.8654, 254.0023, 267.8496, 279.2546, 280.1733, 276.9781, 273.287, 246.0347 }; for (size_t i = 0; i < qud_v.get_result().size(); ++i) assert(fabs(qud_v.get_result()[i] - actual[i]) < 0.0001); // Now multidimensional data // constexpr std::size_t dim { 3 }; using ary_col_t = std::array<double, dim>; using vec_col_t = std::vector<double>; std::vector<ary_col_t> ary_md_x { { 0.0, 0.0, 0.0 }, { 1.0, 0.0, 0.0 }, { 0.0, 1.0, 0.0 }, { 0.0, 0.0, 1.0 }, {-1.0, 1.0, 0.0 }, { 1.0, -1.0, 1.0 }, { 2.0, 1.0, -1.0 }, {-1.0, -1.0, 2.0 }, { 1.0, 2.0, 1.0 }, { 2.0, -1.0, 2.0 } }; std::vector<vec_col_t> vec_md_x { { 0.0, 0.0, 0.0 }, { 1.0, 0.0, 0.0 }, { 0.0, 1.0, 0.0 }, { 0.0, 0.0, 1.0 }, {-1.0, 1.0, 0.0 }, { 1.0, -1.0, 1.0 }, { 2.0, 1.0, -1.0 }, {-1.0, -1.0, 2.0 }, { 1.0, 2.0, 1.0 }, { 2.0, -1.0, 2.0 } }; df.load_column<ary_col_t>("ARY MD X", std::move(ary_md_x), nan_policy::dont_pad_with_nans); df.load_column<vec_col_t>("VEC MD X", std::move(vec_md_x), nan_policy::dont_pad_with_nans); // True model: y = 2 + x0 - x1 + 2*x2 + x0^2 + 2*x0*x1 + 3*x1^2 - x2^2 // (x0*x2 and x1*x2 terms are intentionally absent → coeffs should be ~0) // auto ground_truth = [](double x0, double x1, double x2) { return (2.0 + x0 - x1 + 2.0 * x2 + x0 * x0 + 2.0 * x0 * x1 + 3.0 * x1 * x1 - x2 * x2); }; std::vector<double> md_y; const auto &md_x_col = df.get_column<ary_col_t>("ARY MD X"); md_y.reserve(md_x_col.size()); for (const auto &p : md_x_col) md_y.push_back(ground_truth(p[0], p[1], p[2])); df.load_column<double>("MD Y", std::move(md_y), nan_policy::dont_pad_with_nans); QuadraticFitVisitor<ary_col_t> ary_qud; QuadraticFitVisitor<vec_col_t> vec_qud; df.single_act_visit<ary_col_t, double>("ARY MD X", "MD Y", ary_qud); df.single_act_visit<vec_col_t, double>("VEC MD X", "MD Y", vec_qud); // Y-Fits // assert(ary_qud.get_result().size() == 10); assert(vec_qud.get_result().size() == 10); assert(std::fabs(ary_qud.get_result()[0] - 2.0) < 0.01); assert(std::fabs(ary_qud.get_result()[5] - 7.0) < 0.01); assert(std::fabs(ary_qud.get_result()[9] - 8.0) < 0.01); assert(std::fabs(vec_qud.get_result()[0] - 2.0) < 0.01); assert(std::fabs(vec_qud.get_result()[5] - 7.0) < 0.01); assert(std::fabs(vec_qud.get_result()[9] - 8.0) < 0.01); assert(ary_qud.get_coeffs().size() == 10); assert(vec_qud.get_coeffs().size() == 10); assert(std::fabs(ary_qud.get_coeffs()[0] - 2.0) < 0.01); assert(std::fabs(ary_qud.get_coeffs()[5] - -9.09495e-13) < 0.0000000001); assert(std::fabs(ary_qud.get_coeffs()[9] - 1.0) < 0.01); assert(std::fabs(vec_qud.get_coeffs()[0] - 2.0) < 0.01); assert(std::fabs(vec_qud.get_coeffs()[5] - -9.09495e-13) < 0.0000000001); assert(std::fabs(vec_qud.get_coeffs()[9] - 1.0) < 0.01); assert(ary_qud.get_residual() < 1.0e-21); assert(vec_qud.get_residual() < 1.0e-21); } // ----------------------------------------------------------------------------- static void test_LinearFitVisitor() { using MyDataFrame = StdDataFrame<unsigned long>; std::cout << "\nTesting LinearFitVisitor{ } ..." << std::endl; std::vector<unsigned long> idx = { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460, 123461, 123462, 123466, 123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473, }; MyDataFrame df; df.load_index(std::move(idx)); df.load_column<double>("X1", { 1, 2, 3, 4, 5 }, nan_policy::dont_pad_with_nans); df.load_column<double>("Y1", { 6, 7, 8, 9, 3 }, nan_policy::dont_pad_with_nans); df.load_column<double>("X2", { 1, 2, 4, 6, 8 }, nan_policy::dont_pad_with_nans); df.load_column<double>("Y2", { 1, 3, 4, 5, 6 }, nan_policy::dont_pad_with_nans); LinearFitVisitor<double> lin_v1; auto &result1 = df.single_act_visit<double, double>("X1", "Y1", lin_v1).get_result(); const auto actual1 = std::vector<double> { 7.4, 7, 6.6, 6.2, 5.8 }; assert(std::fabs(lin_v1.get_residual() - 19.6) < 0.01); for (size_t i = 0; i < result1.size(); ++i) assert(fabs(result1[i] - actual1[i]) < 0.0001); linfit_v<double> lin_v2; auto result2 = df.single_act_visit<double, double>("X2", "Y2", lin_v2).get_result(); auto actual2 = std::vector<double> { 1.73171, 2.37805, 3.67073, 4.96341, 6.2561 }; assert(std::fabs(lin_v2.get_residual() - 1.097561) < 0.00001); for (size_t i = 0; i < result2.size(); ++i) assert(fabs(result2[i] - actual2[i]) < 0.0001); // Now multidimensional data // constexpr std::size_t dim { 2 }; using ary_col_t = std::array<double, dim>; using vec_col_t = std::vector<double>; std::vector<ary_col_t> ary_md_x { { 1.0, 2.0 }, { 2.0, 1.0 }, { 3.0, 3.0 }, { 4.0, 0.5 }, { 1.5, 2.5 }, { 5.0, 1.5 }, { 2.5, 4.0 }, { 3.5, 0.0 }, { 0.5, 3.5 }, { 4.5, 2.0 } }; std::vector<vec_col_t> vec_md_x { { 1.0, 2.0 }, { 2.0, 1.0 }, { 3.0, 3.0 }, { 4.0, 0.5 }, { 1.5, 2.5 }, { 5.0, 1.5 }, { 2.5, 4.0 }, { 3.5, 0.0 }, { 0.5, 3.5 }, { 4.5, 2.0 } }; df.load_column<ary_col_t>("ARY MD X", std::move(ary_md_x), nan_policy::dont_pad_with_nans); df.load_column<vec_col_t>("VEC MD X", std::move(vec_md_x), nan_policy::dont_pad_with_nans); std::vector<double> ground_truth { 0.5, // 3 + 1.5(1.0) - 2.0(2.0) 4.0, // 3 + 1.5(2.0) - 2.0(1.0) 1.5, // 3 + 1.5(3.0) - 2.0(3.0) 8.0, // 3 + 1.5(4.0) - 2.0(0.5) 0.25, // 3 + 1.5(1.5) - 2.0(2.5) 7.5, // 3 + 1.5(5.0) - 2.0(1.5) -1.25, // 3 + 1.5(2.5) - 2.0(4.0) 8.25, // 3 + 1.5(3.5) - 2.0(0.0) -3.25, // 3 + 1.5(0.5) - 2.0(3.5) 5.75, // 3 + 1.5(4.5) - 2.0(2.0) }; df.load_column<double>("MD Y", std::move(ground_truth), nan_policy::dont_pad_with_nans); LinearFitVisitor<ary_col_t> ary_lin { false }; LinearFitVisitor<vec_col_t> vec_lin; df.single_act_visit<ary_col_t, double>("ARY MD X", "MD Y", ary_lin); df.single_act_visit<vec_col_t, double>("VEC MD X", "MD Y", vec_lin); // Y-Fits // assert(ary_lin.get_result().size() == 10); assert(vec_lin.get_result().size() == 10); assert(std::fabs(ary_lin.get_result()[0] - 0.5) < 0.01); assert(std::fabs(ary_lin.get_result()[5] - 7.5) < 0.01); assert(std::fabs(ary_lin.get_result()[9] - 5.75) < 0.01); assert(std::fabs(vec_lin.get_result()[0] - 0.5) < 0.01); assert(std::fabs(vec_lin.get_result()[5] - 7.5) < 0.01); assert(std::fabs(vec_lin.get_result()[9] - 5.75) < 0.01); assert(ary_lin.get_slope().size() == dim); assert(vec_lin.get_slope().size() == dim); assert(std::fabs(ary_lin.get_slope()[0] - 1.5) < 0.01); assert(std::fabs(ary_lin.get_slope()[1] - -2.0) < 0.01); assert(std::fabs(vec_lin.get_slope()[0] - 1.5) < 0.01); assert(std::fabs(vec_lin.get_slope()[1] - -2.0) < 0.01); assert(ary_lin.get_residual() < 1.0e-29); assert(vec_lin.get_residual() < 1.0e-29); assert(std::fabs(ary_lin.get_intercept() - 3.0) < 0.01); assert(std::fabs(vec_lin.get_intercept() - 3.0) < 0.01); }
// ----------------------------------------------------------------------------- static void test_CubicSplineFitVisitor() { using MyDataFrame = StdDataFrame<unsigned long>; std::cout << "\nTesting CubicSplineFitVisitor{ } ..." << std::endl; std::vector<unsigned long> idx = { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460, 123461, 123462, 123466, 123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473, 123467, 123468, 123469, 123470, 123471, 123472, 123473, }; MyDataFrame df; df.load_index(std::move(idx)); df.load_column<double>("X1", { 1, 2, 3, 4, 5 }, nan_policy::dont_pad_with_nans); df.load_column<double>("Y1", { 6, 7, 8, 9, 3 }, nan_policy::dont_pad_with_nans); df.load_column<double>("X2", { 1, 2, 4, 6, 8 }, nan_policy::dont_pad_with_nans); df.load_column<double>("Y2", { 1, 3, 4, 5, 6 }, nan_policy::dont_pad_with_nans); CubicSplineFitVisitor<double> csp_v1; df.single_act_visit<double, double>("X1", "Y1", csp_v1); const auto &result1 = csp_v1.get_result(); const auto &c_vec = csp_v1.get_c_vec(); const auto &d_vec = csp_v1.get_d_vec(); assert(result1.size() == 4); assert(std::fabs(result1[0] - 1.125) < 0.001); assert(std::fabs(result1[3] - -2.25) < 0.001); assert(c_vec.size() == 5); assert(std::fabs(c_vec[0] - 0) < 0.001); assert(std::fabs(c_vec[2] - 1.5) < 0.001); assert(std::fabs(c_vec[4] - 0) < 0.001); assert(d_vec.size() == 4); assert(std::fabs(d_vec[0] - -0.125) < 0.001); assert(std::fabs(d_vec[3] - 1.875) < 0.001); // Now multidimensional data // constexpr std::size_t dim { 3 }; constexpr std::size_t n { 10 }; using ary_col_t = std::array<double, dim>; using vec_col_t = std::vector<double>; std::vector<double> x_col(n); std::vector<ary_col_t> ary_y_col(n); std::vector<vec_col_t> vec_y_col(n); for (std::size_t i { 0 }; i < n; ++i) { x_col[i] = double(i) * (2.0 * M_PI / double(n - 1)); ary_y_col[i] = { std::cos(x_col[i]), std::sin(x_col[i]), x_col[i] / 5.0 }; vec_y_col[i] = { std::cos(x_col[i]), std::sin(x_col[i]), x_col[i] / 5.0 }; } df.load_column<double>("MD X", std::move(x_col), nan_policy::dont_pad_with_nans); df.load_column<ary_col_t>("ARY MD Y", std::move(ary_y_col), nan_policy::dont_pad_with_nans); df.load_column<vec_col_t>("VEC MD Y", std::move(vec_y_col), nan_policy::dont_pad_with_nans); CubicSplineFitVisitor<ary_col_t> ary_cpv; CubicSplineFitVisitor<vec_col_t> vec_cpv; df.single_act_visit<double, ary_col_t>("MD X", "ARY MD Y", ary_cpv); df.single_act_visit<double, vec_col_t>("MD X", "VEC MD Y", vec_cpv); // B vector // assert(vec_cpv.get_result().size() == dim); for (const auto &vec : vec_cpv.get_result()) assert(vec.size() == 9); assert(std::fabs(vec_cpv.get_result()[0][0] - -0.209847) < 0.000001); assert(std::fabs(vec_cpv.get_result()[1][7] - 0.173405) < 0.000001); assert(std::fabs(vec_cpv.get_result()[2][4] - 0.2) < 0.01); assert(ary_cpv.get_result().size() == dim); for (const auto &ary : ary_cpv.get_result()) assert(ary.size() == 9); assert(std::fabs(ary_cpv.get_result()[0][0] - -0.209847) < 0.000001); assert(std::fabs(ary_cpv.get_result()[1][7] - 0.173405) < 0.000001); assert(std::fabs(ary_cpv.get_result()[2][4] - 0.2) < 0.01); assert(vec_cpv.get_c_vec().size() == dim); for (const auto &vec : vec_cpv.get_c_vec()) assert(vec.size() == 10); assert(std::fabs(vec_cpv.get_c_vec()[0][1] - -0.538305) < 0.000001); assert(std::fabs(vec_cpv.get_c_vec()[1][7] - 0.51271) < 0.00001); assert(std::fabs(vec_cpv.get_c_vec()[2][4] - -1.58565e-16) < 0.00000001); assert(ary_cpv.get_c_vec().size() == dim); for (const auto &ary : ary_cpv.get_c_vec()) assert(ary.size() == 10); assert(std::fabs(ary_cpv.get_c_vec()[0][1] - -0.538305) < 0.000001); assert(std::fabs(ary_cpv.get_c_vec()[1][7] - 0.51271) < 0.00001); assert(std::fabs(ary_cpv.get_c_vec()[2][4] - -1.58565e-16) < 0.00000001); assert(vec_cpv.get_d_vec().size() == dim); for (const auto &vec : vec_cpv.get_d_vec()) assert(vec.size() == 9); assert(std::fabs(vec_cpv.get_d_vec()[0][0] - -0.257022) < 0.000001); assert(std::fabs(vec_cpv.get_d_vec()[1][7] - -0.085019) < 0.000001); assert(std::fabs(vec_cpv.get_c_vec()[2][4] - 1.80644e-16) < 0.00000001); assert(ary_cpv.get_d_vec().size() == dim); for (const auto &ary : ary_cpv.get_d_vec()) assert(ary.size() == 9); assert(std::fabs(ary_cpv.get_d_vec()[0][0] - -0.257022) < 0.000001); assert(std::fabs(ary_cpv.get_d_vec()[1][7] - -0.085019) < 0.000001); assert(std::fabs(ary_cpv.get_c_vec()[2][4] - 1.80644e-16) < 0.00000001); }