| Signature | Description | Parameters |
|---|---|---|
template<arithmetic T, typename ... Ts> std::vector<DataFrame> get_data_by_mshift(const char *col_name, double kernel_bandwidth, double max_distance, mean_shift_kernel kernel = mean_shift_kernel::gaussian, std::function<double(const T &x, const T &y)> &&dfunc = [](const T &x, const T &y) -> double { return ((x - y) * (x - y)); }, size_type num_of_iter = 50) const; |
This uses Mean-Shift algorithm to divide the named column into clusters. It returns an array of DataFrame's each containing one of the clusters of data based on the named column. Unlike K-Means clustering, you do not have to specify the number of clusters. Self is unchanged. NOTE: Type T must support arithmetic operations |
T: Type of the named column Ts: The list of types for all columns. A type should be specified only once col_name: Name of the data column kernel_bandwidth: The width or spread of the kernel function used max_distance: Maximum distance between two data points in the same cluster mean_shift_kernel: Kernel type used dfunc: A function to calculate the distance between two data points in the named column num_of_iter: Maximum number of iterations for AP clustering algorithm to converge |
template<arithmetic T, typename ... Ts> std::vector<PtrView> get_view_by_mshift(const char *col_name, double kernel_bandwidth, double max_distance, mean_shift_kernel kernel = mean_shift_kernel::gaussian, std::function<double(const T &x, const T &y)> &&dfunc = [](const T &x, const T &y) -> double { return ((x - y) * (x - y)); }, size_type num_of_iter = 50); |
This is identical to above get_data_by_mshift(), but:
|
T: Type of the named column Ts: The list of types for all columns. A type should be specified only once col_name: Name of the data column kernel_bandwidth: The width or spread of the kernel function used max_distance: Maximum distance between two data points in the same cluster mean_shift_kernel: Kernel type used dfunc: A function to calculate the distance between two data points in the named column num_of_iter: Maximum number of iterations for AP clustering algorithm to converge |
template<arithmetic T, typename ... Ts> std::vector<ConstPtrView> get_view_by_mshift(const char *col_name, double kernel_bandwidth, double max_distance, mean_shift_kernel kernel = mean_shift_kernel::gaussian, std::function<double(const T &x, const T &y)> &&dfunc = [](const T &x, const T &y) -> double { return ((x - y) * (x - y)); }, size_type num_of_iter = 50) const; |
Same as above view, but it returns a std::vector of const views. You can not change data in const views. But if the data is changed in the original DataFrame or through another view, it is reflected in the const view. |
T: Type of the named column Ts: The list of types for all columns. A type should be specified only once col_name: Name of the data column kernel_bandwidth: The width or spread of the kernel function used max_distance: Maximum distance between two data points in the same cluster mean_shift_kernel: Kernel type used dfunc: A function to calculate the distance between two data points in the named column num_of_iter: Maximum number of iterations for AP clustering algorithm to converge |
void test_get_data_by_mshift() { std::cout << "\nTesting get_data_by_mshift( ) ..." << std::endl; typedef StdDataFrame64<std::string> StrDataFrame; StrDataFrame df; try { df.read("SHORT_IBM.dat", io_format::binary); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; } StrDataFrame df2 = df; auto lbd = [](const std::string &, const double &) -> bool { return (true); }; auto view = df2.get_view_by_sel<double, decltype(lbd), double, long>("IBM_Open", lbd); // I am using both views and dataframes to make sure both work // auto views = view.get_view_by_mshift<double, double, long>("IBM_Close", 1, 4, mean_shift_kernel::gaussian, [](const double &x, const double &y) -> double { return (std::fabs(x - y)); }); auto dfs = df.get_data_by_mshift<double, double, long>("IBM_Close", 1, 4, mean_shift_kernel::gaussian, [](const double &x, const double &y) -> double { return (std::fabs(x - y)); }); assert(views.size() == 19); assert(dfs.size() == 19); assert(views[0].get_index().size() == 106); assert(dfs[0].get_index().size() == 106); assert(views[4].get_index().size() == 19); assert(views[6].get_index().size() == 274); assert(views[10].get_index().size() == 180); assert(views[14].get_index().size() == 29); assert(views[18].get_index().size() == 2); assert(dfs[18].get_index().size() == 2); assert((std::fabs(views[0].get_column<double>("IBM_Close")[7] - 185.92) < 0.001)); assert((std::fabs(dfs[5].get_column<double>("IBM_Open")[15] - 163.7) < 0.001)); assert((std::fabs(views[16].get_column<double>("IBM_High")[3] - 106.04) < 0.001)); assert(dfs[18].get_column<long>("IBM_Volume")[0] == 10546500); assert(views[18].get_index()[1] == "2020-03-23"); }