| Signature | Description |
|---|---|
enum class mean_shift_kernel : unsigned char { // if d <= 1 then 1 else 0 // uniform = 1, // if d <= 1 then 1 - abs(d) else 0 // triangular = 2, // if d <= 1 then 1 - d * d else 0 // parabolic = 3, // x = 1 - d * d // if d <= 1 then x * x else 0 // biweight = 4, // x = 1 - d * d // if d <= 1 then x * x * x else 0 // triweight = 5, // x = 1 - d * d * d // if d <= 1 then x * x * x else 0 // tricube = 6, // e-0.5 * d * d // gaussian = 7, // if d <= 1 then cos(M_PI_2 * d) else 0 // cosin = 8, // 1 / (2 + ed + e-d) // logistic = 9, // 1.0 / (ed + e-d) // sigmoid = 10, // x = M_SQRT1_2 * abs(d) // e-x * sin(x + M_PI_4) // silverman = 11, }; |
Kernal is a fancy mathematical name for a weight assigned to a distance between datapoints |
| Signature | Description | Parameters |
|---|---|---|
#include <DataFrame/DataFrameMLVisitors.h> template<typename T, typename I = unsigned long, std::size_t A = 0> struct MeanShiftVisitor; |
This is a single action visitor, meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface. Mean-Shift is falling under the category of a clustering algorithm in contrast of Unsupervised learning that assigns the data points to the clusters iteratively by shifting points towards the mode (mode is the highest density of data points in the region, in the context of the Mean-Shift). As such, it is also known as the Mode-seeking algorithm. Runtime complexity is O(I * n2) where I is number of iterations. NOTE: Type T must have arithmetic operators and default constructor well defined The constructor takes 5 parameters
MeanShiftVisitor(double kernel_bandwidth,
double max_dist,
mean_shift_kernel kernel = mean_shift_kernel::gaussian,
distance_func &&f =
[](const value_type &x, const value_type &y) -> double {
return ((x - y) * (x - y));
},
size_type max_iteration = 50)
get_results() Returns a vector of vectors containing datapoint values of each cluster.get_clusters_idxs() Returns a vector of vectors containing indices to datapoints of each cluster. |
T: Column data type I: Index type A: Memory alignment boundary for vectors. Default is system default alignment |
static void test_MeanShiftVisitor() { std::cout << "\nTesting MeanShiftVisitor{ } ..." << std::endl; typedef StdDataFrame64<std::string> StrDataFrame; StrDataFrame df; try { df.read("SHORT_IBM.csv", io_format::csv2); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; } MeanShiftVisitor<double, std::string, 64> mshift(1.0, 4, mean_shift_kernel::gaussian, // mean_shift_kernel::triweight, [](const double &x, const double &y) { return (std::fabs(x - y)); }); df.single_act_visit<double>("IBM_Close", mshift); assert(mshift.get_result().size() == 19); assert(mshift.get_result()[0].size() == 106); assert(mshift.get_result()[4].size() == 19); assert(mshift.get_result()[6].size() == 274); assert(mshift.get_result()[10].size() == 180); assert(mshift.get_result()[14].size() == 29); assert(mshift.get_result()[18].size() == 2); assert(std::fabs(mshift.get_result()[0][6] - 184.16) < 0.001); assert(std::fabs(mshift.get_result()[4][18] - 194.0) < 0.001); assert(std::fabs(mshift.get_result()[6][273] - 154.31) < 0.001); assert(std::fabs(mshift.get_result()[10][135] - 137.61) < 0.001); assert(std::fabs(mshift.get_result()[18][1] - 94.77) < 0.001); }