Back to Documentations

Signature Description
enum class  mean_shift_kernel : unsigned char  {

    // if d <= 1 then 1 else 0
    //
    uniform = 1,

    // if d <= 1 then 1 - abs(d) else 0
    //
    triangular = 2,

    // if d <= 1 then 1 - d * d else 0
    //
    parabolic = 3,

    //  x = 1 - d * d
    //    if d <= 1 then x * x else 0    
    //
    biweight = 4,

    // x = 1 - d * d
    // if d <= 1 then x * x * x else 0
    //
    triweight = 5,

    // x = 1 - d * d * d
    // if d <= 1 then x * x * x else 0
    //
    tricube = 6,

    // e-0.5 * d * d
    //
    gaussian = 7,

    // if d <= 1 then cos(M_PI_2 * d) else 0
    //
    cosin = 8,

    // 1 / (2 + ed + e-d)
    //
    logistic = 9,

    // 1.0 / (ed + e-d)
    //
    sigmoid = 10,

    // x = M_SQRT1_2 * abs(d)
    // e-x * sin(x + M_PI_4)
    //
    silverman = 11,
};
Kernal is a fancy mathematical name for a weight assigned to a distance between datapoints

Signature Description Parameters
#include <DataFrame/DataFrameMLVisitors.h>

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
struct MeanShiftVisitor;
This is a single action visitor, meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface.

Mean-Shift is falling under the category of a clustering algorithm in contrast of Unsupervised learning that assigns the data points to the clusters iteratively by shifting points towards the mode (mode is the highest density of data points in the region, in the context of the Mean-Shift). As such, it is also known as the Mode-seeking algorithm.
Runtime complexity is O(I * n2) where I is number of iterations.

NOTE: Type T must have arithmetic operators and default constructor well defined

The constructor takes 5 parameters
  1. Kernel bandwidth refers to the width or spread of the kernel function used in mean shift clustering
  2. The distance used to determine if a datapoint is in the same area as other datapoints
  3. Kernel method specified above. Kernal is a fancy mathematical name for a weight assigned to a distance between datapoints
  4. A function to calculate distance between two datapoints of type T (with default)
  5. Maximum number of iterations before it converges
  MeanShiftVisitor(double kernel_bandwidth,
                   double max_dist,
                   mean_shift_kernel kernel = mean_shift_kernel::gaussian,
                   distance_func &&f =
                       [](const value_type &x, const value_type &y) -> double  {
                           return ((x - y) * (x - y));
                       },
                   size_type max_iteration = 50)
        
get_results() Returns a vector of vectors containing datapoint values of each cluster.

get_clusters_idxs() Returns a vector of vectors containing indices to datapoints of each cluster.
T: Column data type
I: Index type
A: Memory alignment boundary for vectors. Default is system default alignment
static void test_MeanShiftVisitor()  {

    std::cout << "\nTesting MeanShiftVisitor{ } ..." << std::endl;

    typedef StdDataFrame64<std::string> StrDataFrame;

    StrDataFrame    df;

    try  {
        df.read("SHORT_IBM.csv", io_format::csv2);
    }
    catch (const DataFrameError &ex)  {
        std::cout << ex.what() << std::endl;
    }

    MeanShiftVisitor<double, std::string, 64>   mshift(1.0,
                                                       4,
                                                       mean_shift_kernel::gaussian,
                                                       // mean_shift_kernel::triweight,
                                                       [](const double &x, const double &y)  { return (std::fabs(x - y)); });

    df.single_act_visit<double>("IBM_Close", mshift);

    assert(mshift.get_result().size() == 19);
    assert(mshift.get_result()[0].size() == 106);
    assert(mshift.get_result()[4].size() == 19);
    assert(mshift.get_result()[6].size() == 274);
    assert(mshift.get_result()[10].size() == 180);
    assert(mshift.get_result()[14].size() == 29);
    assert(mshift.get_result()[18].size() == 2);
    assert(std::fabs(mshift.get_result()[0][6] - 184.16) < 0.001);
    assert(std::fabs(mshift.get_result()[4][18] - 194.0) < 0.001);
    assert(std::fabs(mshift.get_result()[6][273] - 154.31) < 0.001);
    assert(std::fabs(mshift.get_result()[10][135] - 137.61) < 0.001);
    assert(std::fabs(mshift.get_result()[18][1] - 94.77) < 0.001);
}

C++ DataFrame