| Signature | Description | Parameters |
|---|---|---|
#include <DataFrame/DataFrameMLVisitors.h> template<typename T, typename I = unsigned long, std::size_t A = 0> struct DBSCANVisitor; |
This is a single action visitor, meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu in 1996. It is a density-based clustering non-parametric algorithm: given a set of points in some space, it groups together points that are closely packed (points with many nearby neighbors), and marks as outliers points that lie alone in low-density regions (those whose nearest neighbors are too far away). DBSCAN is one of the most commonly used and cited clustering algorithms. The constructor takes 3 parameters
DBSCANVisitor(long min_mems,
double max_dist,
distance_func f = [](const T &x, const T &y) -> double {
return ((x - y) * (x - y));
})
get_results() Returns a vector of vectors containing datapoint values of each cluster.get_clusters_idxs() Returns a vector of vectors containing indices to datapoints of each cluster. get_noisey_idxs() Returns a vector containing indices to datapoints that could not be placed in any cluster. Ideally you want this to be empty. |
T: Column data type I: Index type A: Memory alignment boundary for vectors. Default is system default alignment |
static void test_DBSCANVisitor() { std::cout << "\nTesting DBSCANVisitor{ } ..." << std::endl; typedef StdDataFrame64<std::string> StrDataFrame; StrDataFrame df; try { df.read("SHORT_IBM.csv", io_format::csv2); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; } auto lbd = [](const std::string &, const double &) -> bool { return (true); }; auto view = df.get_view_by_sel<double, decltype(lbd), double, long>("IBM_Open", lbd); DBSCANVisitor<double, std::string, 64> dbscan(10, 4, [](const double &x, const double &y) { return (std::fabs(x - y)); }); view.single_act_visit<double>("IBM_Close", dbscan); assert(dbscan.get_noisey_idxs().size() == 2); assert(dbscan.get_noisey_idxs()[0] == 1564); assert(dbscan.get_noisey_idxs()[1] == 1565); assert(dbscan.get_result().size() == 19); assert(dbscan.get_result()[0].size() == 11); assert(dbscan.get_result()[4].size() == 31); assert(dbscan.get_result()[10].size() == 294); assert(dbscan.get_result()[14].size() == 82); assert(dbscan.get_result()[18].size() == 10); assert(dbscan.get_result()[0][6] == 185.679993); assert(dbscan.get_result()[4][18] == 167.330002); assert(dbscan.get_result()[10][135] == 145.160004); assert(dbscan.get_result()[18][3] == 103.550003); }