| Signature | Description |
|---|---|
enum class hampel_type : unsigned char { mean = 1, // Use mean absolute deviation median = 2, // Use median absolute deviation }; |
Different Hampel filter types that are supported. They are to be used with HampelFilterVisitor |
| Signature | Description | Parameters |
|---|---|---|
#include <DataFrame/DataFrameMLVisitors.h> template<typename T, typename I = unsigned long std::size_t A = 0> struct HampelFilterVisitor; // ------------------------------------- template<typename T, typename I = unsigned long> using hamf_v = HampelFilterVisitor<T, I>; |
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface. This functor class applies Hampel filter to weed out outliers. The filter is done by using either mean absolute deviation or median absolute deviation (MAD) The Hampel filter is used to detect anomalies in data with a timeseries structure. It consists of a sliding window of a parameterizable size. For each window, each observation will be compared with the Median or Mean Absolute Deviation (MAD). The observation will be considered an outlier in the case in which it exceeds the MAD by n standard deviation * 1.4826 times.
explicit
HampelFilterVisitor(std::size_t window_size,
hampel_type ht = hampel_type::median,
T num_of_std = 3);
get_result() returns a std::vector of indices to the original data that were deemed outliers. |
T: Column data type. I: Index type. A: Memory alignment boundary for vectors. Default is system default alignment |
static void test_HampelFilterVisitor() { std::cout << "\nTesting HampelFilterVisitor{ } ..." << std::endl; std::vector<unsigned long> idx = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, }; std::vector<double> d1 = { 2.5, 2.45, -1.65, -0.1, -1.1, 1.87, 0.98, 0.34, 1.56, -12.34, 2.3, -0.34, -1.9, 0.387, 0.123, 1.06, -0.65, 2.03, 0.4, -1.0, 0.59, 0.125, 1.9, -0.68, 2.0045, 50.8, -1.0, 0.78, 0.48, 1.99, -0.97, 1.03, 8.678, -1.4, 1.59, }; MyDataFrame df; df.load_data(std::move(idx), std::make_pair("dbl_col", d1)); hamf_v<double> hf_v1(7, hampel_type::mean, 2); const auto &result1 = df.single_act_visit<double>("dbl_col", hf_v1).get_result(); const std::vector<std::size_t> compare1 = { 9, 25 }; assert(result1 == compare1); hamf_v<double> hf_v2(6, hampel_type::median, 2); const auto &result2 = df.single_act_visit<double>("dbl_col", hf_v2).get_result(); const std::vector<std::size_t> compare2 = { 9, 25, 32 }; assert(result2 == compare2); }