| Signature | Description |
|---|---|
enum class decompose_type : unsigned char { // Yt = Trend + Seasonal + Residual // additive = 1, // Yt = Trend * Seasonal * Residual // multiplicative = 2, }; |
The additive decomposition is the most appropriate if the magnitude of the seasonal fluctuations, or the variation around the trend-cycle, does not vary with the level of the time series. When the variation in the seasonal pattern, or the variation around the trend-cycle, appears to be proportional to the level of the time series, then a multiplicative decomposition is more appropriate. Multiplicative decompositions are common with economic time series. An alternative to using a multiplicative decomposition is to first transform the data until the variation in the series appears to be stable over time, then use an additive decomposition. When a log transformation has been used, this is equivalent to using a multiplicative decomposition because: Yt = T * S * R is equivalent to log(Yt) = log(T) + logt(S) + log(R) |
| Signature | Description | Parameters |
|---|---|---|
#include <DataFrame/DataFrameStatsVisitors.h> template<typename T, typename I = unsigned long, std::size_t A = 0> struct DecomposeVisitor; // ------------------------------------- template<typename T, typename I = unsigned long, std::size_t A = 0> using decom_v = DecomposeVisitor<T, I, A>; |
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface. This visitor creates a seasonal-trend (with LOWESS aka "STL") decomposition of observed time series data. This implementation is modeled after the statsmodels.tsa.seasonal_decompose method but substitutes a Lowess regression for a convolution in its trend estimation. This works with both scalar and multidimensional (MD). In case of MD data, it's strictly per-dimension (per-channel) decomposition. Each dimension is treated as an independent scalar time series that happens to share the same time axis. The dimensions never "talk to each other" during any step of the decomposition.
get_seasonal() returns the Seasonal vector. This a vector of vectors in case of MD data. get_residual(). They return the Residual vector. This a vector of vectors in case of MD data.
DecomposeVisitor(std::size_t s_period,
T frac,
T delta,
decompose_type type = decompose_type::additive);
s_period: Length of a season in untis of one observation. There must be at least two seasons in the datafrac: Between 0 and 1. The fraction of the data used when estimating each y-value. delta: Distance within which to use linear-interpolation instead of weighted regression. type: Model type. See above. |
T: Column data type. I: Index type. A: Memory alignment boundary for vectors. Default is system default alignment |
static void test_DecomposeVisitor() { using MyDataFrame = StdDataFrame<unsigned long>; std::cout << "\nTesting DecomposeVisitor{ } ..." << std::endl; std::vector<double> y_vec = { 131.157, 131.367, 132.215, 132.725, 132.648, 130.585, 130.701, 129.631, 129.168, 129.554, 129.467, 129.670, 128.397, 129.014, 129.496, 131.067, 130.219, 128.947, 129.602, 128.118, 127.356, 127.231, 127.154, 128.417, 129.091, 129.082, 128.937, 130.441, 129.371, 129.294, 129.381, 129.564, 129.708, 130.701, 130.663, 130.113, 130.046, 130.393, 128.026, 129.204, 130.530, 129.499, 129.266, 129.357, 130.431, 131.810, 131.761, 131.675, 130.923, 131.694, 133.005, 133.323, 134.152, 138.702, 137.719, 135.492, 133.622, 134.518, 132.725, 131.839, 138.548, 140.996, 143.734, 150.693, 151.108, 149.423, 150.416, 149.491, 151.273, 150.299, 146.783, 147.173, 146.939, 147.290, 145.946, 142.624, 138.027, 136.118, 129.650, 126.767, 130.809, 125.550, 130.732, 126.183, 124.410, 114.748, 121.527, 114.904, 100.138, 105.144, }; MyDataFrame df; df.load_data(MyDataFrame::gen_sequence_index(0, y_vec.size(), 1), std::make_pair("IBM_closes", y_vec)); DecomposeVisitor<double> d_v (7, 0.6, 0.01); df.single_act_visit<double>("IBM_closes", d_v); auto actual_trends = std::vector<double> { 130.613, 130.55, 130.489, 130.43, 130.372, 130.317, 130.263, 130.211, 130.161, 130.111, 130.064, 130.017, 129.972, 129.928, 129.885, 129.842, 129.801, 129.76, 129.72, 129.681, 129.642, 129.603, 129.564, 129.526, 129.49, 129.458, 129.435, 129.459, 129.496, 129.546, 129.61, 129.688, 129.78, 129.885, 130.002, 130.129, 130.267, 130.414, 130.568, 130.73, 130.898, 131.071, 131.248, 131.429, 131.613, 131.801, 131.994, 132.193, 132.4, 132.618, 132.847, 133.089, 133.343, 133.594, 133.814, 133.958, 133.982, 133.867, 133.633, 133.329, 133.013, 132.715, 132.449, 132.214, 131.88, 131.594, 131.336, 131.094, 130.862, 130.636, 130.412, 130.191, 129.97, 129.749, 129.527, 129.305, 129.082, 128.858, 128.633, 128.406, 128.178, 127.949, 127.717, 127.484, 127.249, 127.012, 126.773, 126.531, 126.288, 126.043 }; for (size_t idx = 0; idx < actual_trends.size(); ++idx) assert(fabs(d_v.get_trend()[idx] - actual_trends[idx]) < 0.001); auto actual_seasonals = std::vector<double> { 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695 }; for (size_t idx = 0; idx < actual_seasonals.size(); ++idx) assert(fabs(d_v.get_seasonal()[idx] - actual_seasonals[idx]) < 0.00001); auto actual_residuals = std::vector<double> { 0.0450645, 1.17948, 1.74866, 2.43421, 3.04989, 0.420796, -0.514129, -1.07918, -0.630027, -0.534809, -0.457752, 0.427013, -1.42233, -1.86582, -0.887751, 1.58716, 0.440736, -0.674248, 0.656095, -1.41002, -3.2376, -2.87093, -2.04782, -1.08684, -0.260342, 0.397977, -0.345524, 0.0298508, -0.623855, 0.110697, -0.206247, 0.014974, 0.702412, 0.968872, -0.290624, -0.515527, 0.141332, 0.0017837, -2.40348, -0.751754, -0.215182, -2.52399, -2.48155, -1.7098, -1.15976, 0.147774, 0.541487, -0.36517, -2.4292, -1.42281, 0.520609, 0.256637, 0.94848, 5.88205, 4.05781, 0.58219, -0.858726, 1.01365, -0.88524, -1.35148, 6.30971, 8.4335, 10.3326, 17.98, 19.5905, 17.8516, 19.2189, 19.171, 20.5634, 18.7112, 15.8714, 17.3448, 16.992, 17.6804, 17.1932, 13.4718, 7.99313, 6.76106, 1.3799, -1.61643, 2.7699, -1.62422, 3.16745, -2.25312, -3.33817, -11.9014, -5.22295, -11.4883, -25.3757, -20.7463 }; for (size_t idx = 0; idx < actual_residuals.size(); ++idx) assert(fabs(d_v.get_residual()[idx] - actual_residuals[idx]) < 0.0001); // Now multidimensional data // constexpr std::size_t dim { 3 }; constexpr std::size_t n { 24 }; // 6 full periods constexpr std::size_t period { 4 }; using ary_col_t = std::array<double, dim>; using vec_col_t = std::vector<double>; const std::array<double, period> seasonal_pattern_0 { 1.0, -1.0, 0.5, -0.5 }; const std::array<double, period> seasonal_pattern_1 { 2.0, -2.0, 1.0, -1.0 }; const std::array<double, period> seasonal_pattern_2 { 0.5, -0.5, 0.3, -0.3 }; const std::array<double, n> noise_0 { 0.10, -0.05, 0.08, -0.03, 0.06, -0.09, 0.04, 0.07, -0.06, 0.02, -0.04, 0.05, -0.07, 0.03, 0.09, -0.02, 0.05, -0.08, 0.01, 0.06, -0.03, 0.04, -0.01, 0.07 }; const std::array<double, n> noise_1 { -0.05, 0.08, -0.03, 0.06, -0.09, 0.04, 0.07, -0.06, 0.02, -0.04, 0.05, -0.07, 0.03, 0.09, -0.02, 0.05, -0.08, 0.01, 0.06, -0.03, 0.04, -0.01, 0.07, -0.02 }; const std::array<double, n> noise_2 { 0.03, -0.02, 0.05, -0.01, 0.04, -0.03, -0.01, 0.02, -0.04, 0.06, -0.02, 0.03, 0.01, -0.05, 0.02, 0.04, -0.01, 0.03, -0.02, 0.01, 0.05, -0.03, 0.02, -0.04 }; std::vector<vec_col_t> vec_y(n, vec_col_t(dim)); std::vector<ary_col_t> ary_y(n); for (std::size_t t { 0 }; t < n; ++t) { vec_y[t][0] = 1.0 * t + seasonal_pattern_0[t % period] + noise_0[t]; vec_y[t][1] = 0.5 * t + seasonal_pattern_1[t % period] + noise_1[t]; vec_y[t][2] = 2.0 * t + seasonal_pattern_2[t % period] + noise_2[t]; ary_y[t][0] = vec_y[t][0]; ary_y[t][1] = vec_y[t][1]; ary_y[t][2] = vec_y[t][2]; } df.load_column<ary_col_t>("ARY MD Y", std::move(ary_y), nan_policy::dont_pad_with_nans); df.load_column<vec_col_t>("VEC MD Y", std::move(vec_y), nan_policy::dont_pad_with_nans); DecomposeVisitor<vec_col_t> vec_md_dcom { period, 0.75, // frac 0.01, // delta decompose_type::additive }; DecomposeVisitor<ary_col_t> ary_md_dcom { period, 0.75, // frac 0.01, // delta decompose_type::additive }; df.single_act_visit<ary_col_t>("ARY MD Y", ary_md_dcom); df.single_act_visit<vec_col_t>("VEC MD Y", vec_md_dcom); const auto &trend { ary_md_dcom.get_trend() }; const auto &seasonal { ary_md_dcom.get_seasonal() }; const auto &residual { ary_md_dcom.get_residual() }; assert(trend.size() == n); assert(seasonal.size() == n); assert(residual.size() == n); for (std::size_t i { 0 }; i < n; ++i) { assert(trend[i].size() == dim); assert(seasonal[i].size() == dim); assert(residual[i].size() == dim); } // Trend checks // LOWESS with frac=0.75 on a clean linear signal should track it closely // in the middle of the series (endpoints have more bias). Tolerance ~1.0. // for (std::size_t i { 4 }; i < n - 4; ++i) { assert(std::fabs(trend[i][0] - 1.0 * i) < 1.0); assert(std::fabs(trend[i][1] - 0.5 * i) < 1.0); assert(std::fabs(trend[i][2] - 2.0 * i) < 1.0); } // Seasonal periodicity checks // After tiling, seasonal[i] == seasonal[i % period] for i >= period // for (std::size_t i { period }; i < n; ++i) { assert(std::fabs(seasonal[i][0] - seasonal[i % period][0]) < 1e-10); assert(std::fabs(seasonal[i][1] - seasonal[i % period][1]) < 1e-10); assert(std::fabs(seasonal[i][2] - seasonal[i % period][2]) < 1e-10); } // Seasonal centering check // Additive: sum of one period's seasonal values should be ~0 per channel // double sum_0 { 0 }, sum_1 { 0 }, sum_2 { 0 }; for (std::size_t t { 0 }; t < period; ++t) { sum_0 += seasonal[t][0]; sum_1 += seasonal[t][1]; sum_2 += seasonal[t][2]; } assert(std::fabs(sum_0) < 0.5); assert(std::fabs(sum_1) < 0.5); assert(std::fabs(sum_2) < 0.5); // Residual magnitude check // With clean synthetic data residuals should be small // for (std::size_t t { 2 }; t < n - 2; ++t) { assert(std::fabs(residual[t][0]) < 1.5); assert(std::fabs(residual[t][1]) < 1.5); assert(std::fabs(residual[t][2]) < 1.5); } }
// ----------------------------------------------------------------------------- static void test_IBM_data() { std::cout << "\nTesting IBM_data( ) ..." << std::endl; typedef StdDataFrame<std::string> StrDataFrame; StrDataFrame df; df.read("IBM.csv", io_format::csv2); DecomposeVisitor<double, std::string> d_v(178, 2.0 / 3.0, 0, decompose_type::multiplicative); // decompose_type::additive); df.single_act_visit<double>("IBM_Adj_Close", d_v); const auto &ibm_closes = df.get_column<double>("IBM_Adj_Close"); for (std::size_t i = 0; i < ibm_closes.size(); ++i) std::cout << ibm_closes[i] << "," << d_v.get_trend()[i] << "," << d_v.get_seasonal()[i] << "," << d_v.get_residual()[i] << std::endl; }