Back to Documentations

Signature Description
enum class decompose_type : unsigned char  {
    // Yt = Trend + Seasonal + Residual
    //
    additive = 1,

    // Yt = Trend * Seasonal * Residual
    //
    multiplicative = 2,
};
The additive decomposition is the most appropriate if the magnitude of the seasonal fluctuations, or the variation around the trend-cycle, does not vary with the level of the time series. When the variation in the seasonal pattern, or the variation around the trend-cycle, appears to be proportional to the level of the time series, then a multiplicative decomposition is more appropriate. Multiplicative decompositions are common with economic time series.
An alternative to using a multiplicative decomposition is to first transform the data until the variation in the series appears to be stable over time, then use an additive decomposition. When a log transformation has been used, this is equivalent to using a multiplicative decomposition because:
Yt = T * S * R is equivalent to log(Yt) = log(T) + logt(S) + log(R)

Signature Description Parameters
#include <DataFrame/DataFrameStatsVisitors.h>

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
struct DecomposeVisitor;

// -------------------------------------

template<typename T, typename I = unsigned long,
         std::size_t A = 0>
using decom_v = DecomposeVisitor<T, I, A>;
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface.

This visitor creates a seasonal-trend (with LOWESS aka "STL") decomposition of observed time series data. This implementation is modeled after the statsmodels.tsa.seasonal_decompose method but substitutes a Lowess regression for a convolution in its trend estimation.

This works with both scalar and multidimensional (MD). In case of MD data, it's strictly per-dimension (per-channel) decomposition. Each dimension is treated as an independent scalar time series that happens to share the same time axis. The dimensions never "talk to each other" during any step of the decomposition.
  • Independence: the trend, seasonal pattern, and noise of each channel are unrelated to those of other channels. Knowing channel 1's residual tells you nothing about channel 2's.
  • Shared time axis only: the only thing dimensions share is the timestamps — not the period, not the trend shape, not the noise structure.
  • Identical period across channels: your single s_period_ is applied to all dimensions. If channel 0 has a weekly cycle and channel 1 has a monthly cycle, the decomposition will be wrong for at least one of them.
  • Additive or multiplicative structure holds per channel independently: you can't mix models across channels.
get_result() returns the Trend vector. This a vector of vectors in case of MD data.
get_seasonal() returns the Seasonal vector. This a vector of vectors in case of MD data.
get_residual(). They return the Residual vector. This a vector of vectors in case of MD data.
DecomposeVisitor(std::size_t s_period,
                 T frac,
                 T delta,
                 decompose_type type = decompose_type::additive);
        
s_period: Length of a season in untis of one observation. There must be at least two seasons in the data
frac: Between 0 and 1. The fraction of the data used when estimating each y-value.
delta: Distance within which to use linear-interpolation instead of weighted regression.
type: Model type. See above.
T: Column data type.
I: Index type.
A: Memory alignment boundary for vectors. Default is system default alignment
static void test_DecomposeVisitor()  {

    using MyDataFrame = StdDataFrame<unsigned long>;

    std::cout << "\nTesting DecomposeVisitor{  } ..." << std::endl;

    std::vector<double>         y_vec =
        { 131.157, 131.367, 132.215, 132.725, 132.648, 130.585, 130.701, 129.631, 129.168, 129.554, 129.467, 129.670, 128.397, 129.014, 129.496, 131.067, 130.219, 128.947, 129.602, 128.118, 127.356,
          127.231, 127.154, 128.417, 129.091, 129.082, 128.937, 130.441, 129.371, 129.294, 129.381, 129.564, 129.708, 130.701, 130.663, 130.113, 130.046, 130.393, 128.026, 129.204, 130.530, 129.499,
          129.266, 129.357, 130.431, 131.810, 131.761, 131.675, 130.923, 131.694, 133.005, 133.323, 134.152, 138.702, 137.719, 135.492, 133.622, 134.518, 132.725, 131.839, 138.548, 140.996, 143.734,
          150.693, 151.108, 149.423, 150.416, 149.491, 151.273, 150.299, 146.783, 147.173, 146.939, 147.290, 145.946, 142.624, 138.027, 136.118, 129.650, 126.767, 130.809, 125.550, 130.732, 126.183,
          124.410, 114.748, 121.527, 114.904, 100.138, 105.144,
        };
    MyDataFrame                 df;

    df.load_data(MyDataFrame::gen_sequence_index(0, y_vec.size(), 1),
                 std::make_pair("IBM_closes", y_vec));

    DecomposeVisitor<double>    d_v (7, 0.6, 0.01);

    df.single_act_visit<double>("IBM_closes", d_v);

    auto    actual_trends = std::vector<double>
        { 130.613, 130.55, 130.489, 130.43, 130.372, 130.317, 130.263, 130.211, 130.161, 130.111, 130.064, 130.017, 129.972, 129.928, 129.885, 129.842, 129.801, 129.76, 129.72, 129.681, 129.642, 129.603, 129.564,
          129.526, 129.49, 129.458, 129.435, 129.459, 129.496, 129.546, 129.61, 129.688, 129.78, 129.885, 130.002, 130.129, 130.267, 130.414, 130.568, 130.73, 130.898, 131.071, 131.248, 131.429, 131.613,
          131.801, 131.994, 132.193, 132.4, 132.618, 132.847, 133.089, 133.343, 133.594, 133.814, 133.958, 133.982, 133.867, 133.633, 133.329, 133.013, 132.715, 132.449, 132.214, 131.88, 131.594, 131.336,
          131.094, 130.862, 130.636, 130.412, 130.191, 129.97, 129.749, 129.527, 129.305, 129.082, 128.858, 128.633, 128.406, 128.178, 127.949, 127.717, 127.484, 127.249, 127.012, 126.773, 126.531, 126.288, 126.043
        };

    for (size_t idx = 0; idx < actual_trends.size(); ++idx)
        assert(fabs(d_v.get_trend()[idx] - actual_trends[idx]) < 0.001);

    auto    actual_seasonals = std::vector<double>
        { 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991,
          -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135,
          -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313,
          -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488,
          -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695, 0.951993, 0.499135, -0.362488, -0.0226401, -0.138991, -0.774313, -0.152695
        };

    for (size_t idx = 0; idx < actual_seasonals.size(); ++idx)
        assert(fabs(d_v.get_seasonal()[idx] - actual_seasonals[idx]) < 0.00001);

    auto    actual_residuals = std::vector<double>
        { 0.0450645, 1.17948, 1.74866, 2.43421, 3.04989, 0.420796, -0.514129, -1.07918, -0.630027, -0.534809, -0.457752, 0.427013, -1.42233, -1.86582, -0.887751, 1.58716, 0.440736, -0.674248, 0.656095,
          -1.41002, -3.2376, -2.87093, -2.04782, -1.08684, -0.260342, 0.397977, -0.345524, 0.0298508, -0.623855, 0.110697, -0.206247, 0.014974, 0.702412, 0.968872, -0.290624, -0.515527, 0.141332, 0.0017837,
          -2.40348, -0.751754, -0.215182, -2.52399, -2.48155, -1.7098, -1.15976, 0.147774, 0.541487, -0.36517, -2.4292, -1.42281, 0.520609, 0.256637, 0.94848, 5.88205, 4.05781, 0.58219, -0.858726, 1.01365,
          -0.88524, -1.35148, 6.30971, 8.4335, 10.3326, 17.98, 19.5905, 17.8516, 19.2189, 19.171, 20.5634, 18.7112, 15.8714, 17.3448, 16.992, 17.6804, 17.1932, 13.4718, 7.99313, 6.76106, 1.3799,
          -1.61643, 2.7699, -1.62422, 3.16745, -2.25312, -3.33817, -11.9014, -5.22295, -11.4883, -25.3757, -20.7463
        };

    for (size_t idx = 0; idx < actual_residuals.size(); ++idx)
        assert(fabs(d_v.get_residual()[idx] - actual_residuals[idx]) < 0.0001);

    // Now multidimensional data
    //
    constexpr std::size_t   dim { 3 };
    constexpr std::size_t   n { 24 };   // 6 full periods
    constexpr std::size_t   period { 4 };

    using ary_col_t = std::array<double, dim>;
    using vec_col_t = std::vector<double>;

    const std::array<double, period>    seasonal_pattern_0 { 1.0, -1.0,  0.5, -0.5 };
    const std::array<double, period>    seasonal_pattern_1 { 2.0, -2.0,  1.0, -1.0 };
    const std::array<double, period>    seasonal_pattern_2 { 0.5, -0.5,  0.3, -0.3 };
    const std::array<double, n>         noise_0 { 0.10, -0.05,  0.08, -0.03,  0.06, -0.09, 0.04,  0.07, -0.06,  0.02, -0.04,  0.05, -0.07,  0.03,  0.09, -0.02,  0.05, -0.08, 0.01,  0.06, -0.03,  0.04, -0.01,  0.07 };
    const std::array<double, n>         noise_1 { -0.05,  0.08, -0.03,  0.06, -0.09,  0.04, 0.07, -0.06,  0.02, -0.04,  0.05, -0.07, 0.03,  0.09, -0.02,  0.05, -0.08,  0.01, 0.06, -0.03,  0.04, -0.01,  0.07, -0.02 };
    const std::array<double, n>         noise_2 { 0.03, -0.02,  0.05, -0.01,  0.04, -0.03, -0.01,  0.02, -0.04,  0.06, -0.02,  0.03, 0.01, -0.05,  0.02,  0.04, -0.01,  0.03, -0.02,  0.01,  0.05, -0.03,  0.02, -0.04 };

    std::vector<vec_col_t>  vec_y(n, vec_col_t(dim));
    std::vector<ary_col_t>  ary_y(n);

    for (std::size_t t { 0 }; t < n; ++t)  {
        vec_y[t][0] = 1.0 * t + seasonal_pattern_0[t % period] + noise_0[t];
        vec_y[t][1] = 0.5 * t + seasonal_pattern_1[t % period] + noise_1[t];
        vec_y[t][2] = 2.0 * t + seasonal_pattern_2[t % period] + noise_2[t];
        ary_y[t][0] = vec_y[t][0];
        ary_y[t][1] = vec_y[t][1];
        ary_y[t][2] = vec_y[t][2];
    }
    df.load_column<ary_col_t>("ARY MD Y", std::move(ary_y), nan_policy::dont_pad_with_nans);
    df.load_column<vec_col_t>("VEC MD Y", std::move(vec_y), nan_policy::dont_pad_with_nans);

    DecomposeVisitor<vec_col_t> vec_md_dcom {
        period,
        0.75,   // frac
        0.01,   // delta
        decompose_type::additive
    };
    DecomposeVisitor<ary_col_t> ary_md_dcom {
        period,
        0.75,   // frac
        0.01,   // delta
        decompose_type::additive
    };

    df.single_act_visit<ary_col_t>("ARY MD Y", ary_md_dcom);
    df.single_act_visit<vec_col_t>("VEC MD Y", vec_md_dcom);

    const auto  &trend { ary_md_dcom.get_trend() };
    const auto  &seasonal { ary_md_dcom.get_seasonal() };
    const auto  &residual { ary_md_dcom.get_residual() };

    assert(trend.size() == n);
    assert(seasonal.size() == n);
    assert(residual.size() == n);

    for (std::size_t i { 0 }; i < n; ++i)  {
        assert(trend[i].size() == dim);
        assert(seasonal[i].size() == dim);
        assert(residual[i].size() == dim);
    }

    // Trend checks
    // LOWESS with frac=0.75 on a clean linear signal should track it closely
    // in the middle of the series (endpoints have more bias). Tolerance ~1.0.
    //
    for (std::size_t i { 4 }; i < n - 4; ++i)  {
        assert(std::fabs(trend[i][0] - 1.0 * i) < 1.0);
        assert(std::fabs(trend[i][1] - 0.5 * i) < 1.0);
        assert(std::fabs(trend[i][2] - 2.0 * i) < 1.0);
    }

    // Seasonal periodicity checks
    // After tiling, seasonal[i] == seasonal[i % period] for i >= period
    //
    for (std::size_t i { period }; i < n; ++i)  {
        assert(std::fabs(seasonal[i][0] - seasonal[i % period][0]) < 1e-10);
        assert(std::fabs(seasonal[i][1] - seasonal[i % period][1]) < 1e-10);
        assert(std::fabs(seasonal[i][2] - seasonal[i % period][2]) < 1e-10);
    }

    // Seasonal centering check
    // Additive: sum of one period's seasonal values should be ~0 per channel
    //
    double  sum_0 { 0 }, sum_1 { 0 }, sum_2 { 0 };

    for (std::size_t t { 0 }; t < period; ++t)  {
        sum_0 += seasonal[t][0];
        sum_1 += seasonal[t][1];
        sum_2 += seasonal[t][2];
    }
    assert(std::fabs(sum_0) < 0.5);
    assert(std::fabs(sum_1) < 0.5);
    assert(std::fabs(sum_2) < 0.5);

    // Residual magnitude check
    // With clean synthetic data residuals should be small
    //
    for (std::size_t t { 2 }; t < n - 2; ++t)  {
        assert(std::fabs(residual[t][0]) < 1.5);
        assert(std::fabs(residual[t][1]) < 1.5);
        assert(std::fabs(residual[t][2]) < 1.5);
    }
}
// -----------------------------------------------------------------------------

static void test_IBM_data()  {

    std::cout << "\nTesting IBM_data(  ) ..." << std::endl;

    typedef StdDataFrame<std::string> StrDataFrame;

    StrDataFrame    df;

    df.read("IBM.csv", io_format::csv2);

    DecomposeVisitor<double, std::string> d_v(178, 2.0 / 3.0, 0, decompose_type::multiplicative);
    // decompose_type::additive);

    df.single_act_visit<double>("IBM_Adj_Close", d_v);

    const auto  &ibm_closes = df.get_column<double>("IBM_Adj_Close");

    for (std::size_t i = 0; i < ibm_closes.size(); ++i)
        std::cout << ibm_closes[i] << ","
                  << d_v.get_trend()[i] << ","
                  << d_v.get_seasonal()[i] << ","
                  << d_v.get_residual()[i] << std::endl;
}

C++ DataFrame