Back to Documentations

Signature Description
template<typename T>
struct  SeasonalityParams  {

    bool        detrend { true };          // Remove trend
    bool        de_serial_corr { false };  // Remove serial correlation by differencing

    // Parameters to generate trend using LOWESS
    // The two parameters below must be adjusted for different datasets carefully sometimes by trail and error.
    // The defaults are suitable for financial market data
    //
    std::size_t num_loops { 3 };      // Number of loops
    T           frac { 0.08 };        // The fraction of the data used when estimating each y-value.
    T           delta { 0.0001 };     // Distance with which to use linear-interpolation instead of regression

    std::size_t sampling_rate { 1 };  // Assume the time series is per 1 unit of time
};
Parameter to the SeasonalPeriodVisitor constructor

Signature Description Parameters
#include <DataFrame/DataFrameMLVisitors.h>

template<arithmetic T, typename I = unsigned long>
struct SeasonalPeriodVisitor;

// -------------------------------------

template<typename T, typename I = unsigned long>
using ssp_v = SeasonalPeriodVisitor<T, I>;
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface.

This visitor calculates seasonality of the given column (time series). Seasons mean any repeating pattern in your data. It doesn’t have to correspond to natural seasons. To do that you must know your data well. If there are no seasons in the data, the following method may give you misleading clues. You also must know other things (mentioned below) about your data. These are the steps it goes through:
  1. Optionally detrend the data. You must know if your data has a trend or not. If you analyze seasonality with trend, trend appears as a strong signal in the frequency domain and skews your analysis. You can do that by a few different methods. You can fit a polynomial curve through the data (you must know the degree), or you can use a method like LOWESS which is in essence a dynamically degreed polynomial curve. In any case you subtract the trend from your data.
  2. Optionally take serial correlation out by differencing. Again, you must know this about your data. Analyzing seasonality with serial correlation will show up in frequency domain as leakage and spreads the dominant frequencies.
  3. Now you have prepared your data for final analysis. Now you need to convert your time-series to frequency-series. In other words, you need to convert your data from time domain to frequency domain. Mr. Joseph Fourier has a solution for that. You can run Fast Fourier Transform (FFT) which is an implementation of Discrete Fourier Transform (DFT). FFT gives you a vector of complex values that represent the frequency spectrum. In other words, they are amplitude and phase of different frequency components.
  4. Take the absolute values of FFT result. These are the magnitude spectrum which shows the strength of different frequencies within the data.
  5. Do some simple searching and arithmetic to find the seasonality period.

This visitor has the following methods to get results:
get_result(): Returns the length of seasons.
get_period(): Returns the length of seasons.
get_max_magnitude(): Returns the maximum frequency magnitude
get_dominant_frequency(): Returns the dominant frequency
get_dominant_index(): Returns index of thw column corresponding to the dominant frequency
    explicit
    SeasonalPeriodVisitor(const SeasonalityParams params = { });
        
params: Necessary parameters as explained above.
T: Column data type.
I: Index type.
static void test_SeasonalPeriodVisitor()  {

    std::cout << "\nTesting SeasonalPeriodVisitor{ } ..." << std::endl;

    DTDataFrame df;

    try  {
        df.read("IcecreamProduction.csv", io_format::csv2);
    }
    catch (const DataFrameError &ex)  {
        std::cout << ex.what() << std::endl;
    }

    ssp_v<double, DateTime> ssp({ .de_serial_corr = true});

    df.single_act_visit<double>("IceCreamProduction", ssp);

    assert(std::fabs(ssp.get_max_magnitude() - 4073.55) < 0.01);
    assert(ssp.get_dominant_index() == 53);
    assert(std::fabs(ssp.get_dominant_frequency() - 0.08346) < 0.00001);
    assert(std::fabs(ssp.get_period() - 11.9811) < 0.0001);
    assert(ssp.get_period() == ssp.get_result());
}

C++ DataFrame