| Signature | Description |
|---|---|
enum class stationary_test : unsigned char { // Kwiatkowski-Phillips-Schmidt–Shin (KPSS) // In econometrics, Kwiatkowski–Phillips–Schmidt–Shin (KPSS) tests are // used for testing a null hypothesis that an observable time series is stationary around a deterministic // trend (i.e. trend-stationary) against the alternative of a unit root. Contrary to most unit root tests, // the presence of a unit root is not the null hypothesis but the alternative. Additionally, in the KPSS // test, the absence of a unit root is not a proof of stationarity but, by design, of trend-stationarity. // This is an important distinction since it is possible for a time series to be non-stationary, have no // unit root yet be trend-stationary. // // In a KPSS test, a higher test statistic value (meaning a larger calculated KPSS statistic) indicates a // greater likelihood that the time series is not stationary around a deterministic trend, while a lower // value suggests stationarity; essentially, you want a low KPSS test value to conclude stationarity. // kpss = 1, // Augmented Dickey-Fuller (ADF) // In statistics, an augmented Dickey–Fuller test (ADF) tests the null hypothesis that a unit root is // present in a time series sample. The alternative hypothesis depends on which version of the test is // used, but is usually stationarity or trend-stationarity. It is an augmented version of the // Dickey–Fuller test for a larger and more complicated set of time series models. The augmented // Dickey–Fuller (ADF) statistic, used in the test, is a negative number. The more negative it is, the // stronger the rejection of the hypothesis that there is a unit root at some level of confidence. // // To interpret an ADF test statistic, compare its value to the critical value at a chosen significance // level (usually 0.05): if the test statistic is less than the critical value, you reject the null // hypothesis and conclude that the time series is stationary; if it's greater than the critical value, // you fail to reject the null hypothesis, indicating non-stationarity; a more negative ADF statistic // signifies stronger evidence against the null hypothesis (i.e., more likely stationary). // adf = 2, }; struct StationaryTestParams { // Only considered for KPSS test // double critical_values[4] { 0.347, 0.463, 0.574, 0.739 }; // Only considered for ADF test // std::size_t adf_lag { 1 }; bool adf_with_trend { false }; }; |
Methods to test if a time-series is stationary Also, a struct to contain the necessary parameters to StationaryCheckVisitor constructor |
| Signature | Description | Parameters |
|---|---|---|
#include <DataFrame/DataFrameStatsVisitors.h> template<arithmetic T, typename I = unsigned long> struct StationaryCheckVisitor; // ------------------------------------- template<typename T, typename I = unsigned long> using stac_v = StationaryCheckVisitor<T, I>; |
This is a "single action visitor", meaning it is passed the whole data vector in one call and you must use the single_act_visit() interface. This visitor uses the specified method to test if the given time-series (i.e., column) is stationary. This works with both scalar and multidimensional (i.e. vectors or arrays) datasets. In case of multidimensional data, the analysis is done per dimension (channel). This visitor has the following methods to get results: get_kpss_value(): Returns the KPSS calcuated value. In case of multidimensional data, it returns a vector of size data dimension. get_kpss_statistic(): Returns the percentage calculated by comparing KPSS value against critical_values. In case of multidimensional data, it returns a vector of size data dimension. get_adf_statistic(): Returns the ADF calcuated statistics. In case of multidimensional data, it returns a vector of size data dimension.
explicit
StationaryCheckVisitor(stationary_test method, const StationaryTestParams params = { })
method: One of the above methods.params: Necessary parameters depending what method is being used. |
T: Column data type. I: Index type. |
static void test_StationaryCheckVisitor() { std::cout << "\nTesting StationaryCheckVisitor{ } ..." << std::endl; StrDataFrame df; try { df.read("IBM.csv", io_format::csv2); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; ::exit(-1); } RandGenParams<double> p; p.mean = 0; p.std = 1; p.seed = 123; df.load_column("normal_col", gen_normal_dist<double>(df.get_index().size(), p)); p.max_value = 1000; p.min_value = -1000; df.load_column("uniform col", gen_uniform_real_dist<double>(df.get_index().size(), p)); std::vector<double> log_close; log_close.reserve(df.get_index().size()); for (const auto val : df.get_column<double>("IBM_Close")) log_close.push_back(std::log(val)); df.load_column("log close", std::move(log_close)); DecomposeVisitor<double, std::string> d_v (280, 0.6, 0.01); df.single_act_visit<double>("IBM_Close", d_v); df.load_column("residual close", std::move(d_v.get_residual())); // KPSS tests // StationaryCheckVisitor<double, std::string> sc { stationary_test::kpss }; df.single_act_visit<double>("IBM_Close", sc); assert(std::fabs(sc.get_kpss_value() - 63.5831) < 0.0001); assert(sc.get_kpss_statistic() == 0); df.single_act_visit<double>("normal_col", sc); assert(sc.get_kpss_value() < 0.078); assert(sc.get_kpss_statistic() == 0.1); df.single_act_visit<double>("uniform col", sc); assert(sc.get_kpss_value() < 0.08); assert(sc.get_kpss_statistic() == 0.1); df.single_act_visit<double>("log close", sc); assert(sc.get_kpss_value() < 62.7013); assert(sc.get_kpss_statistic() == 0); df.single_act_visit<double>("residual close", sc); assert(sc.get_kpss_value() < 46.41); assert(sc.get_kpss_statistic() == 0); // ADF tests // StationaryCheckVisitor<double, std::string> sc2 { stationary_test::adf, { .adf_lag = 10, .adf_with_trend = false } }; df.single_act_visit<double>("IBM_Close", sc2); assert(std::fabs(sc2.get_adf_statistic() - 0.989687) < 0.00001); StationaryCheckVisitor<double, std::string> sc3 { stationary_test::adf, { .adf_lag = 25, .adf_with_trend = false } }; df.single_act_visit<double>("IBM_Close", sc3); assert(std::fabs(sc3.get_adf_statistic() - 0.974531) < 0.0000001); StationaryCheckVisitor<double, std::string> sc4 { stationary_test::adf, { .adf_lag = 10, .adf_with_trend = false } }; df.single_act_visit<double>("normal_col", sc4); assert(std::fabs(sc4.get_adf_statistic() - 0.0289613) < 0.0000001); StationaryCheckVisitor<double, std::string> sc5 { stationary_test::adf, { .adf_lag = 25, .adf_with_trend = false } }; df.single_act_visit<double>("normal_col", sc5); assert(std::fabs(sc5.get_adf_statistic() - 0.0208191) < 0.0000001); // ADF tests with trend // StationaryCheckVisitor<double, std::string> sc6 { stationary_test::adf, { .adf_lag = 10, .adf_with_trend = true } }; df.single_act_visit<double>("IBM_Close", sc6); assert(std::fabs(sc6.get_adf_statistic() - 0.977705) < 0.000001); StationaryCheckVisitor<double, std::string> sc7 { stationary_test::adf, { .adf_lag = 25, .adf_with_trend = true } }; df.single_act_visit<double>("IBM_Close", sc7); assert(std::fabs(sc7.get_adf_statistic() - 0.946614) < 0.000001); StationaryCheckVisitor<double, std::string> sc8 { stationary_test::adf, { .adf_lag = 10, .adf_with_trend = true } }; df.single_act_visit<double>("normal_col", sc8); assert(std::fabs(sc8.get_adf_statistic() - 0.0289582) < 0.0000001); StationaryCheckVisitor<double, std::string> sc9 { stationary_test::adf, { .adf_lag = 25, .adf_with_trend = true } }; df.single_act_visit<double>("normal_col", sc9); assert(std::fabs(sc9.get_adf_statistic() - 0.020812) < 0.0000001); StationaryCheckVisitor<double, std::string> sc10 { stationary_test::adf, { .adf_lag = 10, .adf_with_trend = true } }; df.single_act_visit<double>("log close", sc10); assert(std::fabs(sc10.get_adf_statistic() - 0.972062) < 0.000001); StationaryCheckVisitor<double, std::string> sc11 { stationary_test::adf, { .adf_lag = 10, .adf_with_trend = true } }; df.single_act_visit<double>("residual close", sc11); assert(std::fabs(sc11.get_adf_statistic() - 0.679027) < 0.000001); // Now multidimensional data // constexpr std::size_t dim { 3 }; using ary_col_t = std::array<double, dim>; using vec_col_t = std::vector<double>; std::vector<ary_col_t> stationary_ary { {10.1, 5.0, -2.0}, { 9.9, 5.2, -2.1}, {10.2, 4.9, -1.9}, {10.0, 5.1, -2.0}, { 9.8, 5.0, -2.2}, {10.1, 5.3, -1.8}, {10.0, 4.8, -2.1}, {10.2, 5.1, -2.0}, { 9.9, 5.0, -2.0}, {10.1, 5.2, -1.9}, {10.0, 4.9, -2.1}, { 9.8, 5.1, -2.0}, {10.2, 5.0, -2.1} }; std::vector<vec_col_t> stationary_vec { {10.1, 5.0, -2.0}, { 9.9, 5.2, -2.1}, {10.2, 4.9, -1.9}, {10.0, 5.1, -2.0}, { 9.8, 5.0, -2.2}, {10.1, 5.3, -1.8}, {10.0, 4.8, -2.1}, {10.2, 5.1, -2.0}, { 9.9, 5.0, -2.0}, {10.1, 5.2, -1.9}, {10.0, 4.9, -2.1}, { 9.8, 5.1, -2.0}, {10.2, 5.0, -2.1} }; std::vector<vec_col_t> non_stationary_vec { { 1.0, 10.0, -5.0}, { 1.5, 10.5, -4.8}, { 2.1, 11.0, -4.5}, { 2.8, 11.8, -4.0}, { 3.6, 12.5, -3.5}, { 4.5, 13.3, -3.0}, { 5.5, 14.2, -2.4}, { 6.6, 15.0, -1.8}, { 7.8, 16.1, -1.0}, { 9.1, 17.3, -0.2}, {10.5, 18.6, 0.8}, {12.0, 20.0, 1.9}, {13.6, 21.5, 3.0} }; std::vector<ary_col_t> non_stationary_ary { { 1.0, 10.0, -5.0}, { 1.5, 10.5, -4.8}, { 2.1, 11.0, -4.5}, { 2.8, 11.8, -4.0}, { 3.6, 12.5, -3.5}, { 4.5, 13.3, -3.0}, { 5.5, 14.2, -2.4}, { 6.6, 15.0, -1.8}, { 7.8, 16.1, -1.0}, { 9.1, 17.3, -0.2}, {10.5, 18.6, 0.8}, {12.0, 20.0, 1.9}, {13.6, 21.5, 3.0} }; df.load_column<vec_col_t>("STATION VEC", std::move(stationary_vec), nan_policy::dont_pad_with_nans); df.load_column<ary_col_t>("STATION ARY", std::move(stationary_ary), nan_policy::dont_pad_with_nans); df.load_column<vec_col_t>("NON STATION VEC", std::move(non_stationary_vec), nan_policy::dont_pad_with_nans); df.load_column<ary_col_t>("NON STATION ARY", std::move(non_stationary_ary), nan_policy::dont_pad_with_nans); stac_v<vec_col_t, std::string> adf_vec_v { stationary_test::adf }; stac_v<ary_col_t, std::string> adf_ary_v { stationary_test::adf }; stac_v<vec_col_t, std::string> kpss_vec_v { stationary_test::kpss }; stac_v<ary_col_t, std::string> kpss_ary_v { stationary_test::kpss }; df.single_act_visit<vec_col_t>("STATION VEC", adf_vec_v); df.single_act_visit<ary_col_t>("STATION ARY", adf_ary_v); df.single_act_visit<vec_col_t>("STATION VEC", kpss_vec_v); df.single_act_visit<ary_col_t>("STATION ARY", kpss_ary_v); assert(adf_vec_v.get_adf_statistic().size() == dim); assert(std::abs(adf_vec_v.get_adf_statistic()[0] - -0.586018) < 0.000001); assert(std::abs(adf_vec_v.get_adf_statistic()[2] - -0.705822) < 0.000001); assert(kpss_vec_v.get_kpss_statistic().size() == dim); assert(std::abs(kpss_vec_v.get_kpss_statistic()[0] - 0.0) < 0.00000001); assert(std::abs(kpss_vec_v.get_kpss_statistic()[2] - 0.0) < 0.00000001); assert(kpss_vec_v.get_kpss_value().size() == dim); assert(std::abs(kpss_vec_v.get_kpss_value()[0] - 326.728) < 0.001); assert(std::abs(kpss_vec_v.get_kpss_value()[2] - 2239.29) < 0.01); df.single_act_visit<vec_col_t>("NON STATION VEC", adf_vec_v); df.single_act_visit<ary_col_t>("NON STATION ARY", adf_ary_v); df.single_act_visit<vec_col_t>("NON STATION VEC", kpss_vec_v); df.single_act_visit<ary_col_t>("NON STATION ARY", kpss_ary_v); assert(adf_vec_v.get_adf_statistic().size() == dim); assert(std::abs(adf_vec_v.get_adf_statistic()[0] - 0.904666) < 0.000001); assert(std::abs(adf_vec_v.get_adf_statistic()[2] - 0.896825) < 0.000001); assert(kpss_vec_v.get_kpss_statistic().size() == dim); assert(std::abs(kpss_vec_v.get_kpss_statistic()[0] - 0.0) < 0.00000001); assert(std::abs(kpss_vec_v.get_kpss_statistic()[2] - 0.0) < 0.00000001); assert(kpss_vec_v.get_kpss_value().size() == dim); assert(std::abs(kpss_vec_v.get_kpss_value()[0] - 10.76) < 0.01); assert(std::abs(kpss_vec_v.get_kpss_value()[2] - 189.563) < 0.001); }