Back to Documentations

Signature Description
template<typename T>
struct  CanonCorrResult  {

    // These values represent the strength of the linear relationship between
    // each pair of canonical variates, ranging from -1 to 1, with higher
    // absolute values signifying a stronger association.
    //
    std::vector<T>  coeffs { };     // Canonical correlation coefficients

    // The Redundancy Index is a measure that indicates how much variance in
    // one set of variables is explained by the linear combination of the other
    // set of variables. This was proposed by Stewart and Love (1968).
    //
    T               x_red_idx { };  // Redundancy index for X
    T               y_red_idx { };  // Redundancy index for Y
};
Result of Canonical Correlation Analysis as returned by canon_corr() interface

Signature Description Parameters
template<typename T>
CanonCorrResult<T>
canon_corr(std::vector<const char *> &&X_col_names,
           std::vector<const char *> &&Y_col_names) const;
This performs Canonical Correlation Analysis (CCA) between two sets of columns X and Y. It returns the result in a struct defined above.
CCA is a statistical method for examining and measuring correlations between two sets of variables. Fundamentally, CCA looks for linear combinations of variables, also referred to as canonical variables, within each set so that the correlation between them is maximized. Finding relationships and patterns of linkage between the two groups is the main objective.

NOTE: Number of columns in each set must be the same
T: Type of the named columns
X_col_names: Names of the first set of columns
Y_col_names: Names of the second set of columns
static void test_canon_corr()  {

    std::cout << "\nTesting canon_corr( ) ..." << std::endl;

    StrDataFrame    df;

    try  {
        df.read("IBM.csv", io_format::csv2);
    }
    catch (const DataFrameError &ex)  {
        std::cout << ex.what() << std::endl;
    }

    const auto  result = df.canon_corr<double>({ "IBM_Close", "IBM_Open" }, { "IBM_High", "IBM_Low" });

    assert(result.coeffs.size() == 2);
    assert(std::fabs(result.coeffs[0] - 0.999944) < 0.000001);
    assert(std::fabs(result.coeffs[1] - 0.262927) < 0.000001);
    assert(std::fabs(result.x_red_idx - 0.534073) < 0.000001);
    assert(std::fabs(result.y_red_idx - 0.535897) < 0.000001);
}

C++ DataFrame