Back to Documentations

Signature Description Parameters
template<StringOnly T, typename ... Ts>
DataFrame
get_data_by_like(const char *name,
                 const char *pattern,
                 bool case_insensitive = false,
                 char esc_char = '\\') const;
This method does a basic Glob-like pattern matching (also similar to SQL like clause) to filter data in the named column. It returns a new DataFrame. Each element of the named column is checked against a Glob-like matching logic

Globbing rules:
 '*' Matches any sequence of zero or more characters.
 '?' Matches exactly one character.
 [...] Matches one character from the enclosed list of characters.
 [^...] Matches one character not in the enclosed list.

With the [...] and [^...] matching, a ']' character can be included in the list by making it the first character after '[' or '^'. A range of characters can be specified using '-'. Example: "[a-z]" matches any single lower-case letter. To match a '-', make it the last character in the list.

Hint: To match '*' or '?', put them in "[]". Like this: abc[*]xyz matches "abc*xyz" only
NOTE: This could be, in some cases, n-squared. But it is pretty fast with moderately sized strings. I have not tested this with huge/massive strings.
T: Type of the named column. Based on the concept, it can only be either of these types: std::string, VirtualString, const char *, char *
Ts: List all the types of all data columns. A type should be specified in the list only once.
name: Name of the data column
pattern: Glob like pattern to use for matching strings
case_insensitive: If true, matching logic ignores case
esc_char: Character used for escape
template<StringOnly T, typename ... Ts>
PtrView
get_view_by_like(const char *name,
                 const char *pattern,
                 bool case_insensitive = false,
                 char esc_char = '\\');
This is identical with above get_data_by_like(), but:
  1. The result is a view
  2. Since the result is a view, you cannot call make_consistent() on the result.
NOTE: There are certain operations that you cannot do with a view. For example, you cannot add/delete columns, etc.
T: Type of the named column. Based on the concept, it can only be either of these types: std::string, VirtualString, const char *, char *
Ts: List all the types of all data columns. A type should be specified in the list only once.
name: Name of the data column
pattern: Glob like pattern to use for matching strings
case_insensitive: If true, matching logic ignores case
esc_char: Character used for escape
template<StringOnly T, typename ... Ts>
ConstPtrView
get_view_by_like(const char *name,
                 const char *pattern,
                 bool case_insensitive = false,
                 char esc_char = '\\') const;
Same as above view, but it returns a const view. You can not change data in const views. But if the data is changed in the original DataFrame or through another view, it is reflected in the const view. T: Type of the named column. Based on the concept, it can only be either of these types: std::string, VirtualString, const char *, char *
Ts: List all the types of all data columns. A type should be specified in the list only once.
name: Name of the data column
pattern: Glob like pattern to use for matching strings
case_insensitive: If true, matching logic ignores case
esc_char: Character used for escape
template<StringOnly T, typename ... Ts>
DataFrame
get_data_by_like(const char *name1,
                 const char *name2,
                 const char *pattern1,
                 const char *pattern2,
                 bool case_insensitive = false,
                 char esc_char = '\\') const;
This does the same function as above get_data_by_like() but operating on two columns.
T: Type of both named columns. Based on the concept, it can only be either of these types: std::string, VirtualString, const char *, char *
Ts: List all the types of all data columns. A type should be specified in the list only once.
name1: Name of the first data column
name2: Name of the second data column
pattern1: Glob like pattern to use for matching strings for the first column
pattern2: Glob like pattern to use for matching strings for the second column
case_insensitive: If true, matching logic ignores case
esc_char: Character used for escape
template<StringOnly T, typename ... Ts>
PtrView
get_view_by_like(const char *name1,
                 const char *name2,
                 const char *pattern1,
                 const char *pattern2,
                 bool case_insensitive = false,
                 char esc_char = '\\');
This is identical with above get_data_by_like(), but:
  1. The result is a view
  2. Since the result is a view, you cannot call make_consistent() on the result.
NOTE: There are certain operations that you cannot do with a view. For example, you cannot add/delete columns, etc.
T: Type of both named columns. Based on the concept, it can only be either of these types: std::string, VirtualString, const char *, char *
Ts: List all the types of all data columns. A type should be specified in the list only once.
name1: Name of the first data column
name2: Name of the second data column
pattern1: Glob like pattern to use for matching strings for the first column
pattern2: Glob like pattern to use for matching strings for the second column
case_insensitive: If true, matching logic ignores case
esc_char: Character used for escape
template<StringOnly T, typename ... Ts>
ConstPtrView
get_view_by_like(const char *name1,
                 const char *name2,
                 const char *pattern1,
                 const char *pattern2,
                 bool case_insensitive = false,
                 char esc_char = '\\') const;
Same as above view, but it returns a const view. You can not change data in const views. But if the data is changed in the original DataFrame or through another view, it is reflected in the const view. T: Type of both named columns. Based on the concept, it can only be either of these types: std::string, VirtualString, const char *, char *
Ts: List all the types of all data columns. A type should be specified in the list only once.
name1: Name of the first data column
name2: Name of the second data column
pattern1: Glob like pattern to use for matching strings for the first column
pattern2: Glob like pattern to use for matching strings for the second column
case_insensitive: If true, matching logic ignores case
esc_char: Character used for escape
static void test_get_data_by_like()  {

    std::cout << "\nTesting get_data_by_like( ) ..." << std::endl;

    StlVecType<unsigned long>  idxvec = { 1UL, 2UL, 3UL, 10UL, 5UL, 7UL, 8UL, 12UL, 9UL, 12UL, 10UL, 13UL, 10UL, 15UL, 14UL };
    StlVecType<std::string>    strvec1 =
        { "345&%$abcM", "!@#$0987^HGTtiff\"", "ABFDTiy", "345&%$abcM", "!@#$0987^HGTtiff\"", "!@#$0987^HGTtiff\"", "345&%$abcM",
          "!@#$0987^HGTtiff\"", "ABFDTiy", "345&%$abcM", "!@#$0987^HGTtiff\"", "ABFDTiy", "345&%$abcM", "ABFDTiy", "ABFDTiy" };
    StlVecType<std::string>    strvec2 =
        { "ABFDTiy", "!@#$0987^HGTtiff\"", "ABFDTiy", "345&%$abcM", "!@#$0987^HGTtiff\"", "ABFDTiy", "!@#$0987^HGTtiff\"",
          "!@#$0987^HGTtiff\"", "ABFDTiy", "345&%$abcM", "!@#$0987^HGTtiff\"", "ABFDTiy", "345&%$abcM", "ABFDTiy", "ABFDTiy" };
    StlVecType<int>            intvec = { 1, 2, 3, 10, 5, 7, 8, 12, 9, 12, 10, 13, 10, 15, 14 };
    MyDataFrame                df;

    df.load_data(std::move(idxvec),
                 std::make_pair("str column 1", strvec1),
                 std::make_pair("str column 2", strvec2),
                 std::make_pair("int column", intvec));

    auto    df_like2 =
        df.get_data_by_like<std::string, std::string, int>("str column 1",
                                                           "str column 2",
                                                           "?*[0-9][0-9][0-9][0-9]?*",
                                                           "?*[0-9][0-9][0-9][0-9]?*");

    assert(df_like2.get_index().size() == 4);
    assert(df_like2.get_index()[2] == 12);
    assert(df_like2.get_column<int>("int column")[2] == 12);
    assert(df_like2.get_column<std::string>("str column 1").size() == 4);
    assert(df_like2.get_column<std::string>("str column 2").size() == 4);
    assert((df_like2.get_column<std::string>("str column 1")[0] == "!@#$0987^HGTtiff\""));
    assert((df_like2.get_column<std::string>("str column 1")[2] == "!@#$0987^HGTtiff\""));
    assert((df_like2.get_column<std::string>("str column 2")[0] == "!@#$0987^HGTtiff\""));
    assert((df_like2.get_column<std::string>("str column 2")[2] == "!@#$0987^HGTtiff\""));

    auto    dfv_like2 =
        df.get_view_by_like<std::string, std::string, int>("str column 1",
                                                           "str column 2",
                                                           "?*[0-9][0-9][0-9][0-9]?*",
                                                           "?*[0-9][0-9][0-9][0-9]?*");

    assert(dfv_like2.get_index().size() == 4);
    assert(dfv_like2.get_index()[2] == 12);
    assert(dfv_like2.get_column<int>("int column")[2] == 12);
    assert(dfv_like2.get_column<std::string>("str column 1").size() == 4);
    assert(dfv_like2.get_column<std::string>("str column 2").size() == 4);
    assert((dfv_like2.get_column<std::string>("str column 1")[0] == "!@#$0987^HGTtiff\""));
    assert((dfv_like2.get_column<std::string>("str column 1")[2] == "!@#$0987^HGTtiff\""));
    assert((dfv_like2.get_column<std::string>("str column 2")[0] == "!@#$0987^HGTtiff\""));
    assert((dfv_like2.get_column<std::string>("str column 2")[2] == "!@#$0987^HGTtiff\""));

    dfv_like2.get_column<std::string>("str column 2")[3] = "ABC";
    assert(dfv_like2.get_column<std::string>("str column 2")[3] == "ABC");
    assert(df.get_column<std::string>("str column 2")[10] == "ABC");

    auto    df_like1 =
        df.get_data_by_like<std::string, std::string, int>("str column 1", "?*&%?*");

    assert(df_like1.get_index().size() == 5);
    assert(df_like1.get_index()[2] == 8);
    assert(df_like1.get_column<int>("int column")[2] == 8);
    assert(df_like1.get_column<std::string>("str column 1").size() == 5);
    assert(df_like1.get_column<std::string>("str column 2").size() == 5);
    assert((df_like1.get_column<std::string>("str column 1")[0] == "345&%$abcM"));
    assert((df_like1.get_column<std::string>("str column 1")[2] == "345&%$abcM"));
    assert((df_like1.get_column<std::string>("str column 2")[0] == "ABFDTiy"));
    assert((df_like1.get_column<std::string>("str column 2")[2] == "!@#$0987^HGTtiff\""));
}

C++ DataFrame