| Signature | Description |
|---|---|
template<DT_ALLOWABLE_FORMATS DT_F = DT_FORMAT> struct WriteParams { // Floating-point values precision when written to a file // std::streamsize precision { 12 }; // If true, it only reads the data columns and skips the index column // bool columns_only { false }; // Max number of rows to write // long max_recs { std::numeric_limits<long>::max() }; // This specifies in what format DateTime columns are written into a csv // file. This only applies to csv2 format. // The only permitted formats are DT_PRECISE, ISO_DT_TM, AMR_DT_TM, // EUR_DT_TM, ISO_DT, AMR_DT, and EUR_DT. // See Dateitme docs (DT_FORMAT) for more info. // The default format is seconds since Epoch.nanoseconds. // DT_F dt_format { DT_FORMAT::DT_PRECISE }; // This only applies to csv and csv2 formats. It specifies the delimiting // (separating) character. // char delim { ',' }; }; |
Parameters to write() function of DataFrame |
| Signature | Description | Parameters |
|---|---|---|
template<typename S, typename ... Ts> bool write(S &o, io_format iof = io_format::csv, const WriteParams params = { }) const; |
It outputs the content of DataFrame into the stream o. The CSV file format is written:
INDEX:<Number of data points>:<Comma delimited list of values>
<Col1 name>:<Number of data points>:<Col1 type>:<Comma delimited list of values>
<Col2 name>:<Number of data points>:<Col2 type>:<Comma delimited list of values>
.
.
.
All empty lines or lines starting with # will be skipped. For examples, see files in test directoryThe CSV2 file format must be (this is similar to Pandas csv format):
INDEX:<Number of data points>:<Index type>:,<Column1 name>:
<Number of data points>:<Column1 type>,<Column2 name>:
<Number of data points>:<Column2 type>, . . .
Comma delimited rows of values
.
.
.
All empty lines or lines starting with # will be skipped. For examples, see IBM and FORD files in test directoryThe JSON file format looks like this:
{
"INDEX":{"N":3,"T":"ulong","D":[123450,123451,123452]},
"col_3":{"N":3,"T":"double","D":[15.2,16.34,17.764]},
"col_4":{"N":3,"T":"int","D":[22,23,24]},
"col_str":{"N":3,"T":"string","D":["11","22","33"]},
"col_2":{"N":3,"T":"double","D":[8,9.001,10]},
"col_1":{"N":3,"T":"double","D":[1,2,3.456]}
}
Please note DataFrame json does not follow json spec 100%. In json, there is no particular order in dictionary fields. But in DataFrame json:
Binary format is a proprietary format, that is optimized for compressing algorithms. It also takes care of different endianness. The file is always written with the same endianness as the writing host. But it will be adjusted accordingly when reading it from a different host with a different endianness. Binary format is, by far, the fastest way to read and write large files In all formats the following data types are supported:
float -- float
double -- double
longdouble -- long double
short -- short int
ushort -- unsigned short int
int -- int
uint -- unsigned int
long -- long int
longlong -- long long int
ulong -- unsigned long int
ulonglong -- unsigned long long int
char -- char
uchar -- unsigned char
string -- std::string
string -- const char *
string -- char *
vstr32 -- Fixed-size string of 31 char length
vstr64 -- Fixed-size string of 63 char length
vstr128 -- Fixed-size string of 127 char length
vstr512 -- Fixed-size string of 511 char length
vstr1K -- Fixed-size string of 1023 char length
vstr2K -- Fixed-size string of 2047 char length
bool -- bool
DateTime -- DateTime data in format of
<Epoch seconds>.<nanoseconds>
(1516179600.874123908)
In case of csv2, csv, binary, and pretty_prt the following additional types are also supported:
str_dbl_pair -- std::pair<std::string, double>.
The pair is printed as "<s:d>,<s:d>, ...
Where s's are strings and d's are doubles.
str_str_pair -- std::pair<std::string, std::string>.
The pair is printed as "<s1:s2>,<s1:s2>, ...
Where s's are strings.
dbl_dbl_pair -- std::pair<double, double>.
The pair is printed as "<d1:d2>,<d1:d2>, ...
Where d's are doubles.
dbl_vec -- std::vector<double>.
The vector is printed as "s[d1|d2|...]"
where s is the size of the vector and
d's are the double values.
str_vec -- std::vector<std::string>.
The vector is printed as "s[str1|str2|...]"
where s is the size of the vector
and str's are the strings.
dbl_set -- std::set<double>.
The set is printed as "s[d1|d2|...]"
where s is the size of the set
and d's are the double values.
str_set -- std::set<std::string>.
The set is printed as "s[str1|str2|...]"
where s is the size of the set
and str's are the strings.
str_dbl_map -- std::map<std::string, double>.
precision values, The map is printed
as "s{k1:v1|k2:v2|...}"
where s is the size of the map
and k's and v's are keys and values.
str_dbl_unomap -- std::unoredered_map<std::string, double>.
The map is printed as "s{k1:v1|k2:v2|...}"
where s is the size of the map and k's
In case of csv2 the following additional types are also supported:
DateTimeAME -- American style (MM/DD/YYYY HH:MM:SS.mmm)
DateTimeEUR -- European style (YYYY/MM/DD HH:MM:SS.mmm)
DateTimeISO -- ISO style (YYYY-MM-DD HH:MM:SS.mmm)
|
S: Output stream type Ts: The list of types for all columns. A type should be specified only once o: Reference to an streamable object (e.g. cout, file, ...) iof: Specifies the I/O format. The default is CSV precision: Specifies the precision for floating point numbers columns_only: If true, the index columns is not written into the stream max_recs: Max number of rows to write. If it is positive, it will write max_recs from the beginning of DataFrame. If it is negative, it will write max_recs from the end of DataFrame |
template<typename ... Ts> std::future<bool> write(const char *file_name, io_format iof = io_format::csv, const WriteParams params = { }) const; |
Same as write() above, but it takes a file name NOTE:: This version of write() can be substantially faster, especially for larger files, than if you open the file yourself and use the write() version above. |
|
template<typename S, typename ... Ts> std::future<bool> write_async(S &o, io_format iof = io_format::csv, const WriteParams params = { }) const; |
Same as write() above, but executed asynchronously | |
template<typename ... Ts> std::future<bool> write_async(const char *file_name, io_format iof = io_format::csv, const WriteParams params = { }) const; |
Same as write_async() above, but it takes a file name |
static void test_write_json() { std::cout << "\nTesting write(json) ..." << std::endl; std::vector<unsigned long> idx = { 123450, 123451, 123452, 123453, 123454, 123455, 123456, 123457, 123458, 123459, 123460 }; std::vector<double>. d1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 }; std::vector<double>. d2 = { 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 }; std::vector<double> d3 = { 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 }; std::vector<double> d4 = { 22, 23, 24, 25, 26, 27 }; std::vector<std::string>. s1 = { "11", "22", "33", "aa", "bb", "cc", "dd", "tt", "uu", "ii", "88" }; MyDataFrame df; df.load_data(std::move(idx), std::make_pair("col_1", d1), std::make_pair("col_2", d2), std::make_pair("col_3", d3), std::make_pair("col_str", s1)); df.load_column("col_4", std::move(d4), nan_policy::dont_pad_with_nans); std::cout << "Writing in JSON:" << std::endl; df.write<std::ostream, int, double, std::string>(std::cout, io_format::json); } // ----------------------------------------------------------------------------- static void test_io_format_csv2() { std::cout << "\nTesting io_format_csv2( ) ..." << std::endl; std::vector<unsigned long> ulgvec2 = { 123450, 123451, 123452, 123450, 123455, 123450, 123449, 123450, 123451, 123450, 123452, 123450, 123455, 123450, 123454, 123450, 123450, 123457, 123458, 123459, 123450, 123441, 123442, 123432, 123450, 123450, 123435, 123450 }; std::vector<unsigned long> xulgvec2 = ulgvec2; std::vector<int> intvec2 = { 1, 2, 3, 4, 5, 3, 7, 3, 9, 10, 3, 2, 3, 14, 2, 2, 2, 3, 2, 3, 3, 3, 3, 3, 36, 2, 45, 2 }; std::vector<double> xdblvec2 = { 1.2345, 2.2345, 3.2345, 4.2345, 5.2345, 3.0, 0.9999, 10.0, 4.25, 0.009, 8.0, 2.2222, 3.3333, 11.0, 5.25, 1.009, 2.111, 9.0, 3.2222, 4.3333, 12.0, 6.25, 2.009, 3.111, 10.0, 4.2222, 5.3333 }; std::vector<double> dblvec22 = { 0.998, 0.3456, 0.056, 0.15678, 0.00345, 0.923, 0.06743, 0.1, 0.0056, 0.07865, 0.0111, 0.1002, -0.8888, 0.14, 0.0456, 0.078654, -0.8999, 0.8002, -0.9888, 0.2, 0.1056, 0.87865, -0.6999, 0.4111, 0.1902, -0.4888 }; std::vector<std::string> strvec2 = { "4% of something", "Description 4/5", "This is bad", "3.4% of GDP", "Market drops", "Market pulls back","$15 increase", "Running fast", "C++14 development", "Some explanation", "More strings", "Bonds vs. Equities", "Almost done", "XXXX04", "XXXX2", "XXXX3", "XXXX4", "XXXX4", "XXXX5", "XXXX6", "XXXX7", "XXXX10", "XXXX11", "XXXX02", "XXXX03" }; std::vector<bool> boolvec = { true, true, true, false, true }; MyDataFrame df; df.load_data(std::move(ulgvec2), std::make_pair("ul_col", xulgvec2)); df.load_column("xint_col", std::move(intvec2), nan_policy::dont_pad_with_nans); df.load_column("str_col", std::move(strvec2), nan_policy::dont_pad_with_nans); df.load_column("dbl_col", std::move(xdblvec2), nan_policy::dont_pad_with_nans); df.load_column("dbl_col_2", std::move(dblvec22), nan_policy::dont_pad_with_nans); df.load_column("bool_col", std::move(boolvec), nan_policy::dont_pad_with_nans); df.write<std::ostream, int, unsigned long, double, bool, std::string>(std::cout, io_format::csv2); MyDataFrame df_read; try { df_read.read("csv2_format_data.csv", io_format::csv2); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; } df_read.write<std::ostream, int, unsigned long, double, bool, std::string>(std::cout, io_format::csv2); } // ----------------------------------------------------------------------------- static void test_no_index_writes() { std::cout << "\nTesting no_index_writes ..." << std::endl; StlVecType<unsigned long> ulgvec2 = { 123450, 123451, 123452, 123450, 123455, 123450, 123449, 123450, 123451, 123450, 123452, 123450, 123455, 123450, 123454, 123450, 123450, 123457, 123458, 123459, 123450, 123441, 123442, 123432, 123450, 123450, 123435, 123450 }; StlVecType<unsigned long> xulgvec2 = ulgvec2; StlVecType<int> intvec2 = { 1, 2, 3, 4, 5, 3, 7, 3, 9, 10, 3, 2, 3, 14, 2, 2, 2, 3, 2, 3, 3, 3, 3, 3, 36, 2, 45, 2 }; StlVecType<double> xdblvec2 = { 1.2345, 2.2345, 3.2345, 4.2345, 5.2345, 3.0, 0.9999, 10.0, 4.25, 0.009, 8.0, 2.2222, 3.3333, 11.0, 5.25, 1.009, 2.111, 9.0, 3.2222, 4.3333, 12.0, 6.25, 2.009, 3.111, 10.0, 4.2222, 5.3333 }; StlVecType<double> dblvec22 = { 0.998, 0.3456, 0.056, 0.15678, 0.00345, 0.923, 0.06743, 0.1, 0.0056, 0.07865, 0.0111, 0.1002, -0.8888, 0.14, 0.0456, 0.078654, -0.8999, 0.8002, -0.9888, 0.2, 0.1056, 0.87865, -0.6999, 0.4111, 0.1902, -0.4888 }; StlVecType<std::string> strvec2 = { "4% of something", "Description 4/5", "This is bad", "3.4% of GDP", "Market drops", "Market pulls back", "$15 increase", "Running fast", "C++14 development", "Some explanation", "More strings", "Bonds vs. Equities", "Almost done", "XXXX04", "XXXX2", "XXXX3", "XXXX4", "XXXX4", "XXXX5", "XXXX6", "XXXX7", "XXXX10", "XXXX11", "XXXX02", "XXXX03" }; StlVecType<bool> boolvec = { true, true, true, false, false, true }; MyDataFrame df; df.load_data(std::move(ulgvec2), std::make_pair("ul_col", xulgvec2)); df.load_column("xint_col", std::move(intvec2), nan_policy::dont_pad_with_nans); df.load_column("str_col", std::move(strvec2), nan_policy::dont_pad_with_nans); df.load_column("dbl_col", std::move(xdblvec2), nan_policy::dont_pad_with_nans); df.load_column("dbl_col_2", std::move(dblvec22), nan_policy::dont_pad_with_nans); df.load_column("bool_col", std::move(boolvec), nan_policy::dont_pad_with_nans); df.write<std::ostream, int, unsigned long, double, bool, std::string>(std::cout, io_format::csv, { .precision = 6 }); std::cout << std::endl; df.write<std::ostream, int, unsigned long, double, bool, std::string>(std::cout, io_format::csv, { .precision = 6, .columns_only = true }); std::cout << '\n' << std::endl; df.write<std::ostream, int, unsigned long, double, bool, std::string>(std::cout, io_format::csv2, { .precision = 6 }); std::cout << std::endl; df.write<std::ostream, int, unsigned long, double, bool, std::string>(std::cout, io_format::csv2, { .columns_only = true }); std::cout << '\n' << std::endl; df.write<std::ostream, int, unsigned long, double, bool, std::string>(std::cout, io_format::json); std::cout << std::endl; df.write<std::ostream, int, unsigned long, double, bool, std::string>(std::cout, io_format::json, { .columns_only = true }); }
// ---------------------------------------------------------------------------- static void test_DateTime_write() { std::cout << "\nTesting test_DateTime_write( ) ..." << std::endl; DTDataFrame df; try { df.read("DT_sample.csv", io_format::csv2); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; ::exit(-1); } df.write<std::ostream, double, long, DateTime>(std::cout, io_format::csv2, { .max_recs = 5 }); std::cout << "\n\n\n"; df.write<std::ostream, double, long, DateTime>(std::cout, io_format::csv2, { .max_recs = 5, .dt_format = DT_FORMAT::ISO_DT_TM }); std::cout << "\n\n\n"; df.write<std::ostream, double, long, DateTime>(std::cout, io_format::csv2, { .max_recs = 5, .dt_format = DT_FORMAT::AMR_DT_TM }); std::cout << "\n\n\n"; df.write<std::ostream, double, long, DateTime>(std::cout, io_format::csv2, { .max_recs = 5, .dt_format = DT_FORMAT::EUR_DT_TM }); std::cout << "\n\n\n"; df.write<std::ostream, double, long, DateTime>(std::cout, io_format::csv2, { .max_recs = 5, .dt_format = DT_FORMAT::ISO_DT }); std::cout << "\n\n\n"; df.write<std::ostream, double, long, DateTime>(std::cout, io_format::csv2, { .max_recs = 5, .dt_format = DT_FORMAT::AMR_DT }); std::cout << "\n\n\n"; df.write<std::ostream, double, long, DateTime>(std::cout, io_format::csv2, { .max_recs = 5, .dt_format = DT_FORMAT::EUR_DT }); std::cout << "\n\n\n"; }
// ---------------------------------------------------------------------------- static void test_io_format_csv2_with_bars() { std::cout << "\nTesting io_format_csv2_with_bars( ) ..." << std::endl; MyDataFrame df_read; try { df_read.read("csv2_format_data_with_bars.csv", io_format::csv2, { .delim = '|' }); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; ::exit(-1); } df_read.write<std::ostream, int, unsigned long, unsigned char, char, double, bool, std::string>(std::cout, io_format::csv2); std::cout << "\n\n"; df_read.write<std::ostream, int, unsigned long, unsigned char, char, double, bool, std::string>(std::cout, io_format::csv2, { .delim = '|' }); }
// ---------------------------------------------------------------------------- static void test_pretty_print() { std::cout << "\nTesting pretty_print ..." << std::endl; DTDataFrame df; try { df.read("DT_IBM.csv", io_format::csv2); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; ::exit(-1); } df.write<std::ostream, double, long>(std::cout, io_format::pretty_prt, { .precision = 2, .max_recs = 5, .dt_format = DT_FORMAT::ISO_DT }); std::cout << "\n\n\n"; DTDataFrame df2; try { df2.read("AAPL_10dBucketWithMaps_small.csv", io_format::csv2); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; ::exit(-1); } df2.write<std::ostream, double, long, std::map<std::string, double>, std::unordered_map<std::string, double>, std::vector<std::string>, std::set<double>, std::set<std::string>> (std::cout, io_format::pretty_prt, { .precision = 6, .dt_format = DT_FORMAT::ISO_DT }); }
// ---------------------------------------------------------------------------- static void test_markdown() { std::cout << "\nTesting markdown ..." << std::endl; DTDataFrame df; try { df.read("DT_IBM.csv", io_format::csv2); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; ::exit(-1); } df.write<std::ostream, double, long>(std::cout, io_format::markdown, { .precision = 2, .max_recs = 10, .dt_format = DT_FORMAT::AMR_DT }); }
// ---------------------------------------------------------------------------- static void test_latex() { std::cout << "\nTesting latex ..." << std::endl; DTDataFrame df; try { df.read("DT_IBM.csv", io_format::csv2); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; ::exit(-1); } df.write<std::ostream, double, long>(std::cout, io_format::latex, { .precision = 4, .max_recs = 5, .dt_format = DT_FORMAT::ISO_DT }); }
// ---------------------------------------------------------------------------- static void test_html() { std::cout << "\nTesting html ..." << std::endl; DTDataFrame df; try { df.read("DT_IBM.csv", io_format::csv2); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; ::exit(-1); } df.write<std::ostream, double, long>(std::cout, io_format::html, { .precision = 3, .max_recs = 5, .dt_format = DT_FORMAT::ISO_DT }); std::cout << "\n\n\n"; DTDataFrame df2; try { df2.read("AAPL_10dBucketWithMaps_small.csv", io_format::csv2); } catch (const DataFrameError &ex) { std::cout << ex.what() << std::endl; ::exit(-1); } df2.write<std::ostream, double, long, std::map<std::string, double>, std::unordered_map<std::string, double>, std::vector<std::string>, std::set<double>, std::set<std::string>> (std::cout, io_format::html, { .precision = 6, .dt_format = DT_FORMAT::ISO_DT_TM }); }