![]() Normally, binary formats would be much faster than the csv format, because csv takes more space on disk, is row based, uncompressed and needs to be parsed into a computer-native format to have any meaning. The csv format used by the fread and fwrite methods of package data.table is actually a human-readable text format and not a binary format. The graph below shows how the use of multiple threads enhances the read and write speed of our sample dataset. But because the (de-)compression is done on background threads, it can increase the total read- and write speed as well. Write.fst ( df, "dataset.fst", 100 ) # use maximum compressionĬompression reduces the size of the fst file that holds your data. In addition to methods for data frame serialization, fst also provides methods for multi-threaded in-memory compression with the popular LZ4 and ZSTD compressors and an extremely fast multi-threaded hasher. This is an added benefit of fst’s use of type-specific compressors on each stored column. The on-disk file sizes of fst files are also much smaller than that of the other formats tested. The package accomplishes this by an effective combination of multi-threading and compression. ![]() These results are also visualized in the following graph:Īs can be seen from the figure, the measured speeds for the fst package are very high and even top the maximum drive speed of the SSD used. Parameter Speed was calculated by dividing the in-memory size of the data frame by the measured time. ![]() These benchmarks were performed on a laptop (i7 4710HQ GHz) with a reasonably fast SSD (M.2 Samsung SM951) using the dataset defined below. ![]() The figure below compares the read and write performance of the fst package to various alternatives. Data frames stored in the fst format have full random access, both in column and rows. With access speeds of multiple GB/s, fst is specifically designed to unlock the potential of high speed solid state disks that can be found in most modern computers. The fst package for R provides a fast, easy and flexible way to serialize data frames. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |