The benchmarks ran on a P500 Lenovo using java 11 and a fixed processor Frequency of 3GHz.
Univocity is the benchmark with readInputOnSeparateThread set to false, ConcurrentUnivocity has that flag set to true. All the parallel test appart from ConcurrentUnivocity uses a ParallelReader.
Csv Parsing Unescaped/Escaped and Parallel
Library | Version |
---|---|
Jackson | 2.9.8 |
Sfm | 6.7.0 |
Univocity | 2.8.1 |
The csv file parsed is 145 MB unescaped and 188MB with quotes.
Why only those 3? because the other that I tested are pretty slow in comparison. If you think your csv parser is worth benchmark Open an issue.
Parsing an unescaped Csv
Parser | avgt ms | avgt MB/s |
---|---|---|
Sfm Raw | 747 | 194 |
Sfm Callback | 1040 | 139 |
Sfm Iterate | 1127 | 128 |
Univocity | 1256 | 115 |
Jackson | 1593 | 90 |
Parsing an escaped version of Csv
Parser | avgt ms | avgt MB/s |
---|---|---|
Sfm Raw | 921 | 204 |
Sfm Callback | 1103 | 170 |
Sfm Iterate | 1140 | 164 |
Univocity | 1491 | 126 |
Jackson | 1592 | 118 |
Parsing an unescaped Csv with ParallelReader
ConcurrentUnivocity uses readInputOnSeparateThread set to true and no ParallelReader.
Parser | avgt ms | avgt MB/s |
---|---|---|
Sfm Raw | 530 | 273 |
Sfm Callback | 740 | 195 |
Sfm Iterate | 759 | 190 |
ConcurrentUnivocity | 844 | 171 |
Univocity | 890 | 162 |
Jackson | 1243 | 116 |
Parsing a escaped version of Csv with ParallelReader
ConcurrentUnivocity uses readInputOnSeparateThread set to true and no ParallelReader.
Parser | avgt ms | avgt MB/s |
---|---|---|
Sfm Raw | 610 | 307 |
Sfm Callback | 812 | 231 |
Sfm Iterate | 826 | 227 |
ConcurrentUnivocity | 1054 | 178 |
Univocity | 1105 | 170 |
Jackson | 1342 | 140 |
Notes
The UTF8 decoding performance varied quite a bit depending on C2 optimisations. that variability is not represented here as I just display the average. I’m planning to investigate further and find out where it’s coming from.