Benchmark Code

The benchmarks ran on a P500 Lenovo using java 11 and a fixed processor Frequency of 3GHz.

Univocity is the benchmark with readInputOnSeparateThread set to false, ConcurrentUnivocity has that flag set to true. All the parallel test appart from ConcurrentUnivocity uses a ParallelReader.

Csv Parsing Unescaped/Escaped and Parallel

Library Version
Jackson 2.9.8
Sfm 6.7.0
Univocity 2.8.1

The csv file parsed is 145 MB unescaped and 188MB with quotes.

Why only those 3? because the other that I tested are pretty slow in comparison. If you think your csv parser is worth benchmark Open an issue.

Parsing an unescaped Csv

Parser avgt ms avgt MB/s
Sfm Raw 747 194
Sfm Callback 1040 139
Sfm Iterate 1127 128
Univocity 1256 115
Jackson 1593 90

Parsing an escaped version of Csv

Parser avgt ms avgt MB/s
Sfm Raw 921 204
Sfm Callback 1103 170
Sfm Iterate 1140 164
Univocity 1491 126
Jackson 1592 118

Parsing an unescaped Csv with ParallelReader

ConcurrentUnivocity uses readInputOnSeparateThread set to true and no ParallelReader.

Parser avgt ms avgt MB/s
Sfm Raw 530 273
Sfm Callback 740 195
Sfm Iterate 759 190
ConcurrentUnivocity 844 171
Univocity 890 162
Jackson 1243 116

Parsing a escaped version of Csv with ParallelReader

ConcurrentUnivocity uses readInputOnSeparateThread set to true and no ParallelReader.

Parser avgt ms avgt MB/s
Sfm Raw 610 307
Sfm Callback 812 231
Sfm Iterate 826 227
ConcurrentUnivocity 1054 178
Univocity 1105 170
Jackson 1342 140

Notes

The UTF8 decoding performance varied quite a bit depending on C2 optimisations. that variability is not represented here as I just display the average. I’m planning to investigate further and find out where it’s coming from.