Benchmark Code

The benchmarks ran on a P500 Lenovo, result might differ - and do, ran on old i5, Univocity is faster than sfm.

Univocity is the benchmark with readInputOnSeparateThread set to false, ConcurrentUnivocity has that flag set to true. All the parallel test appart from ConcurrentUnivocity uses a ParallelReader.

Csv Parsing Unescaped/Escaped and Parallel

The score are time in ms to parse the file, the lower the better.

Why only those 3? because the other that I tested are pretty slow in compaison. If you think your csv parser is worth benchmark Open an issue.

Parsing a unescaped Csv

Parser avgt 90th 95th 99th
Jackson 1237 1281 1300 1346
Sfm 854 858 859 865
Univocity 958 969 974 994

Parce csv no quotes

Parsing a escaped version of Csv

Parser avgt 90th 95th 99th
Jackson 1490 1726 1778 1810
Sfm 1267 1502 1538 1550
Univocity 1846 1883 1906 1963

Parce csv with quotes

Parsing a unescaped Csv with ParallelReader

ConcurrentUnivocity uses readInputOnSeparateThread set to true and no ParallelReader.

Parser avgt 90th 95th 99th
Jackson 1360 1545 1589 1875
Sfm 749 825 891 999
Univocity 852 895 965 1097
ConcurrentUnivocity 795 806 813 849

Parce csv, Parallel reader, no quotes

Parsing a escaped version of Csv with ParallelReader

ConcurrentUnivocity uses readInputOnSeparateThread set to true and no ParallelReader.

Parser avgt 90th 95th 99th
Jackson 1368 1493 1560 1699
Sfm 951 1048 1109 1230
Univocity 1378 1480 1545 1828
ConcurrentUnivocity 1302 1330 1365 1589

Parce csv, Parallel reader, with quotes

Summary

Sfm is wining on all measures appart from the 90th percentile on unescaped parallel - That is likely link to the implementation of the ParallelReader. The Sfm non parallel escaped reading is even faster than the ConcurrentUnivocity. and only 60ms behind on the non escaped content - might not worth wasting a cpu.

Jackson is better than Univocity on escaped content, Univocity is better than Jackson on unescaped. For some reason the advantages that Jackson possesses disappear in the parallel treatment.

Number of rows effect on parsing/mapping no quotes

Raw Data

All chart represent the time in us taken to read a file with the specified number of rows, the lower the better.

1 row

1 Row

10 rows

10 Row

1000 rows

1000 Row

100,000 rows

100,000 Row

1.5m rows

1.5m Row

Summary

Sfm and Jackson have similar performances with small size files, Univocity has a fix cost that show on those file make it not very suitable for those. Using parallel for small size file is a bad idea.

Jackson mapper underperform and lag behind from 1000 rows. Sfm mapper is clearly in front on most sizes from 1000.

Also surprisingly jackson perform slower with the parallel reader that would need more investigation to identify where the contention comes from.