Show HN: Time Series Benchmark TurboPFor,TurboFloat,TurboFloat LzX,TurboGorilla

  • Gorilla [1] and Gorilla based algos [2] are simply overrated.

    Store these values as 32 bits floats instead of 64 bits and you get instant 50% reduction without any compression.

    This is valid for allmost all time series data.

    Most of time series databases (ex. DuckDB) are storing floating point data as 64 bits.

    They are reporting some extraordinary compression ratio by using a gorilla/chimp like algorithm.

    However as shown in this benchmark, lot of time series data (ex. temparature, climate data, stocks,...)

    don't have more than 1 or 2 fixed decimal digits and can be stored losslessly in 16/32 bits integers.

    Integer compression [1] algorithms can then be used, which results in significant compression ratio and several times faster than the gorilla like algorithms.

    TurboGorilla, the fastest Gorilla (or chimp) based algo in c, cannot exceed 1GB/s in decompression, wherea TurboPFor is in the order of 10Gb/s, TurboBitByte is >100Gb/s.

    -[1] https://github.com/powturbo/TurboPFor-Integer-Compression

    -[2] https://www.vldb.org/pvldb/vol8/p1816-teller.pdf

    -[3] https://www.vldb.org/pvldb/vol15/p3058-liakos.pdf