Show HN: Rapidgzip – Truly Parallel Gzip Decompression with 10 GB/s

  • Does this work with files compressed using the usual gzip command line program?

    I thought you had to write "sync points" inside a gzip file to support reading from locations other than the start. If you somehow bypass that requirement, can you explain how?

  • Long time ago (probably around 2010 or something) I had a similar idea [1] and concluded that it is infeasible due to the LZ77 window but... wow, I never thought that starting a speculative parsing at every possible position can be good enough! Great job!

    Just in case, my own idea was that, due to the fact that there is exactly one end-of-block symbol per block and the longest codes in dynamic Huffman trees should be lexicographically last due to the canonical encoding. So if you have a long row of 1 bits there is a good chance that this codes the EOB symbol (because it should be the rarest symbol and any reasonable compressor will assign the longest code) and a new block follows. I don't know if this is better than the current brute-force approach.

    [1] https://news.ycombinator.com/item?id=29586865

  • Nice work! How much manual SIMD are you using?