Long time ago (probably around 2010 or something) I had a similar idea [1] and concluded that it is infeasible due to the LZ77 window but... wow, I never thought that starting a speculative parsing at every possible position can be good enough! Great job!
Just in case, my own idea was that, due to the fact that there is exactly one end-of-block symbol per block and the longest codes in dynamic Huffman trees should be lexicographically last due to the canonical encoding. So if you have a long row of 1 bits there is a good chance that this codes the EOB symbol (because it should be the rarest symbol and any reasonable compressor will assign the longest code) and a new block follows. I don't know if this is better than the current brute-force approach.
Nice work! How much manual SIMD are you using?
Does this work with files compressed using the usual gzip command line program?
I thought you had to write "sync points" inside a gzip file to support reading from locations other than the start. If you somehow bypass that requirement, can you explain how?