Hacker News

The BLAKE3 cryptographic hash function

by erwanon 1/9/2020, 5:29:19 PM with 15 comments

by s_tecon 1/9/2020, 8:53:24 PM
It looks like the speedup is coming from two main changes.
The first change is reducing the number of rounds from 10 to 7. Think of it like making a smoothie - you add bits of fruit to the drink (the input data), then pulse the blades to blend it up (making the output hash). This change basically runs the blades for 7 seconds instead of 10 seconds each time they add fruit. They cite evidence that the extra 3 seconds aren't doing much - once the fruit's fully liquid, extra blending doesn't help - but I worry that this reduces the security margin. Maybe those extra 3 rounds aren't useful against current attacks, but they may be useful against unknown future attacks.
The other change they make is to break the input into 1KiB chunks, then hash each chunk independently. Finally, they combine the individual chunk hashes into a single big hash using a binary tree. The benefit is that if you have 4KiB of data, you can use 4-way SIMD instructions to process all four chunks simultaneously. The more data you have, the more parallelism you can unlock, unlike traditional hash functions that process everything sequentially. On the flip side, modern SIMD instructions can handle 2 x 32-bit instructions just as fast as 1 x 64-bit instructions, so building the algorithm out of 32-bit arithmetic doesn't cost anything, but gives a big boost to low-end 32-bit CPU's that struggle with 64-bit arithmetic. The tree structure is a big win overall.
by clarkmoodyon 1/10/2020, 2:55:40 AM
Another benchmark:
time openssl sha256 /tmp/bigfile
```
  real  0m28.160s
  user  0m27.750s
  sys   0m0.272s
```
time shasum -a 256 /tmp/bigfile
```
  real  0m6.146s
  user  0m5.407s
  sys   0m0.560s
```
time b2sum /tmp/bigfile
```
  real  0m1.732s
  user  0m1.450s
  sys   0m0.244s
```
time b3sum /tmp/bigfile
```
  real  0m0.212s
  user  0m0.996s
  sys   0m0.379s
```
TIL OpenSSL sha256 invocation is really slow compared to the shasum program. Also BLAKE3 is really fast.
Edit: bigfile is 1GB of /dev/random
by dpc_pwon 1/9/2020, 8:36:59 PM
So this is bao + blake2?
I remember watching Bao, a general purpose cryptographic tree hash, and perhaps the fastest hash function in the world: https://www.youtube.com/watch?v=Dya9c2DXMqQ a while ago.
Nice job!
by loegon 1/9/2020, 10:25:33 PM
Looks like they've taken Too Much Crypto to heart[1] and dropped the number of rounds from Blake2B's 12 down to 7 for Blake3:
https://github.com/BLAKE3-team/BLAKE3/blob/master/reference_...
https://github.com/BLAKE2/BLAKE2/blob/master/ref/blake2b-ref...
Which, yeah, that alone will get you a significant improvement over Blake2B. But definitely doesn't account for the huge improvement they're showing. Most of that is the ability to take advantage of AVX512 parallelism, I think. The difference will be more incremental on AVX2-only amd64 or other platforms, I think.
[1]: Well, TMC recommended 8 rounds for Blake2B and 7 for Blake2S.
by kzrdudeon 1/9/2020, 7:57:36 PM
Just one variant, that's refreshing. And performance is impressive. What's the short input performance like? Say for 64 bytes of input.
by nabla9on 1/9/2020, 8:50:31 PM
That's impressive speedup. I just installed it and holy moly it really is fast. All those extra cores can finally get busy. ;)
b3sum -l 256 big-2.6Gfile
```
  real 0m0.384s
  user 0m2.302s
  sys  0m0.175s
```
b2sum -l 256 big-2.6Gfile
```
  rear  0m3.616s
  user  0m3.360s
  sys   0m0.256s
```
(Intel® Core™ i7-8550U CPU @ 1.80GHz × 8 )
EDIT: ah, the catch. blake3 targets 128 bit security. It competes with SipHash for speed and security
EDIT2 scratch the previous edit.
by memcoon 1/9/2020, 9:05:29 PM
I would be interested in how this compares on the smhasher against some of the other fastest hash competitors like meow hash or xxhash.
by eyegoron 1/10/2020, 1:36:45 AM
I can't seem to find any non-rust implementations in the works yet, so I may sit down and adapt the reference to C# this weekend. Anyone know how the single/few threads performance holds up excluding avx512?
by rurbanon 1/11/2020, 9:45:35 AM
smhasher results without the Rust version yet (which should be ~2x faster):
http://rurban.github.io/smhasher/doc/table.html
It's of course much faster as most of the other crypto hashes, but not faster than the hardware variants of SHA1-NI and SHA256-NI. About 4x faster than blake2.
Faster than SipHash, not faster than SipHash13.
The tests fail on MomentChi2 dramatically, which describe how good the user-provided random seed is mixed in. I tried by mixing a seed for IV[0], as with all other hardened crypto hashes, and for all 8 IV's, which didn't help. So I'm not convinced that a seeded IV is properly mixed in. Which is outside the usage pattern of a crypto or digest hash (b3sum), but inside a normal usage.
Rust staticlib is still in work, which would parallize the hashing in chunks for big keys. For small keys it should be even a bit slower. b3sum is so much faster, because it uses many more tricks, such as mmap.
by bjolion 1/9/2020, 8:23:41 PM
Does anybody know of benchmarks for ARM? Or any research trying to break it?
The numbers look astonishing.
by anticensoron 1/10/2020, 6:43:36 AM
Why are there no AES-like hashing algorithms out there? AES design is very suitable to be used as a building block in a hash if you remove "add round key" operation.
by babel_on 1/11/2020, 9:31:09 PM
Is it possible to benchmark agaist blake2 etc. but where they have the same number of rounds, testing both for reducing blake2 and also increasing blake3? Also, in that vein, offering the version with more rounds could win over the "paranoid" for mostly being a faster Blake2 thanks to SIMD and extra features thanks to the Merkle tree?

by ptomatoon 1/10/2020, 3:29:23 AM

  Benchmark #1: cat b1
    Time (mean ± σ):      1.076 s ±  0.007 s    [User: 5.3 ms, System: 1069.4 ms]
    Range (min … max):    1.069 s …  1.093 s    10 runs
  Benchmark #2: sha256sum b1
    Time (mean ± σ):      6.583 s ±  0.064 s    [User: 5.440 s, System: 1.137 s]
    Range (min … max):    6.506 s …  6.695 s    10 runs
  Benchmark #3: sha1sum b1
    Time (mean ± σ):      6.322 s ±  0.086 s    [User: 5.212 s, System: 1.103 s]
    Range (min … max):    6.214 s …  6.484 s    10 runs
  Benchmark #4: b2sum b1
    Time (mean ± σ):     13.184 s ±  0.108 s    [User: 12.090 s, System: 1.080 s]
    Range (min … max):   13.087 s … 13.382 s    10 runs
  Benchmark #5: b3sum b1
    Time (mean ± σ):     577.0 ms ±   5.4 ms    [User: 12.276 s, System: 0.669 s]
    Range (min … max):   572.4 ms … 587.0 ms    10 runs
  Benchmark #6: md5sum b1
    Time (mean ± σ):     14.851 s ±  0.175 s    [User: 13.717 s, System: 1.117 s]
    Range (min … max):   14.495 s … 15.128 s    10 runs
    
  Summary
    'b3sum b1' ran
      1.86 ± 0.02 times faster than 'cat b1'
     10.96 ± 0.18 times faster than 'sha1sum b1'
     11.41 ± 0.15 times faster than 'sha256sum b1'
     22.85 ± 0.28 times faster than 'b2sum b1'
     25.74 ± 0.39 times faster than 'md5sum b1'

gotdang that's some solid performance. (here running against 10GiB of random bytes; machine has the Sha ASM extensions, which is why sha256/sha1 perform so well)

edit: actually not a straight algo comparison, as b3sum here is heavily benefiting from multi-threading; without that it looks more like this:

  Benchmark #1: cat b1
    Time (mean ± σ):      1.090 s ±  0.007 s    [User: 2.9 ms, System: 1084.8 ms]
    Range (min … max):    1.071 s …  1.096 s    10 runs
   
  Benchmark #2: sha256sum b1
    Time (mean ± σ):      6.480 s ±  0.097 s    [User: 5.359 s, System: 1.115 s]
    Range (min … max):    6.346 s …  6.587 s    10 runs
   
  Benchmark #3: sha1sum b1
    Time (mean ± σ):      6.120 s ±  0.090 s    [User: 5.027 s, System: 1.082 s]
    Range (min … max):    5.979 s …  6.233 s    10 runs
   
  Benchmark #4: b2sum b1
    Time (mean ± σ):     12.866 s ±  0.208 s    [User: 11.722 s, System: 1.133 s]
    Range (min … max):   12.549 s … 13.124 s    10 runs
   
  Benchmark #5: b3sum b1
    Time (mean ± σ):      5.813 s ±  0.079 s    [User: 4.606 s, System: 1.202 s]
    Range (min … max):    5.699 s …  5.933 s    10 runs
   
  Benchmark #6: md5sum b1
    Time (mean ± σ):     14.355 s ±  0.184 s    [User: 13.305 s, System: 1.039 s]
    Range (min … max):   14.119 s … 14.605 s    10 runs
   
  Summary
    'cat b1' ran
      5.33 ± 0.08 times faster than 'b3sum b1'
      5.62 ± 0.09 times faster than 'sha1sum b1'
      5.95 ± 0.10 times faster than 'sha256sum b1'
     11.81 ± 0.21 times faster than 'b2sum b1'
     13.17 ± 0.19 times faster than 'md5sum b1'

still beating the dedicated sha extensions, but not nearly as dramatically.

by cogman10on 1/9/2020, 7:24:02 PM
Where is this useful?
I'm guessing not for password hashes simply because a fast hash is bad for passwords (makes brute forcing/rainbow tables easier).
So is this mostly just for file signing?