Colliding with the SHA prefix of Linux's initial Git commit

  • Great example of Hyrum's Law, https://www.hyrumslaw.com/.

    Comments about SHA256 are irrelevant - you can misuse the prefix of a SHA256 hash just as easily. The issue is that people got used to human-readable hash prefixes of 10-12 characters as "unique" for all intents and purposes, despite the fact that there were never any uniqueness guarantees for prefixes and git has always handled collisions with short object IDs as ambiguous - it's just that it's so rare to happen in the real world that lots of script writers treated that "mostly unique" as a guarantee.

    IMO support for short object IDs is a mistake, as is any behavior that "works this way 99.999% of the time, but hey developer don't forget you need to also code for that .001% edge case". I'm always just copying and pasting things around anyway, so it really doesn't make much difference to me if I'm copying 12 chars or 64.

  • First twelve and last twelve characters are the same:

        $ echo -n retr0id_662d970782071aa7a038dce6 | sha256sum
        307e0e71a409d2bf67e76c676d81bd0ff87ee228cd8f991714589d0564e6ea9a  -
        
        $ echo -n retr0id_430d19a6c51814d895666635 | sha256sum
        307e0e71a4098e7fb7d72c86cd041a006181c6d8e29882b581d69d0564e6ea9a  -
    
    * Via: https://news.ycombinator.com/item?id=38668893

  • There were some plans to migrate to SHA256, but somehow it still hasn't happened.

    The practical upshot is a git commit hash is not enough l to know you are distributing/executing the legitimate code, as opposed to a malicious doppelganger. This is particularly important for tools that rely on it for dependency management, local caches, etc.

  • Relatedly: Kees's keynote on Linux security from a month ago was great: https://www.youtube.com/watch?v=orO8czP5Bxw

  • Presumably the problem is that these tools only take the abbreviated hash into account. Not also the subject:

        <abbrev. hash> ("<subject>")
    
    You also have another data point. You only need to search in the history from the commit that you are reading. Assuming that the "Fixes" commit is an ancestor of the commit whose commit footer you are reading.

    I always just assumed that tools would take all the data into account. Which means that you both need to collide with the abbreviated hash as well as the subject. Now I don't do that since I just copy-paste the hash, but I would quickly notice in case the subject is different (and likely the commit message and the diff just look irrelevant).

    I don't understand why the Linux Kernel has this hard-coded rule[2] -- again, you were going to get collisions eventually, so the tools should have just taken all the data into account (at least the subject) from the start. The recommendation in the Git project is to use `git show -s --pretty=reference`, without any fiddling with the abbreviation:

        <abbrev. hash> (subject, ISO date)
    
    Although the Git maintainer uses `--abbrev=8` since git-show will just use a longer abbreviation in case the output would be ambiguous[1].

    They could have used this instead if they wanted simpler, future-proof tooling:

        Fixes: <full hash>
    
    Just like tools like git-revert and git-cherry-pick do.

    [1]: https://lore.kernel.org/git/xmqq34j5h7v9.fsf@gitster.g/

    [2]: Edit: hard-coded as opposed to Git just figuring out how long the abbreviation should be based on how many objects there are.

  • Cute. I've seen CPU commit prefix brute-forcers but not GPU ones. On the CPU, you can generate chosen 7-8 digit prefixes pretty easily (within a few minutes, I forget exactly). 12 digits is, of course, a massively bigger (factor of 2^16) space.

  • Would tooling break if the radix was changed from 16 to something higher? You could double your resistance by just changing to base 32 representation in the same length

  • We could just encode hashes on base64 instead of hex and get several times more entropy with the same size text

  • In other words, 6 out of 20 bytes match. Good luck colliding the other 112 bits.

  • https://people.kernel.org/kees/colliding-with-the-sha-prefix... is the actual link; the lwn article is just a link to it with a one sentence summary and a short quote.

  • [dead]