Hacker News

Extremely Linear Git History

by zeglon 11/22/2022, 10:43:30 AM with 58 comments

by infogulchon 11/22/2022, 8:20:34 PM
Github-style rebase-only PRs have revealed the best compromise between 'preserve history' and 'linear history' strategies:
All PRs are rebased and merged in a linear history of merge commits that reference the PR#. If you intentionally crafted a logical series of commits, merge them as a series (ideally you've tested each commit independently), otherwise squash.
If you want more detail about the development of the PR than the merge commit, aka the 'real history', then open up the PR and browse through Updates, which include commits that were force-pushed to the branch and also fast-forward commits that were appended to the branch. You also get discussion context and intermediate build statuses etc. To represent this convention within native git, maybe tag each Update with pr/123/update-N.
The funny thing about this design is that it's actually more similar to the kernel development workflow (emailing crafted patches around until they are accepted) than BOTH of the typical hard-line stances taken by most people with a strong opinion about how to maintain git history (only merge/only rebase).
by larschdkon 11/22/2022, 12:29:40 PM
I want the 'merge' function completely deprecated. I simply don't trust it anymore.
If there are no conflicts, you might as well rebase or cherry-pick. If there is any kind of conflict, you are making code changes in the merge commit itself to resolve it. Developer end up fixing additional issues in the merge commit instead of actual commits.
If you use merge to sync two branches continously, you completely lose track of what changes were done on the branch and which where done on the mainline.
by zeglon 11/22/2022, 2:47:07 PM
I don't know how stupid this is on a scale from 1 to 10. I've created a wrapper [1] for git (called "shit", for "short git") that converts non-padded revisions to their padded counterpart.
Examples:
"shit show 14" gets converted to "git show 00000140"
"shit log 10..14" translates to "git log 00000100..00000140"
[1]: https://github.com/zegl/extremely-linear/blob/main/shit
by jordighon 11/22/2022, 2:47:33 PM
Mercurial always has had sequential revision numbers in addition to hashes for every commit.
They aren't perfect, of course. All they indicate is in which order the current clone of the repo saw the commits. So two clones could pull the commits in different order and each clone could have different revision numbers for the same commits.
But they're still so fantastically useful. Even with their imperfections, you know that commit 500 cannot be a parent of commit 499. When looking at blame logs (annotate logs), you can be pretty sure that commit 200 happened some years before commit 40520. Plus, if you repo isn't big (and most repos on Github are not that big by numbers of commits), your revision numbers are smaller than even short git hashes, so they're easier to type in the CLI.
by kinduffon 11/22/2022, 11:37:28 AM
See also Lucky Commit [0], which uses various types of whitespace characters instead of a hash inside the commit, which makes it look more magical.
I wonder about performance, though. Why is the author's method slower than the package I linked?
[0]: https://github.com/not-an-aardvark/lucky-commit
by bluxon 11/22/2022, 2:54:23 PM
I fail to see the point of this, in fact, I think this is a fundamentally flawed approach to dealing with your revision history. The problem is that rebasing commits has the potential of screwing up the integrity of your commit history.
How are you going to deal with non-trivial feature branches that need to be integrated into master? Squash them and commit? Good luck when you need to git bisect an issue. Or rebase and potentially screwing up the integrity of the unit test results in the rebased branch? Both sound unappealing to me.
The problem is not a history with a lot of branches in it, it is in not knowing how to use your tools to present a view on that history you are interested in and is easy for you to understand.
by bloppeon 11/22/2022, 10:23:47 PM
This is a fun idea, but it will mess with your GC heuristics.
https://git-scm.com/docs/git-gc#_configuration
Git does something called "packing" when it detects "approximately more than <X (configurable)> loose objects" in your .git/objects/ folder. The key word here is "approximately". It will guess how many total objects you have by looking in a few folders and assuming that the objects are uniformly distributed among them (these folders consist of the first 2 characters of the SHA-1 digest). If you have a bunch of commits in the .git/objects/00/ folder, as would happen here, git will drastically over- or under-approximate the total number of objects depending on whether that 00/ folder is included in the heuristic.
This isn't the end of the world, but something to consider.
by wirrbelon 11/22/2022, 1:08:40 PM
I think the sweet spot in Developer productivity was when we had SVN repos and used git-svn on the client. Commits were all rebased on git level prior to pushing. If you committed something that broke unit tests your colleagues would pass you a really ugly plush animal of shame that would sit on your desk until the next coworker broke the build.
We performed code review with a projector in our office jointly looking at diffs, or emacs.
Of course it’s neat to have GitHub actions now and pull-requests for asynchronous code review. But I learned so much from my colleagues directly in that nowadays obscure working mode which I am still grateful for.
by chrismorganon 11/22/2022, 12:34:32 PM
It has been my habit for a while to make the root commit 0000000 because it’s fun, but for some reason it had not occurred to me to generalise this to subsequent commits. Tempting, very tempting. I have a couple of solo-developed-and-publicly-shared projects in mind that I will probably do this for.
by oneeyedpigeonon 11/22/2022, 12:15:39 PM
I bet I wasn't the first person who thought this would have to be done by modifying actual file content — e.g. a dummy comment or something. That would clearly have been horrible, but the fact that git bases the checksum off the commit message is... surprising and fortunate, in this case!
by Ayeshon 11/22/2022, 11:24:27 AM
I wonder if Git provides a pluggable hashing mechanism as part of SHA2 migration.
I imagine stuff like this and SVN to Git mirroring to work nicely with identical hashes.
by Semaphoron 11/22/2022, 11:43:35 AM
> Full collision (entire hash is zeros, then 000...1, etc.) — `git linearize --format "%040d"` (takes ~10³³ years to run per commit)
Hah :D
by otikikon 11/22/2022, 11:38:11 AM
This is horrible and I like it.
by ChrisMarshallNYon 11/22/2022, 11:45:09 AM
I find tags to be a fairly useful way of providing a linear progression, but I guess that's no fun.
> but it can also mean to only allow merges in one direction, from feature branches into main, never the other way around. It kind of depends on the project.
That sounds like the Mainline Model, championed by Perforce[0]. It's actually fairly sensible.
[0] https://www.perforce.com/video-tutorials/vcs/mainline-model-...
by davide_von 11/22/2022, 11:16:27 AM
I thought I was a very tidy person, then I saw this.
by gyulaion 11/22/2022, 11:48:30 AM
Sane revision numbers are among the many reasons I prefer SVN to GIT.
by maxbondon 11/22/2022, 11:19:38 AM
Has anyone tried using git alternatives like fossil in production? Did it work out? Did you build CI/CD around it?
by sagebirdon 11/22/2022, 3:21:24 PM
“ So we only have one option: testing many combinations of junk data until we can find one that passes our criteria. “
I have a somewhat related interest of trying to find sentences that have low Sha256 sums.
I made a go client that searches for low hash sentences and uploads winners to a scoreboard I put up at https://lowhash.com
I am not knowledgeable about gpu methods or crypto mining in general, I just tried to optimize a cpu based method. Someone who knows what they are doing could quickly beat out all the sentences there.
by HextenAndyon 11/22/2022, 12:42:41 PM
Wait until you see subversion :)
by chrismorganon 11/22/2022, 12:48:15 PM
The article talks about eight-character prefixes later in the article, but Git short refs actually use seven-character prefixes when there is no collision on that (and that’s what’s shown earlier in the article). So you can divide time by 16.
For me on a Ryzen 5800HS laptop, lucky_commit generally takes 11–12 seconds. I’m fine with spending that much per commit when publishing. The three minutes eight-character prefixes would require, not quite so much.
by jzer0coolon 11/22/2022, 11:25:17 PM
I had in past some teammates merging master into a very old branch. This get's pushed back into master with every past history already committed. Could someone suggest series of command so that only their latest updates are separately moved to latest version of master on current or new branch?
by kzrdudeon 11/22/2022, 2:06:07 PM
Clever and tempting. I would maybe like to use a smaller prefix but ensure a 0 suffix to the number too, to make it easy to read anyway. Like 00010bad, 00020fed, 00030be1, etc..
Wraparound doesn't really matter, as long as it's spaced long apart.
by malkiaon 11/22/2022, 4:39:37 PM
Hail p4, g4, svn and blessed be their monotonically increasing revision number!
by nsajkoon 11/22/2022, 3:30:20 PM
Gitlab supports an option called "Fast-forward merge":
> No merge commits are created.
> Fast-forward merges only.
> When there is a merge conflict, the user is given the option to rebase.
The maintainer can enable this for a project.
by enriqutoon 11/22/2022, 11:13:16 AM
I love linear git! Branches are very confusing for a nonempty set of people. For us, it is always clearer to work with explicit files in the main branch. You are implementing a new feature? Nice: just create a new file on the main branch and keep updating it until you add it to the tests, and later you call it from the main program. This system may break down on large teams, but when you are just a handful of grug-brained developers, it's perfectly appropriate.
by JoachimSon 11/22/2022, 3:15:53 PM
A neat trick with this tool is to generate a commit message that corresponds to a given issue number. It could almost be useful.
Kudos to @zegl for this cool project.
by titzeron 11/22/2022, 1:59:20 PM
I prefer a linear version number on the main branch and I have a really tiny version file that gets incremented on every change to the src/ directory. That's not entirely automated, but a commit queue could do that.
Brute-forcing hash collisions seems like an April Fool's joke. You can't really be serious that people are going to do this regularly?
by shadytreeson 11/22/2022, 12:41:36 PM
There's a memorable Stripe CTF from 2014 that had something similar (Gitcoins). This brought back fond memories of that.
by bcoughlanon 11/22/2022, 12:00:08 PM
I wish Git had more support for "linear" revisions in the main branches. It's great for continuous delivery where you can get a unique identifier that's also human-friendly.
I emulate this by counting the number of merges on main:
git rev-list --count --first-parent HEAD
But it's not that traceable (hard to go from a rev back to a commit).
by conacloson 11/22/2022, 12:38:06 PM
Another approach could be to use prefixes. A 0 could separate the prefix (fixed hash part) from the suffix (random part).
```
  0<suffix>
  10<suffix>
  20<suffix>
  ...
```
Combined with auto-completion, you preserve the main advantage (ordering) and you are able to quickly compute the hash.
by alvison 11/22/2022, 2:50:48 PM
It look appealing from a perfectionist point of view.
But! How can I collaborate with my team when PR merges are inevitable? O:
by craniumon 11/22/2022, 11:37:53 AM
Now, merge only the next-in-line hashes and the contributions to your repo can reach Cloud Scale™. Harness the ultimate power of distributed intelligent agents to create the future, backed by strong mathematical foundations and an ecosystem of innovative technologies. Just at your fingertips
by forgotmypw17on 11/22/2022, 1:58:31 PM
I am writing a solo project. I only use main (aka master) and never use branching. Otherwise, I inevitably screw something up. It is good enough to keep me from losing stuff most of the time, and I almost never have to struggle with understanding what the heck Git is doing.
by tambourine_manon 11/22/2022, 2:02:45 PM
The fact that we use a hash as the main way to interact with commits shows how bad git interface is. Sure, you should be able to easily check the sha anytime, but expose the plumbing to end users on almost any interaction is mad. We just got used to it.
by hiergiltdiestfuon 11/22/2022, 1:15:41 PM
wtf, back to SVN :D
I honestly expected this to be from another "really cool date" - April 1st :D
by joosterson 11/22/2022, 2:08:40 PM
Just like perforce and its "p4 changes" command. I like the simplicity.
by shawabawa3on 11/22/2022, 11:38:15 AM
Sadly if you use commit signing it's unfeasibly slow to do this :(
by jbergstroemon 11/22/2022, 12:14:49 PM
The Webkit project would love this. Can't help but feel that half the reason they spent all the extra effort with subversion was user-friendly commit revisions.
by low_tech_punkon 11/22/2022, 4:25:59 PM
Extremely effective way to waste electricity and emit CO2.
by chrsigon 11/22/2022, 11:49:53 AM
Cool hacker project, learned stuff about git reading the article. I don't want to put this into practice, and don't see the utility of it.
by jasmeron 11/22/2022, 2:41:20 PM
Wouldn't it have been better if we could use something other than SHA1 as the actual name of something?
Where in the worst dystopian parts of software do we do this?
The SHA1 is kind of a security feature if anything, a side-show thing that should be nestled 1-layer deep into the UI and probably most people are unaware of.
Whereas commits and branches should be designed specifically for the user - not 'externalized artifacts' of some acyclic graph implementation.
Git triggers a product designers OCD so hard, it's hard for some of us to not disdain it for spite.
by lloydatkinsonon 11/22/2022, 12:42:20 PM
I am confused. Why not use git tags for versioning?
by hosejaon 11/22/2022, 11:16:47 AM
This is very silly.
by zomglingson 11/22/2022, 3:14:53 PM
Is it possible to change the checksum implementation that git uses, through configuration or a plugin?
I find all this hash inverting quite inelegant.
by Pirate-of-SVon 11/22/2022, 3:15:45 PM
Very good! I use this hack every day in winter to heat my apartment (charging laptop at work, run git brute force at home).
by rock_artiston 11/22/2022, 1:55:40 PM
To make sure I've got it right.
In order to get this 'beautiful' hashes, they're crunching numbers leveraging cpu power?
by breckon 11/22/2022, 1:18:09 PM
This is absolutely genius. Would be nice to upstream it and make it fast. I would start using it.
by mihaalyon 11/22/2022, 3:14:57 PM
Sounds great for a single person project but perhaps a simpler VCS was better then?
by u801eon 11/22/2022, 1:57:56 PM
Why does the version skip from 19 to 20? What about 1A, 1B, 1C, 1D, 1E, and 1F?
by pcthrowawayon 11/22/2022, 11:35:12 AM
I mean.. this kind of breaks down if you have more than one person on the team
by codeulikeon 11/22/2022, 12:49:02 PM
So this is for if you want to use Git as if its Subversion?
by _alex_on 11/23/2022, 1:11:04 AM
Proof of work git hashes. This is nuts and I love it.
by ccbccccbbcccbbon 11/22/2022, 5:41:20 PM
<sarcasm> But what's the carbon footprint and contributed sea level rise of this frivolity? </sarcasm>
by pelasacoon 11/22/2022, 2:45:14 PM
if it was SHA256 we could find an usage for all bitcoin miners that we have...
by johmueon 11/22/2022, 3:18:32 PM
this is a joke right?
by unnouinceputon 11/22/2022, 11:23:37 AM
Extremely Linear Git History...also known as SVN. Guess reinventing the wheel does get you to top HN.