Since moving to jj[1] as a git-compatible alternative, I’ve found it so easy to make clean commits I do it by default for everything - usually 1/ refactor 2/impl, 3/ docs. Because you can always just “jj new” on top of an existing change then squash it down and get automatic rebase past that point it’s quick to keep things organised and makes review life suck less.
https://graphite.dev/ provides a way to stack PRs, it's been discussed on HN in the past (e.g. https://news.ycombinator.com/item?id=30681308).
Probably not a popular take on HN; valuable information should not be hidden in commits but in comments. Especially the WHY is crucial to write down in comments.
I’m fine that people squash as long the reasoning is recorded in comments and reflected through automated tests (unit AND system/api).
This is also crucial information for AI coding tools.
This is a cool idea, though part of what keeps my work organized and my understanding of my own changes is to do the manual preparation of a series of logical commits.
I make use of interactive partial commits using Pycharm when a single file has changes related to different ideas and rewrite history for clarity.
It does matter to me if someone else has gone to this trouble. And it is sometimes a tip off if a person is seemingly sloppy in commit history.
It makes sense that this project exists but I’m also glad it didn’t when I was learning to work in professional environments.
I haven't tried this yet (though I plan to).
One thing I would love is if I could give it a hint and have it extract out certain types of changes into its own branch that could split into a new PR.
I often find myself adding a new, re-usable component or doing a small refactor in the middle of a project. When you're a few commits into a project and start doing side-quests, it's super annoying to untangle that work.
The options are one of:
1. A mega PR (which everybody hates) 2. Methodologically untangling the side quest post-hoc 3. Not doing it
In principle, the "right" thing to do would be to go checkout main, do the side quest, get it merged and then continue.
But that's annoying and I'd rather just jam through, have AI untangle it, and then stack the commits (ala Graphite).
It's easy to verbally explain what stuff is side-quest vs. main quest but it's super annoying to actually do the untangling.
Maybe this tool magically can do that... but I do wonder if some context hints from the dev would help / make it more effective.
Reminds me of git-absorb, which is a non-AI version of part of this.
Very cool, I’ve been looking for something like this, but I’d love to see this flipped around and become an MCP tool that can be consumed by an LLM instead of requiring an API key.
I want something that takes a big commit and splits it up!
I didn't look at things too closely, but it would be nice if this each commit would include a ticket number from the branch (such as a linear id) and/or pr id in each commit for people who do not squash.
One huge advantage of squashing branches is if you see a commit in a `git blame` you might have an idea of where it came from within GitHub/Linear/other systems.
I had a look through the tests, but it doesn’t seem you do any testing of of this does a good job or not?
How did you collate hundreds or thousands of examples of commits being split up and how did you score the results LLMs gave you? Or did it take more than that?
Can I suggest, don't do this? The sustainable unit of code modification is the ticket, not the commit. When you're ready to merge to main, squash all commits into one that takes the ticket title as its commit message and appends the ticket description as its description. This aligns code changes with planned, scoped, and documented units of work. Anything more granular than that quickly becomes noise. By following the above pattern your main commit history becomes a clean, consistent log of tickets being completed, each linking directly back to your ticket management system.
I would love Github to integrate this, as that is typically where I am writing my squash commit messages (when merging Pull Requests).
In our org we squash all commits into one anyway, the main commit title is based on the title of the merge request. We also have an AI code review tool set up (which I usually ignore because there's a lot of extraneous information) that suggests a new title, given that the people making the MR often don't consider that the title ends up in the changelog, becoming the one line that will be used by people using the library to decide whether they need to do something.
Why not just do your work in an organized way in the first place?
I was just talking about writing a tool like this. Bookmarked. Thanks.
Interesting tool. Automating commit cleanup could save time, especially before reviews. Curious to see how well it handles larger or more complex diffs.
In my org I have enforced linear history, squashing all commits into one in PRs and roughly following the rule from [1]:
> If the request is accepted, all commits will be squashed, and the final commit description will be composed by concatenating the pull request's title and description.
One less thing to think about.
Less is more, not vice versa.
I'll never understand people caring about commit in PRs, just push whatever and squash at the end. If commit matters that means you should have done multiple PRs
The idea in itself seems good, but I have a lot of hesitation due to the prompt used to rewrite the commit messages. Even looking at the example in the repo there’s this sycophantic, pompous way of describing mundane things that adds nothing, but only makes it harder to understand what has changed. The commits mentioned don’t "implement a complete auth system" and did not add "comprehensive test coverage". They added parts of an authentication system and some tests.
I’m all for proper commit messages, but only if they add clarity, not take it away.