Hacker News

Python Tooling at Scale: LlamaIndex’s Monorepo Overhaul

by cheesyFishon 5/21/2025, 5:20:30 PM with 5 comments

by lyjackalon 5/21/2025, 6:22:49 PM
I recently did something similar. Using uv workspaces, I used the uv CLI's dependency graph to analyze the dependency tree then conditionally trigger CI workflows for affected projects. I wish there was a better way to access the uv dependency worktree other than parsing the `tree` like output
by tuanacelikon 5/21/2025, 7:11:07 PM
So just to let me get this straight: Does this new setup aim to make it easier to contribute to llamaindex submodules specifically?
by codethiefon 5/25/2025, 1:53:15 AM
I find it quite astonishing there is no go-to build system / task runner yet for handling small to medium-sized monorepos across ecosystems.
I want a tool that
- allows me to define tasks with inputs (+ secrets) and outputs. Inputs can be files & folders from the repo, Docker images, build parameters / env vars, outputs from other tasks, … Typical tasks I have in mind are setup/build/test/deploy, which of course will typically depend on one another, thereby forming a pipeline or dependency graph.
- sandboxes/containerizes tasks by default (in particular: no access to repo file system / working copy, env vars, … beyond what's specified as inputs) but does provide easy escape hatches (for deployment pipelines, sharing venv/node_modules between task and working copy / IDE, …),
- by default automatically caches a task's output & logs for a given input, unless I explicitly tell it not to (again, deployment tasks!). Then, when running a task upon the user's request, it automatically figures out the dependency graph and runs only those tasks that have not been cached before. This includes the case of the task definition itself having changed. (Many tools allowing you to define tasks in a full-blown programming language struggle with detecting this reliably.)
- comes with monorepo support, so supports collecting definitions of e.g. the "test" task across subfolders/projects and running them all in parallel (as far as the dependency graph allows),
- is language-/ecosystem-agnostic, so that I can invoke whatever tool or shell script inside a given task.
- provides a sane configuration language (not YAML) – ideally a lightweight functional language that makes side effects very explicit,
- can be run both in CI and locally, without much setup effort. In fact, since the tool should be used as task runner for everything else in the repo, it should be easily bootstrappable after cloning the repo.
- can be integrated somewhat nicely with Github/GitLab/Azure DevOps/… (actually not that easy).
Dagger comes pretty close in terms of general idea but I'm not sure I like it so far.
by SlimIon729on 5/21/2025, 9:32:25 PM
Interesting to see LlamaIndex's journey from Poetry+Pants to uv+LlamaDev for managing their extensive monorepo. The speed improvements and better developer experience with `uv` are compelling. It's a good reminder of how tooling choices evolve with scale.