Show HN: PgDog – Shard Postgres without extensions

  • Hey Lev!

    I've been looking into PgDog for sharding a 40TB Postgres database atm vs building something ourselves. This could be a good opportunity to collaborate because what we need is something more like Vitess for PostgreSQL. The scatter gather stuff is great but what we really need is config management via something like etcd, shard splitting, best-effort transactions for doing schema changes across all shards etc.

    Almost totally unrelated but have you had good success using pg_query.rs to re-write queries? Maybe I misunderstood how pg_query.rs works but re-writing an AST seems like a nightmare with how the AST types don't really support mutability or deep cloning. I ended up using the sqlparser crate which supports mutability via Visitors. I have a side project I'm chipping away at to build online schema change for PG using shadow tables and logical replication ala gh-ost.

    Jake

  • I know this is just a small feature and probably a less meaningful one compared to the rest of the project - but for me being able to use pgdog as a way to redirect reads to read replicas and writes to the primary (w/o doing that in code) is a huge plus. Many applications out there do not support R/W splits, and having something that does that for you (at the proxy level) has always brought speed improvements for me in the past.

    Such a cool project, good job Lev!

  • Really impressive stuff! Very interesting, well done!

    I don’t know that I’d want my sharding to be so transparently handled / abstracted away. First, because usually sharding is on the tenancy boundary and I’d want friction on breaking this boundary. Second, because the implications of joining across shards are not the same as in-shard (performance, memory, cpu) and I’d want to make that explicit too

    That takes nothing out of this project, it’s really impressive stuff and there’s tons of use cases for it!

  • We've been keeping an eye on PgDog for a while, and it seems like very impressive stuff.

    Congrats on the launch Lev, and keep it up!

  • Very interesting.

    For me the key point in such projects is always handling of distributed queries. It's exciting that pgDog tries to stay transparent/compatible while operating on the network layer.

    Of course the limitations that are mentioned in the docs are expected and will require trade-offs. I'm very curious to see how you will handle this. If there is any ongoing discussion on the topic, I'd be happy to follow and maybe even share ideas.

    Good luck!

  • Looks neat, the first thing I search for in the docs is:

        Unique indexes  Not currently supported. Requires query rewriting and separate execution engine to validate uniqueness across all shards.
    
    But still looks promising.

  • This looks awesome!

    What can be challenge with such solutions is getting the last 1% right when it comes to sharding tricky queries properly (or at least detecting queries which are not handled properly) and also isolation and consistency

  • One of the most interesting Postgres projects I have seen in many years.

    The benchmarks presented only seem to address standard pooling, I'd like to see what it looks like once query parsing and cross-shard join come into play.

  • Much-needed innovation in scaling Postgres. Congratulations on the launch!

  • This looks pretty amazing. Congrats on the launch.

  • this is awesome but I'm wondering does pgdog plan to handle high availability scenarios (multiple frontend proxies)? I know this can lead to much more difficult problems with consensus and handling split brain scenarios.

    if not, what is the approach to enable restarts without downtime? (let's say one node crashes)?

  • Nice! Reminds me of MySQLProxy back in 2007-2014 and later ProxySQL.

    What’s the long term (business) plan to keep it updated?

  • Is your plan to stick with a hashing algorithm for tenant sharding, or allow for more fine grain control to shift large tenants between and shards?

    Hot shard management is a job in of itself and adds lot of operational complexity.

  • This looks amazing.

  • I wish there was something similar for SQL Server.

  • This is very cool, congrats on the launch. Do you think making this CLI/API compatible with Vitess or Citus is worth it?

  • Can someone give tangible real world metrics on when you should consider sharding a Postgres database? Thanks in advance.

  • Awesome work Lev! You had me a distributed copy being able to ingest Gigs per second.

  • very cool to see people go deep in the weeds to make it easier for lazy devs like me.

  • I think this is a very cool project, but when I saw this statement, "Running Postgres at scale is hard. Eventually, one primary isn’t enough at which point you need to split it up.", I was reminded about this recent article about OpenAI, now one of the most visited apps/sites in the world, running on a single pg primary:

    https://news.ycombinator.com/item?id=44071418

    Quite from the article:

    > At OpenAI, we utilize an unsharded architecture with one writer and multiple readers, demonstrating that PostgreSQL can scale gracefully under massive read loads.

    Of course, if you have a lot of write volume this would be an unsuitable architecture, but just a reminder that pg can scale a lot more than many people think with just a single writer.

  • How does this compare to Supabase/Supavisor?

  • is there a network agpl exception?

  • [flagged]