Hacker News

PgBouncer is useful, important, and fraught with peril

by brunoluizon 9/8/2023, 9:43:22 AM with 13 comments

by JelteFon 9/8/2023, 7:03:15 PM
Pgbouncer maintainer here. Overall I think this is a great description of the tradeoffs that PgBouncer brings and how to work around/manage them. I'm actively working on fixing quite a few of the issues in this blog though
1. Named protocol-level prepared statements in transaction mode has a PR that's pretty close to being merged: https://github.com/pgbouncer/pgbouncer/pull/845
2. SET/RESET tracking in transaction mode. For this we need some changes in the postgres protocol to ask for postgres to tell pgbouncer about setting updates. This is being worked on here: https://www.postgresql.org/message-id/flat/CAGECzQQOOhH1Rztu...
3. The single threading issue can be worked around by using multiple processes that listen on the same port using so_reuseport=1 https://www.pgbouncer.org/config.html#so_reuseport
4. The pg_dump issue can actually be solved already by installing the Citus extension and using track_extra_parameters (otherwise you have to wait for the same postgres protocol addition that's needed for the other SET/RESET commands) https://www.pgbouncer.org/config.html#track_extra_parameters
by mickeypon 9/8/2023, 1:15:24 PM
I mean... if you multiplex disparate statements into the same connection and session then, well... yes, that is fraught with an incredible amount of complexity. That stuff's for OLAP, read-only replica servers and so on. High-throughput "hey I just need this one thing."
Your large-scale app probably won't need that by default. You pool connections in pgBouncer (or your application) because they're slow and expensive to set up. If you run a standard N-tier architecture and you have fifty 'backend apps' (say web apps) that talk to a communal PG server, you want pgBouncer to avoid connection churn. If your scale's a lot smaller than that then you don't need pgBouncer and you can get by with your app pooling a handful of conns --- subject to the usual vagaries that come with hand-wavey explanations. Measure, monitor, etc.
It's an interesting and illuminating article, but the take-away is don't do weird statement-based transaction multiplexing unless you know exactly why you want that!
If you're running into "connection limits" on Heroku or whatever... maybe now is the time to stop letting other people manage your DB and just run it yourself? You're clearly big enough to run up against that.
Or, maybe, release your session back to the pool (app/bouncer/whatever) instead of sitting on it. For OLTP stuff like 99% of all web apps, that's key. With 100 connections and a p99 request-response cycle with DB hits being <50ms, you can handle a lot of traffic.
by ComputerGuruon 9/8/2023, 2:38:46 PM
On-topic tangent: reminder or heads-up (depending on if you’ve already seen this) that Postgres is experimenting with thread-based instead of process-based connections (which would pretty much obviate the need for pgBouncer if it works out and becomes the connection model going forward).
HN discussion from a few months ago, with lots of commentary relevant to any pgBouncer scenarios: https://news.ycombinator.com/item?id=36393030
by mston 9/8/2023, 2:16:52 PM
> Postgres doesn’t have a concept of nested transactions
It has savepoints and those nest fine.
(perl's DBIx::Class can be configured to automatically convert transactions into savepoints if it's already inside another transaction; presumably any other ORM-like thing could also do that for you in theory but whether the one you're currently using -does- is left as an exercise to the reader)
by dkhenryon 9/8/2023, 2:09:03 PM
This might be the best explanation I have seen of the benefits and pitfalls of deploying pgbouncer ( or any connection pooler as they are all similar ). The only thing I would add is from a deployment perspective, you should run two layers of pgbouncer. One that runs on the same host as your application ( sidecar container in k8s ), and one global layer that all connections to your DB must pass through. Every large scale application generally gets to this kind of architecture for various reasons, and it helps to just start with it. Some people will argue that connection pooling at the application layer is good enough to replace the sidecar proxy, but from my experience management is simplified by having the pooling done via the sidecar.
by jgalt212on 9/8/2023, 12:51:28 PM
> I also think community and industry enthusiasm around Postgres is at an all time high. There are more managed hosting options than ever (Crunchy Data, Render, Fly.io, and on and on), deep extensions like PostgresML, Citus and Timescale, serverless options like Neon, and real-time services like Supabase with Postgres at their center.
Please sell me why you'd use one vendor to host your DB and another one to host your app. If they are in different racks (forget about different data centers), that's going to kill you on latency. The only way around that we've seen is to vectorize all your queries.
by zzzeekon 9/8/2023, 1:16:53 PM
what a great post, we have had a ton of issues with users using pgbouncer and it's not because things are "broken" per se, it's just the situation is very complicated, and pgbouncer's docs are also IMO in need of updating to be more detailed and in a few critical cases less misleading, specifically the prepared statements docs.
This blog post refers to this misleading nature at https://jpcamara.com/2023/04/12/pgbouncer-is-useful.html#pre... .
> PgBouncer says it doesn’t support prepared statements in either PREPARE or protocol-level format. What it actually doesn’t support are named prepared statements in any form.
IMO that's still not accurate. You can use a named prepared statement just fine in transaction mode. start a transaction (so you aren't in autocommit), use a named statement, works fine. you just can't use it again in another transaction, because it will be "gone" (more accurately, "unmoored" - might be in your session, might be in someone else's session). Making things worse, when the prepared statement is "unmoored", its name can then conflict with another client attempting to use the same name.
so to use named prepared statements, you can less ideally name them with random strings to avoid conflicts, or you can DEALLOCATE the prepared statement(s) you used at the end of your transaction. for our users that use asyncpg, we have them use a uuid for prepared statements to avoid these name conflicts (asyncpg added this feature for us here: https://github.com/MagicStack/asyncpg/issues/837). however, they can just as well use DEALLOCATE ALL, set this as their `server_reset_query`, and then so that happens in transaction mode, also set `server_reset_query_always`, so that it's called at the end of transactions. Where pgbouncer here IMO entirely misleadingly documents this as "This setting is for working around broken setups that run applications that use session features over a transaction-pooled PgBouncer." - which is why nobody uses it, because pgbouncer claims this is "broken". It's not any more broken than it is to switch out the PostgreSQL session underneath a connection that uses multiple transactions. Pgbouncer can do better here and make this clearer and more accommodating of real world database drivers.
by benzibleon 9/8/2023, 2:17:20 PM
> There are more managed hosting options than ever (Crunchy Data, Render, Fly.io, and on and on)
From fly.io's docs [1]:
> This Is Not Managed Postgres
[1] https://fly.io/docs/postgres/getting-started/what-you-should...
by physicleson 9/9/2023, 3:08:36 AM
My understanding now is that using pgBouncer falls under the category of YAGNI / “the best part is no part”: don’t use it unless you have a demonstrated need for it, because it adds complexity.
About 5 years ago we migrated to postgres and installed pgBouncer because of course you do. Sometime last year I started seeing some weird issues like the article points out, and the more I looked into the gory details, the more shocked I became. (The fact that statement level isolation even exists is… wow)
So I did an experiment and deleted it. Things kept working just fine. We’re not living in a microservices world so application-level pooling is enough — I don’t think I’ve ever seen us use half of available connections.
by Deadronon 9/8/2023, 1:09:17 PM
PgBouncer has always left me confused in the world of application level connection pooling. I have never quite understand the value of it if we are already using connection pools in our applications. I don't want to pool connections to another pool.
by artyomon 9/8/2023, 2:54:26 PM
Great article. Should be part of the official PgBouncer documentation.
by sidcoolon 9/8/2023, 2:26:26 PM
If a Database needs to support more than 500 connections at a given time (due to scale), what's the way around?
by qlkjwenfon 9/8/2023, 8:40:44 PM
Let's hope PG team will drop thread-based connections for the superior lightweight task-based connections. Then PgBouncer will not be needed.