Hacker News

Postgres Webhooks with Pgstream

by mebcittoon 9/1/2024, 10:10:51 AM with 7 comments

by jitlon 9/1/2024, 7:43:38 PM
The GitHub repo README gives a better sense of the capabilities of the pgstream system: https://github.com/xataio/pgstream?tab=readme-ov-file#archit...
For deployments at a more serious scale, it seems they support buffering WAL events into Kafka, similar to Debezium (the current leader for change data capture), to de-couple the replication slot reader on the Postgres side throughput from the event handlers you deliver the events to.
Pgstream seems more batteries-included compared to using Debezium to consume the PG log; Kafka is optional with pgstream and things like webhook delivery and OpenSearch indexing are packaged in, rather than being a “choose your own adventure” game with Kafka Streams middleware ecosystem jungle. If their offerings constraints work for your use-case, why not prefer it over Debezium since it seems easier? I’d rather write Go than Java/JVM-language if I need to plug into the pipeline.
However at any sort of serious scale you’ll need the Kafka in there, and then I’m less sure you’d make use of the other plug and play stuff like the OpenSearch indexer or webhooks at all. Certainly as volume grows webhooks start to feel like a bad fit for CDC events at the level of a single row change; at least in my brief Debezium CDC experience, my consumer pulls batches of 1000+ changes at once from Kafka.
The other thing I don’t see is transaction metadata, maybe I shouldn’t worry much about it (not many people seem to be concerned) but I’d like my downstream consumer to have delayed consistency with my Postgres upstream, which means I need to consume record changes with the same transactional grouping as in Postgres, otherwise I’ll probably never be consistent in practice: https://www.scattered-thoughts.net/writing/internal-consiste...
by simonwon 9/1/2024, 7:58:54 PM
Implementing webhook deliveries is one of those things that's way harder than you would initially imagine. The two things I always look for in systems like this are:
1. Can it handle deliveries to slow on unreliable endpoints? In particular, what happens if the external server is deliberately slow to respond - someone could be trying to crash your system by forcing it to deliver to slow-loading endpoints.
2. How are retries handled? If the server returns a 500, a good webhooks system will queue things up for re-delivery with an exponential backoff and try a few more times before giving up completely.
Point 1. only matters if you are delivering webhooks to untrusted endpoints - systems like GitHub where anyone can sign up for hook deliveries.
2. is more important.
https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c... shows that the HTTP client can be configured with a timeout (which defaults to 10s https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c... )
From looking at https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c... it doesn't look like this system handles retries.
Retrying would definitely be a useful addition. PostgreSQL is a great persistence store for recording failures and retry attempts, so that feature would be a good fit for this system.
by canadiantimon 9/1/2024, 8:18:22 PM
Xata putting out some cool pg things. Is PG roll still looking good these days? Seemed pretty sexy last time I looked at it
by adamcharnockon 9/1/2024, 7:45:40 PM
A quick google didn’t reveal much about pgstream’s delivery semantics. Do these webhooks get called at most once, or at least once? I hope the latter, but in absence of information I’d guess it is the former.
by newb76on 9/2/2024, 12:52:35 PM
This is very nice! I came across this through an internal ref. It appears that there is a lot of movement in the CDC space lately - pgstream, sequinstream, bemi, pg_flo (Saw some guy on X posting videos of CDC tools too)
Are there real world use cases being solved with this, or is it just hobby tool?
by tazuon 9/1/2024, 8:50:33 PM
Curious if there's a reason this extension is written in Go and not Zig? The blog posted [1] a few months ago about their cool Zig library for making extensions and I've been playing around with it. Is Zig just not mature enough?
[1]: https://xata.io/blog/introducing-pgzx
by oarson 9/1/2024, 9:31:37 PM
Postgres webhooks could be very usefulness for me. Thanks for sharing.