How many folks here struggle to adopt tooling like this because it isn’t possible to add psql extensions to places like RDS?
The name seems to be an allusion to the author P.G. Wodehouse, creator of the character Jeeves.
https://en.wikipedia.org/wiki/P._G._Wodehouse
Very clever naming!
The (internal) use of DataFusion to create new, powerful extensions for Postgres is a very clever idea. Very good work for the ParadeDB team.
I like this one very much. Very simple way to avoid having to use different set of tools and query languages (or more limited query languages) to query lakes.
Neat that you plan to support both Delta Lake and Apache Iceberg
I'm curious about HN's position between these two formats? I'm having a hard time deciphering which might be the industry winner (or perhaps they both have a place, no "winner" necessary)
Paradedb is doing a lot of good work with postgres. Pg_analytics, and now pg_lakehouse...
This looks functionally similar as using http://github.com/spiceai/spiceai with a postgreSQL data accelerator.
As somebody who writes a lot of Postgres extensions, I can say this is quite interesting!
I think I can see some parallels to Supabase's wrappers project.
Keep up the good work!
Looks like pg as a replacement for databricks sql, which is already a query engine for datalakes. It's not a lakehouse, but it calls itself one. Seems like a cool and useful project, but the name is problematic.
I have another question. So far on the clickbench leaderboard it's 15x slower than baseline. The number 1 place is 1.67 slower the baseline.
I assume that's DataFusion speed. What's the plan to improve upon it?
This is great work! Could you please comment on the choice of your license. Lost Postgres extension that achieve wide adoption use Postgres, MIT or Apache license.
Very nice addition! Do you plan to support Snowflake as an object store in the near future? It's not currently in pg_lakehouse's README.
I am not up to date in various lakes. Is this read-only? Are you able to init a lake from scratch?
What's the model to feed such a lake from some queue?
How does this compare to Hydra? https://www.hydra.so/
Very cool!
Could you share the key difference between this and the previous pg_analytics, and motivation of making it a separate plugin?
It seems very promising!
2 questions:
- do you distribute query processing over multiple pg nodes ?
- do you store the metadata in PG, instead of a traditional metastore?
Nice. I wish timescaledb open-sourced their s3 storage thing.
Yet another amazing postgres plugin made possible by pgrx (https://github.com/pgcentralfoundation/pgrx)
It's really crazy how some projects just instantly enable a whole generation of new possibilities.
If you are impressed like this and want to build something like it -- check out pgrx, it's a pretty great experience.
looks interesting!
Readers may also enjoy Steampipe [1], an open source tool to live query 140+ services with SQL (e.g. AWS, GitHub, CSV, Kubernetes, etc). It uses Postgres Foreign Data Wrappers under the hood and supports joins etc with other tables. (Disclaimer - I'm a lead on the project.)
1 - https://github.com/turbot/steampipe