This reads like a marketing piece, not an honest technical blogpost.
I agree that Usearch is fast, but it feels pretty dishonest to take credit for someone else's work. Like maybe at least honestly profile what's going on with USearch vs pgvector (..and which settings for pgvector??), and write something interesting about it?
The last time I tried Lantern, it'd segfault when I tried to do anything non-trivial with it, and was incredibly unsafe with how it handled memory. Hopefully that's at least fixed.. but lantern has so many red flags.
As someone who just indexed 6m documents with pgvector, I can say itβs a massive time sync - on the order of days, even with a 32 core 64Gb RDS instance.
How does performance scale (vs pgvector) when you have an index and start loading data in parallel? Or how does this scale vs the to-be-released pgvector 0.5.2?
Curious about the "outside of the database" index generation part. Is this index WAL-protected eventually?
Seems related: https://news.ycombinator.com/item?id=38840850
You piqued my interest enough to sign up and try...but now it needs an Access Code to try the DB, any HN special here?
So approximately 0% chance I could use this on AWS RDS or Aurora correct?
Still, very impressive
[dead]
Nice to see people care about index construction time.
I'm the lead author of JVector, which scales linearly to at least 32 cores and may be the only graph-based vector index designed around nonblocking data structures (as opposed to using locks for thread safety): https://github.com/jbellis/jvector/
JVector looks to be about 2x as fast at indexing as Lantern, ingesting the Sift1M dataset in under 25s on a 32 core aws box (m6i.16xl), compared to 50s for Lantern in the article.
(JVector is based on DiskANN, not HNSW, but the configuration parameters are similar -- both are configured with graph degree and search width.)