Hacker News

S3 Express Is All You Need

by ryanworlon 11/28/2023, 7:04:44 PM with 13 comments

by Sirupsenon 11/28/2023, 8:24:20 PM
Most production storage systems/databases built on top of S3 spend a significant amount of effort building an SSD/memory caching tier to make them performant enough for production (e.g. on top of RocksDB). But it's not easy to keep it in sync with blob...
Even with the cache, the cold query latency lower-bound to S3 is subject to ~50ms roundtrips [0]. To build a performant system, you have to tightly control roundtrips. S3 Express changes that equation dramatically, as S3 Express approaches HDD random read speeds (single-digit ms), so we can build production systems that don't need an SSD cache—just the zero-copy, deserialized in-memory cache.
Many systems will probably continue to have an SSD cache (~100 us random reads), but now MVPs can be built without it, and cold query latency goes down dramatically. That's a big deal
We're currently building a vector database on top of object storage, so this is extremely timely for us... I hope GCS ships this ASAP. [1]
[0]: https://github.com/sirupsen/napkin-math [1]: https://turbopuffer.com/
by promochaon 11/28/2023, 11:19:39 PM
> “Of course the AWS S3 Express storage costs are still 8x higher than S3 standard, but that’s a non issue for any modern data storage system. Data can be trivially landed into low latency S3 Express buckets, and then compacted out to S3 Standard buckets asynchronously. Most modern data systems already have a form of compaction anyways, so this “storage tiering” is effectively free.”
This is key insight. The data storage cost essentially becomes negligible and latency goes down by a magnitude by making S3 Express as a buffer storage then moving data to standard S3. I see a future where most data-intensive apps would use S3 as main storage layer.
by francoismassoton 11/28/2023, 10:10:11 PM
We tested S3 Express for our search engine quickwit [0] a couple of weeks ago.
While this was really satisfying on the performance side, we were a bit disappointed by the price, and I mostly agree with the article on this matter.
I can see some very specific use cases where the pricing should be OK but currently, I would say most of our users will just stay on the classic S3 and add some local SSD caching if they have a lot of requests.
[0] https://github.com/quickwit-oss/quickwit/
by emgeeeon 11/28/2023, 9:00:02 PM
some additional context here is that warpstream is building a Kakfa compatible streaming system that uses s3 as the object store. This allows them to leverage cheap zone transfer costs for redundancy + automatic storage tiering to cut down on the costs of running and maintaining these systems. This has previously come at the cost of latency due to s3's read/write speeds but with S3 this makes them more competitive with Confluent Kafka's managed offerings for these latency sensitive applications.
IMO warpstream is a really cool product and this new S3 offering makes them even better
by fswdon 11/28/2023, 9:34:11 PM
I solved this problem locally. When uploading a file to the server before going to S3 it is cached in redis. Whenever the codebase needs to use the file, it checks redis, and if it is not there it fetches it and caches it again.
by throwitaway222on 11/28/2023, 8:28:20 PM
I don't understand why EFS never gets major shout outs - it's way better than S3: systems can mount it as a drive, shared across systems, already has had super low latency... Not sure what s3 express is really useful for if EFS already exists.
by ostion 11/28/2023, 10:00:00 PM
If I'm not wrong, this is the low latency S3 that is written in Rust. Finally launched after years in the making.
by kristianpon 11/28/2023, 11:51:28 PM
I saw "X is all you Need" with the "Attention is all you need" paper [1], which launched the Transformer upon the world. Is it the first instance of that phrase?
[1] https://arxiv.org/abs/1706.03762
by BonoboIOon 11/28/2023, 7:54:24 PM
Has anyone here a usecase which would perform better with this new S3 Express Tier?
And a second question, would it be worth the 8x times surcharge?
by mgaunardon 11/28/2023, 10:50:29 PM
Many S3 implementations appear to simply be transparent downloads to disk rather than a true "use the network as a disk".
by tjoffon 11/28/2023, 8:00:39 PM
> However, the new storage class does open up an exciting new opportunity for all modern data infrastructure: the ability to tune an individual workload for low latency and higher cost or higher latency and lower cost with the exact same architecture and code.
I get it, but at the same time that is also what you lost when you locked yourself in with a particular vendor.
by apion 11/29/2023, 12:35:29 AM
“All You Need Considered Harmful” - most cliche title?
by collinc777on 11/29/2023, 12:08:10 AM
Will this improve running sqlite on s3?