used bitcask during undergrad for a systems course project. task was to build a minimal key value store with durability and fast writes. no frameworks allowed. tried leveldb first but spent too much time tuning compaction. switched to bitcask after reading the original Riak paper and it just worked.
append only writes meant less complexity. loaded keys into memory on startup, mapped offsets, done. didn't need range queries or indexes, just fast put/get. wrote a simple merge script to compact old segments. performance was solid and startup time didn’t degrade as data grew.
biggest learning was how bitcask avoided cleverness. no tricks, no layered abstractions. it was just clean storage logic with a clear mental model. still think about it when touching newer engines that try to do too much
This looks interesting. Maybe I'm not in-the-know, but why would you offload such important aspects like `sync` to the client instead of building in some protocol to ensure that file integrity is maintained? With this kind of design choice, it seems quite easy to lose data, unless I'm missing something.
This is something that sometimes i play with:
https://github.com/lsferreira42/nadb
It is a disk based KV store with tags for search
Sorry, maybe I am not in the mood of delving too deep into the project(but I starred it! Amazing job I suppose) and I don't want to ask AI but rather some experts who are surely lurking HN.
Can you guys please explain this to me like I am 5(or maybe 10)? Is this something revolutionary to keep in back of the mind? How does it compare to redis? When should I use it, if any. I always prefer sqlite, then postgresql if scalability and afterwards I am not sure but maybe things like clickhouse. I am also looking more into duckdb but maybe not as a primary database, but rather just in fun. There are also things like turso and cloudflare d1 (if I remember correctly), kinda prefer cloudflare d1 but also like turso or sqlite in general. Still, the database space really piques my interest.
Thanks in advance for helping this young fellow out!
Nice little implementation :) you even added a server too. Good work, keep it up!
Bitcask, now there's a blast from the Basho past. It always bugged me that no good secondary indexing strategy was built to make using Bitcask viable for more use cases. Everyone always wanted to use the LevelDB backend just to get at secondary indexing features (which also performance scaled inversely relative to cluster size, which was it's own problem). But having Riak exhibit consistent, high-performance was waaaaaaaay easier on Bitcask.