Plan 9 implemented this concept in the worm cached file server, one of the on-disk file systems used in plan 9. The idea was you have a disk based cache and a WORM (write once, read many) dump consisting of optical juke boxes. Writes to the fs are stored in the cache until the fs is dumped to worm, manually or on a schedule (hard-coded to do this 2am every night.) http://man.9front.org/4/cwfs
The idea was to reduce the cost of storage by removing long term data from costly hard disks and storing it on cheap magneto-optical disks which like CD's could be stored in an automated juke box. Write all the data you want to the cache, then commit to worm. As the worm fills, you just buy another disk and put it in the jukebox. The history(1) command then gives you a files history as a set of paths you can bind over another path to use an old version of a file instead of copying it. Its really a file system for programmers. http://doc.cat-v.org/plan_9/4th_edition/papers/fs/
This idea was expanded on with Venti/Fossil which allows you to build file systems from arbitrary venti data sets. http://doc.cat-v.org/plan_9/4th_edition/papers/venti/
> Optical discs promise to come one to two orders of magnitude closer to the limiting case of free mass storage than ever before. Other features of optical discs include improved reliability and a single technology for both on-line and archival storage with a long shelf life. Because of these features and because of (not in spite of) their non-deletion limitation, it is argued that optical discs fit the requirements of database systems better than magnetic discs and tapes.
Wild view from where we sit today, but CDs were ~700MB in 1982. Seagate launched a 5MB hard drive in 1980 so.... not entirely absurd to think that `just don't delete things` could be the way of the future. We sorta adopted `just don't delete things` anyway though not with respect to RDBMS systems.
Thanks for sharing!
We now (2023) live in a time where storing years of text and even audio is essentially free. Storing years of video is still actually costly.
Btw: You need about 12 TB for a 1 year video stream at 3 Mbit/s, so it's certainly doable, but it's not cheap.
It's interesting that we have almost started to live in this world. I have a half-written blog post on this phenomenon but I guess I'm 45 years too late.
Interestingly, Google and Facebook seem to have basically done it right with their exascale filesystems. The same with object stores.
Reminds me of the various "what if all memory was non-volatile" that made the rounds when Intel Optane entered the stage. A bit like the inverse of this, but the caveats might turn out similar: in one case you'd still want a well-defined resettable area, in the other case you'd still want to avoid having to deal with arbitrarily long addresses which would at some point become as bad as seek times even if hypothetically seek times in the stricter sense did not exist.
If mass storage were free, then everything would be append-only by default. There would be no excuse to not do this.
A major benefit of append-only is that your writes are always ideal for whatever storage medium. Especially magnetic or tape. Combine append-only with batching of transactions (i.e. across 1-10 milliseconds at a time), and you can write multiple txns per disk I/O operation (assuming txn size < storage block size).
Leo Szilard's solution to the problem of Maxwell's Demon was that the acto of deleting data the demon must perform is the thermodynamically limiting factor. Deleting selective data efficiently is in fact one of the greatest challenges in large production databases, and in an era of increasing privacy restrictions like GDPR's right to deletion, an increasing challenge for database operators.
I just found this quite old paper and it came as a surprise to me to discover that the idea of append-only storage is not 20 years old but more than 40!
The older work I was aware of is on "The design and implementation of a log-structured file system" (1)
So this is with pleasure that I learned that these ideas was around in the 80:
- Deletion considered harmful
- A non-deletion strategy using timestamps
- The importance of accessing past data
- A non-deletion strategy can improve both integrity and reliability
(1) https://dl.acm.org/doi/10.1145/146941.146943