Disaster Recovery with ZFS and Zrepl

  • It sounds like the author is using ZFS encryption.

    If that's the case, then doing so with replicated ZFS snapshots is probably not a good idea.

    That specific scenario (ZFS encryption -> replication of encrypted snapshots) is a known cause of ZFS corruption. :(

    https://www.phoronix.com/news/OpenZFS-Encrypt-Corrupt

    Unfortunately it doesn't seem to be widely known about, though there is a suggestion to make it official:

    https://github.com/openzfs/openzfs-docs/issues/494

  • > I don’t back up my drives, I replicate them.

    Don't take this as a general advice. For important data, it's important to have multiple backups and validate their effectiveness routinely.

  • Well... The issue with encrypted zfs + raw send is that a pool encrypted with a common key for all volume became an individual key per volume, a non-RAW send means your target read your files. If you use a keyfile this is a non-issue. If you type your key, well, you import all the old volumes, create a new pool and send them re-encrypting them with a common key. Very raw but doable at home scale setups.

  • You don't actually need a dedicated ZFS backup program. A simple cron script will handle incremental backups just fine. If anyone is interested, the script we use to backup our multi-TB PostgreSQL database can be found here: https://lackofimagination.org/2022/04/our-experience-with-po...

  • For ZFS use it's probably a good idea to avoid WD NVMe drives too.

    There are a large number of people who've reported problems with ZFS (and btrfs) with WD SN770 and a few other WD models:

    • https://github.com/openzfs/zfs/discussions/14793

  • Funny story -- when I was working on the xkcd Machine comic, I actually used the ZFS snapshots to rescue data. I accidentally blew away some early physics prototype code and fished it out of /.zfs/snapshot.

  • My face palm moment of this year was accidentally restoring a zfs snapshot of my root pool form a week ago, but it was actually a year and a week ago. Didn’t lose any of my data, but suddenly I had some offering to format their version mismatched databases.