Postgres UUIDv7 and per-back end monotonicity

  • I would strongly implore people not to follow the example this post suggests, and write code that relies on this monotonicity.

    The reason for this is simple: the documentation doesn't promise this property. Moreover, even if it did, the RFC for UUIDv7 doesn't promise this property. If you decide to depend on it, you're setting yourself up for a bad time when PostgreSQL decides to change their implementation strategy, or you move to a different database.

    Further, the stated motivations for this, to slightly simplify testing code, are massively under-motivating. Saving a single line of code can hardly be said to be worth it, but even if it were, this is a problem far better solved by simply writing a function that will both generate the objects and sort them.

    As a profession, I strongly feel we need to do a better job orienting ourselves to the reality that our code has a tendency to live for a long time, and we need to optimize not for "how quickly can I type it", but "what will this code cost over its lifetime".

  • Remember even if timestamps may be generated using a monotonically increasing value that does not mean they were committed in the same order to the database. It is an entirely separate problem if you are trying to actually determine what rows are "new" versus "previously seen" for things like cursor-based APIs and background job processing. This problem exists even with things like a serial/autoincrement primary key.

  • >The Postgres patch solves the problem by repurposing 12 bits of the UUID’s random component to increase the precision of the timestamp down to nanosecond granularity [...]

    >It makes a repeated UUID between processes more likely, but there’s still 62 bits of randomness left to make use of, so collisions remain vastly unlikely.

    Does it? Even though the number of random bits has decreased, the time interval to create such a duplicate has also decreased, namely to an interval of one nanosecond.

  • I maintain that people are too eager to use UUIDv7 to begin with. It's a dessert topping and a floor wax.

    Let's say you need an opaque unique handle, and a timestamp, and a monotonically increasing row ID. Common enough. Do they have to be the same thing? Should they be the same thing? Because to me that sounds like three things: an autoincrementing primary key, a UUIDv4, and a nanosecond timestamp.

    Is it always ok that the 'opaque' unique ID isn't opaque at all, that it's carrying around a timestamp? Will that allow correlating things which maybe you didn't want hostiles to correlate? Are you 100% sure that you'll never want, or need, to re-timestamp data without changing its global ID?

    Maybe you do need these things unnormalized and conflated. Do you though? At least ask the question.

  • > The Postgres patch solves the problem by repurposing 12 bits of the UUID’s random component to increase the precision of the timestamp down to nanosecond granularity (filling rand_a above), which in practice is too precise to contain two UUIDv7s generated in the same process.

    A millisecond divided by 4096 is not a nanosecond. It's about 250 nanoseconds.

  • UUID7 is excellent.

    I want to share a django library I wrote a little while back which allows for prefixed identity fields, in the same style as Stripe's ID fields (obj_XXXXXXXXX):

    https://github.com/jleclanche/django-prefixed-identity-field...

    This gives a PrefixedIdentityField(prefix="obj_"), which is backed by uuid7 and base58. In the database, the IDs are stored as UUIDs, which makes them an efficient field -- they are transformed into prefixed IDs when coming out of the database, which makes them perfect for APIs.

    (I know, no documentation .. if someone wants to use this, feel free to file issues to ask questions, I'd love to help)

  • My org has been using ULID[0] extensively for a few years, and generally we've been quite happy with it. After initially dabbing with a few implementations, I reimplemented the spec in Kotlin, and this has been working out quite well for us. We will open-source our implementation in the following weeks.

    ULID does specifically require generated IDs to be monotonically increasing as opposed to what the RFC for UUIDv7 states, which is a big deal IMHO.

    [0]: https://github.com/ulid/spec

  • What benefit does this have over something like Twitter's Snowflake, which can be used to generate distributed monotonically increasing IDs without synchronization?

    We've been using an implementation of it in Go for many years in production without issues.

  • Ordering for UUIDv7s in the same millisecond is super useful when some rows represent actions and others reactions.

    I have used this guarantee for events generated on clients. It really simplifies a lot of reasoning.

  • This post makes me think (keep thinking) if we can use a solution that I used for another project in another context: using a Cryptographic Feistel Network to compute UUIDS so they are reversible if you need to know the original order. Each entity uses another key for the generation but if they know the keys they can know the order of the other party. Basically is using an existing cryptographic function if the block size is the same and if not adaping it to a specific block size via a Feistel Network.

  • On one hand I too am looking forward to more widespread use of UUIDv7, but on the other I don't really get the problem this is solving for their spec. If you care about timestamp ordering I don't think doing it in a way that forces you to fake a PK if you insert an earlier dated record at a future point makes sense. But I guess I'm implicitly assuming that human meaningful dates differ from insertion times in many domains.

  • After reading this I went ahead and added the extra 12 bits to my Go UUID library. Thanks for the write up on the PostgreSQL patch.

    If anyone is interested, here is the package: https://github.com/cmackenzie1/go-uuid. It also includes a CLI similar to that of `uuidgen`, but supports version 7.

  • I have an implementation function that computes N v7 UUIDs, sorts them, and returns them. This makes testing possible.

        Collection<UUID> generate(final int count);
    
    I also have an interface that I can back with a RNG that generates auto incrementing values, sorts for testing, I have the experience of ints, but for production, my non-timestamp component is random.

  • The naming of "rand_a" and "rand_b" in the spec is a bit misleading. They don't have to be generated randomly. I'm sure there's a historical reason for it.

    "extra_" or "distinct_" would be a more accurate prefix for UUIDv7.

    UUIDv7 is actually quite a flexible standard due to these two underspecified fields. I'm glad Postgres took advantage of that!

  • I implemented this in pure Python a few days ago in case anyone finds it helpful, here it is: https://gist.github.com/pirate/7e44387c12f434a77072d50c52a3d...

    My implementation supports graceful degradation between nanosecond scale resolution, microsecond, and millisecond, by using 12 bits for each and filling up the leftmost bits of rand_a and rand_b. Not all environments provide high resolution system clocks with no drift, so it's is important to maintain monotonicity when generating IDs with a low-res timestamp as input. You still want the bits that would've held the nanosecond value to be monotonic.

    Neither of the existing uuid_utils and uuid7 python libs that can generate UUID7s support this monotonicity property.

    Am planning on using this for ArchiveBox append-only "snapshot" records, which are intrinsically linked to time, so it's a good use-case imo.

    There's another great resource here that I think is one of the best explainers of UUIDv7: https://antonz.org/uuidv7/

    Whatever you do, don't implement the cursed 36-bit whole-second based time UUIDv7 variant that you occasionally see on StackOverflow / blog posts, stick to 48!

  • Been waiting for UUIDv7 for years -- maybe it's time to archive pg_idkit[0], or maybe instead just switch the UUIDv7 version to do the native thing rather than the Rust code.

    [0]: https://github.com/VADOSWARE/pg_idkit

  • What is a v7 UUID. Why do we need more than 1. uuid from a random seed and 2. one derived from that and a timestamp (orderable)