Static Integer Types (2021)

  • Note that, here in 2022, those ARM chips with CHERI do now exist, Morello: https://www.arm.com/architecture/cpu/morello

    Also, although this article says Rust has to choose usize ~= u128 on Morello which is unpalatable, Aria proposes that instead Rust tweaks the definition of usize to say it's about addresses not pointers, and thus usize ~= u64 on Morello.

    https://gankra.github.io/blah/fix-rust-pointers/ leading into https://gankra.github.io/blah/tower-of-weakenings/

    If you have nightly Rust, you can play with Aria's new semantics because she implemented them. I think they're a good idea, but I don't have much "skin in the game", unlike, apparently the author of this article.

  • > I hope it’s uncontroversial that, like Rust, languages should not allow implicit casts between integer types at all.

    I find this controversial. The unstated option #4 for addressing C's permissive implicit narrowing conversions is to simply disallow implicit narrowing conversions, but continue providing implicit integer conversions to types of greater-than-or-equal rank.

    I suspect the reason you left out option #4 is an entirely different, self-imposed constraint in Rust:

    > [Rust] defines From casts for integer types that will succeed on every platform. Since casting a 32-bit integer to a usize would fail on a 16-bit platform, I’m not ever allowed to compile — even on a 64-bit platform where such a cast would always succeed.

    But therein lies the original questionable turn that Rust made wrt usize--there's no accounting for future platforms.

    What makes C so portable is precisely the notion of C's integer ranking system and implicit conversions. By guaranteeing relative rank and permitting implicit widening conversions, most issues on most future architectures have been accommodated. It's not perfect, but the the value for your money is immense. The vast majority of issues like this go away.

    And what do you lose by permitting implicit widening conversions? There are some potential correctness issues. For example, subexpressions computing bitmasks might not behave as expected when converted to a wider type in an outer expression. But this same problem exists in your proposed solutions and explicit conversions, generally. I would even consider implicit conversions safer because we can always add additional rules (optional or mandatory) that capture these cases (e.g. -Wwidth-dependent-shift-followed-by-implicit-widening), whereas explicit conversions usually have the effect of short-circuiting stronger type checking. (Unless you go the C++ route and add a panoply of conversion operators. But better hope you chose the right one!)

    You lose the ability for a build to fail on target X because the conversion wouldn't work on target Y. But that requirement is fundamentally in conflict with accommodating future architectures, and incentives the type of explicit conversions that could hide or complicate future porting issues.

    Regarding this footnote:

    > [8] For reasons that are unclear to me, uintptr_t is an optional type in C99. However, I don’t know of any platforms which support C99 but don’t define it.

    AFAIU, the reason is exactly because the committee foresaw that not all architectures could accommodate conversions between object pointers and integers. Relatedly, the C standard DOES NOT permit conversions between object pointers and function pointers, which also means the C standard DOES NOT permit conversions between function pointers and uintptr_t, even if uintptr_t is defined. Both capability systems and memory architectures already existed that couldn't accommodate the latter conversions. The value of function pointer/integer conversions was much less than object pointer/integer conversions, so they defined uintptr_t only for object pointers, and made uintptr_t optional.

    Function pointer/object pointer conversions are widely supported by C compilers, but this is an extension that makes code non-conforming. See C11 J.5.7. Note that non-conforming does not mean undefined; it just means that such code is not C code as defined by the standard and beyond its purview. It creates a headache for POSIX, which defines a singular interface, dlsym, for acquiring both object and function references.

  • > […] but what would a signed memory address mean and what happens if/when we want to use all 64-bits to represent memory addresses […]

    Half of this space is reserved for the kernel.

  • If you are interested in a C++ library that makes using integers a lot safer, take a look at Boost.SafeInt

    https://www.boost.org/doc/libs/1_79_0/libs/safe_numerics/doc...

  • > Undefined behaviour on integer types is a terrible idea (though unspecified behaviour might have a place).

    Would unspecified behavior be sufficient to attain those compiler optimizations that are the reason for keeping signed integer overflow undefined in newer C and C++ versions?

  • Is there a C++ library that implements static integer types with these ideas? In principle the operations don’t seem complicated, but there are probably enough edge cases that it’s tricky to get it all right.