The new Clang _ExtInt feature provides exact bitwidth integer types

  • I think it's funny. C was originally invented in an era when machines didn't have a standard integer size, 36-bit architectures were at their heydays, so C integers - char, short, int, and long - only have a guaranteed minimum size that could be taken for granted, but nothing else, to achieve portability. But after the computers of world have converged to multiple-of-8-bit integers, the inability to specify a particular size of an integer become an issue. As a result, in modern C programming, the standard is to use uint8_t, uint16_t, uint32_t, etc., defined in <stdint.h>, C's inherent support of different integer sizes are basically abandoned - no one needs it anymore, and it only creates confusions in practice, especially in the bitmasking and bitshifting world of low-level programming. Now, if N-bit integers are introduced to C, it's kind of a negation-of-the-negation, and we complete a full cycle - the ability to work on non-multiple-of-8-bit integers will come back (although the original integer-size independence and portability will not come back).

  • I'd love to know if there's any use to this beyond FPGAs. This just seems to be another case of porting the complexity of RTL design into C syntax so that they can claim they have an HLS product that compiles C to gates. It's not C to gates if you had to rewrite all your C to manually specific the bit widths of every single signal. I really wonder how far we can keep going before the naming police break into Intel Headquarters and rename all their marketing material with "Low Level Synthesis".

  • Note that the spec[1] requires that this tops at an implementation-defined size of integers, so you're likely not getting out of writing bignum code yourself (and even fifimplemented, the bignum operations may likely be variable-time and thus unsuitable for any kind of cryptography). Making the size completely implementation-defined also sounds like it'll be unreliable in practice, I feel like making it at least match the bit size of int would be a worthwhile trade-off between predictability for the programmer aiming for portability and simplicity of implementation.

    [1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2472.pdf

  • I'm very much in support of this. One thing I like about Zig[1] s that integers are explicitly given sizes. I've been playing recently with it, but I'm waiting for a specific "TODO" in the compiler to be fixed.

    [1] https://ziglang.org/

  • > These tools take C or C++ code and produce a transistor layout to be used by the FPGA.

    Hmm, I haven’t been following that but it seems that...

    > The result is massively larger FPGA/HLS programs than the programmer needed

    And there it is.

    Really seems odd to me to try and force procedural C into non-linear execution of FPGA. Like it seems super odd, and when talking about changes to C to help that... I really don’t get it.

    This isn’t what C is for. What is the performance advantage over Verilog? How many people want n-bit into in C when automatically handled structures work well for most people.

    Maybe I’m just not seeing the bigger picture here and that example was just poor?

  • First of all, I suppose that it will be possible to make them unsigned (just like for standard types). Is this correct?

    Also, what's the relationship between standard types and the new _ExtInts? Are _ExtInt(16) equivalent to shorts, or are they considered distinct and require explicit cast?

    > In order to be consistent with the C Language, expressions that include a standard type will still follow integral promotion and conversion rules. All types smaller than int will be promoted, and the operation will then happen at the largest type. This can be surprising in the case where you add a short and an _ExtInt(15), where the result will be int. However, this ends up being the most consistent with the C language specification.

    For instance, what if I choose to replace short by _ExtInt(16) in the above? What would be the promotion rule then?

    Note that it was already possible to implement arbitrary sized ints for a size <= 64, by using bitfields (although it's possible that you could fall into UB territory in some situations, I've never used that to do modular arithmetic).

    Edit: Ah, there's this notion of underlying type: one may use the nearest upper type to implement a given size, but nothing prevents to use a larger type, for instance:

    struct short3_s { short value:3; };

    struct longlong3_s { long long value:3; };

    I don't know what the C standard says about that, but clearly these two types are not identical (sizeof will probably gives different results). What's will it be for _ExtInt? How these types will be converted?

    Another idea:

    what about

    struct extint13_3_s {

      _ExtInt(13) value:3;
    
    };

    Will the above be possible? In other words, will it be possible to combine bitfields with this new feature?

    I guess it's a much more complicated problem that it appears to be at first.

  • Much of my time is spent writing Mentor Catapult HLS for ASIC designs these days.

    Every HLS vendor or language has their own, incompatible arbitrary bitwidth integer type at present. SystemC sc_int is different from Xilinx Vivado ap_int is different from Mentor Catapult ac_int is different from whatever Intel had for their Altera FPGAs. It's a real mess.

    I'm hoping this is another small step to slowly move the industry into a more unified representation, or at least if LLVM support for this at the type level could enable faster simulation of designs on CPU by improving the CPU code that is emitted. What probably matters most for HLS though are the operations which are performed on the types (static or dynamic bit slicing, etc).

  • > While the spelling is undecided, we intend something like: 1234X would result in an integer literal with the value 1234 represented in an _ExtInt(11), which is the smallest type capable of storing this value.

    That “smallest type capable of storing this value” is a disappointing approach, IMHO. It’d be a lot more powerful to just be able to pass in bit patterns (base-2 literals) and have the resulting type match the lexical width of the literal. 0b0010X should have a bit-width of 4, not 2.

  • Speaking of C, if you missed last week's thread with C Committee members, it was rather amazing: https://news.ycombinator.com/item?id=22865357.

    Click 'More' at the bottom to page through it; it just keeps going.

  • At some point they need to branch off and not call it C anymore. C should stay relatively small -- small enough that a competent programmer could write a compiler and RTS for it.

  • I think most commonly used languages with and without standards, C,C++, JavaScript/Wasm, Python, Java, etc. should standardize new primitive type represetations together (with hardware people included).

    If you have different representations in different languages it just creates unnecessary impedance mismatch. It would be better for everyone if you could just pass these types from language to language.

  • That's what I've wrote to their reddit post:

    The feature is of course fantastic. But the syntax still looks bit overblown.

    Type system-wise this seems to be more correct:

      _ExtInt(a) + _ExtInt(b) => _ExtInt(MAX(a, b) +1)
    
    And int + _ExtInt(15) might need a pragma or warning flag to warn about that promotion. One little int, or automatic int pollutes all.

  • I love Erlang for the ability to deal with _bits_. To see this in a compiled language would be wonderful. Of course, you can get down to the bit level with bitwise logical operations, but to be able to express it more naturally would be a great boon to people writing low-level network stuff, and will probably reduce programming errors.

  • Congrats Erich! One thing I'd be curious about is the ergonomics (or lack of) of explicit integer promotions and conversions for these types, as I find the current rules for implicit integer promotions a little confusing and hard to remember.

    For a fun compiler bug in LLVM due to representation of arbitrary width integers, see: https://nickdesaulniers.github.io/blog/2020/04/06/off-by-two...

  • ”Likewise, if a Binary expression involves operands which are both _ExtInt, rather than promoting both operands to int the narrower operand will be promoted to match the size of the wider operand, and the result of the binary operation is the wider type.“

    I don’t understand that choice. The result should be of the wider type, yes, but, for example, multiplying a _ExtInt(1) by a _ExtInt(1000) should take less hardware than multiplying two ExtInt(1000)s. So, why promote the narrower one to the wider type?

  • I wonder if this could help standardize some vectorized code as well.

  • A lot of people don't know this, but `BigInt`s are supported in modern JavaScript; integers of arbitrarily large precision.

    Try in your browser console:

        2n ** 4096n
    
        // output (might have to scroll right)
        1044388881413152506691752710716624382579964249047383780384233483283953907971557456848826811934997558340890106714439262837987573438185793607263236087851365277945956976543709998340361590134383718314428070011855946226376318839397712745672334684344586617496807908705803704071284048740118609114467977783598029006686938976881787785946905630190260940599579453432823469303026696443059025015972399867714215541693835559885291486318237914434496734087811872639496475100189041349008417061675093668333850551032972088269550769983616369411933015213796825837188091833656751221318492846368125550225998300412344784862595674492194617023806505913245610825731835380087608622102834270197698202313169017678006675195485079921636419370285375124784014907159135459982790513399611551794271106831134090584272884279791554849782954323534517065223269061394905987693002122963395687782878948440616007412945674919823050571642377154816321380631045902916136926708342856440730447899971901781465763473223850267253059899795996090799469201774624817718449867455659250178329070473119433165550807568221846571746373296884912819520317457002440926616910874148385078411929804522981857338977648103126085903001302413467189726673216491511131602920781738033436090243804708340403154190336n
    
    To use, just add `n` after the number as literal notation, or can cast any Number x with BigInt(x). BigInts may only do operations with other BigInts, so make sure to cast any Numbers where applicable.

    I know this is about C, I thought I'd just mention it, since many people seem to be unaware of this.

  • C++ has sped up the pace of its releases, but I don't have a sense of where C is. I didn't realize until I looked it up just now that there's a C18, although I gather that this is even smaller a change than C95 was.

    Safe to say that a feature like this would be standardized by 2022 at the earliest?

  • So, if I have an array of extint(3), does it pack them nicely into 10-per-32-bit-word? Or 21-per-64-bit-word? Will a struct with six extint(5) fields fit into 4 bytes? What about just a few global variables of extint(1)? Will they get packed into a single byte? Did I miss where this is covered?

  • Knuth's MIX computer with its 6-bit bytes and 5-byte words (IIRC) came to my mind [0]

    [0] https://en.wikipedia.org/wiki/MIX

  • Currently clang is getting it, if ISO C gets it, it is another matter.

  • the title is a little misleading. Since _ExtInst is just an extension of Clang not a standard. GCC and Clang both have some hidden features that are not in standard.

  • I feel this better be compiler extensions. Writing FPGA code has so much specialness anyway.

  • How does this not break sizeof ?

  • What was wrong with the actual title?

    > The New Clang _ExtInt Feature Provides Exact Bitwidth Integer Types