Fun with -fsanitize=undefined and Picolibc

  • Wow, this: "random() was returning values in int range rather than long." is a very nice bug find. Randomness is VERY hard to check for humans. For example, Python's binomial distribution is very bad on some inputs [1], giving widely wrong values, but nobody found it. I bumped into it when I implemented an algorithm to compute the approximate volume of solutions to a DNF, and the results were clearly wrong [2]. The algorithm is explained here by Knuth, in case you are interested [3]

    [1] https://www.cs.toronto.edu/~meel/Slides/meel-distform.pdf [2] https://github.com/meelgroup/pepin [3] https://cs.stanford.edu/~knuth/papers/cvm-note.pdf

  • > String to float conversion had a table missing four values. This caused an array access overflow which resulted in imprecise values in some cases.

    I've once wrote a function to parse the date format from log files that Go doesn't natively support, and forgot to add November. I quit that job in April, so I never saw any issues. However when 1st of November came my ex-colleagues saw no logs for this day, and when they found out the reason they created a hash tag #nolognovember which you can probably find somewhere to this day :)

  • > the vast bulk of sanitizer complaints came from invoking undefined or implementation-defined behavior in harmless ways

    This is patently false. Any Undefined Behavior is harmful because it allows the optimizer to insert totally random code, and this is not a purely theoretical behavior, it's been repeatedly demonstrated happening. So even if your UB code isn't called, the simple fact it exists may make some seemingly-unrelated code behave wrongly.

  • > Passing pointers to the middle of a data structure. For example, free takes a pointer to the start of an allocation. The management structure appears just before that in memory; computing the address of which appears to be undefined behavior to the compiler.

    To clarify, the undefined behavior here is that the sanitizer sees `free` trying to access memory outside the bounds of what was returned by `malloc`.

    It's perfectly valid to compute the address of a struct just before memory pointed to by a pointer you have, as long as the result points to valid memory:

        void not_free(void *p) {
          struct header *h = (struct header *) (((char *)p) - sizeof(struct header));
          // ...
        }
    
    In the case of `free`, that resulting pointer is technically "invalid" because it's outside what was returned by `malloc`, even though the implementation of `malloc` presumably returned a pointer to memory just past the header.

  • > [...] detect places where the program wanders into parts of the C language specification [...]

    Small nitpick, the UB sanitizer also has some checks specific for C++ https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html

  • And don't forget -fbounds-safety, which is in Apple's clang/llvm and perhaps other versions. https://clang.llvm.org/docs/BoundsSafety.html

  • That arithmetic shift right implementation is also what I came up with for a video game fantasy architecture that only has logical shift right. (16-bit registers)

        ; asr rd, rs1, rs2   ; rd = signed(rs1) >> rs2
    
        and rt, rs1, 0x8000  ; isolate sign bit
        lsr rt, rt, rs2      ; shift sign bit to final position
        neg rt, rt           ; sign-extended part of final result
        lsr rd, rs1, rs2     ; base part of final result
        or rd, rd, rt        ; combine both parts
    
    It might be easier to understand broken down this way for anyone who didn't understand the article's one-liner.