Cosine Implementation in C

  • I love these things! Approximation is art.

    I have a similar example in my book Geometry for Programmers (https://www.manning.com/books/geometry-for-programmers). There I make a polynomial approximation for sine that is: 1) odd (meaning antisymmetric); 2) precise at 0 and Pi/2 (and -Pi/2 automatically); 3) have precise derivatives and 0 and Pi/2 (and -Pi/2); 4) have better overall precision than the equivalent approximation with Taylor series; and it only consists of 4 members! Adding more members makes it impractical in the modern world since we have hardware implementations for sine on CPUs and GPUs.

    Spoiler alert, the secret sauce is the integral equation ;-)

  • Which is just a support function for cos (which doesn't do argument reduction). See [1] for the actual function, and [2] & [3] for the argument reduction code.

    [1] https://github.com/ifduyue/musl/blob/master/src/math/cos.c

    [2] https://github.com/ifduyue/musl/blob/master/src/math/__rem_p...

    [3] https://github.com/ifduyue/musl/blob/master/src/math/__rem_p...

  • This is a faster implementation of cosine that utilises six static const double variables (C1, C2, C3, C4, C5, C6) to speed things up. These variables represent the coefficients of a polynomial approximation for the cosine function.

    For the genuine article, check out: https://github.com/ifduyue/musl/blob/master/src/math/cos.c

  • I remember an approximation function used in vacuum-tubes based surface-to-air system of the 60s: cos(x) = 0.7. As explained to us rookies, that precision was enough to hit the target.

  • My first experience with "Use the source Luke" was wondering as a young lad how computers calculated trig functions. This led me to the Unix V7 math lib source code[1], and thence to a book by Hart & Cheney which by luck my university library had on the shelf. Good times.

    [1] https://github.com/v7unix/v7unix/blob/master/v7/usr/src/libm...

  • Implementing low precision transcendental functions is much more interesting IMO, as it allows for a wider variety of algorithms and approaches. When you are limited to the IEEE definitions, there really isn't much room for creativity and you have to handle all the obscure edge cases and chase least significant bits, bleh. That's not something you want to do if you need to calculate a trillion atan's. If one is curious how these algorithms work, check out the book "Elementary Functions".

  • Does anyone know of a good reference on signals and systems for people with a pure math background rather than an engineering or physics background?

    So, the implementation linked above uses the polynomial of best approximation to cosine on an interval. That polynomial was presumably obtained using the Remez algorithm, which, given a function f and a set of n+2 points in your desired interval, yields a polynomial of degree n best approximating f in the interval (in the uniform norm sense)[1] by performing a sequence of linear solves

    I have a math and specifically approximation theory background, and the wiki article on the Remez algorithm makes a lot of sense to me. Like, this is the language I speak: https://en.wikipedia.org/wiki/Remez_algorithm

    I'm less comfortable though when I try to understand the documentation for the remez implementation in SciPy because it is using the language of signals and systems, which is how engineers think about this stuff: https://docs.scipy.org/doc/scipy/reference/generated/scipy.s...

    Signals seems to use a lot of analogies to sound waves, and it does map in some way or another to the mathematical language I'm comfortable with. I've spent a little bit of time reading papers and introductory texts on signals, but the definitions feel a little loosey goosey to me.

    Anyway, just opportunistically using this thread to ask if anyone knows of some book or papers or blog posts that can help me translate this stuff into math language :)

  • Some of the more recent implementations of cosine for single precision that are correctly rounded to all rounding modes according to IEEE 754 standard:

    - The CORE-MATH project: https://gitlab.inria.fr/core-math/core-math/-/blob/master/sr...

    - The RLIBM project: https://github.com/rutgers-apl/The-RLIBM-Project/blob/main/l...

    - The LLVM libc project: https://github.com/llvm/llvm-project/blob/main/libc/src/math...

  • This takes me back. I did the transcendental functions for a CPM C compiler in the early 80s using Chebyshev polynomials.

    My tutor failed my implementation because my code formatting was utterly crap. :) I am still laughing as there was no sin, cos in the maths library at all at the time.

  • Personally I like Universal CORDIC, you can calculate much more than Cosine, but it tends to be slower. See https://en.wikibooks.org/wiki/Digital_Circuits/CORDIC#The_Un... and a somewhat janky fixed point library that implements it https://github.com/howerj/q

  • Very good explanation about Remez algorithm: https://xn--2-umb.com/22/approximation/index.html

  • Personally I go with the recursive definition. Use the trig identity cos(x)^2 + 1 = cos(2x).

        2cos(x)^2 - 1 = cos(2x).
        2(2cos(x)^2 - 1)^2 - 1 = 2cos(2x)^2 - 1 = cos(4x)
        2(2(2cos(x)^2 - 1)^2 - 1)^2 - 1 = 2cos(4x)^2 - 1 = cos(8x)
    
    Keep squaring, doubling and subtracting one one the left. Keep raising by powers of 2 in the argument on the right. Eventually, if you're working with rational numbers, multiplication by 2^k moves all k fractional bits over to the integer side of the decimal, and then integer times pi is the base case of the recursion. If you rescale the units so that pi = 2^(n-1) for some n, you don't even have to think about the non-fractional part. The register value circling back to 0 is supposed to happen anyway.

    You could Just as easily recursively define cos to "spiral downwards in x rather than outwards", each recursive layer reducing the argument of x by a factor of 2 until it reduces down to the precomputed value cos(machine precision).

    This was all just circular reasoning

  • Here is a naive question. A lot of effort goes into dealing with floating point imprecision and instabilities, right. So why aren’t these algorithms implemented with integers? A 64 bit integer could represent the region from 0 to 1 with 2*64 evenly spaced slices, for example.

    Would that make things more accurate across operations? Is this something that is done?

  • So why is that extra term needed here?

    w = 1.0-hz;

    return w + (((1.0-w)-hz) + (zr-xy));

    ------------------^ here

  • Why does he link to his lame fork of musl for the official FreeBSD sources for libm, taken from Sun, used by musl?

    https://github.com/freebsd/freebsd-src/blob/main/lib/msun/sr... or at least https://git.musl-libc.org/cgit/musl/tree/src/math/__cos.c

  • This is the kind of thing that triggers my imposter syndrome big time.

    I've been coding professionally for almost 30 years, and I see stuff like this and wonder if I'll ever be a real programmer.

  • Here's someone else's fast-sine code snippet I found and used about a decade ago when I was working on an additive synthesizer I never finished. It basically uses the same method, Taylor series approximation.

    Driving it as an oscillator relies on just a single integer addition operation per sample to increment the phase, and relies on integer overflow to reset the phase.

    I like the comment from someone asking if it could really be so fast since it has 7 whole multiplications in it. Ah I love the realtime dsp world, so different from my python/JS day job.

    https://www.musicdsp.org/en/latest/Synthesis/261-fast-sin-ap...

  • If you care about this kind of stuff, the theory lies in numerical methods, for example,

    http://www.sci.brooklyn.cuny.edu/~mate/nml/numanal.pdf

  • I recently had to implement some Bessel functions numerically. It's fantastic to see that someone at Sun Microsystems implemented all this procedures in the early 90's (I'm sure even long before that). You will find this implementations in R, Julia, musl, you name it!

    I like it specially the use the "GET_HIGH_WORD" and friends:

    https://git.musl-libc.org/cgit/musl/tree/src/math/j0.c

    It's a lot of fun to figure out what they meant by things like:

    > if (ix >= 0x7ff00000) return 1/(x*x);

  • https://www.atwillys.de/content/cc/sine-lookup-for-embedded-...

    Have you checked out this?

    Its an alternative implementation which uses interpolated table lookup and requires only 66 bytes of constant memory with an error of less than 0.05% compared to the floating point sine.

  • Can someone shed light on these two steps in the approximation?

      *         since cos(x+y) ~ cos(x) - sin(x)*y
      *                        ~ cos(x) - x*y,
    
    For the first, maybe it's an approximate identity I'd never heard of?

    For the second, it seems like it relies on X being small, but in this code... Y might be small enough for sin==identity approximation, but X isn't, is it?

  • It is possible to draw a line through n points with an n-1 degree polynomial, and since the sine/cosine curve is so simple you don't need to take that many samples to get a good approximate.

    Would it not be faster to just store a lookup table and interpolate is my question. I recall hearing that's how it's done.

  • Its been a while since I've dealt with this, is this something similar to the CORDIC algorithm ? Couple of years ago I implemented it in VHDL, very fun exercise.

    https://en.wikipedia.org/wiki/CORDIC

  • https://github.com/RobTillaart/FastTrig/blob/master/FastTrig...

    If absolute precision is not required, there is many ways to compute cosinus !

  • Programming accurate floating point calculations is where you need a maths college/uni degree. 99% or the "other programming" (including "AI") does not require skills higher than those from high-school.

  • Memory model experts: if you define the constants in the way this code has, will they be contiguous in memory? Or would it be better to allocate these in an aligned(64) array?

  • What is y, "the tail of x"?

  • Fascinating. But why are cos and sin implemented with different polynomials?

  • cosine and sine are easy thanks to the taylor series. They are both one liners in haskell. tangent on the other hand....

  • sincos is better. Two for one deal. :)

  • [flagged]

  • double_t hz,z,r,w;

    Wtf?

    *clicks request changes*

  • Same vibes:

      float Q_rsqrt( float number )
       {
       long i;
       float x2, y;
       const float threehalfs = 1.5F;
    
       x2 = number * 0.5F;
       y  = number;
       i  = * ( long * ) &y;                       // evil floating point bit level hacking
       i  = 0x5f3759df - ( i >> 1 );               // what the fuck? 
       y  = * ( float * ) &i;
       y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
       // y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed
       return y;
     }
    
    https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overv...