January 21st, 2023 Angle trisection seems to rear its ugly head from time to time so here we’ll look at a set of approximations of: \[\begin{align*} f(x) = \text{cos}\left(\frac{\text{acos}\left(x\right)}{3}\right) \end{align*}\] but instead of performing a straight polynomial approximation: \[\begin{align*} p(x) = c_0 + c_1 x + c_2 x^2 + \cdots \end{align*}\] we’ll approximate by a weighted sum of functions: \[\begin{align*} a(x) = c_0 + c_1 f_1(x) + c_2 f_2(x) + \cdots \end{align*}\] using the tool Sollya to generate the constants $c_n$. The function $f$ is really being used as a fun strawman example for generating a weight function approximation. Source code related to this post can be found here. Inigo Quilez’s approximations IQ wrote a blog post on approximating $f$. The first part is computing: \[\begin{align*} h = \sqrt{\frac{1+x}{2}} \end{align*}\] which is simply the cos half-angle identity and his first approximation g1 is simply shifting and scaling $h$. The remaining approximations are finding larger degree polynomial approximations in terms of $h$. // cos half-angle float h(float x) { return sqrtf(0.5f+0.5f*x); } float g1(float x) { x = h(x); return x*0.5f+0.5f; } float g2(float x) { x = h(x); return x*(-0.064913f*x+0.564913f)+0.5f; } float g3(float x) { x = h(x); return x*(x*(x*0.021338f-0.096562f)+0.575223f)+0.5f; } float g4(float x) { x = h(x); return x*(x*(x*(x*-0.008978f+0.039075f)-0.107071f)+0.576974f)+0.5f; } So we have a single expression approximating the entire range. As a motivational example if we attempted the same thing with a 9th degree polynomal (p9)..well things wouldn’t work out so well: A weighted sum of functions approximation Let start by noticing that if we take the degree 1 polynomial we’re generating, expand $h$ and simplify: \[\begin{align*} p(x) & = c_0 + c_1~h(x) \\ & = c_0 + c_1~\sqrt{\frac{1+x}{2}} \\ & = c_0 + \frac{c_1}{\sqrt{2}}~\sqrt{1+x} \\ & = c_0 + c_1~\sqrt{1+x} \end{align*}\] where the final $c_1$ is some different constant than the preceeding lines. So instead of computing: h = sqrtf(0.5*x + 0.5) we could instead compute b = sqrt(1+x) and find a polynomial of b. This will find a new set of constants $c_n$ and transforms that first mul/add or fma into addition. Instead we’re going to use $b^n$ as the set of functions to build approximations. Looking at first 6 expansions: \[\begin{align*} b & = & \sqrt{1+x} \\ b^2 & = & 1+x \\ b^3 & = & \sqrt{1+x} + x~\sqrt{1+x} \\ b^4 & = & 1+2x+x^2 \\ b^5 & = & \sqrt{1+x} + 2x~\sqrt{1+x}+x^2\sqrt{1+x} \\ b^6 & = & 1+3x+3x^2+x^3 \\ & & \cdots \end{align*}\] And we reduce these to a simplier set of functions by: Constant multiples can be dropped (folds into the produced constant as per above) Terms present in ealier functions are dropped as they are already accounted for This gives us: \[\begin{align*} b_0 & = 1 \\ b_1 & = \sqrt{1+x} \\ b_2 & = x \\ b_3 & = x~\sqrt{1+x} \\ b_4 & = x^2 \\ b_5 & = x^2~\sqrt{1+x} \\ b_6 & = x^3 \\ & \cdots \end{align*}\] To clarify point 2: $b^2$ has and a constant term and $b_0$ handles the weigth of a constant term: so it reduces to $x$ $b^3$ has the term $\sqrt{1+x}$ which is $b_1$ so $b_3$ reduces to $x~\sqrt{1+x}$ Running this set of functions through the optimization process finds the constant $c_n$ (or weight) of function $b_n$. The even $b_n$ reduce to a polynomial of $x$ and the odd to a polynomial of $x$ times $\sqrt{1+x}$. We can break these into two polynomial evaluations merged by an FMA at the end. So a degree 3 approximation looks like: \[\begin{align*} e(x) & = c_0 + c_2~x \\ o(x) & = c_1 + c_3~x \\ a(x) & = \sqrt{1+x} ~ o(x) + e(x) \end{align*}\] This breaks the dependancy on the square root computation allowing it happen in parallel with $e(x)$ and $o(x)$. Back to $f$ This is going to be brief since it’s really “toy” example. Here’s a minimal-ish Sollya script to compute $f$ (actual script) using the previous: T = floating; // floating-point (not fixed-point) E = absolute; // minimize abs error (codomain is on a POT interval) B = [|24...|]; // generate single precision constants F = cos(acos(x)/3); // the function we're approximating c0 = -1+2^-128; // tweaked end-point R = [c0,1]; // the range of the approximation hc = sqrt(1+x); // reduced half-angle like thingy // the set of functions we're finding weights for (a6) basis = [|1,x,x^2, hc,hc*x,hc*x^2,hc*x^3|]; // create the approximation P = fpminimax(F,basis,B,R,T,E); // get a guess at the error e = dirtyinfnorm(P-F, R) // dump out the info print(" p(x) = ", P); display = hexadecimal!; print(" ", P); display = decimal!; write(" approx abs error: ", e); display = hexadecimal!; print("(", e, ")"); Taking a shorter approximation [|1,x,hc,hc*x|] gives: float a3(float x) { static const float A[] = { 0x1.ec4dc6p-8f, 0x1.a7e32p-2f}; static const float B[] = {-0x1.8961dp-5f, 0x1.cee61ap-2f}; float t = sqrtf(1.f+x); float a = fmaf(x, A[0], A[1]); float b = fmaf(x, B[0], B[1]); return fmaf(t,a,b); } Approximating $f$ across the entire range is very risque but adding range reduction would make things branchy. If we can have the same expression but with a different set of constants for (say) two sub ranges then we could select the set based on a cut point of the input. This point can be found by a binary search of functions: iteratively find two approximations and search for the value where their error is approximately equal. float c3(float x) { // constants as rationals for mean languages w/o hex floats. // easy in sollya using 'display=powers!' instead of hexadecimal. // found by a hacky binary search (see sollya file) const float cut = -2674669.f/4194304.f; const float A2[2][2] = { { 11019427.f/2147483648.f, 6809093.f/16777216.f}, { 12876879.f/1073741824.f, 3523121.f/8388608.f}}; const float B2[2][2] = { {-11111149.f/ 268435456.f, 7720477.f/16777216.f}, {-14304849.f/ 268435456.f, 14989259.f/33554432.f}}; float t = sqrtf(1.f+x); const int i = (x >= cut) ? 0 : 1; const float* A = A2[i]; const float* B = B2[i]; float a = fmaf(x, A[0], A[1]); float b = fmaf(x, B[0], B[1]); return fmaf(t,a,b); } A rough visual of abs error (because reference and approximations computed in javascript in double precision) of the example set of approximations in the “toy code”: (click on function name in legend to enable/disable in plot) and some numbers on each: function µOps ULP ~abs error naive — 2 1.192093e-07 cr3 — 2 1.192093e-07 g2 4 270973 1.615125e-02 g3 8 17184 1.024187e-03 g4 10 1629 9.709597e-05 g5 12 185 1.102686e-05 a3 10 1174 6.997585e-05 a4 12 257 1.531839e-05 a5 14 19 1.132488e-06 a6 16 5 2.980232e-07 c3 23 76 4.529953e-06 c4 17 11 6.556511e-07 c5 20 2 1.192093e-07 fr 24 1 5.960464e-08 naive = cosf(acosf(x)/3.f) using compiler libm functions cr3 = like ‘naive’ but using correctly rounded versions of cosf and acosf gn = IQ’s approximations with FMA fr = code of c5 with input promoted to double on input and demoted at end (2 uops each). lazy hack out of curiosity. µOps = number of micro-ops produced by GCC 12.2. The hiccups in cn functions is the unfortune array ordering I chose. Too lazy to correct. Summary We’ve taken a very quick peek at a couple of out of the norm approximation techniques: generate approximations (in Sollya) for a set of functions instead of a polynomial selecting between two sets of constants (for the same expression) as an alternate to adding terms to met an error target An advantage in our example case is the ability to break up the dependency chains in the computation. Comments math (32) trig (2) , approximation (3)