June 15th, 2023 In this post I’m going to sketch out some stable methods of computing and storing the parameters of the so-called look-at rotation. Everything here is more complicated than standard methods, so for it to be of any interest one or more of the following would need to be desirable: Directly generate a quaternion, yaw-pitch-roll or swing-twist decomposition Filtering of the result (say temporal smoothing) Dealing with regions of instability Branch-free friendly implementations This list makes it a a pretty narrow thing to want to do and this is partially intended to show some ways to attack moving an ill conditioned problem to a backward stable one without using any fancy floating-point tricks and attempting to avoid a mess of random branching to work around. Backward stable can be roughly defined as “the solution is exact for a nearby input” but here since we have a multi-component input (a unit vector) and multi-component output(s) so it’s more: “in the very small neighboorhood of the output is an exact solution for a nearby input”. First let’s define the convention that will be used: A right-handed coordinate system with $\mathbf{x}$ is to the right $\mathbf{y}$ is forward and $\mathbf{z}$ is up. There are various standard constructions. We might start with a pair of positions for the “entity” in question. Call $\left(p_0\right)$ the entity’s position and $\left(p_1\right)$ a target position for it to face toward. Then the facing unit vector $\left(\mathbf{v}\right)$ is: \[\mathbf{v} = \| p_1-p_0 \|\] where $\| \cdot \|$ denotes normaling the contained expression. Or the unit or arbitrary magnitude vector might be directly provided. Most variants will also provide another unit vector parameter $\left(\mathbf{w}\right)$ which specifies the direction the entity considers to be up. Given the convention and inputs we can compute the local right unit vector $\left(\mathbf{r} = \mathbf{x}’\right)$ by: \[\mathbf{r} = \| \mathbf{v} \times \mathbf{w} \|\] from which we can compute the local up unit vector $\left(\mathbf{u} = \mathbf{z}’\right)$: \[\mathbf{u} = \mathbf{r} \times \mathbf{v}\] Reduced standard method Let’s look at a standard construction with a reduced problem statement. First we’ll limit the input forward vector $\left(\mathbf{v}\right)$ to unit magnitude: \[\mathbf{v} = \left( x,~y,~z \right)\] Since we can just normalize prior to calling our routine this is minor. The next limitation is bit bigger. We’re going to fix the parent up to $+\mathbf{z}$. This drastically simplifies the math and the end goal here is to produce a rotation (or decomposed rotation). Expanding with these limitations gives: \[\begin{align} \mathbf{r} & = \mathbf{x}' & = \| \mathbf{v} \times \mathbf{z} \| & = \left\{\frac{y}{\sqrt{x^2+y^2}},~\frac{-x}{\sqrt{x^2+y^2}},~0\right\} \\ \mathbf{u} & = \mathbf{z}' & = \mathbf{x}' \times \mathbf{y}' & = \left\{\frac{-x z}{\sqrt{x^2+y^2}}, ~\frac{-y z}{\sqrt{x^2+y^2}}, ~\frac{x^2+y^2}{\sqrt{x^2+y^2}}\right\} \end{align}\] Let’s note the key properties: we’re producing an orthonormal basis so: $\mathbf{r} \cdot \mathbf{v} = \mathbf{r} \cdot \mathbf{u} = \mathbf{v} \cdot \mathbf{u} = 0$ $\mathbf{r} \cdot \mathbf{r} = \mathbf{v} \cdot \mathbf{v} = \mathbf{u} \cdot \mathbf{u} = 1$ the $z$ component of $\mathbf{r}$ is zero so it’s bound to the $\mathbf{xy}$-plane and can be expressed as $\mathbf{x}$ rotated about $\mathbf{z}$ The above expressions are written to reflect how the computation is going to be performed. Now a tempting change would be to replace $x^2+y^2$ with $1-z^2$ since $\mathbf{v}$ is unit but Danger, Will Robinson! we know that our input is only approximately unit and (even if carefully computed) each component will (generally) have small errors. Let’s simplify and assume that: \[\mathbf{v} \cdot \mathbf{v} = 1\] exactly and that $y=0$ and $z=1$ then: \[\mathbf{v} \cdot \mathbf{v} = x^2 + y^2 + z^2 = x^2 + 1 = 1\] This implies that $x^2$ is too small to contribute to an addition with one so: \[x^2 \le u\] where $u$ is the size of the rounding unit $ \left( \frac{1}{2}\text{ulp}(1) \right) $ which is $2^{-24}$ in single precision so: \[|x| \le 2^{-12} \approx 0.000244\] which is a very different result that zero. Notice that this isn’t a single precision problem and we’re in the same boat in any precision. As an example since doubles have a wider dynamic range (can represent a wider range of exponents) the issue is more pronounced. Below is a floating point number line on $\left[0,~1\right]$ where the “units” are ULP (so each “step” is the next floating point number we can represent). The top (blue) range shows the range of values that $\sqrt{x^2+y^2}$ might be when $|z|$ is approaching one and the bottom (purple) for doubles. All of this might seems cursed if it’s unfamiliar but it allows us to represent unit vectors very nearly pointing in cardinal directions and performing the “correct” computation is only very important if the result is highly sensitive to the sub-result. Aside: a previous blog post on quaternion/matrix conversion shows that making these kinds of transforms can drastically increase relative error even in well conditioned regions. Ahem. Anyway. Another issue is when $x$ and $y$ are both zero we have an infinite number of solution (and we’re ill conditioned on the approach) so we need to add decision making for that case. Let’s assume that we start in the default orientation (forward = $\mathbf{y}$ and up = $\mathbf{z}$) and are rotating toward the input giving us a $\pm \frac{\pi}{2}$ rotation about $\mathbf{x}$ where the sign that of the input $z$: \[\begin{align*} \mathbf{r} & = \left\{1,\hphantom{-}0,~0\right\} \\ \mathbf{u} & = \left\{0,-z,~0\right\} \end{align*}\] For the approach let’s consider a biased version of the above expressions (with $n = x^2+y^2 $): \[\begin{align*} \mathbf{r} & = \left\{\frac{y+b}{\sqrt{n+b}},~\frac{-x}{\sqrt{n+b}},~0\right\} \\ \mathbf{u} & = \left\{\frac{-x z}{\sqrt{n+b}}, ~\frac{-\left(y+b\right) z}{\sqrt{n+b}}, ~\frac{n}{\sqrt{n+b}}\right\} \end{align*}\] If $b$ is zero then they reduce to the previous set and if $b=1$: $\left(n+b\right) = 1$ if $ n \le u $ $\left(y+b\right) = 1$ if $ y \le u $ so if we conditionally set $b$ to zero or one based on some threshold value of $n$ we create deadzones in the tiny spherical caps around $+\mathbf{z}$ and $-\mathbf{z}$. The threshold should be no smaller than the square root of the smallest normal number to avoid hard to compute cases and no larger than $\sqrt{u}$ so than the magnitude of $z$ should be one and both biased terms flush to one. Finally some noteworthy implementation details. We don’t need to carefully compute $n$ as it won’t help much alone and we’ll be pretty tight keeping everything simple. The example code below does use fma but this is just for performance reasons (under the assumption that floating point contractions might be disable). Also we don’t want to perform five divisions (or even corrected multiply by reciprocal) so I’ll use the scale factor s = sqrt(1/(n+b) which has the lowest average error of the simple choices. typedef struct { float x,y,z; } vec3_t; typedef struct { vec3_t f,r,u; } basis_t; // Given unit vector 'v' compute local forward (f), right (r) // and up (u) unit vectors (an orthonormal basis set) with // the convention of up=Z, right=X & forward=Y. A look-at // with the input up fixed to Z. void lookat_std(basis_t* basis, vec3_t v) { // choice of the "small" cap around |z| = 1 // to treat as a deadzone to handle exactly n=0 // case as well as hard to compute cases. This // is about as small as it can get without // needing to move to careful computations. // |z| should be 1 all the way up to // sqrt(ulp(1)/2) = 0x1.0p-12 in binary32. // So practical choices can be somewhat larger // than this. However it should be no larger // than ulp(1)/2 = 0x1.0p-24 in binary32 to // keep both (y+b) and (n+b) resulting in 1 // when b=1. static const float cap = 0x1.0p-63f; // sqrt(min_normal) vec3_t u,r; // n doesn't need to be carefully computed and could // be performed as x*x + y*y. the 'fma' is for performance // and assumption that fp contraction could be disabled. float n = fmaf(v.x,v.x, v.y*v.y); float b = n > cap ? 0.f : 1.f; // proceed as normal (plus add the bias) float y = v.y + b; float s = sqrtf(1.f/(n+b)); float sx = -v.x*s; float sy = y*s; // std value : deadzone value r.x = sy; // y/sqrt(n) : 1 r.y = sx; // -x/sqrt(n) : -x {~0} r.z = 0.f; // 0 : 0 u.x = sx*v.z; // -xz/sqrt(n) : -x {~0} u.y = -sy*v.z; // -yz/sqrt(n) : -1 u.z = s*n; // n/sqrt(n) : n {~0} basis->f = v; // copy just to provide a std structure basis->u = u; basis->r = r; } Euler angles Yaw-Pitch (but no Roll) We can parameterize a unit vector using a spherical coordinate system where the radius is fixed to one and using a mathematical style convention of $\theta$ being the angle from $\mathbf{y}$ in the ${\mathbf{xy}}$ plane and $\phi$ the angle from ${\mathbf{xy}}$. \[\begin{align*} \theta & = \text{asin}\left(z\right) = \text{atan}\left(z,~\sqrt{x^2+y^2}\right) \\ \phi & = \text{atan}\left(-x,~y\right) \end{align*}\] where the arctangent is the two parameter form (atan2). We want to compute $\theta$ using atan2 to avoid NaNs if $|z|$ has creeped above one during computations and to get a more accurate measurement (same reasoning as above for computing $n$ as $x^2+y^2$). If we treat these angles as the yaw $\left(\phi\right)$ and pitch as $\left(\theta\right)$ of the Euler angle (really should be called Tait-Bryan) parameterization commonly used in aviation then the resulting rotation produces our desired local right and up vectors. So the pitch is a rotation about $\mathbf{x}$ which as a matrix is: \[P = \left( \begin{array}{rrr} 1 & 0 & 0 \\ 0 & \cos (\theta ) & -\sin (\theta ) \\ 0 & \sin (\theta ) & \cos (\theta ) \\ \end{array} \right)\] the yaw is a rotation about $\mathbf{z}$: \[Y = \left( \begin{array}{rrr} \cos (\phi ) & -\sin (\phi ) & 0 \\ \sin (\phi ) & \cos (\phi ) & 0 \\ 0 & 0 & 1 \\ \end{array} \right)\] composing the two rotations gives: \[YP = \left( \begin{array}{rrr} \cos (\phi ) & -\cos (\theta ) \sin (\phi ) & \sin (\theta ) \sin (\phi ) \\ \sin (\phi ) & \cos (\theta ) \cos (\phi ) & -\sin (\theta ) \cos (\phi ) \\ 0 & \sin (\theta ) & \cos (\theta ) \\ \end{array} \right)\] and yanking out the columns give the local basis: \[\begin{array}{rrr} \mathbf{r} = \mathbf{x}' = & \left(\hphantom{-\sin\left( \theta \right)} \cos\left(\phi\right),~\hphantom{-\cos\left( \theta \right)}\sin\left(\phi\right), ~\hphantom{\cos\left(\right)}0 \right) \\ \mathbf{f} = \mathbf{y}' = & \left( -\cos\left( \theta \right) \sin \left( \phi \right),~\hphantom{-}\cos\left( \theta \right) \cos \left( \phi \right),~ \sin\left( \theta \right) \right) \\ \mathbf{u} = \mathbf{z}' = & \left(\hphantom{-}\sin\left(\theta\right)\sin\left(\phi\right),~-\sin\left(\theta\right)\cos\left(\phi\right),~ \cos\left(\theta\right) \right) \end{array}\] slapping this into toy code gives: typedef struct { float yaw, pitch; } lookat_yp_t; lookat_yp_t lookat_yp(vec3_t v) { lookat_yp_t r; r.pitch = atan2f( v.z, sqrtf(fmaf(v.x,v.x,v.y*v.y))); r.yaw = atan2f(0.f-v.x, v.y+0.f); return r; } basis_t yp_to_basis(lookat_yp_t yp) { float cp = cosf(yp.pitch), sp = sinf(yp.pitch); float cy = cosf(yp.yaw), sy = sinf(yp.yaw); // 'f' reconstruction can be skipped if we have original 'v' and we // keep the restriction that 'v' must be (approximately) unit. vec3_t r = (vec3_t){.x= cy, .y= sy, .z=0.f }; vec3_t f = (vec3_t){.x=-sy*cp, .y= cy*cp, .z=sp }; vec3_t u = (vec3_t){.x= sy*sp, .y=-cy*sp, .z=cp }; return (basis_t){.r=r, .f=f, .u=u }; } I haven’t bothered with introducing the small cap region since atan2 will manange to yield reasonable results. But notice the funky inclusion of zeroes in both inputs of the yaw atan2. This is to normalize the result to zero when both $x$ and $y$ are zeroes where otherwise it can any of the following four depending on how the signs shake out: $\pm 0$ or $\pm \pi$. Note that this isn’t branch-free friendly because of the arctangent and the unit vector input requirement isn’t really needed here and if that requirement is dropped then the computation of f in yp_to_basis is cheaper than performing it elsewhere. We’ve already performed the bulk of the work. Oh! And as an example of how easy it is to get $|z| = 1$ and $|n| \ne 0$ let’s peek at reconstructing $r$ with $\phi=\pi$: phi = RN(pi) // 0x1.921fb6p1 r = {RN(cos(phi)), RN(sin(phi)), 0} // {-1, -0x1.777a5cp-24, 0} where RN is the default floating-point rounding (round to nearest, ties to even). Two complex decomposition not sorry for the bad pun We can take the previous and carry through the inverse trig followed by forward trig operations: \[\begin{align*} \cos(\theta) & = \sqrt{x^2+y^2} = \frac{x^2+y^2}{\sqrt{x^2+y^2}} \\ \sin(\theta) & = z \\ \\ \cos(\phi) & = \frac{y}{\sqrt{x^2+y^2}} \\ \sin(\phi) & = \frac{-x}{\sqrt{x^2+y^2}} \end{align*}\] Of course we have all the terms we needed in the vector construction and get identical expressions if we perform the reconstruction. Vector expressions (without bias) repeated for convenience: \[\begin{align*} \mathbf{r} & = \left\{\frac{y}{\sqrt{x^2+y^2}},~\frac{-x}{\sqrt{x^2+y^2}},~0\right\} \\ \mathbf{u} & = \left\{\frac{-x z}{\sqrt{x^2+y^2}}, ~\frac{-y z}{\sqrt{x^2+y^2}}, ~\frac{x^2+y^2}{\sqrt{x^2+y^2}}\right\} \end{align*}\] Other than showing the equivalence of the Yaw-Pitch and vector constructions it seems interesting to show the “same thing” in a different way in case anybody has an “ah ha!” moment. Plus it’s a natural stepping stone to the next section. The section name comes from we can think of this as a pair of independent unit complex numbers. One for the $\mathbf{xy}$ plane for yaw and the other in the $\mathbf{yz}$ plane for pitch. But anyway coded up explictly this way gives: typedef struct { float cp,sp; // cos(pitch), sin(pitch) float cy,sy; // cos(yaw), sin(yaw) } lookat_tc_t; lookat_tc_t lookat_tc(vec3_t v) { lookat_tc_t r; // SEE: lookat_std static const float cap = 0x1.0p-63f; float n = fmaf(v.x,v.x, v.y*v.y); float b = n > cap ? 0.f : 1.f; float s = sqrtf(1.f/(n+b)); float y = v.y + b; r.cp = n * s; r.sp = v.z; r.cy = y * s; r.sy = -v.x * s; return r; } basis_t tc_to_basis(lookat_tc_t tc) { // 'f' reconstruction can be skipped if we have original 'v' vec3_t r = (vec3_t){.x= tc.cy, .y= tc.sy, .z=0.f }; vec3_t f = (vec3_t){.x=-tc.sy*tc.cp, .y= tc.cy*tc.cp, .z=tc.sp }; vec3_t u = (vec3_t){.x= tc.sy*tc.sp, .y=-tc.cy*tc.sp, .z=tc.cp }; return (basis_t){.r=r, .f=f, .u=u }; } Swing-twist decomposition The short version of a swing-twist decomposition decomposition (click through for more) is that we factor a rotation into a pair of quaternions. The first is a rotation about a fixed axis (here $\mathbf{z}$) and this component is termed the twist $\left(T\right)$. The second rotation, the swing $\left(S\right)$, is in the orthogonal complement (so some rotation axis in the $\mathbf{xy}$ plane). Here we only have two degrees of freedom and the twist is limited to about $\mathbf{x}$. So the Euler and swing-twist decompositions directly map to one other in our problem. In general swing contains both yaw and roll information. Directly plugging yaw-pitch info gives: \[\begin{align*} S^2 & = \cos\left(\theta\right) + \left(\sin\left(\theta\right),~0,~0\right) \\ T^2 & = \cos\left(\phi\right) + \left(0,~0,~\sin\left(\phi\right)\right) \end{align*}\] Recall that in quaternions we need half the angle of the desired rotation so the terms are squared. So what we want is this: \[\begin{align*} S & = \cos\left(\frac{\theta}{2}\right) + \left(\sin\left(\frac{\theta}{2}\right),~0,~0\right) \\ T & = \cos\left(\frac{\phi}{2}\right) + \left(0,~0,~\sin\left(\frac{\phi}{2}\right)\right) \end{align*}\] Replacing with the direct terms from the previous section: \[\begin{align*} S^2 & = \frac{x^2+y^2}{\sqrt{x^2+y^2}} + \left(z,~0,~0 \right) \\ T^2 & = \frac{y}{\sqrt{x^2+y^2}} + \left(0,~0,~\frac{-x}{\sqrt{x^2+y^2}} \right) \end{align*}\] with some substitutions: \[\begin{align*} n & = x^2+y^2 \\ c & = \sqrt{n} \\ \\ S^2 & = \frac{n}{c} + \left(z,~0,~0 \right) = c + \left(z,~0,~0 \right) \\ T^2 & = \frac{y}{c} + \left(0,~0,~\frac{-x}{c}\right) \\ \end{align*}\] and we need to apply the half-angle indentities. Starting with swing: \[\begin{align} S & = \sqrt{\frac{1+c}{2}} + \left(\text{sgn}\left(z\right)\sqrt{\frac{1-c}{2}},~0,~0\right) \label{s0} \\ & = \frac{1+c}{\sqrt{2\left(1+c\right)}} + \left(\frac{z}{\sqrt{2\left(1+c\right)}},~0,~0\right) \label{s1} \end{align}\] Equation $\eqref{s0}$ is cheaper at two square roots where $\eqref{s1}$ is one and a divide. Here we’ll use the former. Neither have stability problems and in the limit approach: \[\begin{align*} S & = \frac{1}{\sqrt{2}} + \left(\frac{\text{sgn}\left(z\right)}{\sqrt{2}},~0,~0\right) \end{align*}\] The twist expands as: \[\begin{align} T & = \frac{c+y}{\sqrt{2c\left(c+y\right)}} + \left(0,~0,~\frac{-x}{\sqrt{2c\left(c+y\right)}}\right) \label{t0} \\ & = \frac{c+y}{\sqrt{2\left(n+cy\right)}} + \left(0,~0,~\frac{-x}{\sqrt{2\left(n+cy\right)}}\right) \label{t1} \end{align}\] where $\eqref{t1}$ is the generally more interesting form since we can use an FMA to compute $\left(n+cy\right)$ for both a performance and accuracy win. Note we now have two ill conditioned regions instead of one. Like before when both $x$ and $y$ are approaching zero and additionally when $y$ is approaching $-1$. The $\left|y\right|=1$ cases are: \[\require{mathtools} \begin{align*} T = \begin{cases} 1 + \left(0,~0,~0\right) & y = \hphantom{-}1 \\[2ex] 0 + \left(0,~0,~1\right) & y = -1 \end{cases} \end{align*}\] as a foreshadowing of how I’ll attack negative $y$ which is by argument reduction. The twist is a function of a point $p=\left(x,y\right)$ in the unit disc and we can map that point into the first quadrant $p’ = \left(\left|x\right|,~\left|y\right|\right)$, transform $p’$ and reverse the reduction. A visualization of the quadrants: and after halving the angle with respect to $\mathbf{y}$: If the input $x$ was negative then we reflected about $\mathbf{y}$ and if $y$ was negative then reflected about $\mathbf{x}$. To restore a negative $y$ input we need to reflect about the diagonal (angle has been halved): \[\begin{align*} \text{cos}\left(\frac{\pi}{2}-\alpha\right) & = \text{sin}\left(\alpha\right) \\ \text{sin}\left(\frac{\pi}{2}-\alpha\right) & = \text{cos}\left(\alpha\right) \end{align*}\] So the first step can be to compute pair values $t_w$ and $t_b$ and swap if $y$ is negative: \[\begin{align} \left(t_w,~t_b\right) = \begin{dcases} \left(c+\left|y\right|,~\left|x\right|\right) & y \ge 0 \\[2ex] \left(\left|x\right|, ~c+\left|y\right|\right) & y \lt 0 \end{dcases} \label{ts} \end{align}\] If the input $x$ was negative we simply need to reintroduce the sign and since needed $-x$ in $\eqref{t1}$ we get (this is an assignment and using an arrow looked ugly to me): \[\begin{align} t_b = -\text{sgn}(x)~t_b = \text{sgn}(-x)~t_b \label{ts2} \end{align}\] After performing the steps $\eqref{ts}$ and $\eqref{ts2}$ the twist becomes: \[\begin{align} T = \frac{t_w}{\sqrt{2\left(n+c\left|y\right|\right)}} + \left(0,~0,~\frac{t_b}{\sqrt{2\left(n+c\left|y\right|\right)}}\right) \label{ts3} \end{align}\] Finally we need to tweak the bias application. To get a one in the denominator we need to bias $n$ by $\frac{1}{2}$ when in the deadzone and the $y$ in $t_w$ by one. // swing-twist decomposition of look-at // specialized version for limited swing/twist // components typedef struct { float sb,sw; // sw + sb(1,0,0) float tb,tw; // tw + tb(0,0,1) } st_lookat_t; st_lookat_t lookat_st(vec3_t v) { static const float cap = 0x1.0p-63f; // sqrt(min_normal) st_lookat_t st; float n = fmaf(v.y,v.y, v.x*v.x); float c = sqrtf(n); // swing: #if 1 // two square-root version st.sw = sqrtf(0.5f*(1.f+c)); st.sb = copysignf(sqrtf(0.5f*(1.f-c)),v.z); #else // divide & square-root version float rs = sqrtf(0.5f/(1.f+c)); st.sw = fmaf(rs,c,rs); st.sb = rs * v.z; #endif // twist: reduce to first quandrant float b = n > cap ? 0.f : 0.5f; // bias if inside deadzone float y = fabsf(v.y) + b + b; float x = fabsf(v.x); float sx = f32_xor(-v.x,x); // -sgn(x) = sgn(-x) { for copysignf removal } float t2 = fmaf(c,y,n+b); // n+cy float ts = sqrtf(0.5f/t2); // 1/sqrt(2(n+cy)) st.tw = ts * (c+y); // (c+y)/sqrt(2(n+cy)) st.tb = ts * x; // x/sqrt(2(n+cy)) // reflect about diagonal if y is negative if (v.y < 0.f) { float t = st.tb; st.tb = st.tw; st.tw = t; } // reintroduce sign of x (reflect about y) //st.tb = copysignf(st.tb, -v.x); // this or following line st.tb = f32_xor(st.tb, sx); // see below return st; } // specialized swing/twist to quat quat_t st_to_quat(st_lookat_t st) { quat_t q; q.x = st.sb * st.tw; q.y = st.sb * st.tb; q.z = st.sw * st.tb; q.w = st.sw * st.tw; return q; } Quaternion We can get the direct-to-quaternion formulation from the swing-twist by performing the composition: \[\begin{align*} Q = ST \end{align*}\] Taking $\eqref{s1}$ and $\eqref{ts3}$, replacing the denominators by $a$ and $b$ respectively and performing the product gives: \[Q = \frac{(c+1)t_w}{a b} + \left(\frac{z~t_w}{a b},~\frac{z~t_b}{a b},~\frac{(c+1)t_b}{a b}\right)\] and expanding $ab$: \[\begin{align*} ab & = \sqrt{2(1+c)}\sqrt{2\left(n + c \left|y\right|\right)} \\ & = 2 \sqrt{(1+c)\left(n + c \left|y\right|\right)} \\ \end{align*}\] The term inside the root could be expanded but I’ll just leave as is and compute it and the other two $\left(1+c\right)t$ terms as $t+ct$ using FMA. quat_t lookat_quat(vec3_t v) { static const float cap = 0x1.0p-63f; // sqrt(min_normal) // for twist we need point in disc: p = (x,y) float n = fmaf(v.y,v.y, v.x*v.x); // n = dot(p,p) float c = sqrtf(n); float b = n > cap ? 0.f : 0.5f; // twist: reduce to first quandrant float x = fabsf(v.x); float y = fabsf(v.y) + b + b; //float sx = f32_xor(-v.x,x); // -sgn(x) = sgn(-x) float t = fmaf(c,y,n+b); // n+cy float s = sqrtf(0.25f / fmaf(c,t,t)); // 1/(2 sqrt((1+c)(n+cy))) float tw = fmaf(s,y,s*c); float tb = s*x; // reflect about diagonal if y is negative if (v.y < 0.f) { float t = tb; tb = tw; tw = t; } // reintroduce sign of x (reflect about y) // use copysignf version this time //tb = f32_xor(tb, sx); tb = copysignf(tb, -v.x); // compose the result quat_t q; q.x = v.z * tw; // z tw q.y = v.z * tb; // z tb q.z = fmaf(c,tb,tb); // (1+c)tb q.w = fmaf(c,tw,tw); // (1+c)tw return q; } Comments math (32) animation (1) , camera (1) , rotation (2)