Toy code can be found: HERE. All reported values are using clang without enabling floatingpoint contractions.
Prelim the dead horse beating
Skimming the math. Starting with a quaternion $Q=w+\left(x,y,z\right)$ then we can rotate $\mathbf{p}$ by:
and if $Q$ is unit magnitude this reduces to:
If $Q$ with magitude of $s$ is transformed by $\eqref{eq:rsxform}$ then the result is the composition of the rotation and uniform scaling of $s^2$.
To create a matrix we need to apply the rotation to the basis set to form our three equations:
\[\begin{eqnarray*} \mathbf{x} = \left(1,0,0\right) \\ \mathbf{y} = \left(0,1,0\right) \\ \mathbf{z} = \left(0,0,1\right) \\ \end{eqnarray*}\]which expanded and reduced gives:
\[\begin{eqnarray} \mathbf{x}' & = & Q\mathbf{x}Q^{*} \nonumber \\ & = & \left(w^2 + x^2  y^2  z^2, ~2\left(xy+wz\right), ~2\left(xzwy\right) \right) \label{xp} \\ & = & \left(1  2 \left(y^2+z^2\right), ~2\left(xy+wz\right), ~2\left(xzwy\right) \right) \label{xpr} \\ \nonumber \\ \mathbf{y}' & = & Q\mathbf{y}Q^{*} \nonumber \\ & = & \left(2 \left(xywz\right), ~w^2  x^2 + y^2  z^2, ~2\left(wx+yz\right) \right) \label{yp} \\ & = & \left(2 \left(xywz\right), ~1  2\left(x^2+z^2\right), ~2\left(wx+yz\right) \right) \label{ypr} \\ \nonumber \\ \mathbf{z}' & = & Q\mathbf{z}Q^{*} \nonumber \\ & = & \left(2 \left(wy+xz\right), ~2\left(yzwx\right), ~w^2  x^2  y^2 + z^2\right) \label{zp} \\ & = & \left(2 \left(wy+xz\right), ~2\left(yzwx\right), ~1  2\left(x^2+y^2\right) \right) \label{zpr} \end{eqnarray}\]Quaternion to rotation matrix
Sticking to the math convention of column vectors, then we can shove the (not reduced) equations $\eqref{xp} \eqref{yp} \eqref{zp}$ into the columns giving:
and we can reduce the diagonal elements since $Q\cdot Q = 1$ as per $\eqref{xpr} \eqref{ypr} \eqref{zpr}$:
Translating both into C gives:
The reason for this aside that because I've sometime seen code that extracts two of the directions as above and builds the third using a cross product. For the cross product method to be a performance win requires that it ends up being faster than three additions.
Another aside: In cases where input quaternions are moving away from unity due to compounding errors it is possible to fold the corrective step into the transform instead of renormalizing as a prestep.
Rotation matrix to quaternion
Let’s rename the elements of $\eqref{2matndr}$ as:
then the sum and difference of off diagonal pairs give:
and the permutations of trace like functions:
which can be reworked to:
If we know or determine an element has a sufficiently large magnitude we can choose the equation for that element from the second set of equations and three equations from the first set which are products of the that and the remaining elements.
As an example take a special case version where we know that the magnitude of the angle is not greater than $\frac{\pi}{2}$ which in turn ensures that the scalar $(w)$ is the largest element. Then we need $\eqref{4w2}$ from the second set and $\eqref{4wx} \eqref{4wy} \eqref{4wz}$ from the first.
Standard construction
The standard construction for the general case is based on the observation that since we’re producing a unit quaternion, then the smallest that the largest magnitude component can be is $\frac{1}{2}$. Given that we can simply walk through the components and choose the first that’s at least that large.
Given uniform random input then the (in code order) probabilities are: $\frac{2}{3},\frac{1}{9},\frac{1}{9},\frac{1}{9} $ The first from $\frac{2}{\pi}\text{acos}\left(\frac{1}{2}\right) = \frac{2}{3}$.
Mike Day’s construction
Insomniac Games (Mike Day) presented a paper^{1} in 2015 which makes the following observations. The sign of diagonal components imply $\eqref{2matstd}$:
\[\begin{eqnarray*} m_{00} < 0 \implies y^2+z^2 > \frac{1}{2} \\ m_{11} < 0 \implies x^2+z^2 > \frac{1}{2} \\ m_{22} < 0 \implies x^2+y^2 > \frac{1}{2} \end{eqnarray*}\]
and the permutations of sum/difference of pairs of diagonal elements imply:
They use the above to form a method that insures to choose a sufficiently large element. The following is a slightly massaged version of Day:
Given uniform random input then the outer and both inner probabilities are: $\frac{1}{2}$.
Angle errors
Let’s examine roundtrip errors. We generate a uniform quaternion $A$, convert to a matrix, back to a quaternion $B$ and then find the relative quaternion $X$:
then measure the represented angle, multiple by two (to get 3D rotation measure instead of in $\mathbb{H}$) and convert to degrees:
This choice of measure ignores the magnitudes of $A$ and $B$. The trace names are tomatrixmethod/toquatmethod. The xaxis is normalized.
There’s a jump in trace behavior at $\frac{2}{3}$ because our xaxis is a function of $w$ and this the point where $w=\frac{1}{2}$. The forward transform without diagonal reduction has lower peak error, but potentially more interesting is the fact that this method is better at anglepreserving (conformal) than the standard.
Variants you probably don’t care
Branch free Day
The reworking of Day above is to point out the we can formulate a branchfree version. Let’s call a function to isolate the sign bit of a float $sb$ and denote an integer XOR as $\oplus$. The initial branch is selected based on the sign of $m_{22}$. In both cases we’re computing three temp variables $a,b,c$ which can be expressed in terms of that sign: $s_0 = \text{sb}\left(m_{22}\right)$. Then inside the inner four cases we’re computing $e_0$ (the chosen sufficiently large element) which can be expressed in terms of the sign of $a$: $s_1 = \text{sb}\left(a\right)$. ($c$ and $e_0$ can use abs instead). The remaining two elements (unnamed temp expressions) can be formed using $s_1$ and the parity of two sign values: $s_p = s_0 \oplus s_1$. If we rename $b$ to $e_1$, then so far we have:
To complete the component computation we simply need to scale all four by $\frac{1}{2 \sqrt{e_0}}$. This leave placing them in memory, if we has storage order of $\left(x,~y,~z,~w\right)$ and think of our elements as the ordered list: $\left(e_3,~e_2,~e_1,~e_0 \right) $, then we apply the follow two rules (input to the second is renaming the output of the first)
 if $s_0$ is set $\left(e_3,~e_2,~e_1,~e_0 \right) \rightarrow \left( e_1,~e_0,~e_3,~e_2 \right) $, so flip the first and last two.
 if $s_p$ is set $\left(e_3,~e_2,~e_1,~e_0 \right) \rightarrow \left( e_2,~e_3,~e_0,~e_1 \right) $, so flip the first and second pairs.
Alternately we can simply compute a value to XOR as below:
Branch free higher error
Another branchfree possibility is to use the second set of equations to compute the magnitudes of all the elements, choose the scalar to be positive and compute the sign of the bivector components by the first set of equations.
which as promised results in much higher errors:
Although there are cases where these errors are low enough to be usable the performace is worse that some better methods…
Almost branch free
Yet another option is to notice that special case small angle version performs rather well. These plots again are:
So we can formulate a version that has a highly predictable branch that handles the common case and otherwise fallback.
A lazy table of empirical numbers: some adhoc cut points, probability and roundtrip errors using the standard method of quattomatrix:
BF1_CUT  probability  rterror 

0.00001  0.998002  0.042664 
0.0001  0.993634  0.014877 
0.0002  0.990994  0.010233 
0.0004  0.987265  0.007769 
0.0008  0.981993  0.004991 
0.0016  0.974531  0.003587 
0.0032  0.963969  0.002935 
0.0064  0.949045  0.001946 
0.0128  0.927966  0.001367 
0.0256  0.898184  0.000975 
0.0512  0.856216  0.000753 
0.1024  0.797129  0.000528 
Overkill but lower error
Let’s run through a method which includes every component of the rotation matrix in each component of the resulting quaternion. The point of that exercise is a light filtering of noise present in the input. First we’ll rewrite the rotation matrix as the result of converting an axisangle conversion with unit vector $(x,~y,~z)$ and angle $\theta \in [0,\pi]$:
NOTE: everywhere else I’m using $x,y,z$ for quaternion components and here for the unit vector of the axis angle.
Expanding and naming off diagonal differences:
\[\begin{eqnarray*} s_1 = m_{21}  m_{12} = 2x \sin(\theta) \\ s_2 = m_{02}  m_{20} = 2y \sin(\theta) \\ s_3 = m_{01}  m_{10} = 2z \sin(\theta) \end{eqnarray*}\]
and sums:
and (what I’m calling) trace like:
We can combine these as follows:
a simple rewrite to make the halfangle identities jump out:
So we can compute the magnitudes of the quaternion components as follows:
The remaining issue is then determining the signs of the components and the $s_i$ terms have that information. Plot the error of that implementation (vs. Day) would give us:
So it all falls apart when approaching the maximal rotation angle. The problem is $\sin \theta$ is approaching zero. We can correct this by noting the most important information is in the largest magnitude component $q_i$ and we could choose that to be the one that’s positive (or equivalently known sign). Specifically the $a_i$ term which includes the known and desired sign components. With this correction we have:
Let’s do some points:
 dp is promoting input to doubles, computing and demoting to single. For single precision this is less work than being fancy. If working precision is doubles you’d have to go with extended precision methods (TwoProduct, compensated summations, etc.) to squeeze out the additonal error reduction.
 Although it’s structure helps with noise filtering, performing the standard quaternion to matrix method (reducing the diagaonals) is still a major poke in the eye. Just hide all the std methods now..thanks!
 All of this added computational complexity does buy us anything for small rotation angles (vs. Day & “Almost branchfree”..provided NDR quattomat)
Since NDR is the most effective error reduction, let’s do a spitball of it promoted to double computation. (reminder: the toy code will let you play with whatever combo you want)
Now we’re not really attempting to filter a noisy input we can relax to ensuring $w$ is sufficiently large to not increase the error in the case we’re interested it to not have a branch misprediction nightmare. (Going branchfree is possible and would be the “thing” todo in a SIMD implementation. For scalar it doesn’t seem worthwhile). Making the simplifing assumption that rotations are uniformly random distributed then the normalized cumulative volume (cumulative/total) of a rotation is:
for a 3D rotation angle of $\theta$. Plugging in how the quaternion scalar $w$ is related to $\theta = 2 ~\text{acos} (w)$ yields:
which is pretty close to a linear decreasing function:
So if the first test is w > CUT
then the probablity of that block is $w$ fed into the above expression (assuming uniform input and better if angles are skewed toward small). An implementation:
References and Footnotes

“Converting a Rotation Matrix to a Quaternion”, Mike Day, 2015 (download page) ↩