Toy code can be found: HERE
Prelim the dead horse beating
Skimming the math. Starting with a quaternion $Q=w+\left(x,y,z\right)$ then we can rotate $\mathbf{p}$ by:
and if $Q$ is unit magnitude this reduces to:
If $Q$ with magitude of $s$ is transformed by $\eqref{eq:rsxform}$ then the result is the composition of the rotation and uniform scaling of $s^2$.
To create a matrix we need to apply the rotation to the basis set to form our three equations:
which expanded and reduced gives:
Quaternion to rotation matrix
Sticking to the math convention of column vectors, then we can shove the (not reduced) equations $\eqref{xp} \eqref{yp} \eqref{zp}$ into the columns giving:
and we can reduce the diagonal elements since $Q\cdot Q = 1$ as per $\eqref{xpr} \eqref{ypr} \eqref{zpr}$:
Translating both into C gives:
The reason for this aside that because I've sometime seen code that extracts two of the directions as above and builds the third using a cross product. For the cross product method to be a performance win requires that it ends up being faster than three additions.
Another aside: In cases where input quaternions are moving away from unity due to compounding errors it is possible to fold the corrective step into the transform instead of renormalizing as a prestep. As an example:
Rotation matrix to quaternion
Let’s rename the elements of $\eqref{2matndr}$ as:
then the sum and difference of off diagonal pairs give:
and the permutations of trace like functions:
which can be reworked to:
If we know or determine an element has a sufficiently large magnitude we can choose the equation for that element from the second set of equations and three equations from the first set which are products of the that and the remaining elements.
As an example take a special case version where we know that the magnitude of the angle is not greater than $\frac{\pi}{2}$ which in turn ensures that the scalar $(w)$ is the largest element. Then we need $\eqref{4w2}$ from the second set and $\eqref{4wx} \eqref{4wy} \eqref{4wz}$ from the first.
Standard construction
The standard construction for the general case is based on the observation that since we’re producing a unit quaternion, then the smallest that the largest magnitude component can be is $\frac{1}{2}$. Given that we can simply walk through the components and choose the first that’s at least that large.
Given uniform random input then the (in code order) probabilities are: $\frac{2}{3},\frac{1}{9},\frac{1}{9},\frac{1}{9} $ The first from $\frac{2}{\pi}\text{acos}\left(\frac{1}{2}\right) = \frac{2}{3}$.
Mike Day’s construction
Insomniac Games (Mike Day) presented a paper^{1} in 2015 which makes the following observations. The sign of diagonal components imply $\eqref{2matstd}$:
and the permutations of sum/difference of pairs of diagonal elements imply:
They use the above to form a method that insures to choose a sufficiently large element. The following is a slightly massaged version of Day:
Given uniform random input then the outer and both inner probabilities are: $\frac{1}{2}$.
Angle errors
Let’s examine roundtrip errors. We generate a uniform quaternion $A$, convert to a matrix, back to a quaternion $B$ and then find the relative quaternion $X$:
then measure the represented angle, multiple by two (to get 3D rotation measure instead of in $\mathbb{H}$) and convert to degrees:
This choice of measure ignores the magnitudes of $A$ and $B$. The trace names are tomatrixmethod/toquatmethod. The xaxis is normalized.
There’s a jump in trace behavior at $\frac{2}{3}$ because our xaxis is a function of $w$ and this the point where $w=\frac{1}{2}$. The forward transform without diagonal reduction has lower peak error, but potentially more interesting is the fact that this method is better at anglepreserving (conformal) than the standard.
Variants you probably don’t care
Branch free Day
The reworking of Day above is to point out the we can formulate a branchfree version. Let’s call a function to isolate the sign bit of a float $sb$ and denote an integer XOR as $\oplus$. The initial branch is selected based on the sign of $m_{22}$. In both cases we’re computing three temp variables $a,b,c$ which can be expressed in terms of that sign: $s_0 = \text{sb}\left(m_{22}\right)$. Then inside the inner four cases we’re computing $e_0$ (the chosen sufficiently large element) which can be expressed in terms of the sign of $a$: $s_1 = \text{sb}\left(a\right)$. ($c$ and $e_0$ can use abs instead). The remaining two elements (unnamed temp expressions) can be formed using $s_1$ and the parity of two sign values: $s_p = s_0 \oplus s_1$. If we rename $b$ to $e_1$, then so far we have:
To complete the component computation we simply need to scale all four by $\frac{1}{2 \sqrt{e_0}}$. This leave placing them in memory, if we has storage order of $\left(x,~y,~z,~w\right)$ and think of our elements as the ordered list: $\left(e_3,~e_2,~e_1,~e_0 \right) $, then we apply the follow two rules (input to the second is renaming the output of the first)
 if $s_0$ is set $\left(e_3,~e_2,~e_1,~e_0 \right) \rightarrow \left( e_1,~e_0,~e_3,~e_2 \right) $, so flip the first and last two.
 if $s_p$ is set $\left(e_3,~e_2,~e_1,~e_0 \right) \rightarrow \left( e_2,~e_3,~e_0,~e_1 \right) $, so flip the first and second pairs.
Alternately we can simply compute a value to XOR as below:
Branch free higher error
Another branchfree possibility is to use the second set of equations to compute the magnitudes of all the elements, choose the scalar to be positive and compute the sign of the bivector components by the first set of equations.
which as promised results in much higher errors:
Almost branch free
Yet another option is to notice that special case small angle version performs rather well. These plots again are:
So we can formulate a version that has a highly predictable branch that handles the common case and otherwise fallback.
A lazy table of empirical numbers: some adhoc cut points, probability and roundtrip errors using the standard method of quattomatrix:
BF1_CUT  probability  rterror 

0.00001  0.998002  0.042664 
0.0001  0.993634  0.014877 
0.0002  0.990994  0.010233 
0.0004  0.987265  0.007769 
0.0008  0.981993  0.004991 
0.0016  0.974531  0.003587 
0.0032  0.963969  0.002935 
0.0064  0.949045  0.001946 
0.0128  0.927966  0.001367 
0.0256  0.898184  0.000975 
0.0512  0.856216  0.000753 
0.1024  0.797129  0.000528 
References and Footnotes

“Converting a Rotation Matrix to a Quaternion”, Mike Day, 2015 (download page) ↩