This is a massively chopped down version of a post I haven’t been completing, so maybe breaking into parts will work better. The goals are as follows:
 Derive slerp, attempt to make it less blackbox like and show it is a 2D problem
 Show that it is equivalent to the commonly seen function
 Form a reference version that only requires one forward trig operation
 Present analysis tools which require only reals
 Give basic analysis for lerp
The following shadertoy is a viz for three interpolations formulated directly in 2D:
 slerp (blue)
 nlerp (magenta)
 halfangle+nlerp (red).
Clicking down sets the initial point and the final point is the current mouse position. On button release the initial position resets to zero and the final remains. As a 2D slice of quaternions the maximum angle between the endpoints is $\frac{\pi}{2}$.
From complex real powers to the algebraic form of slerp
Since quaternions are an extension of complex numbers let’s first consider the complex case. Given a unit complex number $z$ both in explict and polar form:
The real (principle) power can be expressed in terms of the polar form as $\left( t\in \mathbb{R} \right) $:
This linearly parameterizes the angle between one and $z$. If $z$ respresents a rotation then this produces a continuous interpolation between the identity and the rotation represented by $z$ following the minimal magnitude path.
To generalize to an arbitrary starting point $A$ and ending point $B$ we can simply transform the problem:
\[\begin{equation} \label{zslerp} s \left( t \right) = \left(BA^*\right)^tA \end{equation}\]Which is the algebraic form of slerp in twodimensions.
SLERP the dreaded small angle circular arc
Replacing $z$ with a unit quaternion:
changes hardly anything. I’m assuming familarity with the contents of half/double angle and Cayley: preliminaries, the bullet points of which are:
 effectively working with the complex number: $z = a + b~\mathbf{i} $
 unit magnitude: $ a^2 + b^2 =1 $
 positive scalar: $ a $ on $\left[0,1\right] $
 $ b $ on $\left[0,1\right] $, since $b=\sqrt{x^2+y^2+z^2}$. Any sign of $b$ is contained by the implied $\mathbf{u}$
 the implied angle $\theta$ is therefore on $\left[0,~\frac{\pi}{2}\right]$
 the domain is the arc of unit circle in the first quadrant, axes included.
Since the quaternion product doesn’t commute we can express slerp as:
\[\begin{eqnarray} s\left( t \right) & = & \left(BA^*\right)^tA \label{slerp} \\ & = & A\left(A^*B\right)^t \label{slerp1} \\ & = & \left(AB^*\right)^{1t}B \label{slerp2} \\ & = & B\left(B^*A\right)^{1t} \label{slerp3} \end{eqnarray}\]
Where $\eqref{slerp2}$ and $\eqref{slerp3}$ are simply reversing the points and parameterization (holds in complex as well). Using $\eqref{slerp}$ we have $Q=BA^*$ where $A \cdot B \geq 0$ and the action of slerp is $Q^t$, which maps a coordinate in the {1,$Q$}plane to a coordinate in the {1,$Q$}plane.
Basic analysis tools
The action of slerp is defined by the principle real power $\eqref{power}$ and is bound to a complex plane. So we can reason about it as walking a path^{1} from $\left(1,0\right) $ to $\left(a,b\right) = \left(\cos \theta,\sin \theta\right)$. If the path in the plane is described by:
Then the angle at $t$ is:
From $\phi\left(t\right)$ we can compute other quantites such as angular velocity $\left(\omega=\frac{d\phi}{dt}\right)$ and acceleration $\left(\alpha=\frac{d\omega}{dt}\right)$.
Taking the values for slerp $\eqref{power}$ and pluging them in we get the expected:
\[\begin{eqnarray*} p\left(t\right) & = & \left( \cos\left(t\theta\right), ~\sin\left(t\theta\right)\right) \\ \phi\left(t\right) & = & \text{atan2}\left( \sin\left(t\theta\right) , ~\cos\left(t\theta\right)\right) = t \theta \\ \omega\left(t\right) & = & \theta \\ \alpha\left(t\right) & = & 0 \end{eqnarray*}\]
This gives a simple framework for computing errors (excluding those from floatingpoint computation).
Given some approximation of slerp we can compute the function $T$ to correct to constant angular velocity:
\[\begin{eqnarray} \text{atan}\left(\frac{p_y\left(T(t)\right)}{p_x\left(T(t)\right)}\right) = t \theta \nonumber \\ \frac{p_y\left(T(t)\right)}{p_x\left(T(t)\right)} = \tan\left(t \theta\right) \label{reparam} \\ \end{eqnarray}\]
by solving for $T$.
Algebraic to geometric form the version you almost always see
Rather than grind the math let’s directly place our 2D coordinate $\eqref{power}$ back into the original space:
\[\begin{equation} \label{slerpxy} s(t) = \cos\left(t\theta\right)X + \sin\left(t\theta\right)Y \end{equation}\]
Note that this holds for any number of dimensions.
Using $\eqref{slerp}$ or $\eqref{slerp1}$ then the ${X}$ direction is $A$ and ${Y}$ is the direction orthogonal to $A$ that contains $B$ (normalized). Since the inputs are normalized we have:
\[Y = \left\Vert B  (A \cdot B)A\right\Vert\]
which brings us to:
If we were to swap $A$ and $B$ and replace $t$ by $1t$ to instead use $\eqref{slerp2}$ or $\eqref{slerp3}$.
Making some trig replacements:
\[\begin{eqnarray*} \cos\left(\theta\right) & = & A\cdot B \\ \sin\left(\theta\right) & = & \sqrt{1 \left(A\cdot B\right)^{2}} \end{eqnarray*}\]
then we have:
Where $\eqref{gdavis}$ can be directly formulated from geometric reasoning and has somehow has be taken to be the true defination of slerp.
Ken Shoemake’s 1985 paper^{2} gives $\eqref{slerp1}$ and $\eqref{gdavis}$. This equation has be covered very well, as an example J.M.P van Waveren^{3} 2005.
Reference formulation of slerp one forward trig op is enough
Starting from $\eqref{srecon}$, replace $A\cdot B$ by $d$ and collecting terms gives:
If we examine the terms in the limit:
which yields linear interpolation as $\theta$ approaches zero ($d$ approaching one):
Or more simply the coord and arc approach being the same for small angles.
The two forward trig ops can be converted into one and given the range of both are positive^{4}. Showing both options as a pair of scale values where $s_t=\sin\left(t\theta\right)$ and $c_t=\cos\left(t\theta\right)$:
\[\begin{eqnarray*} s(t) & = & \left\{\sqrt{1s_t^2}d \frac{s_t}{\sqrt{1d^2}},~\frac{s_t}{\sqrt{1d^2}}\right\} \\ & = & \left\{c_td \sqrt{\frac{1c_t^2}{1d^2}},~\sqrt{\frac{1c_t^2}{1d^2}}\right\} \end{eqnarray*}\]
This leaves the computation of theta. I’ll paritially dismiss (directly) using $\text{acos}$ and $\text{asin}$ as being nearly hopeless functions^{5} and given the range of inputs we don’t need to yet distinguish between one and two parameter $\text{atan}$. There are a fair number of ways we can express $\theta$ via $\text{atan}$ and here are four of them:
where $\eqref{acos}$ and $\eqref{asin}$ are simply $\text{acos}$ and $\text{asin}$ in $\text{atan}$ form and $\eqref{hacos}$ $\eqref{hacos2}$ are applying halfangle to $\eqref{acos}$ in two forms.
Now for some practical considerations. By reference we can go two ways: close enough to detect problems in actually used methods or minimal error with respect to representable so we can accurately measure errors. I’m going with the former. Also we have to decide the meaning of a negative dot product.
 It is an error so assert or spew a message.
 Can I haz minimal anyway plz?
 Go the long way around.
I like memes as much as the next person so will go with option 2.
Using C standard functions we could make a concise^{6} implemention:
This leaves a fair number of thing we can know to the compiler to deduce and hopefully do something about. Let’s convert the “slerp weights” to single parameter $\text{atan}$ with input on $\left[ 0,1 \right]$ using $\eqref{hacos}$:
Showing the use of approximation plus one step of newtonraphson for $x^{\frac{1}{2}}$ since it has minimal impact on accuracy. Good idea or not is a different story. Likewise for removing either of the forward trig ops.
Assumming no hardware sin/cos then we could replace with approximations on $\left[0,\frac{\pi}{2}\right]$. Another option is move the angle doubling to after the approximation(s):
Although the range of arctan is narrow enough to directly approximate it requires a fair number of terms to achieve small error. From the sumoftangents identity:
To halve our angle range set $b=1$ since $\text{atan}\left(1\right)=\frac{\pi}{4}$. Using this substitution allows narrowing the range to $\left[0, \sqrt{2}1\right]$
Totally ignoring all the range reduction comments and toss in some lowest order
The reason for walking through some possible reductions of “reference form” are lost since I’ve ripped out the almost all the sections, but felt it was better to leave in place. Also losing forward trig operations is more a CPU side issues than GPU.
LERP and nlerp
Going back to our reference formula written with trig operations:
Approximating with either lerp or nlerp produce the same angular error values. Errors introduced by using a nonunit quaternion depend on how the consumer of the output is written (the simplifications made using unit magnitude as a given). Plugging in the approximation we get:
\[\begin{eqnarray*} p_l\left(t\right) & = & \left( 1+t\left(\cos\left(\theta\right)1\right), ~t\sin\left(\theta\right)\right) \\ \phi_l\left(t\right) & = & \text{atan}\left(~t\sin\left(\theta\right), ~1+t\left(\cos\left(\theta\right)1\right)\right) \\ \omega_l\left(t\right) & = & \frac{\sin\left(\theta\right)}{12t\left(\cos\left(\theta\right) \left(t1\right)  \left(t1\right) \right)} \\ \alpha_l\left(t\right) & = & \frac{2\sin\left(\theta\right)\left(\cos\left(\theta\right)1\right)\left(2t1\right)} {\left(1+2t\left(\cos\left(\theta\right) \left(t1\right)  \left(t1\right) \right)\right)^2} \end{eqnarray*}\]
To get a quick feel let’s examine the angle and angular velocity errors for some sample angles: $ \frac{\pi}{2}, \frac{\pi}{4}, \frac{\pi}{8} $ (Rotations of: 180⁰, 90⁰, 45⁰). First the absolute error of the angle at ‘t’ (normalized to range X axis):
As expected the result is exact at the end points and at $t=\frac{1}{2}$. The latter since a ray that bisects a coord bisects the arc. We can see the error decreases fast and by progressively turning off traces (click the legend) eyeball it around a factor of 10 each time we halve the angle. Also the ‘t’ of the peak error shifts to smaller values and appears to be slowing toward some final point.
Now the angular velocity absolute error. I’ve flipped the ordering so positive is too fast and negative is too slow:
\[\omega_l \left(t \right)  \theta\]
We can form an expression for the $t$ value of the maximum angle error as a function of $\theta$ by solving for $\omega_l\left(t\right)=\theta$:
which is a very boring function. Here are some sample values:
So the $t_{max}$ is rapidly approaching the fixed point $t_{max}(0)$. The max error for angular velocity is always at $t=\frac{1}{2}$. Choosing to examine the maximal angle range we have worst case error values of:
Normalizing the abs error by these and varying theta (also normalized) we can examine max error vs angle:
And the same plot as a log/log plot:
Since these are approximately linear in loglog implies the error terms are approximately monomials.
References and Footnotes

Actually any scalar multiple of the endpoints is mathematically correct. ↩

“Animating Rotation with Quaternion Curves”, Ken Shoemake, 1985. (PDF) ↩

“Slerping Clock Cycles”, van Waveren, 2005. ↩

no sign fixup needed. ↩

arcos can be approximated by: $\sqrt{1x}P\left(x\right)$ ↩

The abs and sgn computations aren’t needed with atan2. ↩