My previous post was a quick walkthrough of how to compute the ULP distance between two finite floating point numbers and the logical next thing to consider is performing approximate floating point comparisons.

**TL;DR version:** Go read Bruce Dawson’s post on this topic and then Christer Ericson’s post. I’ll get around to making some very minor comments that you probably don’t care about. Note that although these posts date back to when x87 had to be considered they are still structured very well for modern ISAs.

The general case version of computing ULP distance looked like this:

We can simply extend the above code to produce a comparison function which returns *true* if they are within some specified distance. One possible way would be:

A couple of comments on how I structured the above. First in the same sign case we drop $\text{abs}$ and replace it with adding the distance $d$ and comparing with $2d$. This is the classic trick of testing if a value is within some distance of zero using unsigned integers and a single comparison. Secondly the different sign case does this as well even though the computed distance is known to be positive. Structured like this produces branchfree code on recent-ish compilers.

Sadly the above function is very unlikely to be what you really want. It is too strict about relative error for the vast majority of cases where approximately equal comparisons are needed. Bruce Dawson’s post covers the issues so there’s no need to repeat them.

Let’s take a look at Bruce’s hybrid (absolute and relative distance) test:

Which starts by computing the absolute difference `diff`

(which I’m calling distance) and then accepts (returns `true`

) if smaller or equal to the specified value `maxDiff`

. Let’s just assuming that `maxDiff`

is at least some some number of ULPs greater than zero since otherwise we wouldn’t be meanfully performing a hybrid test (and obviously this handles the case where the inputs are equal).

Let’s note a simplified version of a “well known” floating point property:

**Sterbenz’s lemma:** The subtraction $a-b$ is exact if $ \frac{b}{2} \le a \le 2b $. And for what it’s worth we can widen the condition to cover more cases by changing it to:

So for any partially reasonable notion of approximately equal then computation of `diff`

is exact (when they are approximately equal). Now let’s think about when $~a > b$, express $b$ in terms of $a$ and examine the absolute distance expression:

where $n$ is the ULP distance (a positive integer) and $\varepsilon$ is the *relative* size of a ULP. If we repeat for the $~b > a~$ case then we’d get:

For equation usage let’s use: $m$ for `largest`

, $d$ for `diff`

and $t$ for `maxRelDiff`

. Then we have:

and the relative distance comparison becomes:

I’m going to jump back and fill in some potential blanks. For finite floating point values which are *normal* (not *denormal* a.k.a *subnormal*) then the relative size of a ULP is fixed which for single precision is $\varepsilon = 2^{-23}$ and C standard headers exposes this value as the macro `FLT_EPSILON`

(this value is also the absolute ULP size at one). The magnitude of the smallest normal single is $2^{-126}$.

If we choose a `maxRelDiff`

value that’s a power of two and a `maxDiff`

value that’s sufficiently large then we can ensure that the product `largest*maxRelDiff`

$\left(m~t\right)$ is exact. Let’s break down what `maxDiff`

needs to be:

- The smallest normal value is $2^{-126} $
- The smallest
`maxRelDiff`

that allows for an exact product is $\varepsilon $ so we need a factor of $ 1/\varepsilon=2^{23} $ - The smallest value
`largest`

can take on is when $a=-b$ (the values are centered around zero) so we need an addition factor of $ 2 $

Composing these gives a minimum `maxDiff`

of $2^{-126} \cdot 2^{23} \cdot 2 = 2^{-102}~$ $~(\approx 1.97215 \times 10^{-31} ) $. If we changed `maxRelDiff`

to $2\varepsilon$ then this value would be halved..but this is probably already a much smaller number than you’d want to use in a hybrid test.

One thing to note is that although the above configuration produces exact comparisons it will not produce *exactly* the same comparision results if we were to wrap the above integer method with an absolute distance test. This is because we are treating distance measures slightly differently. The integer comparison measures how many successor/predecessor steps there are between the two values. The floating point version measures the ULP distance between the two where size of the ULP is taken to be that of the larger magnitude. These only differ when the two values cross a power-of-two interval boundary.

Example: $a=1$ with relative error of 1 ULP then integer version accepts $b=\left\{1-2^{-24},~1,~1+2^{-23}\right\}$ and the floating point version will additionally accept $1-2^{-23}$.

Now if we yet again rewrite `maxRelDiff`

as $\left(i~\varepsilon\right)$ then (as noted) the product will be always exact if integer $i$ is a power of two and otherwise rounding will sometimes occur. If the product rounds then the actually performed comparion will be one of: $ \left\{i-1,~i,~i+1\right\} $ ULPs.

For quick eye-balling: godbolt