Some comments on approximately equal FP comparisons

May 14th, 2019

My previous post was a quick walkthrough of how to compute the ULP distance between two finite floating point numbers and the logical next thing to consider is performing approximate floating point comparisons.

TL;DR version: Go read Bruce Dawson’s post on this topic and then Christer Ericson’s post. I’ll get around to making some very minor comments that you probably don’t care about. Note that although these posts date back to when x87 had to be considered they are still structured very well for modern ISAs.

The general case version of computing ULP distance looked like this:

inline uint32_t f32_to_bits(float x)
{
  uint32_t u; memcpy(&u, &x, 4); return u;
}

inline uint32_t u32_abs(uint32_t x)
{
  return (int32_t)x >= 0 ? x : -x;
}

// returns the ULP distance between finite a & b
uint32_t f32_ulp_dist(float a, float b)
{
  uint32_t ua = f32_to_bits(a);
  uint32_t ub = f32_to_bits(b);
  uint32_t s  = ub^ua;

  if ((int32_t)s >= 0)
    return u32_abs(ua-ub);   // same sign case
  
  return ua+ub+0x80000000;   // different sign case
}

We can simply extend the above code to produce a comparison function which returns true if they are within some specified distance. One possible way would be:

// returns true if finite a & b are within d ULPs
uint32_t f32_within_ulp_dist(float a, float b, uint32_t d)
{
  uint32_t ua = f32_to_bits(a);
  uint32_t ub = f32_to_bits(b);
  uint32_t s  = ub^ua;

  if ((int32_t)s >= 0)
    return ua-ub+d <= d+d;
  
  return ua+ub+0x80000000+d <= d+d;
}

A couple of comments on how I structured the above. First in the same sign case we drop $\text{abs}$ and replace it with adding the distance $d$ and comparing with $2d$. This is the classic trick of testing if a value is within some distance of zero using unsigned integers and a single comparison. Secondly the different sign case does this as well even though the computed distance is known to be positive. Structured like this produces branchfree code on recent-ish compilers.

Sadly the above function is very unlikely to be what you really want. It is too strict about relative error for the vast majority of cases where approximately equal comparisons are needed. Bruce Dawson’s post covers the issues so there’s no need to repeat them.

Let’s take a look at Bruce’s hybrid (absolute and relative distance) test:

bool AlmostEqualRelativeAndAbs(float A, float B,
            float maxDiff, float maxRelDiff = FLT_EPSILON)
{
  // Check if the numbers are really close -- needed
  // when comparing numbers near zero.
  float diff = fabsf(A - B);
  
  if (diff <= maxDiff)
    return true;
 
  A = fabsf(A);
  B = fabsf(B);
  float largest = (B > A) ? B : A;
 
  if (diff <= largest * maxRelDiff)
    return true;
	
  return false;
}

Which starts by computing the absolute difference diff (which I’m calling distance) and then accepts (returns true) if smaller or equal to the specified value maxDiff. Let’s just assuming that maxDiff is at least some some number of ULPs greater than zero since otherwise we wouldn’t be meanfully performing a hybrid test (and obviously this handles the case where the inputs are equal).

Let’s note a simplified version of a “well known” floating point property:

Sterbenz’s lemma: The subtraction $a-b$ is exact if $ \frac{b}{2} \le a \le 2b $. And for what it’s worth we can widen the condition to cover more cases by changing it to:

\[\text{exponent}\left(a-b\right) \le \text{min}\left( \text{exponent}\left(a\right),~\text{exponent}\left(b\right)\right)\]

So for any partially reasonable notion of approximately equal then computation of diff is exact (when they are approximately equal). Now let’s think about when $~a > b$, express $b$ in terms of $a$ and examine the absolute distance expression:

\[\begin{align*} b & = a\left(1 - n~\varepsilon \right) \\ \left|a-b\right| & = a~n~\varepsilon \end{align*}\]

where $n$ is the ULP distance (a positive integer) and $\varepsilon$ is the relative size of a ULP. If we repeat for the $~b > a~$ case then we’d get:

\[\begin{align*} a & = b\left(1 - n~\varepsilon \right) \\ \left|a-b\right| & = b~n~\varepsilon \end{align*}\]

For equation usage let’s use: $m$ for largest, $d$ for diff and $t$ for maxRelDiff. Then we have:

\[d = m~n~\varepsilon\]

and the relative distance comparison becomes:

\[\begin{align*} d \le m~t \\ m~n~\varepsilon \le m~t \\ n~\varepsilon \le t \end{align*}\]

I’m going to jump back and fill in some potential blanks. For finite floating point values which are normal (not denormal a.k.a subnormal) then the relative size of a ULP is fixed which for single precision is $\varepsilon = 2^{-23}$ and C standard headers exposes this value as the macro FLT_EPSILON (this value is also the absolute ULP size at one). The magnitude of the smallest normal single is $2^{-126}$.

If we choose a maxRelDiff value that’s a power of two and a maxDiff value that’s sufficiently large then we can ensure that the product largest*maxRelDiff $\left(m~t\right)$ is exact. Let’s break down what maxDiff needs to be:

The smallest normal value is $2^{-126} $
The smallest maxRelDiff that allows for an exact product is $\varepsilon $ so we need a factor of $ 1/\varepsilon=2^{23} $
The smallest value largest can take on is when $a=-b$ (the values are centered around zero) so we need an addition factor of $ 2 $

Composing these gives a minimum maxDiff of $2^{-126} \cdot 2^{23} \cdot 2 = 2^{-102}~$ $~(\approx 1.97215 \times 10^{-31} ) $. If we changed maxRelDiff to $2\varepsilon$ then this value would be halved..but this is probably already a much smaller number than you’d want to use in a hybrid test.

One thing to note is that although the above configuration produces exact comparisons it will not produce exactly the same comparision results if we were to wrap the above integer method with an absolute distance test. This is because we are treating distance measures slightly differently. The integer comparison measures how many successor/predecessor steps there are between the two values. The floating point version measures the ULP distance between the two where size of the ULP is taken to be that of the larger magnitude. These only differ when the two values cross a power-of-two interval boundary.

Example: $a=1$ with relative error of 1 ULP then integer version accepts $b=\left\{1-2^{-24},~1,~1+2^{-23}\right\}$ and the floating point version will additionally accept $1-2^{-23}$.

Now if we yet again rewrite maxRelDiff as $\left(i~\varepsilon\right)$ then (as noted) the product will be always exact if integer $i$ is a power of two and otherwise rounding will sometimes occur. If the product rounds then the actually performed comparion will be one of: $ \left\{i-1,~i,~i+1\right\} $ ULPs.

For quick eye-balling: godbolt

Comments

math (34)