This is a mostly code post with some supporting comments. I’ll assume you’re familiar with IEEE bit formats and will use binary32 as the working example. The quick skim version is we have a sign bit followed by a biased exponent and then the explict bits of the significand (fractional part). For all of this we’re taking the inputs to be finite (including denormals).
We’ll start with the fact that the bit pattern of finite IEEE values are equivalent to a signed magnitude integer. So to compute the ULP distance between two floats we can simply:
- get the bit patterns for both
- convert both from signed magnitude to two’s complement
- subtract to get the difference
- compute the absolute value to get the distance
Tossing together some helper functions and pasting them together gives:
The resulting function is relatively expensive for what it does. Let’s create a special case version which only properly handles inputs with the same sign. If both are positive then the subtraction can be directly performed on the signed magnitude integers and likewise for both negative (the sign bits cancel). This eliminiates the integer representation conversion which is the bulk of our original version.
The previous is far from useless since it’s common to only care about small distances and incorrectly computed large wouldn’t matter. So it’s usable if we know that one value isn’t in the neighborhood of zero (at least as far from zero as the maximum distance we’re interested in).
Let’s now think about when the signs are different. If we ignore the sign bit than rest of the signed magnitude integer is the distance from zero to the float so we can then simply add the two integers together and zero out the sign bit of the result to compute the distance. Mashing that up with a same signed special case takes us back to a general case function: