Computing floating point additive recurrence

January 24th, 2020

Toy code can be found here.

This is an extract of a blog post that I haven’t been able to motive myself to complete. The topic of which is some high level brushstrokes of rank-1 lattices and how they are related to irrational generating vectors (n-dimensional Weyl sequences or additive recurrence) . Examples of irrational generating vectors are various flavors of Fibonacci/Golden ratio sets and Martin Robert’s $R_n$-sequences presented in the blog post: The Unreasonable Effectiveness of Quasirandom Sequences. Here I’m going to ignore n-dimensional and focus on computing a 1D sequence. This is all gonna be quick and dirty…let’s go:

The $i^\text{th}$ element of the sequence generated by an irrational constant $\alpha$ can be expressed in closed form:

\[\begin{equation*} \begin{array}{l} s_i & = i~\alpha \bmod 1 \\ & = \left\{i~\alpha\right\} \end{array} \end{equation*}\]

where $\left\{ \cdot \right\}$ is just shorthand for the modulus (or the fractional part) and we can progressively walk elements with the recurrence:

\[s_{i+1} = \left\{s_i + \alpha\right\}\]

A couple quick comments on $\alpha$. First we can require it to be on $\left(0,1\right)$ because:

\[\left\{k + \alpha\right\} = \alpha\]

for all integers $k$. Also 0 and 1 are obviously useless. Like the previous we also have:

\[\left\{1 - \alpha\right\} = -\alpha\]

So negating $\alpha$ produces the same sequence “mirrored” about $1/2$ and we can choose to limit ourselves to either $\left(0,1/2\right)$ or $\left(1/2,1\right)$. (Really interesting values are never near 0 or 1..but that doesn’t matter here)

Now let’s spitball how addition/products of computer primitive types work:

float point gives us the $p$ most significant bits of the result (and there’s that whole rounding thing)
integers (are modulo width) and gives us the $p$ least significant bits of the result.

So integers automatically perform the operations we’re really interested in. Onward. As an example irrational let’s use the perennial favorite, the golden ratio (well, really its inverse..or the fractional part of it…they’re the same):

\[\alpha = \frac{1}{\varphi} = \frac{\sqrt{5}-1}{2} \approx 0.618034\]

To work with integers we need to convert $\alpha$ into a $0.b$ fixed point value $A$:

\[A = \text{RO}\left( 2^b~\alpha \right)\]

where $\text{RO}\left(\cdot\right)$ denotes round-to-odd and $b$ is the bit-width of the integer. The rounding to odd is so $A$ is coprime to $2^b$ which ensures our sequence has $2^b$ elements (the best we can do with $b$ bits). A code sketch:

// 32-bit version:  RO(2^32 (1/phi))
static const uint32_t A = 2654435769;

void u_do_something_from_i(uint32_t i)
{
  uint32_t u = A*i;               // i'th element of integer sequence
  float    si;                    // FP version
  
  do {
    si = (float)u * 0x1.0p-32f;   // scale to float
    u += A;                       // next element of integer sequence

    // do stuff with 'si'

  } while(...);
}

Now let’s consider what happens if we follow the same recipe for floating point:

\[\text{alpha} = \text{RO}\left(\alpha \right)\]

where $\text{RO}\left(\cdot\right) $ will give us a significand with the smallest bit set. If we compute fmodf(i*alpha,1) (or an equivalent) what’s going to happen is we’re going to lose information as a function of $\log_2\left(i\right)$. Think of how long multiplication works (the number of digits needed to represent the result). If we compute v += alpha things will be a little better but it’s still a function of $\log_2$ the number of elements we’ve walked. The toy code computes these values and there’s a table at the end with the output. These numbers probably don’t look very impressive, but remember we care about a collection of values and some measurement between this point set (and most likely in a dimension greater than one). Actually, just forget about that and let’s fix it instead.

Since truncated fixed-point does exactly what we want the question becomes can we mimic that in floating-point? The good news is yeah and the bad news is it’s in all ways inferior to using integers if we try to use all available bits of precision. However with a little TLC we can make it work. If we have a $b$ bit unsigned integer then the maximum value we can represent is $\left(2^b-1\right)$ so the largest value we can get by adding two of them is:

\[\left(2^b-1\right) + \left(2^b-1\right) = 2^{b+1} - 2\]

bringing us the to well known fact that we need one extra bit to be exact. So for additive recurrence in floating point if we can ensure that the smallest bit is always clear then we can prevent rounding and we’ll be exact. Using single precision we have 24-bits and if we limit ourselves to 23 then the only time our sum will have the full 24 will be when it has become greater than one which we will then subtract away taking us back to 23. The previous statement is only true if our additive constant is on $\left(1/2,1\right)$ which places the “decimal” point at the required fixed position (left as exercise to reader..I’ve always wanted to type that).

Let’s step through this in painful detail. Say we want to use the plastic number $(\rho)$:

\[\begin{align*} \rho & =\sqrt[3]{\frac{9+\sqrt{69}}{18}}+\sqrt[3]{\frac{9-\sqrt{69}}{18}} \approx 1.324717957244746 \\ \left\{\rho\right\} & \approx 0.324717957244746 \\ 1-\left\{\rho\right\} & \approx 0.675282042755254 \\ 2^{23}\left(1-\left\{\rho\right\}\right) & \approx 5664676.346113066 \\ \text{RO}\left(2^{23}\left(1-\left\{\rho\right\}\right)\right) & = 5664675 \end{align*}\]

The value has an integer part, kill it, on wrong range so we mirror it about $1/2$, scale it by the number of bits and round to odd. Here we have no particular reason to choose one odd over the other..so pick one.

Here’s a toy sketch. To get the initial value it simply uses the integer method and then walks the sequence using floating point.

// RO(2^23 (2-plastic))
static const uint32_t K23     = 5664675u << (32-23);
static const float    alpha23 = K23*0x1.p-32f;

void f_do_something_from_i(uint32_t i)
{
  // compute the first using integers
  float si = (float)(i*K23) * 0x1.0p-32f;
  
  do {
    // do stuff with 'si'
	
	// update to next element
    si += alpha23;
    if (si >= 1.f) si -= 1.f;
  } while(..);
}

Summary

A quick recap.

The pure integer method:

no special rules to remember other than “make it odd”. This isn’t a hard rule: for every trailing zero we divide the period by two.
it’s period is $2^b$
downside is it requires: int-to-float and scale

The float-point method:

if constant doesn’t fit the listed requirements then things go abit wrong
it’s period is $2^{p-1}$, where $p$ is the number of normal significand bits of the format.
downside is it requires: manually handling the “wrap” (subtract one when greater than)

Toy code output

This is section is simply the output of the toy code formatted into tables.

First computing si = fmodf(i*alpha,1) with odd alpha compared to the exact result. Each line of the table is when we’ve lost a bit of precision.

$i$	$\log_2\left(i\right) $	$s_i$ exact	$s_i$ float	exact	float
3	1.584962	0.854102	0.854102	daa66b00	daa66c00
6	2.584963	0.708204	0.708204	b54cd600	b54cd800
12	3.584963	0.416407	0.416408	6a99ac00	6a99b000
24	4.584962	0.832815	0.832815	d5335800	d5336000
48	5.584962	0.665629	0.665630	aa66b000	aa66c000
96	6.584962	0.331259	0.331261	54cd6000	54cd8000
192	7.584962	0.662518	0.662521	a99ac000	a99b0000
384	8.584963	0.325035	0.325043	53358000	53360000
768	9.584963	0.650070	0.650085	a66b0000	a66c0000
1536	10.584963	0.300140	0.300171	4cd60000	4cd80000
3072	11.584963	0.600281	0.600342	99ac0000	99b00000
6144	12.584963	0.200562	0.200684	33580000	33600000
12288	13.584963	0.401123	0.401367	66b00000	66c00000
24576	14.584963	0.802246	0.802734	cd600000	cd800000
49152	15.584963	0.604492	0.605469	9ac00000	9b000000
98304	16.584963	0.208984	0.210938	35800000	36000000
196608	17.584963	0.417969	0.421875	6b000000	6c000000
393216	18.584963	0.835938	0.843750	d6000000	d8000000
786432	19.584963	0.671875	0.687500	ac000000	b0000000
1572864	20.584963	0.343750	0.375000	58000000	60000000
3145728	21.584963	0.687500	0.750000	b0000000	c0000000
6291456	22.584963	0.375000	0.500000	60000000	80000000
12582912	23.584963	0.750000	0.000000	c0000000	00000000
25165824	24.584963	0.500000	0.000000	80000000	00000000

and the same for computing si += alpha; if (si >= 1.f) si -= 1.f; with odd alpha compared to the exact result.

$i$	$\log_2\left(i\right) $	$s_i$ exact	$s_i$ float	exact	float
4	2.000000	0.090170	0.090170	17155d00	17155c00
9	3.169925	0.180339	0.180339	2e2aba00	2e2ab800
43	5.426265	0.193494	0.193493	3188cc00	3188c800
111	6.794416	0.219802	0.219801	3844f000	3844e800
255	7.994353	0.216690	0.216689	37790000	3778f000
543	9.084808	0.210466	0.210464	35e12000	35e10000
1119	10.127995	0.198019	0.198015	32b16000	32b12000
2263	11.144021	0.228853	0.228845	3a961800	3a959800
4559	12.154502	0.234792	0.234776	3c1b5000	3c1a5000
9159	13.160975	0.190941	0.190911	30e18800	30df8800
18343	14.162942	0.214697	0.214636	36f66800	36f26800
36719	15.164239	0.206481	0.206359	34dbf000	34d3f000
73471	16.164886	0.190048	0.189804	30a70000	30970000
146967	17.165133	0.212911	0.212423	36815800	36615800
293967	18.165295	0.202909	0.201932	33f1d000	33b1d000
587967	19.165375	0.182903	0.180950	2ed2c000	2e52c000
1175959	20.165407	0.198621	0.194715	32d8d800	31d8d800
2351943	21.165422	0.230057	0.222245	3ae50800	38e50800
4703919	22.165432	0.237201	0.221576	3cb93000	38b93000
9407871	23.165438	0.251488	0.220238	40618000	38618000
18815775	24.165440	0.280062	0.217562	47b22000	37b22000
37631583	25.165442	0.337210	0.212210	56536000	36536000
75263199	26.165442	0.451506	0.201506	7395e000	3395e000
150526423	27.165442	0.735826	0.235826	bc5f1800	3c5f1800

Comments

distribution (7)